WO2010029472A1

WO2010029472A1 - Inserting advertisements in connection with user-created content

Info

Publication number: WO2010029472A1
Application number: PCT/IB2009/053843
Authority: WO
Inventors: Srinivas Rao Kudavelly; Mauro Barbieri
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2008-09-11
Filing date: 2009-09-03
Publication date: 2010-03-18
Also published as: TW201016004A

Abstract

An advantageous way of inserting an advertisement in connection with content created by a user is disclosed. The invention comprises detecting (102) one or more dramatic moments in the content when a viewer's attention is substantially high, and inserting (104) the advertisement when at least one of the detected dramatic moments occurs. The invention can assist advertisers to promote their advertisements employing the user- created content.

Description

Inserting advertisements in connection with user-created content

Field of the invention

The present subject matter relates to inserting advertisements in connection with user-created content.

Background of the invention

US 2008/0098423 discloses methods, systems and computer program products that facilitate selecting advertisements for insertion into advertisement slots in broadcast content. Broadcast criteria for the content and viewer criteria are identified. Broadcast criteria include information about the content and viewer criteria include information about viewers to whom the content is targeted. Information is retrieved from a plurality of tags attached to a respective plurality of stored advertisements that are available for insertion into the advertising slot. Each tag includes information about a respective advertisement. An advertisement having tag information that is compatible with identified broadcast criteria and viewer criteria is selected and inserted into the advertising slot. The solution disclosed generally works well with the broadcast content.

There is a continuous increase of user-created content on the web. There is an increase in the number of social network sites e.g. YouTube, flickr that host user-created content. This has caused the advertisers to explore opportunities of using these new channels to promote their advertisements using the user-created content.

Summary of the invention

It is an object of the present subject matter to allow potential advertisers to make use of user-created content for promoting their advertisements. The invention is defined by the independent claims. The dependent claims define advantageous embodiments. This object and several other objects are obtained in a first aspect of the present subject matter by providing a method of inserting an advertisement in connection with content created by an user. The method comprises: - detecting one or more dramatic moments in the content when a viewer's attention is substantially high; and inserting the advertisement when at least one of the detected dramatic moments occurs. The present subject matter discloses a method that can automatically create metadata for the user-created content. The metadata here refers to dramatic moments wherein viewer's attention is substantially high. The dramatic moments can be used for inserting the advertisements thereby maximizing the impact on the end user. Along with the metadata, the quality attribute as to what type of content is preceding or following the metadata can be used e.g. Dutch camera angle as reported by Joseph V. Mascelli, 'The five Cs of Cinematography", Silman James Press, Los Angeles.

Professional video production uses certain common conventions often referred to as "the grammar of the film". Film grammar includes conventions for conveying meaning through particular camera and editing techniques. One of the relevant film grammar rules is the "Dutch" angle that is used to emphasize scenes of high emotional content and strong mental stress.

The sources of content can be e.g. i) hard disk ii) a user's personal handy cam videos iii) favorite tune content. The content preferably can be edited content. Raw footage generally is video that comes directly from a camera (e.g. DV handy cam) and is unedited. Edited video is video that result from the process of cutting and pasting (montage) the addition of transitions, effects, extra sound tracks etc.

Professional video is generally highly and nicely edited. Personal content is generally not always nicely edited but one can observe a trend towards more and better editing. The reason could be the availability of fast computers, large HDD capacity and bandwidth and better (e.g. free and also online) video editing software.

Besides this trend, there is also a trend of personal content being made by remixing professional content generally known as mash-ups. The last type of content is popular among young generations. The disclosed method can be used preferably with edited content (i.e. including professionally made content and consumer made content).

The disclosed subject matter provides a mechanism to assist the user to derive revenue for his created content. The user who wants to share his/her content can derive revenue out of his/her content (instead of freely distributing the content) by negotiating with potential advertisers.

Considering the huge amount of content that is produced and uploaded every day on sites such as YouTube, advertisers generally cannot afford to have editors to review the content to insert the advertisements, as it will cost more. Further, the user content has an ephemeral value (e.g. from 1 day/week /1 month). Hence, automated way of finding the interesting dramatic moments and inserting the advertisements makes business sense to the advertiser and as well to the user who owns the content.

From the advertiser's perspective, the advertisers get to know whether the content on which their advertisements piggy back is worthy enough. The advertisers can check whether the content has any relation to the advertisements which they are trying to promote. As an illustrative example, cartoon clip which is though popular may not suit to promote mouthwash for adults.

In an embodiment, the dramatic moments are detected using heuristic rules derived from film grammar. The advantage is that advertisements can be inserted automatically into user-generated content at moments in which the viewer's attention is high.

This increases the probability that the viewers will actually watch the advertisements. An advertisement inserted at the beginning or end of a video item can be more easily skipped.

In a further embodiment, the heuristic rules are formed using a linear combination of audio and video features preferably selected from the following: - visual saliency shot cut rate object motion presence of close-ups low depth-of-field shots - zoom-in sequences audio energy start of music sudden change of music genre or tempo

The audio and video features can be automatically calculated from the audio visual content. In a still further embodiment, the method comprises computing a saliency score based on the linear combination of the audio and video features, the saliency score being associated to segments of video preferably obtained after shot cut scene detection.

The audio and visual features can all be calculated automatically from the audiovisual signals. They can all be normalized to the [0, 1] interval, where high values are correlated with high likelihood of having a dramatic moment in the video. To detect dramatic moments in a video it is therefore sufficient to calculate a linear combination of all the above mentioned features (saliency score) and select the video segments that correspond to peaks of the linear combination. For example, if we indicate with £ the i-th feature (value defined in the interval [0, I]), the saliency score S is given by:

where W₁ is the i-th weight given to the i-th feature. The weights given to the features can be determined empirically by analyzing a sufficiently large set of videos, or depending on the type of videos considered. In a still further embodiment, the insertion of the advertisement is based on the temporal distribution of the detected dramatic moments the duration of each of the detected dramatic moments.

Selecting the advertisement slots with appropriate time intervals can enhance the effectiveness of the advertisements and can draw the attention of the viewer. The duration of advertisement at dramatic moments can depend on how close the dramatic moments are to each other. The combined moment of all inserted advertisements may depend on how many dramatic moments were found (i.e. more dramatic moments, more advertisements). If the dramatic moments are cluttered at a particular region, then the advertisements insertion can be based on distributing the advertisement across the entire video content. After obtaining the saliency score, which can be a measure of importance or saliency for each video segment in a video item, a rule based system can be used that can insert advertisements depending upon the temporal distribution of the most important detected dramatic moments. As an example, the method can have a rule that can insert the advertisement when there is a peak of the "saliency score" or "right after a peak" when the viewers attention is still high. Other rules can constrain the amount of advertisements within acceptable levels.

In a still further embodiment, a color bar code is generated for the detected dramatic moments and the variation in color band of the color bar code can be an indicator of the detected dramatic moment. This embodiment provides an easy way of identifying the dramatic moments wherein the advertisements can be inserted.

A color band visually indicating where the dramatic moments are can allow a fast and easy placement of advertisements because the user can immediately see the position and duration of the advertisements with respect to the content and the expected peaks in the viewer's attention. In a still further embodiment, the method comprises uploading the content along with the detected dramatic moments on a website; tracking information associated with content popularity; and communicating the tracked information to a potential advertiser thereby enabling the potential advertiser to decide whether his advertisement is to be inserted within the content.

The tracked information can comprise parameters such as the number of people viewing the content, the number of times the content is downloaded.

In a still further embodiment, the detected dramatic moments can be used by the potential advertiser to dynamically insert advertisements while streaming the content. This embodiment can circumvent the existing mechanism of showing stale advertisements.

In a second aspect of the present subject matter, a device for inserting an advertisement in connection with content created by a user is disclosed. The device comprises: a detection means configured to detect one or more dramatic moments in the content wherein viewer's attention is substantially high; and an insertion means configured to insert the advertisement when at least one of the detected dramatic moments occurs.

In a further aspect of the present subject matter, a computer program product for inserting an advertisement in connection with content created by a user is disclosed. The computer program product comprises a computer readable storage medium having computer readable program code embodied therein, the computer readable program code being configured to enable a programmable device to carry out the disclosed method of inserting the advertisement in connection with the content created by the user.

Brief description of the drawings:

These and other aspects, features and advantages will be further explained by the following description, by way of example only, with reference to the accompanying drawings, in which same reference numerals indicate same or similar parts, and in which:

Fig. 1 schematically illustrates an exemplary embodiment of the method according to the present invention;

Fig. 2 schematically illustrates an exemplary embodiment of generating a color bar code for the detected dramatic moments; and Fig. 3 schematically illustrates an exemplary device according to the present invention. Detailed description of the embodiments

Referring now to Fig. 1, the method 100 of inserting an advertisement in connection with content created by a user comprises a step 102 of detecting a plurality of dramatic moments in the content wherein a viewer's attention can be substantially high. The sources of content can be from i) hard disk ii) a user's personal hand cam videos iii) favorite tune content. The content preferably can be edited content. Raw footage generally is video that comes directly from a camera (e.g. DV handy cam) and is unedited. Edited video is video that results from the process of cutting and pasting (montage) the addition of transitions, effects, extra sound tracks etc. Professional video is generally highly and nicely edited. Personal content is generally not always nicely edited but one can observe a trend towards more and better editing. The reason could be the availability of fast computers, large HDD capacity, bandwidth and better (and free and also online) video editing software.

Besides this trend, there is also an important trend of personal content being made by re-mixing professional content generally known as mash-ups. The last type of content is popular among young generations. The disclosed method can be used preferably with edited content (i.e. including professionally made content and consumer made content).

The dramatic moments can be detected using heuristic rules derived from film grammar. The heuristic rules can be formed using a linear combination of audio and video features preferably selected from the following: visual saliency shot cut rate object motion presence of close-ups - depth-of-field shots zoom-in sequences audio energy start of music sudden change of music genre or tempo

Visual saliency: The idea is that viewers' attention is generally high during visually salient scenes. Visual saliency can be defined in terms of visual sharpness of a video signal. Algorithms for calculating sharpness and visual saliency found in prior art can be made use of to compute the visual saliency. Shot cut rate: Video items are generally the result of production procedures that involve what is called montage or video editing. Video is first captured in so-called camera takes. A camera take is a sequence of frames captured uninterruptedly by the same camera from the moment the camera starts capturing to the moment it stops. During montage, camera takes are trimmed, split, and inserted one after the other to compose an edited version of a video. The basic element of an edited video is called shot. A shot is a contiguous sequence of frames belonging to the same camera take in an edited video. Content-wise, shots usually possess a high degree of uniformity, either visually or auditory. Efficient and effective algorithms for automatic detection of shot cuts are available in prior art and can used to detect shot boundaries.

The amount of action perceived in a video segment depends on the actual motion present as well as on the duration of a shot with respect to its neighboring shots. This aspect, usually referred to as editing rhythm or film pace, can be modeled by means of the shot cut rate. The shot cut rate gives an indication of the frequency at which shot-cuts occur in a video sequence. For each shot, it is defined with respect to the duration of the neighboring shots by computing a running average of the shots' durations in a window of length 2r+l shots (r can be set to 2 for a window length of 5 shots). The reason for using a running average is that only a sequence of short shots contributes to the perception of action, not isolated short segments between long ones. An increasing shot cut rate translates in a perceived increase in rhythm or pace and this usually occurs in dramatic moments.

Objects motion: Generally, the amount of perceived action in a video is strongly correlated to the amount of object motion. Dramatic moments are often characterized by a sudden increase of object motion (e.g. a fight, a person falling, a car chase). Object motion can be quantified using motion estimation algorithms. Presence of close-ups: Among the various film grammar rules used by content producers to convey meaning through video, the field of view plays an important role. It is determined by the size of a subject in relation with the overall frame, which depends on the distance of the camera from the subject, and the focal length of the lens used. Based on field of view, shots can be classified into different types usually labeled by how big and how near an object appears to the viewers: for example, extreme long, long, medium, medium close- up, full close-up, and extreme close-up.

Of these various shot types, close-up shots can be the most interesting for identifying dramatic moments. They show a fairly small part of a scene, such as a character's face, in great detail so that it fills the screen. They focus attention on a person's feelings or reactions, and are used to show people in state of emotional excitement, grief of joy. Close- ups transport the viewers into the scene; eliminate all non-essentials and isolate whatever significant incident should receive narrative emphasis. More specifically they can be used to:

• Underline narrative highlights, such as important dialogues, player actions or reactions. Whenever dramatic emphasis or increased audience attention is required, the subject should be brought closer to the viewer.

• Isolate significant subject matter and eliminate all non-essential material from view. Audience attention thus can be concentrated on an important action, a particular object or a meaningful facial expression. • Cue the audience on how they should react. A reaction close-up of an actor portraying fear, tension, awe, pity or any other action, will stimulate a similar feeling in the viewer.

Close-ups can be automatically detected by using algorithms for face detection and measuring the ratio between the size of a detected face and the size of the video frame. Other more robust methods available in prior art can also be used.

Low-depth of field shots: Close-up shots are examples of low-depth of field shots. These types of shots are characterized by having a small part of the image in focus and the background out of focus. Similar to close-ups, low-depth of field shots can be used in situations where high relative importance is given to a subject. Algorithms available in the prior art for the detection of low-depth of field shots can be made use of.

Zoom-in sequences: According to film production conventions, reducing the distance between a subject or scene and the audience is a way of signaling the audience that something is important. This can be achieved by either moving the camera toward the subject (camera dolling) or by using optical zoom. Zoom-in sequences are used to give emphasis to a particular object, situation or character. Zoom-in sequences can be detected by applying camera motion or global motion estimation techniques.

Audio energy: The perception of action in video is influenced not only by visual clues such as object motion and shot-cut rate, but also by auditory clues such as audio loudness. High audio energy scenes correspond usually to dramatic or action scenes. The audio energy can be easily calculated from the audio signal and normalized with respect to the average audio energy to detect the peaks corresponding to the dramatic moments.

Start of music and sudden change of music genre or tempo: According to the cinematographic principle of contrast and affinity, particularly dramatic moments are highlighted by means of perceptible sudden variations in the content. A powerful clue that directors often use to indirectly tell the audience that something important is going to happen, is a sudden change in the audio track. In particular, the start of music is usually an indicator for an interesting point. Automatic audio classification can be used to detect speech and silence, and also to estimate the probability that an audio segment contains music. A sudden change of music genre or pace can also be a signal that something interesting might be happening in the film. This clue requires an audio classifier capable of fine distinctions between music genres.

The saliency score can be computed based on the linear combination of the audio and video features, the saliency score being associated to segments of video preferably obtained after shot cut scene detection.

The audio and visual features can all be calculated automatically from the audiovisual signals. They can all be normalized to the [0, 1] interval, where high values are correlated with high likelihood of having a dramatic moment in the video. To detect dramatic moments in a video it is therefore sufficient to calculate a linear combination of all the above mentioned features (saliency score) and select the video segments that correspond to peaks of the linear combination. For example, if we indicate with f the i-th feature (value defined in the interval [0, I]), the saliency score S is given by: S = ^ f₁

where W₁ is the i-th weight given to the i-th feature. The weights given to the features can be determined empirically by analyzing a sufficiently large set of videos, or depending on the type of videos considered.

After obtaining the saliency score, which can be a measure of importance or saliency for each video segment in a video item, a rule based system can be used that can insert advertisements depending upon the temporal distribution of the most important detected dramatic moments. As an example, dramatic moments can be detected every time the saliency score exceeds a certain threshold. The threshold can be empirically set to a constant value or can be made dependant on the average and standard deviation of the saliency score. As an example, the method can have a rule that can insert the advertisement when there is a peak of the "saliency score" or "right after a peak" when the viewers attention is still high. Other rules can constrain the amount of advertisements within acceptable levels.

In step 104 the advertisement is inserted when at least one of the detected dramatic moments occurs. The advertisement can be inserted based on the temporal distribution of the detected dramatic moments. As an illustrative example, in the user-created content there could be 8 dramatic moments detected namely (I₁, d₂, d₃, d₄, ds, dβ, άη and dg. Among the 8 dramatic moments, only at the dramatic moments d₃, ds and άη the advertisement can be inserted for example based on the temporal distribution of the dramatic moments. As a further illustrative example, if in the end of the video there are more dramatic scene changes and as a consequence more metadata points are generated at the end then the advertisements are inserted at that location.

The duration of each of the dramatic moment and the proximity between the dramatic moments can be considered in inserting the advertisement. Selecting the dramatic moments with appropriate time intervals can enhance the effectiveness of the advertisements and can draw the attention of the viewer.

The user can run the content through content management algorithm to obtain the metadata points (e.g. dramatic moments). The content management algorithm can for example be the MIAMI algorithm (movie in a minute algorithm) which is disclosed in M. Barbieri, H. Weda, N. Dimitrova, "Browsing video recordings using Movie-in-a-minute", Proceedings of the IEEE International Conference on Consumer Electronics (ICCE 2006), Las Vegas, USA, January 2006. The MIAMI algorithm can summarize the content of any duration in 1 minute. The metadata points that are obtained using the MIAMI algorithm can be the points in the content where dramatic scene changes have occurred (e.g. the MIAMI algorithm can summarize the movie (say 3 hours) into 1 minute by providing the list of tags (meta data) in the form of PTS (start time) and PTS (end time).

A color bar code can be generated for the detected dramatic moments and the variation in color band of the color bar code can be an indicator of the dramatic moment. This can be an easy way of identifying the dramatic moments wherein the advertisements can be inserted.

Referring now to Fig. 2, arrows indicate the most interesting part of the content, and one can see that in the soccer match the change in players, foul, tackle is observed by way of change in the color bar and can be ideal candidate locations for inserting the advertisements. The user can upload the content (e.g. 1 minute summarized content generated using MIAMI algorithm) along with the detected dramatic moments on a website. The information related to content popularity such as number of people visiting the content, the number of times the content is downloaded and other related information can be tracked. The tracked information can be communicated to potential advertisers. This can help the potential advertisers to decide whether to use the user created content for their advertisement.

This embodiment can provide a framework or infrastructure wherein a business organization can host a website where the user can advertise or upload his 1 minute content (e.g. generated using MIAMI algorithm). By this way the advertisers have the option to check the content popularity (e.g. number of click/count/eye balls that a particular 1 minute video trailer has caught) based on which the advertiser can contact the user for inserting his brand advertisements into the user's created content. The user can negotiate the price directly with the advertiser thereby assisting the user to derive revenue for his created content.

It is also possible for the advertiser to make use of the detected dramatic moments and dynamically insert advertisements while streaming the content. This can circumvent the mechanism of showing stale advertisements. Further, the advertisers can know whether the content on which their advertisements piggy back is worthy enough. The advertisers can check whether the content has any relation to the advertisements which they are trying to promote. As an illustrative example, a cartoon clip which is popular may nevertheless not be suitable to promote mouthwash for adults.

In essence, the disclosed embodiments of the invention use audio visual features for detecting salient points in a video, and employ them to automatically insert advertisements when viewer's attention is high.

A typical user scenario could be as illustrated below. The user copies the favorite content (e.g. recording) from the handy cam onto the NAS device (network attached storage device). Content management algorithms can be run on the NAS device. The NAS device can be connected directly to a computer network to provide centralized data access and storage to heterogeneous network clients. The user can run content management algorithms such as MIAMI and obtain the dramatic moments. The user can then upload the content with the detected dramatic moments to a website like YouTube. The content popularity is tracked on the website using content popularity tracking mechanisms (which can include the number of people visiting the website, the number of downloads made etc). Advertisers can check the content, the popularity of the content, the relevance of the detected dramatic moments to their advertisements. The advertisers can insert the advertisements at the detected dramatic moments. The advertisers can further configure to dynamically insert advertisements at the detected dramatic moments as the content is streamed from the YouTube site. Alternatively, the user just uploads his content to a website, and the website analyzes the content so as to detect salient points so that advertizements can be displayed adjacent to the content when the viewer's attention is high.

Referring now to Fig. 3, the device 300 for inserting an advertisement within content created by a user comprises a detector 302 configured to detect a plurality of dramatic moments in the content wherein viewer's attention is substantially high, and an inserter 304 configured to insert the advertisements when at least one of the detected dramatic moments occurs. Both the detector and the inserter are preferably embodied by a programmable device programmed to carry out the above-mentioned algorithms.

In summary, an advantageous way of inserting an advertisement within content created by an user is disclosed. The invention comprises detecting one or more dramatic moments in the content wherein a viewer's attention is substantially high, identifying a plurality of advertisement slots based on the plurality of detected dramatic moments and inserting the advertisement when at least one of the identified advertisement slots occurs. The invention is useful for advertisers to promote their advertisements using the user-created content.

Although claims have been formulated in this application to particular combinations of features, it should be understood that the scope of the disclosure of the present subject matter also includes any novel features or any novel combination of features disclosed herein either explicitly or implicitly or any generalization thereof, whether or not is relates to the same subject matter as presently claimed in any claim and whether or not it mitigates any or all of the same technical problems as does the present subject matter.

Further, while the subject matter has been illustrated in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and not restrictive; the subject matter is not limited to the disclosed embodiments. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art of practicing the claimed subject matter, from a study of the drawings, the disclosure and the appended claims. A single unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Use of the verb "comprise" and its conjugates does not exclude the presence of elements other than those stated in a claim or in the description. Use of the indefinite article "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps. The figures and description are to be regarded as illustrative only and do not limit the subject matter. Any reference signs in the claims should not be construed as limiting the scope.

Claims

Claims:

1. A method of inserting an advertisement in connection with content created by a user, the method comprising: detecting (102) one or more dramatic moments in the content when a viewer's attention is substantially high; and - inserting (104) the advertisement when at least one of the detected dramatic moments occurs.

2. The method as claimed in claim 1, wherein the dramatic moments are detected using heuristic rules derived from film grammar.

3. The method as claimed in claim 2, wherein the heuristic rules are formed using a linear combination of audio and video features preferably selected from the following: visual saliency - shot cut rate object motion presence of close-ups low depth-of-field shots zoom-in sequences - audio energy start of music sudden change of music genre or tempo

4. The method as claimed in claim 3, further comprising computing a saliency score based on the linear combination of the audio and video features, the saliency score being associated to segments of video preferably obtained after shot cut scene detection.

5. The method as claimed in claim 4, wherein the insertion of the advertisement is based on a temporal distribution of the detected dramatic moments a duration of each of the detected dramatic moments.

6. The method as claimed in claim 1, wherein a color bar code is generated for the detected dramatic moments, and the variation in color band of the color bar code is an indicator of the detected dramatic moment.

7. The method as claimed in claim 6, further comprising uploading the content along with the detected dramatic moments on a website; tracking information related to content popularity; and communicating the tracked information to a potential advertiser thereby enabling the potential advertiser to decide whether his advertisement can be inserted within the content.

8. The method as claimed in claim 7, wherein the detected dramatic moments are used by the potential advertiser to dynamically insert advertisements while streaming the content.

9. Device for inserting an advertisement in connection with content created by a user, the device comprising: a detection means (302) configured to detect one or more dramatic moments in the content when a viewer's attention is substantially high; and an insertion means (304) configured to insert the advertisement when at least one of the detected dramatic moments occurs.

10. A computer program product for inserting an advertisement in connection with content created by a user, the computer program product comprising a computer readable storage medium having computer readable program code embodied therein, the computer readable program code being configured to enable a programmable device to carry out the method of claim 1.