AU2005201690A1 - Method for Creating Highlights for Recorded and Streamed Programs - Google Patents

Method for Creating Highlights for Recorded and Streamed Programs Download PDF

Info

Publication number
AU2005201690A1
AU2005201690A1 AU2005201690A AU2005201690A AU2005201690A1 AU 2005201690 A1 AU2005201690 A1 AU 2005201690A1 AU 2005201690 A AU2005201690 A AU 2005201690A AU 2005201690 A AU2005201690 A AU 2005201690A AU 2005201690 A1 AU2005201690 A1 AU 2005201690A1
Authority
AU
Australia
Prior art keywords
segment
segments
highlight
episode
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
AU2005201690A
Inventor
Jeroen Vendrig
Ernest Yiu Cheong Wan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2004903135A external-priority patent/AU2004903135A0/en
Application filed by Canon Inc filed Critical Canon Inc
Priority to AU2005201690A priority Critical patent/AU2005201690A1/en
Publication of AU2005201690A1 publication Critical patent/AU2005201690A1/en
Abandoned legal-status Critical Current

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Description

S&FRef: 711857
AUSTRALIA
PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address of Applicant: Actual Inventor(s): Address for Service: Invention Title: Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3-chome, Ohta-ku, Tokyo, 146, Japan Ernest Yiu Cheong Wan Jeroen Vendrig Spruson Ferguson St Martins Tower Level 31 Market Street Sydney NSW 2000 (CCN 3710000177) Method for Creating Highlights for Recorded and Streamed Programs Associated Provisional Application Details: [33] Country:
AU
[31] Appl'n No(s): 2004903135 [32] Application Date: 09 Jun 2004 The following statement is a full description of this invention, including the best method of performing it known to me/us:- 5845c -1- METHOD FOR CREATING HIGHLIGHTS FOR RECORDED AND STREAMED PROGRAMS Field of the Invention The present invention relates to the customization of recorded and streamed content and, in particular, to the summarisation of television (TV) and radio programs for user consumption.
Background Consumers of audio-visual content often consume only those parts of content that are most interesting to them. For instance, a TV viewer often switches channels at the uninteresting parts of a program and often watches only certain parts of a recorded program. In the case of recorded programs, this is usually achieved manually by the viewer scanning forward for the next interesting part.
More recently, with the advent of digital TV, a number of methods have been proposed to summarise sports programs recorded on a digital medium such as a Digital Video Recorder (DVR). These methods detect important segments, such as slow motion replay, and use those segments to create the summary. However, the methods are usually applicable to a very specific type of program, such as sports program or even one particular type of sports program such as baseball or soccer.
The TV-Anytime Forum has also defined segmentation metadata for describing program segments. Segmentation metadata include of, amongst others, a segment locator and content descriptions of the segment. The segment locator comprises of a start time and a duration for locating the segment. The content descriptions including key words and free text description, are similar to those used for describing the content of the programs but applies only to the part of the program identified by the segment locator. The availability of segmentation metadata has made it possible to re-purpose the content of a large range of 711857 t -2programs. A content provider or a value-added metadata provider may use the a segmentation metadata to define program segments and specify a program summary in terms of the defined program segments. An application can then use the segmentation metadata to generate highlights for recorded or streamed programs, where a streamed program is a program being broadcasted or a recorded program. Segmentation metadata Salso allows a TV application to automatically match the characteristics of the program Ssegments against a viewer's preferences to create a personalized summary for the viewer.
For many years, artificial intelligence and statistical approaches have been applied to a TV viewer's viewing history to learn the viewer's viewing preferences and make program suggestions. The use of segmentation metadata for selecting program segments is often achieved by a straightforward application of learning algorithms, typically used on program (ie. content description) metadata, but instead to segmentation metadata. For instance, those may include a number of algorithms used to create user profiles based on the genre of the programs the viewer watched as well as genres explicitly specified by the viewer. As segmentation metadata becomes available, the algorithms are simply extended to use the genre of the program segments.
Such a straightforward extension of an existing algorithm may cause unexpected results. For instance, a viewer may have a strong preference for sports and news programs but watch them on different channels. When summarising a news program from a particular channel, the algorithm may include unwanted sports segments to the highlight.
On the other hand, another viewer may not show strong preference for sports programs but always watch the sport segment of a particular news program. The same algorithm may filter out the sport segment of the program while generating the highlight.
Summary 711857 -3- It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems by removing a reliance upon segmentation metadata by interpreting the media consumption habits of a user with respect to prior episodes of a program series to thereby allow for segmentation as well as the creation of highlights of a current episode based upon a frequency of consumption of previous episodes of the series.
According to a first aspect of the present disclosure, there is provided a method of generating a highlight for a current streamed or recorded episode of a program series, there being segmentation metadata related to said current episode and describing features of said current episode, said method comprising the steps of: gathering information on parts of at least one other episode of said series that have been consumed by a user, said parts being indexed by time based on a pre-determined time unit; analysing said information to obtain a frequency graph of the consumed parts; analysing the frequency graph to thereby identify segments of the current episode that are most relevant to the user; and forming said highlight of the current episode by compiling said identified segments.
According to a second aspect of the present disclosure, there is provided a method of refining the segment boundaries estimated from a watch frequency graph of a streamed or recorded program episode, said method comprising steps of: computing a difference between the time a user tunes to a segment of a consumed program and the time of the closest boundary of said segment for a set of programs; 711857 -4determining a distribution of said computed time differences; adjusting said segment boundaries based on said distribution.
According to a third aspect of the present disclosure, there is provided a method of presenting a video sequence of images, there being segment locators related to said video sequence and describing features of said video sequence, said sequence being formed of at least one segment, said method comprising the steps of: analysing said describing features to obtain a viewer relevance value for each segment in said video sequence; and adjusting the presentation speed of each segment according to said relevance value.
According to a fourth aspect of the present disclosure, there is provided a method of storing a video sequence of images having plural segments, there being segmentation metadata related to said video sequence and describing features of said video sequence, said method comprising the steps of: analysing said segmentation metadata and said describing features to obtain a viewer relevance value for each segment in said video sequence; and adjusting audio and video encoding parameters of each said segment according to said relevance value.
According to another aspect of the present disclosure, there is provided an apparatus for implementing any one of the aforementioned methods.
According to another aspect of the present disclosure there is provided a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.
Other aspects of the invention are also disclosed.
Brief Description of the Drawings 711857 One or more embodiments of the present invention will now be described with reference to the drawings, in which: Fig. 1 is a block diagram of the viewing data gathering process; Fig. 2 is a block diagram of a first highlight generation process; Fig. 3 is the block diagram of another highlight generation process; Figs. 4A and 4B show two ways to shorten a highlight; Figs. 5A and 5B show a data structure used in the described process for representing the Time Graph of the watched fragments of a program episode, and its encoding in time unit values; Fig. 6 is a flow chart of a viewing data gathering process; Fig. 7 is a flow chart of the highlight generation process; Fig. 8 shows a mapping of Time Graphs onto a Watch Frequency Graph; Figs. 9A to 9C show an example of adjusting a detected segment boundary based on viewing statistics; Fig. 10 is a schematic block diagram of a general purpose computer upon which arrangements described can be practiced; Fig. 11 is a flow chart depicting a process for adjusting the start-time of detected segments; and Fig. 12 is a flow chart depicting a process for variable speed playback.
Detailed Description including Best Mode The arrangements described herein are described using television (TV) programs.
However, those skilled in the art will recognize that the arrangements may be used for other non-TV materials such as radio programs, multi-media training materials, etc. In addition, the term 'TV system' is used in this description to refer to a system capable of: e receiving and showing live streamed (or broadcast) or recorded TV programs; 711857 -6storing an electronic programming guide (EPG) which contains programs information and schedules, TV applications which are computer programs written for running in a typical TV environment, TV programs and other data, and running the TV applications.
Such a system may be a single device or formed of multiple devices, such as a digital-TV set top box, a TV receiver and a digital video recorder working together to provide the required functionality. TV program material may also be described as a video sequence of images and includes those formed in the analogue and digital domains. A recorded sequence of such image may be colloquially termed a "video".
Before the highlighting method is discussed in detail, some terminology used in the context of this document has to be clarified. In general, a TV program that is part of a series is called an episode of the series. The terms "episode" and "series" are commonly used for programs such as situation comedies (for example, "Friends"), talk shows (for example, "Ricki Lake") and soap operas (for example, "Neighbours"). In this document, the terms "episode" and "series" are extended to cover other types of program groups that comprise of a set of regularly presented programs each of which is complete in itself.
Examples of such series are "ABC News" which has a new episode every evening, and "English Premier League Highlights" which has a new episode every week during the English soccer season. Typically, episodes in such series share many characteristics, such as the title of the program (for example, "ABC News"), the presenter (for example, the newsreader "Tony Eastley"), and the format (for example, local news, followed by international and business news, sport and ending with a weather report).
Fig. 1 depicts a TV viewer 1110 watching a program episode presented upon a TV system 1210. As noted above, the TV system 1210 may include a stand-alone TV receiver 1290 and a set -top box 1292 having an integrated digital video recorder. Functionally 711857 -7indicated in the schematic box 1294 are components of the system 1210 which may be Sformed within the TV receiver 1290, in an integrated implementation, or in the set-top box 1292 in a stand-alone implementation. The viewer 1110 uses a remote control 1120 to generate user commands 1220 which operate to switch channels and play recorded program episodes. The user commands 1220 are captured and interpreted by a Navigator Smodule 1230 of the TV system 1210. Preferably, the Navigator module 1230 forwards the commands to other modules, including system modules and TV applications, which have registered their interest in the commands and, in some instances, acts on the commands directly if the commands are basic TV commands such as a channel switching command.
For example, in Fig. 1 basic TV commands are acted on directly by tuning the TV system 1210. In addition, in Fig. 1 the TV commands are forwarded to a Viewing Monitor Module 1240 explained below and other modules 1241, such as the digital video recorder.
In the described arrangement a viewer 1110 is assumed to be watching TV programs as long as the TV 1290 is on. That is, if the viewer 1110 falls asleep or goes to the toilet while the TV 1290 is on, the current program is still considered watched. Alternatively, the viewing environment may be monitored to assess whether a viewer actually watches a program. For example, the viewer may press a button when going away and coming back, or corrects the viewing history at a later time (eg. after waking up).
A Viewing Monitor Module 1240 rionitors to user commands forwarded by the Navigator module 1230, such as selecling the previous channel or selecting the next channel, and creates viewing records 1 :60 1o record viewing patterns of the viewer 1110.
Preferably, the viewing records 1260 ace stored locally on the TV System 1210 as a Viewing History 1280. The Viewing His!olv 1280 may alternatively be stored on a remote server (not illustrated) coupled to the system J 210 by a communications link.
711857 -8- Each Viewing Record 1260 contains information that identifies the channel and the program that was watched by the viewer 1110. In the example of Fig. 1, a particular viewing record 1262 indicates that the channel and program watched are ABC and ABC News, respectively. The Viewing Records 1260 also store the time the viewer 1110 tunes to the program for the first time or starts playing the program. This time may be referred to as Watch Time. In the example of the viewing record 1262, the watch time for the ABC News program episode was 03/05/28 20:00. The viewing record 1260 also stores information about which fragment of the program episode was watched. In the described arrangement, the information takes the form of a Time Graph 1270, seen for example in the viewing record 1262, and the fragments are indexed by time based on a predetermined fixed time unit c, from the beginning of the program episode. A "watchedfragment" value, which is represented in the Time Graph 1270, having an x-axis representing time (fragment), and y-axis being the watched value, may be determined according to Equation 1 below: Equation 1 if the program episode is played at normal rate or in 1, slow motion for more than a time fraction h (say of theithtimeinterval[ix c, l)xc watched_ fragment(i) 0, otherwise.
That is, a recorded fragment is not considered watched if the fragment is skipped, fast-forwarded or if the viewer switches to another device while playing the fragment.
Similarly, a TV program episode fragment is not considered watched if the viewer switches channels. The threshold h, used for determining whether a fragment is watched 711857 -9or not, may be fixed or adaptive (that is, adapted over time to improve the 'watched'/'unwatched' classification).
The above definition of watchedfragment assumes that the viewer 1110 is watching a program during the period that it is shown on the TV 1290. The definition does not consider the possibility that the viewer 1110 may have fallen asleep, left the room or is not paying attention to the TV 1290. Additional apparatus including audio and video devices may be used to monitor whether the viewer 1110 is actually watching the program to provide a more accurate definition and determination of watchedfragment.
According to the definition of Equation 1, watchedfragment is defined as a binary indicator. In an alternative implementation, watchedfragment may take on three values, where the original 'not-watching' value of 0 is replaced by two values: a -1 for notwatching while being aware of the program, and a 0 for not watching for some other reason. The two new values allow a Highlight Maker (to be described below) to distinguish between the case where the viewer 1110 explicitly showed his disinterest in the program by switching to another channel, or by fast forwarding to an other part of the program, and the case where the viewer 1110 did not watch a program fragment possibly because the viewer 1110 might not know that the program is on.
When computing watchedfragment, a different value for time unit c may be used for different types of programs. For instance, a time unit value of half a minute may be used for news programs while a time unit value of 2 minutes may be used for live soccer matches. A half hour news program would then result in a Time Graph 1270 also having bits of watchedfragment data, while a 2 hour soccer match would result in Time Graph 1270 also having 60 bits. In a preferred implementation, 8 bytes are used for storing the Time Graph of the program episodes as a bitmap, which would be sufficient in the case of both the above examples.
711857 Fig. 5A shows a data structure 5100 including the bitmap used for representing the Time Graph. A Time Graph Bitmap 5110 is preceded by a header 5120 which includes a 3-bit field 5130 for the value of time unit and another 3-bit field 5140 for the size of the bitmap 5110. Fig. 5B shows a table 5150 relating the time unit bit values 5130 to specific time periods.
If storage capacity within the system 1210 is not a constraint, viewing information on a watched fragment may be stored in a Time Graph in more detail. For example, in an alternative implementation, the Time Graph stores the percentage of the fragment watched during a time interval instead of a binary indicator to signal whether the fragment has been watched or not. The bit in Time Graph Bitmap 5110 may be extended to a byte containing more precise detailed viewing information. The advantage of such a Time Graph is that a different threshold, h, (see Equation 1) can then be applied to the Time Graph to generate different Time Graph Bitmaps 5110 and, hence, changing the threshold, h, will not invalidate existing Time Graph data.
Information in the Viewing Record is not restricted to a Time Graph. Other data such as a personal rating of the program by the viewer 1110 may be stored in the Viewing Record 1260. In addition, other program and schedule information on the watched program may also be copied from an Electronic Program Guide (EPG) 1250 to the Viewing Record 1260 if old entries in the EPG 1250 are regularly purged to reclaim storage space.
Fig. 2 shows how the Viewing History 1280 is used for generating a program Highlight 2300 for the latest episode in a series. A Viewing Pattern Analyser 2220 retrieves the Viewing Records 1260 of all episodes of the series and uses their Time Graphs 1270 to compute a frequency graph on how often each time fragment of the program series is watched. Such a frequency graph may be referred to as a Watch 711857 11 Frequency Graph 2240. For example, the viewer 1110 may always watch the first five minutes of an episode (for example because the viewer 1110 is interested in the local news), but the viewer 1110 may always skip the last two minutes of an episode (for example because the viewer 1110 is not interested in the weather report). In this simple example, the Watch Frequency Graph 2240 for the program series would have values of 100% for fragments in the first five minutes of the episodes and 0% for fragments in the last two minutes of the episodes.
All or a subset of the past Viewing Records 1260 of the program series may be used to compute the Watch Frequency Graph 2240. To account for possible changes in preferences of the viewer 1110 over time, the Viewing Pattern Analyser 2220 may use only the more recent Viewing Records 1260 or it may give more weight to the more recent Viewing Records 1260. In addition, episodes that have a significantly different length than other episodes are considered special editions, for example an extended Christmas special of a series. Generally, an episode may be considered a special edition when its length is over a certain percentage (say 15%) or amount of time (say 10 minutes) longer or shorter than the average length of the episodes in a series. Preferably, the weight of special edition episodes is set to 0 so that they do not contribute to the Watch Frequency Graph 2240.
When aggregating Time Graphs 1270 from the past Viewing Records 1260of the program series into one Watch Frequency Graph 2240 for the program series, variations in length of the program episodes have to be taken into account. Time Graphs 1270 for past Viewing Records 1260 may be mapped to the length of the program episode for which the Highlight is being made, as explained below. Typically, the Highlight is being made for the latest recorded episode. In an alternative implementation, the length of the Watch Frequency Graph 2240 is set to the length of the longest contributing program episode, where a contributing program episode is an episode with a weight greater than 0.
711857 -12- Fig. 8 shows three ways that may be used for mapping a Time Graph 8020 of a contributing program episode 8010 to the set length of a Watch Frequency Graph 8030, those being: Normalization: In a first mapping, the Time Graph 8020 is normalized such that its new length equals the set length of the Watch Frequency Graph 8030, resulting in a new Time Graph 8040. If the original data from the Viewing Monitor Module 1240 from which the original Time Graph is computed is still available, the normalization can be done by recomputing the Time Graph with a different time unit c such that the number of time units in the new Time Graph equals the number of time units in the Watch Frequency Graph 8030 a process which we referred to as "resembling". If the original data is no longer available, the original Time Graph 8020 is converted into the new Time Graph 8040 by interpolating the values of the original Time Graph 8020.
Padding or truncation: In a second mapping, the Time Graph 8020 is extended with additional values or is truncated by having values removed from it such that the length of new Time Graph 8042 equals the set length of the Watch Frequency Graph 8030.
In the former case, the padding values may be set to the value of the last time unit in the original Time Graph 8020 or to 0.
Direct mapping: In a third mapping, the length of the Time Graph 8020 of a contributing episode equals the set length of the Watch Frequency Graph 8030, the Time Graph 8020 is used directly as a new Time Graph 8044 for computing the Watch Frequency Graph 8030.
Preferably, the normalization approach is followed if the length of the Time Graph 8020 of a contributing episode exceeds the set length of the Watch Frequency Graph 8030.
The padding approach is followed if the length of the Time Graph 8020 of a contributing 711857 13episode is shorter than the set length of the Watch Frequency Graph 8030, and the direct mapping approach is used when the two graphs are of the same length.
After the mapping, the Time Graphs 8040-8044 for all contributing episodes will have the same length as the set length of the Watch Frequency Graph 8030. The Watch Frequency Graph 2240 is then computed by summing the Time Graphs 8040-8044 of the contributing episodes.
For program series that have a more or less fixed format, such as news and sports, the Watch Frequency Graph 2240 will show prominent peaks at segments of the program series with which the viewer 1110 is most interested. In order to locate any prominent peaks, a low-pass filter such as the well-known Gaussian filter or Butterworth filter is first applied to the Watch Frequency Graph 2240 to remove high frequency ripples. A prominent peak is then taken as a peak where the values of the graph over a predetermined time interval, say 2 minutes, around the peak consistently exceeds a certain fraction, say 0.9, of its peak value. Assuming the presence of at least one prominent peak in the Watch Frequency Graph 2240, a Highlight Maker 2250 (seen in Fig. 2) will use the Watch Frequency Graph 2240 to determine which segments should be included in the Highlight 2300. The length (ie. duration) of the Highlight 2300 may be automatically determined by the Highlight Maker 2250 based on attributes (content description metadata) of the series or the episodes. Such attributes may include the program series' genre and/or previous highlight requests in the past. The length of the Highlight 2300 may also be specified by the viewer 1110. In the latter case, the Highlight Maker 2250 may provide the viewer 1110 with information about possible lengths of the Highlight 2300 and allow the viewer 1110 to make a selection.
In an alternative implementation, the length of the Highlight 2300 may be automatically determined by the Highlight Maker 2250 based on the Watch Frequency 711857 -14- Graph 2240. The length may be determined by applying a threshold to the values in the Watch Frequency Graph 2240. For example, the sum of the lengths of the segments that are watched more than 10 times are used as the length of the Highlight 2300. The values in the Watch Frequency Graph 2240 may be normalized by dividing them by the number of episodes that contributed to the Watch Frequency Graph 2240. If episodes made a weighted contribution to Watch Frequency Graph 2240, the normalization value should be weighted accordingly. After normalization, the threshold can be expressed as a percentage, for example the length of the Highlight 2300 is determined by the segments that are watched in 80% of the episodes. Use of normalization allows comparison with Watch Frequency Graphs 2240 of other series, and the threshold can be computed automatically based on the general viewing behaviour. In such an implementation, the threshold may be set proportional to the average value in the Watch Frequency Graphs 2240 of all watched series. For example, the threshold may be set to 1.5 times the average.
If the average value in the Watch Frequency Graphs 2240 of all watched series is 50%, the threshold would be set to 75% in the example.
The main tasks of the Highlight Maker 2250, described in more detail below, are: segmenting the program episode, ranking the segments based on their relevance to the viewer (for deciding which segments should be part of the highlight), determining the length of the highlight, and incorporating relevant segments to the highlight to achieve the required length. An important aspect of these tasks is assessing the relevance of a segment, which is the likelihood that the viewer 1110 wants to watch the segment.
The goal of the segmentation task is to find parts of a TV program (segments) that are semantically coherent. Examples of such semantically coherent parts are scenes in a movie or a comedy show, and topics in a news program. The start and end of the parts are 711857 referred to as logical segment boundaries, which mark a change in the content of the TV program.
Program episode segmentation may be achieved using three main categories, being segmentation that is segmentation metadata based, (ii) multimedia content based, and (iii) viewer statistics based. The viewer statistics based approach is hitherto not known, and methods for such segmentation are described in more detail below. Initially however, the advantages and disadvantages of each approach are discussed in more detail, so that the contribution and necessity of the viewer statistics based approach becomes evident.
The first and most accurate way for segmenting episodes is to use of available segmentation metadata, this being metadata about the program episode's segments such as defined by the TV-Anytime Forum. A Segmentation Metadata 2265 storage, seen in Fig.
2, can be used to determine the boundaries of the included segments exactly. Each segment is also associated with a set of keywords that characterize the segment. In the case that a segment is itself a group of smaller segments, a set of keywords of the segment may be specified explicitly or derived from the child segments. However, segmentation metadata for program episodes is not always available. The Highlight Maker 2250 may employ one or both of the two other ways in the absence of segmentation metadata 2265.
The second way to segment episodes is to analyse the video and audio content of the Recorded/Streamed Program 2280 to determine suitable boundaries for the segments included in the Highlight 2300. Content-based detection of segment boundaries may locate boundaries that are explicit or implicit. Explicit boundaries are edit effects, such as transition effects and black frames, which were inserted by the content creators in order to separate shots. A typical example are black frames that are inserted to separate program content and commercial breaks. Implicit boundaries are significant changes in the multimedia content over a pre-determined period of time, or specific changes in the 711857 -16multimedia content. Examples of significant changes in the multimedia content over a predetermined period of time are changes in colour, intensity, motion, shot length, soundtrack, or caption topics, these being typical of segments with low video and/or audio activity.
Examples of specific changes in the multimedia content are a pause by a newsreader, the display of a billboard, a pan shot (to give a broad overview of a scene), zoom in and zoom out (such as a long shot that serves as an establishing shot). Features such as colour histogram, motion vectors, etc. can be derived from the compressed data (such as MPEG- 2) or uncompressed data so that they can be used for detecting segment boundaries.
The detection of logical segment boundaries is done by relating multimedia features to the semantic content of a program. For example, by using the colour histograms of a sequence of shots, it may be determined that the shots share the same locale and therefore are related semantically. Logical boundaries are then found where there is a change in locale. In another example, a significant change in multimedia feature values or characteristics indicates an event that may be interpreted as a logical boundary. An example in the context of movies is a change from an inside scenery to an outside scenery indicated by a change in the brightness of the visual content. An example in the context of sports broadcasts is a change in the audio volume as audience and commentators become excited about what is happening on the field. In another example, logical boundaries are determined by the repetition of content. For example, in news program episodes, the returning picture of a newsreader in a studio setting indicates the start of a new topic after a news report has finished. This may be detected by finding the setting that occurs most in an episode or in a series. In other program types, several story lines may be interleaved, such as a travel program that alternates segments of several exotic destinations in order to keep the viewer interested. The repetition of a content theme then indicates both a logical boundary and the relation between several segments in the program episode. Colour 711857 -17characteristics, for instance, may be used to detect the boundaries of segments and their relationship. As repetition of a content theme may span over several episodes of a program series, content characteristics of previous episodes may be used to detect a fixed program format. An example is the special graphics associated with the weather forecast segment at the end of a news program episode.
Although content-based analysis has proven to perform well in a number of domains, it is computationally expensive and depends on the availability of the video stream in a supported digital format, which in practice is the source of many technical difficulties.
The third way to perform segmentation is based on viewer statistics. Such an approach is computationally inexpensive and does not depend on availability of the video stream. This method detects segment boundaries based on the characteristics of the Watch Frequency Graph 2240, as the viewer 1110 switches to a channel when a segment of interest starts, and leaves the channel when the segment of interest ends. This statistical approach is most useful for generating highlights for new episodes of programs that have a fixed format. In the context of generating highlights, the approach does not have to find all segments and their boundaries. It is sufficient to find segment boundaries between the most frequently watched (or interesting) segments that are to be included in the Highlight 2300, and the less frequently watched segments that are to be excluded from the Highlight 2300. Consecutive segments that are watched with similar frequency are of the similar interest to the viewer 1110 and may all be included into or excluded from the Highlight 2300. Hence, the detection of mutual boundaries of consecutive segments with similar frequency values is unimportant as such will not influence the make up of the Highlight 2300. In a preferred implementation, segment boundaries are detected in the Watch Frequency Graph 2240 by locating the time points where the viewing pattern before and after the point differs significantly. Segmentation based on the Watch Frequency Graph 711857 -18- 2240 may be used alone, or in combination with content-based analysis described earlier to achieve better accuracy and to reduce the amount of content to be analysed.
Having established the advantages of the viewer statistics based segmentation method, a method of segmenting episodes based on analysis of Watch Frequency Graph 2240 may now be described. In a preferred implementation, a peak or a sequence of neighbouring peaks in the Watch Frequency Graph 2240 is assumed to correspond to a segment of interest, as the most often viewed parts of a series are most likely to be interesting to the viewer 1110. If the viewer normally switches channels or fast-forwards the recording at the right moment at the start and the end of an interesting segment, the boundaries of the segment can be very sharp and can be determined precisely. More likely, the viewer 1110 will switch channels or fast-forward too early or too late, resulting in a spread around the actual boundary value when multiple episodes of a program series have been watched. The spread (or distribution) can be computed by analysing the Time Graphs 1270 of program episodes for which segmentation metadata is available and can then be applied to other episodes or series. A distribution computed from the Time Graphs 1270 of other episodes of the same program series is also desirable, as the spread will likely accurately reflect the channel switching behaviour of the viewer 1110 for the program series. Nevertheless, Time Graphs 1270 from other program series may also be used as they reflect the general channel switching behaviour of the viewer 1110. A different spread is computed for the start and the end of a segment. If the mean of the spread shows that the viewer 1110 is on average T' seconds too early or too late at the start/end of a segment of interest, the detected boundaries of the segment of interest will be adjusted accordingly. To avoid inclusion of channel switching behaviour that is not related to a specific segment, a threshold (say 1 minute or 10% of a segment's length) may be applied so that only small values for T' contribute to the spread.
711857 -19- A method 11000 for adjusting the start-time of detected segments is shown in Fig. 11. A similar process may be used for adjusting the end-time of the detected segments. The method 11000 commences at step 11002 where previous episodes for a series are retrieved. Step 11004 checks if more episodes are available. If so, step 11006 operates to retrieve the time graph and segmentation metadata for the episode. Step 11008 then tests if there exist "zap-in" moments in the time graph. A zap-in moment is that point in time where the viewer commences viewing (ie. consuming) the particular episode of the series, for example upon switching channels. If not, control returns to step 11004. If so, step 11010 follows to identify the segment boundary in the segmentation metadata that is closest to the zap-in moment. Step 11012 then calculates the time lapse between the zap-in moment and the start of the segment boundary. If the time lapse is lower than a predetermined threshold, as determined at step 11014, step 11016 follows to add the time lapse to a time lapse data set. Control returns from step 11016 and 11014 (no-higher than the threshold) to step 11008 to test for the presence of more zap-in moments.
When all zap-in moments have been accommodated (step 11008) and all episodes processed (step 11004) step 11018 operates to compute the mean, T, of the time lapse data set. Step 11020 then locates the segment boundaries of the current episode based on those zap-in moments of the watch frequency graph. The method 11000 concludes with step 11022 then adjusting the segment boundaries by adding the mean time lapse value T to the start of each segment.
An example of using the spread to adjust computed segment boundaries is shown in Figs. 9A to 9C. Fig. 9A shows that the "zap-in" moment, such as when a viewer switches channels, occurs a period Tn before the time Ss at which the segment actually starts. The value of Tn may be different for each Time Graph 1270, as shown by the two example Time Graphs of Fig. 9A. For a collection of Time Graphs, the spread of values for T, can 711857 be computed resulting in, for example, a normal spread as depicted in Fig. 9B. In the example, the zero value, which corresponds to the time the segment starts, is on the right of the mean indicating that the viewer 1110 consistently zaps to the channel too early.
Fig. 9C shows that the mean T' may be used for adjusting an automatic detection of a segment boundary S' resulting in an adjusted detected segment boundary In this particular example, the segmentation detection process performed on the Watch Frequency Graph 2240 returns an estimate of the boundary S' 250 seconds into the episode. Statistical analysis performed on a collection of the Time Graphs of the series results in a mean delay, of 10 sec. That is, the viewer is on average 10 seconds early tuning to the segment. As a result, the segment boundary is adjusted to boundary which is 260 seconds into the episode.
Once the program episode's segments are known, the Highlight Maker 2250 can select the most relevant segments for inclusion into the Highlight 2300. Methods for the computation of segment relevance will be described later. First attention is given to the number of segments that should be included in the Highlight 2300, which depends on the length of the Highlight 2300. The length of the Highlight 2300 may be set in a variety of ways. In one approach, the length of the Highlight 2300 is specified explicitly by the viewer 1110 when the viewer 1110 requests the Highlight 2300 to be played. The viewer 1110 may be given a number of highlights of different lengths from which to choose. That is, the viewer can further select a set of complete segments from a highlight (which is pregenerated or generated after the highlight option is selected) to obtain a highlight of the desired length. In most cases, these highlights will contain only complete segments.
In another approach, the length of the Highlight 2300 is computed automatically based on the genre of the program and/or viewing statistics. The Watch Frequency Graph 2240 of the program series may be compared to the Watch Frequency Graph 2240 of other 711857 -21program series to determine the importance of the program series and its segments. The results are then used to determine the length of the Highlight 2300.
Given the length of the Highlight 2300, the Highlight Maker 2250 selects the most relevant segments from the program episode for inclusion into the Highlight 2300 up to the required length. Desirably, the relevance of a segment is computed from the Watch Frequency Graph 2240. The segments with the highest peaks are considered to be the most relevant to the viewer 1110. Alternatively, the area of a segment under the Watch Frequency Graph 2240 may be computed and normalized by the duration of the segment.
The normalized areas of the segments are then compared to determine the relative relevance of the segments with a larger normalised area indicating higher relevance. In a further alternative, only the area of the graph around a pre-determined interval of the peaks of a segment is computed, normalised and compared. More than one peak per segment may be selected. This effectively gives more weight to segments that on average contain parts that are of great interest to the viewer 1110. Alternatively, the normalised areas around the minima of the segments are computed and compared. This gives more weight to segments that on average do not contain parts that the viewer 1110 is not interested in.
This method focuses on consistent viewing behaviour and is less sensitive to extreme values in the Watch Frequency Graph 2240 than selection of maxima.
Instead of using a normalised area which is equivalent to using the mean value of the Watch Frequency Graph 2240 of a segment, the mode and the median of the segment's graph can be used as an indication of relevancy. The mode selects the value that appears most in the graph. The mode may be interpreted as the most frequent decision of the viewer 1110 on whether to watch a segment or not. The median selects the graph values in the middle of the ordered graph values and is robust against extreme values. This is useful 711857 -22as extreme values may not be a good indication of the relevance of the whole segment for some viewers.
The relevancy of a segment may not only be determined by the corresponding part of the Watch Frequency Graph 2240, but also by the graph of the neighbouring segments.
For example, if segment Si is preceded by segment Si-1, and the end of segment Si-I shows relatively high watch frequencies, it may be concluded that the viewer 1110 is consistently tuning-in early so as not to miss the segment Si. This confirms the importance of segment Si.
In a preferred implementation, the Highlights 2300 are generated based on relevance computed from the Watch Frequency Graph 2240 or based on a repository of Viewer Preferences 2260 (seen Fig. 2) specified explicitly by the viewer 1110 or inferred using content description metadata (such as keywords and descriptions) associated with previously watched and skipped segments, episodes and series. In a further implementation, the segmentation metadata-based relevance value is combined with the Watch Frequency Graph-based relevance value into one relevance value for the segment.
For instance, the resulting relevance value can be computed as a weighted sum, the product, the maximum or the minimum of the two relevance values. As the two relevance values have different ranges, they have to be normalised first. The normalization maps the two relevance values to the range 1] before combining them. Further, instead of combining the relevance value and then using the resultant relevance to rank the segments, the rankings based on the two relevance values may be combined by computing their weighted average.
Preferably, the Highlight Maker 2250 uses segment selection to adjust the length of the highlight to satisfy the requested length. If the total length of the included segments exceeds the requested length, segments that correspond to the lower peaks in the Watch 711857 23 Frequency Graph 2240 and/or materials from some of the included segments may be excluded to shorten the Highlight 2300 to the required length. The discarded materials typically correspond to lower frequency values in the Watch Frequency Graph 2240.
As an example, in Fig. 4A, a segment 4112 that corresponds to the lowest peak of the Watch Frequency Graph 4110 is excluded from a very short highlight. The corresponding Highlight 4130 indicates the two segments 4114 and 4116 that are included in the Highlight 4130, whilst the markers 4118 on the timeline 4120 mark the episode's segments. In Fig. 4B, the segments that correspond to the two highest peaks are included in the Highlight 4140. However, as for the segment that corresponds to the third peak 4112, only those fragments whose Watch Frequency Graph 4110 is above a certain threshold 4150 are included in the Highlight 4140. These fragments correspond to the most frequently watched portion of a segment. The threshold 4150 may be dynamically adjusted to alter the length of the Highlight 4140.
On the other hand, if after all the parts with significant frequency value in the Watch Frequency Graphs 2240 have been included, the total length of the included segments is still less than the requested length, additional segments that matched the Viewer Preferences 2260 (of Fig. 2) may be added to the Highlight 2300. Segment attributes such as keywords, description, etc. from the Segmentation Metadata 2265 may be matched against preferences that are explicitly specified by the viewer 1110 or learned from the viewing history of the viewer 1110. Knowledge of Viewer Preferences 2260 also allows the generation of a highlight, even when the Watch Frequency Graph 2240 shows no prominent peaks. These 'backup' arrangements are the primary mechanisms used by the prior art systems for generating a highlight. In the present case, the use of the general preferences of a viewer 1110 instead of program-specific preference information in such a 'backup' mechanism is justified as it is used only after the Watch Frequency Graph 2240 711857 -24has shown that the viewer 1110 has no specific preferences on any particular segments of the program episode.
In the arrangement of Fig. 2, the Highlight Maker 2250 outputs a highlight as Highlight Metadata 2270 which point to the corresponding segments in the episode. The segmentation metadata defined by the TV-Anytime Forum may be used for this purpose.
A Highlight Presenter 2290 then uses the Highlight Metadata 2270 to extract the required segments from the Recorded Program 2280 for presentation as the Program Highlight 2300. It is noted that the adding of a segment to the Highlight 2300 is in fact effected by adding metadata describing the segment to a group of metadata 2270 that collectively describe the Highlight 2300. Reproduction of the Highlight 2300 is effected by accessing the select portions of recorded program material 2280 using the Highlight Metadata 2270.
To avoid abrupt transitions from segment to segment, appropriate transitional effects may be added between segments.
In alternative implementation, schematically illustrated in Fig. 3, a Highlight Maker 3250 extracts the required segment from a Record Program 3280, adds in transitional effects as required and outputs a resulting Highlight 3270 to storage. In this arrangement, the Highlight Presenter 3290 simply plays the stored Highlight 3270 when requested by the viewer 1110, without additional processing to reproduce the Program Highlight 3300.
Figs. 6 and 7 are the flow charts depicting a method of forming a highlight.
The methods of Figs. 6 and 7, which perform the above described processes is preferably practiced using a system 1000, such as that shown in Fig. 10 wherein the processes of Figs. 1 to 9 may be implemented as software, such as an application program executing within a computer module 1001 of the system 1000. In particular, the steps of method of highlighting are effected by instructions in the software that are carried out by the computer module 1001. The instructions may be formed as one or more code modules, 711857 25 each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part performs the highlighting methods and a second part manages a user interface between the first part and the user. The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer from the computer readable medium, and then executed by the computer module 1001. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer module 1001 preferably effects an advantageous apparatus for highlighting.
The system 1000 includes the computer module 1001, which in Fig. 10 is represented in a fashion akin to the set-top box 1292 described above. The system 1000 also includes input devices such as a keyboard 1002 and mouse/pointer device 1003, a communications network 1020 and a radio frequency antenna 1015. The antenna 1015 provides for reception of RF broadcast signals such as digital television signals whereas the communications network 1020 represents a medium via which cable television signals may.
be received and also for connection to other computer devices, such as a server computer 1040 which can provide a source of the segmentation metadata 2265, 3265. The keyboard 1002 and mouse pointer 1003 are preferably combined as a single unit in the form of a hand-held remote control device 1025 which provides for RF or IR communications 1030 with the computer module 1001. The system 1000 also includes an output device including a TV display/receiver 1014. A communications interface transceiver 1016 is used by the computer module 1001 for communicating to and from the communications network 1020, for example connectable via a telephone line 1021 or other functional medium. The interface 1016 can be used to obtain access to the Internet, and other 711857 -26network systems, such as a Local Area Network (LAN) or a Wide Area Network (WAN), and may be incorporated into the computer module 1001 in some implementations.
The computer module 1001 typically includes at least one processor unit 1005, and a memory unit 1006, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 1001 also includes an number of input/output interfaces including an audio-video interface 1007 that couples to the TV display/receiver 1014, an I/O interface 1013 for the keyboard 1002 and mouse pointer 1003 or remote control 1025, and an interface 1008 for the communications interface 1016 and antenna 1015. In some implementations, the communications interface 1016 may be incorporated within the computer module 1001, for example within the interface 1008. A storage device 1009 is provided and typically includes a hard disk drive 1010 and a floppy disk drive 1011. A magnetic tape drive (not illustrated) may also be used. A CD-ROM drive 1012 is typically provided as a non-volatile source of data.
The CD-ROM drive 1012 may be a read/write device thereby permitting the permanent recording of program highlights. The components 1005 to 1013 of the computer module 1001, typically communicate via an interconnected bus 1004 and in a manner which results in a conventional mode of operation of the computer module 1001 known to those in the relevant art. Although described as a set-top box, the computer module 1001 may be implemented in a traditional personal computer device such as IBM-PC's and compatibles, Sun Sparcstations or alike computer systems evolved therefrom.
Typically, the application program is resident on the hard disk drive 1010 and read and controlled in its execution by the processor 1005. Intermediate storage of the program and any data fetched from the network 1020 or received via the antenna 1015 may be accomplished using the semiconductor memory 1006, possibly in concert with the hard disk drive 1010. In some instances, the application program may be supplied to the user 711857 -27encoded on a CD-ROM or floppy disk and read via the corresponding drive 1012 or 1011, or alternatively may be read by the user from the network 1020 via the interface device 1016. Still further, the software can also be loaded into the computer module 1001 from other computer readable media. The term "computer readable medium" as used herein refers to any storage or transmission medium that participates in providing instructions and/or data to the computer module 1001 for execution and/or processing.
Examples of storage media include floppy disks, magnetic tape, CD-ROM, DVD, a hard disk drive, a ROM or integrated circuit, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 1001. Examples of transmission media include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.
The methods of highlighting may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions to be described. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.
More specifically, the generalised process depicted in Fig. 1 are typically performed by corresponding sub-programs that execute within the computer module 1001. Then Navigator module 1230, Other modules 1241 and EPG 1250 each perform traditional roles and therefore need not be further described.
Fig. 6 shows a method 6100 for the creation of the viewing history which is essentially performed by the Viewing Monitor 1240, being a software application operating within the computer module 1001 as executed by the processor 1005. Viewing events are monitored by the Viewing Monitor 1240 (of Fig. 1) in step 6110. For each 711857 -28program watched, a Viewing Record 1262 (of Fig. 1) is created from the viewing events in step 6120. The Viewing Record 1262 includes a Time Graph 1270 (of Fig. 1) that shows the parts of the program that has been watched. In step 6130, the Viewing Records 1260 created are written to storage, such as the HDD 1010, to form the Viewing History 1280 (of Fig. 1).
Fig. 7 shows a method 7100 for creation of a highlight of a program episode. Again the method 7100 is performed as a software program executing within the computer module 1001. The method 7100 starts with step 7110 where Viewing Records 1260 of the watched episodes of the program are retrieved from the Viewing History 1280, for example by the processor accessing the HDD 1010. A Watch Frequency Graph 2240 is computed by the processor 1005 from the Viewing Records 1260 in step 7120. In step 7130, the Watch Frequency Graph 2240 is low-passed filtered and then checked for prominent peaks that consistently exceed a predetermined fraction of its peak value over a predetermined time interval. If prominent peaks are present in the graph 2240, the program episode is checked for associated Segmentation Metadata 2265 in step 7140. This check may involve access via the communications network 1020 to the server computer 1040 where such metadata 2265 may be retained. Alternatively, the metadata 2265 may be extracted from the recorded program material. If Segmentation Metadata 2265 is not available for the program episode, step 7150 causes the processor 1005 to operate to locate the segment boundaries around the peaks. This is achieved by detecting, on either side of a peak, a steep section of the graph 2240 for which the absolute value of slope exceeds a predetermined threshold, and then using the mid-point of the section as a boundary. If Segmentation Metadata 2265 for the program episode is available in the recorded program stream 2280 or on the server 1040, such metadata will be used. Using either the segment boundaries located in step 7150 or the segment boundaries specified in the Segmentation 711857 -29- Metadata 2265, the segment that corresponds to the next highest peak (that is, the highest peak that has not been considered so far) of the Watch Frequency Graph 2240 is located and added to the Highlight in step 7160. After the addition of the new segment, the length of the Highlight is checked against the required length in step 7170. If the length is smaller than the required length, step 7180 checks if all the peaks of the Watch Frequency Graph 2240 have been processed. If not, the process loops back to step 7160 and other segments corresponding to the next highest peaks of the Watch Frequency Graph 2240 are added one by one.
Once all the peaks of the Watch Frequency Graph 2240 have been processed, lo transition effects are added in step 7220 to make the transition between segments less abrupt. The Highlight 2300 or Highlight Metadata 2270 is then sent for storage in step 7230, for example upon the HDD 1010.
If in step 7170, the length of the Highlight 2300 is found to exceed the required length, the segments that have already been included into the Highlight will be reexamined in step 7190. Segment parts that have low frequency readings in the Watch Frequency Graph 2240 will be discarded to trim the Highlight 2300 to the required length.
Transition effects are then be added in step 7220 to make the transition between segments less abrupt. The Highlight 2300 or Highlight Metadata 2270 is then output to storage in step 7230 in a similar fashion to that described above.
If, in step 7130, the Watch Frequency Graph 2240 shows no prominent peaks, step 7145 checks whether Segmentation Metadata 2265 for the program episode is available. If Segmentation Metadata 2265 is not available, step 7240 indicates that no Highlight is available for the program episode, concluding the method 7100. If Segmentation Metadata 2265 is available, step 7200 makes use of the metadata to rank the segments of the program episode according to some known viewing preferences of the viewer 1110. Step 711857 7210 will then include the most highly ranked segments into the Highlight 2300 until the required length of the Highlight 2300 is reached. Transition effects are then added in step 7220 to make the transition between segments less abrupt. The Highlight 2300 or Highlight Metadata 2270 is output to storage in step 7230.
In an alternative implementation, the Highlight Presenter 2290 plays the entire Recorded Program 2280, but the speed at which a segment is played depends on the Highlight Metadata 2270. Such an implementation accommodates viewers who do not want to watch the entire program episode, but who want to make sure that they do not miss any segments that may be interesting. One way to achieve this is to play the segments included in the highlight at normal speed and those segments that are not included in the highlight at faster speed as in an automatic fast-forward. Another way is to show only key frames or selected frames of the non-highlight segments to give an illusion of a slide show.
The viewer can select an initially non-highlight segment for inclusion into the highlight and reset its playing speed to normal.
Those skilled in the art will recognize that the variable speed playback described above may use an approach other than the one described earlier to compute the relevance of a segment or a fragment. That is, the manner in which variable speed playback may be achieved is independent of the approach used for computing segment relevance.
Nevertheless, no approach can guarantee a relevance measure that always reflects the viewer's preferences accurately. Variable speed playback gives the viewer a chance to inspect the 'non-highlight' segments and leaves the viewer in charge of deciding what to watch and what to miss.
The viewer does not have to scan the highlight and start and stop fast-forwarding at very precise moments. The viewer needs to intervene only if the computed relevance of a segment does not match the viewer's actual interest in the segment. As the actual content 711857 -31 of a segment is still shown, albeit in a compressed manner, the viewer can make an informed decision on whether to intervene or not. As such, the variable speed playback approach enables viewers to watch the automatic compiled highlights without worrying too much about missing out on interesting portions of a program episode.
Fig. 12 is a flowchart showing the steps of a variable speed playback method 12000.
The method 12000, like the others described herein is preferably implemented as an application program executable in the computer system 1000. The method 12000 commences at step 12002 where segmentation metadata (eg. 2265, 3265) for a program episode is retrieved.
Step 12004 then determines if all the segments have been played back. If so (yes), step 12006 operates to conclude variable speed playback. Such may include playback at normal speed. Where there are segments remaining, as will be the case when commencing playback, step 12008 follows to compute the relevance of the first encountered segment.
Step 12010 then tests the computed relevance to determine whether it is greater than a predetermined relevance. If so, step 12012 follows which operates to playback the current segment at normal speed. If not, step 12014 operates to play the current segment at a variable speed based on the computed relevance. On conclusion of each of steps 12012 and 12014, control returns to step 12004 where a check is made for more segments, to identify the next segment for the program episode.
The computation of relevance may be performed for example as indicated above, based upon whether or not a segment appears in a corresponding program highlight 2300, 3300.
The influence of relevance on playback speed is not necessarily restricted to the segment level. Analysis of audiovisual characteristics or of viewing behaviour may show different relevancies for several fragments in a segment, especially in long segments.
711857 -32- Playback may slow down close to normal speed for fragments in the segment that may be more relevant to the viewer than the rest of the segment. This makes it is easier for the viewer to intervene and watch the fragment, and possibly the segment, at normal speed. If the next few fragments have a low relevance, the playback speed may increase again.
In a preferred implementation, segments whose normalized relevance is higher than 0.7 are shown at normal speed. This threshold may be adapted based on the viewer actions over previous segments. For example, if the viewer manually fast-forwarded previous highlight segments, the threshold could be set higher. If the viewer intervened when the automatic fast-forward kicked in, the threshold could be set lower.
Segments with relevance lower than the threshold are generally shown at a faster playback speed, where the speed varies inversely to the relevance of the segment. While the speed may be a continuous function of relevance, in typical implementations, the speed varies in discrete steps, resulting in 2x, 4x, or 8x speed-up. The speed may also vary during a non-highlight segment. For example, the first 10 seconds of a non-highlight segment may be shown at normal speed, after which the speed gradually increases until it reaches the speed corresponding to the relevance of the segment. Playback may slow down when the relevance of a fragment in the segment is significantly higher, say higher or 0.25 higher, than the overall relevance of the segment. Changes in speed are preferably performed at a viewer perceptual rate, thereby maintaining viewer perspective during the segmented reproduction.
Increase in playback speed of the visual content may be implemented in several ways. One way is to drop frames. That is, where at normal speed 25 frames are displayed per second, at a playback rate of 2 only 50% of those frames are displayed in a second.
For very high playback rate, apart from dropping frames the content of several remaining frames may be shown at the same time as a mosaic. Remaining frames may also be frozen 711857 -33for a short time to give the effect of a slide show to reduce the jerkiness or erratic nature of the playback Playback speed of the audio content may be independent of the visual content.
Compared to visual content, audio can be sped up only slightly if the audio content is to be comprehensible to the viewer. One way to deal with this problem is to make audio and visual content asynchronous. That is, the audio is played back at normal speed (or at a slightly increased speed), and is not precisely related to the visual content displayed. After playing some audio content for, say 5 seconds or one sentence, the audio content in the segment is skipped to the point where it is synchronized with the visual content again.
The Highlight Maker 2250 may also be used to control the parameters of a video encoder and an audio encoder when outputting a program episode to storage, especially for adjusting quality and compression settings. Examples of encoding parameters that may be set by the Highlight Maker 2250 are the bit rate, frame rate, and the compression algorithm. This is done when storage space is limited (for example when the storage is beginning to fill up with recorded programs), or when the highlight is copied to another device with limited storage, or when processing power (for high quality compression) is limited and has to be shared by a number of applications. Typically, the encoding parameters are made identical for all segments included in the highlight, while segments not included in the highlight are discarded (or compressed away). In this way, they are not part of the resulting video but there may be an indication that a part of the original video was skipped. The encoder may be a 'null' encoder meaning that included segments are directly copied from the original stream to the output. Alternatively, the encoding parameters may be set to normal quality for all segments included in the highlight, and to low quality for segments not included in the highlight. With this, viewers can still see the entire video, but only the highlight segments are of normal quality. In a further 711857 -34implementation, the encoding settings for each segment or fragment depend on their relevance value. The most relevant segments are compressed with the highest quality setting, while the least relevant segments are compressed with the lowest quality setting.
The manner in which the Highlight Maker 2250 sets the encoding parameters will depend on various factors, such as the factory settings, any user preferences, the available storage space, the processing power of the recorder, the perceived performance of the Highlight Maker 2250, and how important it is to have the possibility to watch the entire episode at a later time.
It is observed that the setting of the video encoder and audio encoder parameters based on the results of the Highlight Maker 2250 may use any method to compute the relevance of a segment or a fragment. Methods for determining the encoder parameters based on segment relevance may be similar to those used for determining the playback rate of the variable speed playback method described above. An important difference however is that viewers can't intervene to adjust the settings. The advantage of varying the encoder setting based on relevance is that storage space is saved while the viewer still has access to all the content, albeit in a more compressed, lower-quality form. Encoding using the relevance-based encoder settings may be done during recording, after recording, or whenever more disk space is needed.
A further variation to the described arrangements is to alter when the Highlight Metadata 2270 is created. Although the arrangements shown in Fig. 2 and Fig. 3 create a Highlight 2300, 3300 for a Recorded Program 2280,3280, in an alternative arrangement the highlight may be created before a program has been broadcast or streamed. This is possible, because the selection of segments of a program episode for a highlight depends only on the Viewing Records 1260 of previous episodes, the content description metadata for the particular episode and the series, and the preferences 2260,3260 of the viewer 711857 1110. Program episode (content description) metadata is usually available when the program episode was created at an earlier date or when the episode was last screened.
Viewer preferences 2260,3260 are specified or computed independently of a specific program episode. Hence, the segments to be included in the highlight can be determined even before the program episode is broadcast. The availability of a recorded copy of the program episode merely allows the Highlight Maker 2250,3250 to determine the boundaries of the highlight segments more precisely. When processing a program episode while it is being broadcast, the Highlight Maker 2250, 3250 may record only those segments included in the highlight, or apply different encoding parameters to highlight or non-highlight segments. In a further variation, the viewer 1110 is warned before a highlight segment starts, so that the viewer 1110 can switch to the channel while the episode is being broadcast. In this variation, recording is optional.
Industrial Applicability It is apparent from the above that the arrangements described are applicable to the multimedia distribution systems where users desire to create condensed versions of broadcast material for subsequent consumption. Such has direct application to digital television broadcasts and to cable TV systems. The arrangements may be adapted to traditional analogue TV broadcasts where such are supplemented by repositories of appropriate segmentation data and/or an electronic program guide. Further, the arrangements may be applied to other media types, such as digital radio broadcasts, where a user may only listen to certain aspects of programming. In this fashion, the arrangements described are just as applicable to radio news broadcasts, for example, as they are to TV news broadcasts. The actual viewing of TV material and listening to radio material may be collectively thought of as consumption of the media by the user. For instance, the "watch 711857 -36frequency graph" for the above described TV arrangements may be a "listen frequency graph for radio implementations, and collectively a "consume frequency graph".
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
(Australia Only) In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings.
711857

Claims (40)

1. A method of generating a highlight for a current streamed or recorded episode of a program series, there being segmentation metadata related to said current episode and describing features of said current episode, said method comprising the steps of: gathering information on parts of at least one other episode of said series that have been consumed by a user, said parts being indexed by time based on a pre-determined time unit; analysing said information to obtain a frequency graph of the consumed parts; analysing the frequency graph to thereby identify segments of the current episode that arc most relevant to the user; and forming said highlight of the current episode by compiling said identified segments.
2. A method according to claim 1, wherein said information on the consumed parts of said previous episodes is stored as part of a user viewing record.
3. A method according to claim 1 further comprising, after step and before step the further step of: determining a length for the highlight based on at least one of the content description metadata of said series, the content description metadata of said episode, said frequency graph for said series, and an express selection by the user.
4. A method according to claim 3 wherein step comprises compiling select ones of said identified segments having a cumulative length closest to said determined length. 711857 -38- A method according to claim 1, wherein step comprises computing a length for the highlight from said frequency graph by applying a threshold to values of the frequency graph.
6. A method according to claim 5, wherein values of said frequency graph are normalized for the number of episodes in the series before said threshold is applied.
7. A method according to claim 6, wherein said threshold is proportional to the average of the normalized values of said frequency graph of segments in series other than said series.
8. A method according to claim 1, wherein said identified segments correspond to the highest peaks in said frequency graph.
9. A method according to claim 1, wherein said identified segments correspond to segments whose frequency of consumption is higher than a predetermined threshold. A method according to claim 1, wherein said identified segments are determined by a statistic of a frequency graph of each said segment.
11. A method according to claim 1, wherein said identified segments are determined by a statistic of the frequency graph around a pre-determined interval at the highest peaks of each segment. 711857 -39-
12. A method according to claim 1, wherein said identified segments are determined by a statistic of the frequency graph around a pre-determined interval at the minima of each segment that contains the highest peaks of the frequency graph.
13. A method according to any one of claims 10 to 12, wherein said statistic is the mean.
14. A method according to any one of claims 10 to 12, wherein said statistic is the mode. A method according to any one of claims 10 to 12, wherein said statistic is the median.
16. A method according to any one of claims 1 to 15, wherein step further comprises identifying segments of said current episode that match preferences of the user that are specified explicitly by the user or learned from a viewing history of the user.
17. A method according to claim 3, wherein step further comprises: (ca) summarising said identified segments by limiting said identified segments to the most frequently consumed portions of said identified segments to thereby meet said determined length of the highlight.
18. A method according to any one of claims 1 to 17, wherein step further comprises: (cb) adjusting a boundary of said identified segments to some adjacent detectable logical boundary. 711857
19. A method according to any one of the preceding claims wherein said episode comprises a radio broadcast having audio content able to be segmented. A method according to any one of the preceding claims wherein said episode comprises a television broadcast having audio-visual content able to be segmented.
21. A method according to claim 19 or 20, wherein said logical boundary is the start or the end of a commercial break.
22. A method according to claim 20, wherein said logical boundary is the start or the end of a sequence of frames of low video and/or audio activity.
23. A method according to claim 20, wherein said logical boundary is the start or the end of a sequence of frames which display the same locale.
24. A method according to claim 20, wherein said logical boundary corresponds to a difference in video content and/or audio content and/or content description metadata between the frames preceding said logical boundary and the frames succeeding said logical boundary. A method according to claim 20, wherein said logical boundary corresponds to a repetition of video content and/or audio content and/or content description metadata between the frames preceding said logical boundary and the frames succeeding said logical boundary. 711857 -41-
26. A method according to claim 20, wherein said logical boundary corresponds to the repetition of video content and/or audio content and/or content description metadata from other episodes of a program series in the frames succeeding and/or preceding said logical boundary for said episode.
27. A method according to any one of claims 1 to 26, wherein step further comprises: (dc)introducing transition effects around the boundaries of said identified segments.
28. A method according to any one of claims 1 to 27, wherein step outputs said compiled highlight as metadata that points to the actual fragments of said recorded program episode.
29. A method according to claim 28, wherein said highlight metadata is used by an application to extract fragments from said current episode to thereby enable presentation of the highlight. A method according to any one of claims 1 to 28, wherein step further comprises: (cd) extracting said identified segments from said recorded current episode; and step further comprises outputting said highlight to storage with said compiled segments inserted.
31. A method according to any one of claims 1 to 28, wherein said highlight metadata is used by an application to present the entire recorded program episode using different speeds for segments that are part of said highlight and segments that are not. 711857 42
32. A method according to any one of claims 1 to 25, wherein step outputs the entire program episode using different encoding parameters for segments that are part of said highlight and segments that are not.
33. A method according to claim 32, wherein said encoding parameters are determined from a statistic of the frequency graph of said segment.
34. A method according to any one of claims 1 to 27, wherein step extracts said identified segments from said recorded episode and outputs said compiled highlight to storage with said segments inserted wherein the encoding parameters are determined from values of said frequency graph of said segment. A method according to any one of claims 1 to 27, wherein encoding parameters for recording of said segments of said episode are determined from values of said frequency graph of said segment prior to broadcast of said episode.
36. A method according to any one of claim 1 to 27, wherein encoding parameters for recording of said segments of said episode are determined from said compiled highlight metadata and values of said frequency graph of said segment prior to broadcast of said episode.
37. A method of refining the segment boundaries estimated from a watch frequency graph of a streamed or recorded program episode, said method comprising steps of: 711857 43 computing a difference between the time a user tunes to a segment of a consumed program and the time of the closest boundary of said segment for a set of programs; determining a distribution of said computed time differences; adjusting said segment boundaries based on said distribution.
38. A method according to claim 37, where said determined distribution comprises time differences that are less than a predetermined threshold value.
39. A method according to claim 37 wherein said determined distribution consists of time differences that are less than a predetermined threshold value. A method of presenting a video sequence of images, there being segment locators related to said video sequence and describing features of said video sequence, said sequence being formed of at least one segment, said method comprising the steps of: analysing said describing features to obtain a viewer relevance value for each segment in said video sequence; and adjusting the presentation speed of each segment according to said relevance value.
41. A method according to claim 40, wherein said describing features are derived from at least one of: content description metadata of said sequence, (ii) audiovisual video content of said sequence, (iii) a watch frequency graph of a series of said video sequences, and 711857 -44- (iv) viewer preferences.
42. A method according to claim 40, wherein step comprises the sub-steps of: (ba) maintaining an original presentation speed of said segment where said relevance value is higher than or equal to a threshold value; and (bb) increasing the presentation speed of said segment from said original presentation speed to a speed that corresponds to said relevance value where said relevance value is lower than said threshold value.
43. A method according to claim 42, wherein said increasing of the presentation speed is performed at a viewer perceptual rate.
44. A method according to claim 41, wherein step (bb) further comprises the steps of: (bba) analysing said describing features to obtain a viewer relevance value for fragments in said segment; and (bbc) adjusting the playback speed of said fragment corresponding to said viewer relevance value for said fragment, wherein said adjusted presentation speed is greater than or equal to the original presentation speed and smaller than said increased presentation speed of said segment. A method of storing a video sequence of images having plural segments, there being segmentation metadata related to said video sequence and describing features of said video sequence, said method comprising the steps of: analysing said segmentation metadata and said describing features to obtain a viewer relevance value for each segment in said video sequence; and 711857 adjusting audio and video encoding parameters of each said segment according to said relevance value.
46. A method according to claim 45, wherein said describing features are derived from at least one of: content description metadata of said sequence, (ii) audiovisual video content of said sequence, (iii) a watch frequency graph of a series of said video sequences, and (iv) viewer preferences.
47. A method of presenting video images substantially as described herein with reference to any one of the embodiments as that embodiment is illustrated in the drawings.
48. A computer readable medium having a computer program recorded thereon and adapted to make a computer execute a procedure to perform the method of any one of the preceding claims.
49. Computer apparatus adapted to perform the method of any one of claims 1 to DATED this TWENTY-FIRST Day of APRIL 2005 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant SPRUSON&FERGUSON 711857
AU2005201690A 2004-06-09 2005-04-21 Method for Creating Highlights for Recorded and Streamed Programs Abandoned AU2005201690A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2005201690A AU2005201690A1 (en) 2004-06-09 2005-04-21 Method for Creating Highlights for Recorded and Streamed Programs

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
AU2004903135 2004-06-09
AU2004903135A AU2004903135A0 (en) 2004-06-09 Method for Creating Highlights for Recorded and Streamed Programs
AU2005201690A AU2005201690A1 (en) 2004-06-09 2005-04-21 Method for Creating Highlights for Recorded and Streamed Programs

Publications (1)

Publication Number Publication Date
AU2005201690A1 true AU2005201690A1 (en) 2006-01-05

Family

ID=35767645

Family Applications (1)

Application Number Title Priority Date Filing Date
AU2005201690A Abandoned AU2005201690A1 (en) 2004-06-09 2005-04-21 Method for Creating Highlights for Recorded and Streamed Programs

Country Status (1)

Country Link
AU (1) AU2005201690A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023033799A1 (en) * 2021-08-31 2023-03-09 Google Llc Automatic adjustment of audio playback rates

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023033799A1 (en) * 2021-08-31 2023-03-09 Google Llc Automatic adjustment of audio playback rates

Similar Documents

Publication Publication Date Title
JP5322550B2 (en) Program recommendation device
JP4615166B2 (en) Video information summarizing apparatus, video information summarizing method, and video information summarizing program
US9098172B2 (en) Apparatus, systems and methods for a thumbnail-sized scene index of media content
JP5227382B2 (en) Method and apparatus for switching to similar video content
CA2738430C (en) Delete viewed portions of recorded programs
US7333712B2 (en) Visual summary for scanning forwards and backwards in video content
JP2005524271A (en) System and method for indexing commercials in video presentation
JP2005524290A (en) Black field detection system and method
JP2009533993A (en) Data summarization system and data stream summarization method
KR20020001820A (en) Methods and apparatus for recording programs prior to or beyond a preset recording time period
EP1820346B1 (en) Customizing commercials
JP4735413B2 (en) Content playback apparatus and content playback method
JP2010226630A (en) Image processing apparatus having comment processing function, and method of processing comments of the same
US20200186852A1 (en) Methods and Systems for Switching Between Summary, Time-shifted, or Live Content
JP2010246000A (en) Video search reproduction device
JP2007066409A (en) Recording and reproducing apparatus, and recording and reproducing method
JPWO2007039995A1 (en) Digest creation device and program thereof
AU2005201690A1 (en) Method for Creating Highlights for Recorded and Streamed Programs
US20080025689A1 (en) Device And Method For Recording Multimedia Data
JP2002271739A (en) Video device and reproduction control information distribution method
JP2007184781A (en) Digest video generating device, and digest video generating method

Legal Events

Date Code Title Description
MK1 Application lapsed section 142(2)(a) - no request for examination in relevant period