VIDEO DATA STRUCTURE FOR VIDEO BROWSING BASED ON CONTENT
BACKGROUND OF THE INVENTION Field of the Invention
The present invention relates to a video browsing system, and more particularly to a video data structure for video browsing based on content, m which contents of a video may be summarized and browsed by defining relations among characters, places and events. Background of the Related Art Typically, users simply view movies and/or dramas as broadcasted through a TV or played at a movie theater. However, a user may wish to view a particular movie or drama at a particular time, or wish to view only a particular section of a movie or a drama. Accordingly, various techniques which enables a selective watching of a movie/drama or sections of a movie/drama have been suggested.
In the related art, for example, various video data may be represented or classified into format chunk, index chunk, media chunk, segment chunk, target chunk, and/or representation chunk. Also, data on various characters or objects such as a name of an object, position on the screen, numeric data with relation to a segment of the video data m which the object appears, may be represented by the target and representation chunk. Accordingly, a user can select an object through a table and reproduce for display a particular segment where the object is shown m the
video .
In another related art, various additional data of a video data are obtained before, during or after the production of the video data. Thereafter, an additional information table of the obtained data is composed and provided to users . Namely, the additional data table may include a position where an actor appears, a position where a character of the actor appears, and a position where stage properties appear, such that a scene can be reproduced as selected by a user through the additional data table. For example, if a user selects a stage property, information on the selected stage property such as the manufacturer and price may be displayed on a screen, and the user may be able connect with the manufacturer or a seller of the stage property through a network connection. In still another related art, recording information on each segment of a video m a video map has been suggested. That is, information such as the degree of violence, the degree of adult contents, the degree of importance of contents, characters positions, and the degree of difficulty m understanding may be indicated for each segment of a video in the video map. Thus, the user may set a degree of preference for one or more items of the video map, and only segments of the video meeting the set degree of preference would be reproduced, thereby limiting a display of particular contents to unauthorized viewers. Other techniques m the related art which allow users to selectively view a portion of a video include a temporal
relational graph of shots for a video. However, viewing a temporal relational graph is similar to viewing several representative scenes of a video and would not allow a user to easily follow the contents of a video. Similarly, other techniques in the related art as described above provide items simply arranged without any relation to the objects appearing in the movie or drama, based upon the selection of the user. However, the contents of a movie or drama generally builds around relations between characters, places and events. For example, relations between characters may not change from beginning to the end of the story or may continuously vary. Moreover, since one or more characters relate to a specific character in the movie or drama, the browsing method in the related art substantially fails to provide an accurate understanding of the story of the movie or drama to the user.
Therefore, techniques in the related arts have disadvantages in that it is impossible to understand a video centering on relations among characters according to the development of events, changes of relations, and relations among characters and places as events develop.
SUMMARY OF THE INVENTION
Accordingly, an object of the present invention is to solve at least the problems and disadvantages of the related art.
Another object of the present invention is to provide a video data structure for a video browsing based on content which
allows users to easily browse and understand contents of a video. Another object of the present invention is to provide a video data structure for video browsing based on content which allows users to fully understand relations between characters of a video by representing relations which vary according to the developments of events n the video.
A further object of the present invention is to provide a video data structure for video browsing based content which allows users to easily recognize a whole content of a video by summarizing the content.
A still further object of the present invention is to provide a video data structure for video browsing based content which allows users to watch a portion of a video based upon a particular event. Additional advantages, objects, and features of the invention will be set forth in part m the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and advantages of the invention may be realized and attained as particularly pointed out in the appended claims.
To achieve the objects and in accordance with the purposes of the invention, as embodied and broadly described herein, a video data structure for video browsing based on content comprises objects and events; characters and events as objects; places used as backgrounds of objects; characters, places and
events as objects; and a plot of a video content. Here, the content of a video is expressed based on relations between factors, including relations which change during the development of a movie or drama. Thus, a user can browse and view a segment of an event corresponding to a selected relation.
In another embodiment of the present invention, a video data structure for video browsing based on content centering on relations between factors includes data representing an actual segment of a video; data representing the video content; and connecting data which indicates data representing an actual segment of the video by connecting factors to the data representing the video content.
In still another embodiment of the present invention, a video data structure for video browsing based on content includes a semantic data including relations between characters and changes in the relations; and data which connects relations between characters with video segments corresponding to the changes in the relations. Thus, a user can browse a video by selecting relations and changes in the relations between characters.
In a further embodiment of the present invention, a video data structure for video browsing based on content includes semantic data including objects and relations between the objects and places where the objects appear, and data for connecting corresponding video segments to the relations between the objects and the places. Thus, a user can browse a video and view a
segment of an event corresponding relation between objects and places .
In still a further embodiment of the present invention, a video data structure for video browsing based on content, the whole content of a video is expressed by a set of paragraphs corresponding to events as a factor for constituting the video. Thus, a user can browse and view a content of an event corresponding to an event paragraph.
BRIEF DESCRIPTION OF THE DRAWINGS The invention will be described in detail with reference to the following drawings m which like reference numerals refer to like elements wherein:
Fig. 1 is a view of a video browser supported by a data structure according to the present invention; Fig. 2 shows a video data structure m which video data is represented centering on relations among characters according to the present invention;
Fig. 3 is a view of a video story browser supported by a data structure according to the present invention; Fig. 4 is a video data structure for representing a video in an object-place relation graph according to a first embodiment of the present invention;
Fig. 5 is a video data structure for representing a video m an object-place relation graph according to another embodiment of the present invention;
Fig. 6 is a video data structure for video browsing based on segments of an event after representing a content of a video by a set of event paragraphs according to the present invention;
Fig. 7 is a video data structure which includes video summary data according to the present invention; and
Fig. 8 is a video data structure for video browsing based on content centering on relations between characters according to a the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS Generally, the present video data structure of relations and changes m relations between objects m a video allows a user to easily understand and browse a video. Fig. 1 shows an example screen of a video browser which is based on relations between objects and is supported by a video data structure m accordance with the present invention. Particularly, a video browser based on contents is disclosed m copendmg U.S. Patent Application Serial No. ? and 09/239,531, and is fully incorporated herein.
Referring to Fig. 1, a screen of character relations, a screen of main scenes, and a main screen are displayed. Although an object may be any person, item or place that appears in a video, for purposes of explanation, objects will be assumed to be characters and places in a video.
The screen of character relations includes a character screen 101 which represents the characters of a video, and a relation screen 102 which represents relations and changes in the
relations between a character selected in the character screen 101 and other characters of the video. Here, a constant relation between the selected character and other characters are shown in an upper level of a relation tree structure, while changes in relations are shown in a lower level of the relation tree structure. Also, a main scene screen 103 displays key frames of events which show constant relations or changes in relations between characters as selected from the relation screen 102, and a main screen 104 displays a video segment corresponding to a scene of an event selected from the main scene screen 103.
In the present invention, a constant relation means either a relation between characters that cannot change throughout a video, such as a patent to child relation, or a relation which is most representative of the relations between characters. Also, the displayed constant relation may include additional information such as a number of variable relations in the lower tree structure. For example, in the displayed constant relation between 'character 1' and 'character 2, ' the number '2' displayed above 'character 2' indicates that there are two variable relations in the tree structure.
In Fig. 1, for example, a user selected 'character 1' from the character screen 101, and relations with 'character 2' and 'character 3' are shown in the relation screen 102 by a tree structure. Also, since the user selected relation 2 with 'character 2' from the relation screen 102, scenes of significant events corresponding to relation 2 between 'character 1' and
'character 2' are shown by key frames m the main scene screen 103. In the present invention, a significant event may mean an event which show a corresponding relation or an event which brought about a change in a corresponding relation. Finally, 'scene 10' is selected from the main scene screen 103 and the video segment corresponding to 'scene 10' is displayed in the main screen 104.
The video browser as shown m Fig. 1 may be implemented by a video data structure as shown in Fig. 2, which represents video data based primarily upon relations between characters.
Referring to Fig. 2, a video data structure is a visual description scheme of a video and begins from a visual DS 201. The visual DS 201 is divided into a syntactic structure DS 202 and a semantic structure DS 203. The syntactic structure DS 202 includes data information corresponding to actual segments of a video while the semantic structure DS 203 includes additional data information describing each segments of a video.
The syntactic structure DS 202 is organized into actual video segments in a segment DS 204 and corresponding temporal positions of each video segments m a time DS 205.
The semantic structure DS 203 is divided into event information m an event DS 206; object information in an object
DS 207; and relation information, such as constant relations or variable relations between characters or between characters and events in an event/object relation graph DS 208.
Particularly, the event DS 206 is organized into a Reference
to Segment 209 including information necessary for displaying a video segment corresponding to an event, when a user selects an event, and an annotation DS 210 including text information which connects events with actual positions of the events in a video and information for explaining events m a video.
The object DS 207 is organized into an object annotation DS 211 including information for describing objects such as characters or places.
The event/object relation graph 208 is organized into an entity relation 212 with a return which allows a tree structure display of relations between characters. According to the present invention, to realize the character relation screen of Fig. 1, constant relations between characters are displayed m the top level of the tree while changes of relations between characters are displayed m a lower level of the tree.
The entity relation 212 is divided into a relation 213, Reference to Object 216, and Reference to Event. The relation 213 is organized into a type 214 including information on the nature of relations, and a name 215 including information on the titles of relations. For example, a nature of relation may be 'family' and a title of relation may be 'spouse. ' The Reference to Object 216 connects related characters with each other and the Reference to Event 217 connects events which show particular relations between related characters . In the above video data structure, the notation above each data such as {0,1}, {0,*}, or {1,*} indicates the number of data
for the corresponding data. For example, the notation of {0,1} for the syntactic structure DS 202 indicates that the visual DS 101 can have zero or one syntactic structure DS . On the other hand, the notation of {0,*} for the segment DS 204 indicates that the syntactic structure DS 202 may have from zero to any number of segment DS .
Using a video data structure as described above, a video browser, as shown in Fig. 1, may be implemented. Namely, the characters in a video are displayed on the character screen 101 based upon the object DS 207. If 'character 1' is selected from the character screen 101, other characters related to 'character 1' and changes in the relations are displayed by a tree structure in the relation screen 102 based upon entity relation 212 data, relation 213 data, type 214 data, name 215 data, and Reference to Object 216 data under the event/object relation graph DS 208.
At this time, if 'relation 2' with 'character 2' is selected from the relation screen 102, scenes of events corresponding to relation 2 between 'character 1' and 'character 2' are shown by key frames in the main scene screen 103 based upon event DS 206 data and Reference to Event 217 data. If 'scene 10' is selected from the main scene screen 103, the video segment corresponding to 'scene 10' is displayed m the mam screen 104 based upon the Reference to Segment 209 data, the segment DS 204 data, and the time DS 205 data. Fig. 3 is another example screen of a video story browser which can be implemented using the present invention. The video
story browser in Fig. 3 is based upon object-place relations, wherein a plot of a video is summarized by event paragraphs and information of a video segment corresponding to a displayed event paragraph is expressed by a relation graph of objects. Referring to Fig. 3, a relation screen 301 displays a relation graph between characters and places for a video segment corresponding to an event paragraph displayed m a story screen 302, where the event paragraph was selected by a user from a plot of video composed by a set of event paragraphs. Also, contents of video segments corresponding to a relation selected from the relation screen 301 is displayed by key frames with a brief explanation m a text screen 303.
In the video browser of Fig. 3, characters are displayed in a left column while places are displayed m a right column of the relation screen 301, wherein lines connecting the characters with the places indicate that the connected characters and places have relations. Also, information on at least one event scene corresponding to a relation selected from the relation screen 301 is displayed n the text screen 303 by key frames and text. Fig. 4 is another video indexing method in accordance with the present invention for a video browser, and can be used as a video data structure to implement a video browser based on an object-place relation graph as shown in Fig. 3.
Referring to Fig. 4, a video structure 401 is divided into a syntactic structure 402 and a semantic structure 403. The syntactic structure 402 is organized into a segment set 404
including actual video segments and for each video segment 405, a corresponding time 406 data and shot 407, such that an actual video segment can be displayed. Here, a shot is information of a representative frame. The semantic structure 403 is organized into an event 408 information of events and an object 409 information of characters. The event 408 is divided into a Reference to Segment 410 including information for connecting an event segment to a corresponding actual video segment, and an annotation 411 including information which summarizes each corresponding events. The object 409 is divided into a Reference to Segment 412 including information for connecting characters to corresponding event, if the object is a character, an object type 413 including information which describes the type of each corresponding objects, and an annotation 414 including information which summarizes each corresponding objects in relation to the Reference to Segment 412 and object type 413.
In the above video data structure, a return structure of the segment set 404, the event 408, and the object 409 enables a browser to display relations between objects by a tree structure.
Also, the object type 413 is data information which distinguishes whether an object is a character or a place, and to implement a video browser as shown in Fig. 3, the Reference to Segment 410, 412 including event and object information is used to connect objects such as characters and places to actual video segments corresponding to event segments. Namely, when an
object and a place refers to a same event segment, the object and place have a relation. Therefore, if a relation between an object and a place is selected, a video segment corresponding to an event segment showing the selected object-place relation would be displayed using information in segment 404, 405, 406, 407 connected by the Reference to Segment 412 in object 409 and the Reference to Segment 410 in event 408.
Fig. 5 is another video indexing method in accordance with the present invention for a video browser based on an object-place relation graph as shown in Fig. 3. The video data structure of Fig. 5 includes a video structure 501 and is divided into a syntactic structure 502 and a semantic structure 503 as in the video data structure described with reference to Fig. 4.
Namely, the syntactic structure 502 is organized into a segment set 504, a video segment 505, time 506, and shot 507. The semantic structure 503 is organized into an event 508 and an object 509, where the event 508 is divided into a Reference to Segment 510 and an annotation 511, and the object 509 is divided into a Reference to Segment 512, an object type 513, and an annotation 514. Moreover, as in Fig. 4, the object type 513 is data for discriminating whether an object is a character or a place, and the segment set 504, the event 508, and the object 509 allows a tree structure display of with information of different degree of detail using a return structure. However, the video data structure of Fig. 5 further includes an additional event/object relation graph DS 515 for a more
efficient connection relation between objects and places. Particularly, the event/object relation graph DS 515 is organized into an entity relation 516 including information which connects related characters by a Reference to Object 517 and information which connects events to corresponding relations by a Reference to Event 518, thereby displaying connections between related characters, places, and events.
As described above, the connection relations between characters, places, and events are processed and displayed more efficiently than the video data structure of Fig. 4. Also, the video data structure of Fig. 5 is more advantageous than the structure of Fig. 4 because names of a relation between an object and place can be identified. That is, if an object-place relation browsing is performed using the video data structure as shown in Fig. 5, and if an object and a place are selected, a video segment corresponding to an event which shows the selected object-place relation would be displayed using information in segment 504, 505, 506, 507. At this time, the Reference to Segment 510 and the Reference to Event 512 connects with the actual video segments in segment 504 ~ 507 through the Reference to Object 517 and the Reference to Event 518.
However, since the event/object relation graph DS 515 is additionally included, the efficiency with respect to the storage space, i.e. data volume, for a video browser is lower than the structure of Fig. 4.
Fig. 6 is another video indexing method in accordance with
the present invention for a video browser which displays a plot of a video in event segments by event paragraphs. The video data structure of Fig. 6 includes a video structure 601 and is divided into a syntactic structure 602 and a semantic structure 603 as in the video data structure described with reference to Fig. 4. Accordingly, the syntactic structure 602 is organized into a segment set 504 including actual video segments and for each video segment 605, a corresponding time 606 and shot 607, such that an actual video segment can be displayed. However, the semantic structure 603 is organized into an event 608 information which is divided into a Reference to Segment 609 including information for connecting an event segment to a corresponding actual video segment, and an annotation 409 including information which summarizes each corresponding event segments. In the above video data structure, the segment set 604 and the event 608 enables a tree structure display of object relations using a return structure. Particularly, the event 608 allows a tree structure display, wherein a set of event summaries in a top level composes a plot of a video and each of the event summaries which is generally presented by one paragraph is connected to corresponding a video segment.
Thus, in the video data structure as shown in Fig. 6, a paragraph corresponding to a video segment selected from a plot of a video, and the video segment corresponding to the selected paragraph can be displayed in various style using an additional data structure 611. Such additional data structure can be
provided for video data browsing based on content as described with reference to the relation screen 301 and text screen 303 of Fig. 3.
Fig. 7 is another video indexing method in accordance with the present invention for a video browser, which represents the additional data structure 611 of Fig. 6 by a connection graph between related characters and places such that events corresponding to each relation can be viewed.
Referring to Fig. 7, a video DS 701 is divided into a syntactic structure DS 702 including actual segments of video and a semantic structure DS 703 including additional data information describing each segments of a video.
The syntactic structure DS 702 is organized into actual video segments in a segment DS 704 and corresponding temporal positions of each video segments m a time DS 705.
The semantic structure DS 703 is organized into event information m an event DS 706; object information, such as characters or places, in an object DS 707; and relation information, such as constant relations between characters, variable relations between characters, or between characters and events, in an event/object relation graph DS 708.
The event DS 706 is divided into a Reference to Segment 709 including information necessary for displaying a video segment corresponding to an event, when a user selects an event, and an annotation DS 710 including information which connects events with actual positions of the events in a video and information
for explaining events in a video.
The segment DS 704 and the event DS 706 are organized by two levels, wherein level 0 includes a set of event segments which defines a video, and level 1 includes information of an event segment which corresponds to a selected relation between a character and place.
The object DS 707 is divided into a Reference to Segment 711 including information for connecting characters to corresponding event segments, if the object is a character, an object type 712 including information which describes the type of each corresponding objects, and an annotation DS 713 including information which summarizes each corresponding objects in relation to the Reference to Segment 711 and object type 712.
Also, the event/object relation graph DS 708 is organized into an entity relation 714 including information which connects related characters by a Reference to Object 715 and information which connects events to corresponding relations by a Reference to Event 716.
Therefore, according to the video data structure as shown in Fig. 7, if one of a paragraph composing a plot of a video is selected and displayed on the story screen 302 of a video browser in Fig. 3, event segments of level 0 corresponding to the selected paragraph would be selected. At this time, a graph of characters and places are displayed using the objects connected to the displayed event segments. If a relation is selected from the character-place graph, an event segment of level 1
corresponding to the selected relation would be selected. Here, a correspondence means an event segment of level 1 which is commonly connected with the character and place.
Fig. 8 is another video indexing method in accordance with the present invention for a video browser based on constant relations and changes in relations between characters according to the development of events. Also, objects and places are represented by an object-place relation graph and event segments satisfying each of the object-place relation can be displayed. Furthermore, a plot of video can be expressed by a set of event paragraphs, and a video browsing can be performed based upon the content with relation to the video segments divided by the respective paragraphs.
The video data structure of Fig. 8 includes a visual DS 801 and is divided into a syntactic structure 802 and a semantic structure 803 as in the video data structure described with reference to Fig. 7.
Namely, the syntactic structure 802 is organized into a segment DS 804 and time DS 805. The semantic structure DS 803 is organized into an event DS 806, an object DS 807 and an event/object relation graph DS 808, where the event DS 806 is divided into a Reference to Segment 809 and an annotation 810 and the object DS 807 is divided into a Reference to Segment 811, an object type 812, and an annotation DS 813. Moreover, as in Fig. 7, the segment DS 804, the event DS
806, and the object DS 807 enables a tree structure display with
information of different degree of detail using a return structure, and the object type 812 is data for discriminating whether an object is a character or a place. Furthermore, a return of an entity relation 814 of the event/object relation DS 808 allows a tree structure display with subordinate relations. For example, constant relations between characters are placed m an upper level of a tree and changes in relations between characters are placed m a lower level of the tree.
However, m the video data structure of Fig. 8, the entity relation 814 is divided into a relation 815, Reference to object 818, and reference to event 819. The relation 815 under the entity relation 814 is further divided into a type 816 including information on the nature of relations, and a name 817 including information on the titles of relations. For example, a nature of relation may be 'family' and a title of relation may be 'spouse' as discussed above. As in Fig. 7, the Reference to Object 818 connects related characters and a Reference to Event 819 connects characters with events corresponding to a particular relation.
As described above, according to the present invention, video browsing may be performed based upon characters relations in a movie or a drama, such that a user may easily reproduce and watch a section related to a desired plot of the movie or the drama. Also, a video according to the present invention may be expressed by a relation graph between objects, places and events, such that video indexing and browsing method based on the video content may be implemented. Moreover, a video according to the
present invention may also be expressed by a set of event paragraphs, and a video indexing and browsing based on content may be performed per segment represented by respective paragraphs, such that a more clear and effective video indexing and browsing may be achieved.
The present invention may be applied to a VOD system m a broadcasting field for transmitting a particular portion to a user, such that only a selected video segment may be rapidly reproduced, thereby allowing an effective utilization of the network source. Finally, the present invention may be applied to domestic or broadcasting video reproducers for convenient video browsing of a desired segment of various video sources such as recorded movies, dramas, or sports games.
The foregoing embodiments are merely exemplary and are not to be construed as limiting the present invention. The present teachings can be readily applied to other types of apparatuses. The description of the present invention is intended to be illustrative, and not to limit the scope of the claims. Many alternatives, modifications, and variations will be apparent to those skilled in the art.