KR101658002B1

KR101658002B1 - Video annotation system and video annotation method

Info

Publication number: KR101658002B1
Application number: KR1020150177008A
Authority: KR
Inventors: 낭종호; 최지수; 최기석
Original assignee: 서강대학교산학협력단
Priority date: 2015-12-11
Filing date: 2015-12-11
Publication date: 2016-09-21

Abstract

The present invention relates to a video annotation system. The moving picture annotation system defines a video sentence and a video paragraph as a new video indexing unit, and performs indexing in units of the video sentence and the video paragraph. The video sentence refers to a video section in which one speaker performs a voice description in units of one sentence through one or several shots. The video paragraph is a section in which a plurality of speakers composed of one or more video sentences speak one small group It is bounded until it changes to another topic. INDUSTRIAL APPLICABILITY The indexing method according to the present invention has a hierarchical structure, so that one episode can be effectively searched by subject according to medical explanation and small scale system.

Description

Video annotation system and video annotation method

The present invention relates to a video annotation system, and more particularly, to a video annotation system in which a video sentence and a video paragraph are set as a new indexing unit of a moving picture, a moving picture is indexed into a video sentence and a video paragraph as new indexing units, And more particularly, to a video annotation system that enables a user to effectively search for a video.

Recently, due to the multi - channelization of broadcasting, the kinds and amount of broadcasting contents have become enormous, and users' access to broadcasting contents is increasing. In particular, interest and development of broadcasting contents are expected to increase day by day as content tailored to the requirements such as having a theme for the viewer's interest or having the purpose of acquiring information is continuously generated. The movement of these users has played a major role in the spread of digital devices and supporting software. According to the broadcasting schedule of 'Passive broadcast viewing' method, it is not 'watching TV programs', but 'active broadcasting watching' method, so that applications or web pages provided through devices such as smart phones, tablets, You can watch the part. In this way, big data is formed as high-quality broadcast contents suited to the requirements of viewers accumulate. Effective software using digital devices analyzes and processes broadcast contents to enable users to acquire information actively It is expected that the future consumption outlook for broadcast contents will be brighter.

On the other hand, broadcast contents have various genres, and among the broadcast contents, documentary is a broadcasting genre for the purpose of information transmission. The documentary aims to convey as much realistic information as possible while communicating actual events or scientific experiments and results to viewers. Within the documentary, it has various sub-genres depending on the theme. The human documentary is about the people's lives and events, while the animal documentary shows information about the animals by showing the animals' residence, food, and behavior. Although the composition varies according to each theme and purpose, the main purpose of the documentaries is to provide useful information to viewers. Historical documentaries, for example, are used to convey indirect experiences or knowledge to viewers through credible information such as content in documents or interviews with historians. Such a documentary genre can be a special knowledge data archive if it is well organized and can be used when necessary. But there is a problem with getting information from the documentary. In order to obtain necessary information, there is a disadvantage that passive viewing is required according to the flow of reproducing contents from the beginning. That is, it is possible to retrieve episodes of the video on the basis of the title alone, but details of the episodes are not indexed, annotated and classified. There are documentary programs that have been broadcast hundreds of times through constant popularity. Despite the documentary program, which is broadcasted as a short film feature but contains quite useful information, the difficulty in searching for information means that the systematic library of information through indexing, annotation, .

The documentary genre is useful content in that it conveys objective and realistic information. However, it does not systematically classify the information in one episode, making it difficult to find information effectively.

Hereinafter, some of various proposed video annotation systems will be described in detail.

'Ontology Building for Broadcasting Contents in IPTV Environment' (Kim, Hyung Joo, Kim, Jong - Duk, and Dong - Sik Lee, 2008.) has been designing an annotation model for a broadcasting contents retrieval system which is composed by subdividing a program into a library with emphasis on library contents and writing annotation information according to users' search environment. In this paper, we have discussed the theme of documentary based on animal among various broadcast contents, and we can explain main design method by two methods of indexing and annotation model. First, the indexing unit used chapters and scenes. The scene referred to in the paper is usually 30 to 60 seconds long and has a length of 300 seconds. The scene is subjective depending on the viewer, but the actor of the annotation system is a content manager, so it is possible to expect a subjective but more specialized configuration as a person who constitutes the program.

In the method of designing annotation model according to the paper, the field of annotation information utilized objects, behaviors, and sites which are only the most important information among the five or six principles used in news tagging. These three pieces of information can be a subject, a verb, an object, and can form a natural sentence. For example, the 'lion' in an animal documentary becomes an object and becomes an act of 'eating'. And 'in the jungle, on the tree' becomes a place. The scene has information that is one sentence, and the user has a scenario to find the scene through these three pieces of information. Among them, the object and the action field constituted a thesaurus. The reason for using the thesaurus in the object field is that the content manager can quickly annotate it using pre-classified code, and the same meaning, but the text can prevent other information input, unified annotation is possible, . The object thesaurus has a maximum of 6Depths in the system, it can be changed to a more detailed thesaurus when it is targeted to the experts according to the level of the user, and the flexibility to use a thesaurus with a lower depth when the user is a general user. The action field also has a thesaurus of 2Depth. We use a method that can be hierarchically searched when users search by forming a group of similar behaviors with different meanings. The data corresponding to each thesaurus form an ontology through a relation corresponding to each other. That is, it forms a multidimensional network structure between data of several subjects, verbs, and objects. This ontology structure can also provide recommendations and related data based on user preferences.

Semantic Annotation and Retrieval of Documentary Media Objects "(D. Kanellopoulos," Semantic Annotation and Retrieval of Documentary Media Objects, " The Electronic Library , Vol. 30, No. 5, pp. 721-747, 2012.) I have annotated some semantic elements for effective search without distinction of the genre of the documentary, and show whether the documentary content uses a new index rather than an existing scene.

We used the new indexes Documentary, DocumentaryClip, and DocumentaryPiece as a feature of indexing. Each annotation field has a different annotation field, and according to the event, the annotation has a boundary according to the subject of the annotator, and has metadata readability for the entire documentary through a hierarchical structure. DocumentaryPiece, which is the minimum index unit, becomes a set of Action or Documentarist actions appearing in a documentary. Action lists of similar events become a DocumentaryPiece. A DocumentaryClip is a set of DocumentaryPiece that has information about the nature of Documentarist appearing in the DocumentaryPiece, and has information about the hierarchy, such as the contextual meaning and genre characteristics of the DocumentaryClip before and after. As the highest index, Documentary is a set of DocumentrayClips and has information such as title, format and time for the content. This hierarchical structure classifies the information for each level and has the advantage of searching for a detailed and detailed view of one documentary.

The paper is more free to use the new indexes and represent semantic information, and the metadata did not conform to the MPEG-7 standard in order to fill more information. However, the metadata defines the DTD of the corresponding system and follows the XML Scheme. Although it is convenient to record annotation data from the viewpoint of commentator, it can not be standardized, and scalability is a problem in applying to other contents or other retrieval systems. And it is necessary to define the relationship between meta data through ontology and to return more relevant information to users.

"Ontologies for the Metadata Annotation of Stories" (V. Lombardo and A. Pizzo, "Ontologies for the Metadata Annotation of Stories," Proceedings of the Digital Heritage International Congress , pp. 153-160, ) Is effectively annotated and digital heritage is aimed at. Especially, it aims to include contents such as drama, opera, and theater which have original works in text, as well as object as well as semantic information by scene.

The indexing method in the annotation system according to the above paper will be described first. The index unit is targeted to a scene in which an event is performed. Because it is the content of the genre in which the story is made, the conversion of the event becomes clearer than the documentary genre by changing places or changing the characters in the conversation. Next, the form of the annotation model defines the object and the semantic element of the content using the ontology, and each field of the ontology forms a relation with other fields. This allows you to see the meaning and relationship between fields. And because it is done around the characters that appear, unlike the documentary, it contains semantic information such as the emotions and psychological state of the person. It can be seen that the configuration of the field is changed in order to effectively express the contents information centered on the story.

In this paper, it is additionally possible to mention a visualization of the information returned to the user. Stories are composed of event Plans for each person in order to allow users to understand the annotation information according to time as the time change is important. As mentioned above, the structure of the information returned from the search is also important for the convenience of the user and the characteristics of the content.

Hereinafter, an indexing method using a conventional shot and a scene will be described with respect to hierarchical index design.

It is necessary to divide episode of "secret of life history" which is medical related broadcast contents into a part having main scene, so that users can search for the parts and obtain information. That is, the indexing process for the entire image must be performed first, so that the partial image can be annotated and the annotation data can be searched. For video content, basically the shot is used as a minimum segmentation unit.

In video content, a shot usually refers to the time the camera starts to shoot and stops. Because of this, shots can represent the boundaries of screen transitions and can be the basic unit for dividing video content. However, there are some disadvantages to indexing the content of <The Secrets of Life History> with these shots.

FIG. 1 is a conceptual diagram illustrating a problem of using a conventional shot index. 1, a loss of context of the voice data occurs. ① As in <secret of life history>, contents development through narrator frequently occurs. When introducing disease related explanation or introduction to performers, various shots are used to convey contents. In other words, since the narrator uses various shots while carrying out a voice description which is about one sentence length when transmitting contents, it is difficult to know what contents are transmitted by dividing the voice information when indexing such scenes in shot units. Also, as in (2), when the annotator ignores the voice information and annotates only through the video information, the annotated annotation data is generated. Therefore, <secrets of life history> contents are important not only for video information but also for voice information, and they should be indexed in units that can contain both information.

Next, we can do scene unit indexing. Scene is a group of consecutive shots physically larger than shots, and users can more effectively search if content is hierarchically indexed using both shots and scenes. Semantically, one scene can be defined based on changes in the same place or event that cameras take. It may be more accurate to make a change of a scene as a boundary point of a scene, but it may be subjective depending on a person who distinguishes a scene when changing an event is regarded as a boundary point.

FIG. 2 is a conceptual diagram for explaining a problem in the case where event switching is performed on the scene boundary according to the conventional indexing method. As shown in FIG. 2, when the different annotators index <secret of life history> due to the subjective nature of the scene boundary for such an event, one episode may be divided into different meaning and size. If we divide the scene boundaries into places, as described in the previous shots, certain scenes can have ambiguity because they continuously use non-visually unified information that can cross the place and support the voice information. Therefore, when indexing content in a scene-by-scene basis, rules should be understood among annotators, so users should proceed in a way that enables them to search effectively.

Accordingly, the present invention proposes a video annotation system that prevents context loss of audio information of a moving picture and enables annotators to annotate more definite contents.

Korean Patent Publication No. 10-2002-0009757 Korean Patent Laid-Open Publication No. 10-2014-0051412 Korean Patent Registration No. 10-1398700

An object of the present invention is to provide a video annotation system and a video annotation method capable of effectively indexing and annotating information possessed by the contents.

It is another object of the present invention to provide a video annotation system and a video annotation method capable of setting video sentences and video paragraphs as new indexing units capable of indexing and annotating moving images, indexing them in units of video sentences and video paragraphs, .

According to an aspect of the present invention, there is provided a video annotation system for dividing a video into a video sentence and a video paragraph, indexing the video sentence unit and the video paragraph unit, Video sentences and video paragraphs, wherein the video sentence is set to a video section consisting of one or more shots forming a single sentence in a voice description by a single speaker, Is set as a video section composed of one or two or more video sentences for a common subject, and the annotation is annotation data having a taxonomy-based structure designed in consideration of specialized information of the contents of a moving picture .

In the moving picture annotation system according to the first aspect, the moving picture annotation system may include: a video controller for loading a moving picture and generating a shot index file for the moving picture; A script control unit for analyzing the voice data of the moving picture to generate a script file composed of text data; A video reproducing unit for reproducing shots according to a shot sequence using the shot index file; A video sentence indexing unit for forming and indexing video sentences using the shot index file and the script file; A video paragraph indexing unit for grouping the video sentences to form a video paragraph and index video paragraphs; A video sentence annotation unit for annotating the indexed video sentences and generating metadata for the video sentences; A video paragraph annotation unit for annotating the video paragraphs and generating metadata for the video paragraphs; So as to index and annotate the moving picture into the video sentence and the video sentence.

In the video annotation system according to the first aspect, the video controller may load a moving image to be indexed, detect a shot boundary (shot boundary detection) on the loaded moving image, And the shot index file is generated and stored. The shot index file may include time information and frame information for a boundary of each shot.

In the video annotation system according to the first aspect, it is preferable that the script control unit fetch a caption file for the moving image and store the caption file in the script file.

In the video annotation system according to the first aspect, it is preferable that the video sentence indexing unit sets a video sentence by grouping shots based on a sentence unit of a script file, and indexes the set video sentences.

In the video annotation system according to the first aspect, it is preferable that the video paragraph indexing unit groups the video sentences according to a common subject to set a video paragraph.

In the video annotation system according to the first aspect, it is preferable that the video sentence annotation unit writes the video sentence type, the specialized scene annotation information, and the specialized frame annotation information for each video sentence.

In the moving picture annotation system according to the first aspect, the video paragraph annotation unit preferably writes a video paragraph type and video paragraph keyword information for each video paragraph.

In the video annotation system according to the first aspect, the video annotation system may further comprise a user interface unit allowing a user to access each unit of the video annotation system, wherein the user interface unit A video control interface unit; A script control interface unit connected to the script control unit; A video player interface unit connected to the video player unit; A video sentence indexing interface unit connected to the video sentence indexing unit; A video paragraph indexing interface unit connected to the video paragraph indexing unit; A video sentence annotation interface unit connected to the video sentence annotation unit; A video paragraph annotation interface unit connected to the video paragraph annotation unit; .

A video annotating method according to a second aspect of the present invention includes the steps of: (a) loading a moving image and generating a shot index file for the moving image; (b) analyzing the audio data of the moving picture to generate a script file composed of text data; (c) reproducing shots according to a shot sequence using the shot index file; (d) forming and indexing video sentences using the shot index file and the script file; (e) grouping the video sentences to form a video paragraph and indexing a video paragraph, (f) annotating the indexed video sentences and generating metadata for the video sentence, (g) Writing annotations on the video paragraphs and generating metadata for the video paragraphs, and indexing and annotating the videos into video sentences and video paragraphs.

In the video annotation method according to the second aspect, the step (a) may include loading a moving image to be indexed, performing shot boundary detection on the loaded moving image, And the shot index file may include time information and frame information for the boundaries of each shot. The shot index file may include at least one of a time information and a frame information.

In the moving picture annotating method according to the second aspect, it is preferable that the step (d) sets a video sentence by grouping the shots based on the sentence unit of the script file, and indexes the set video sentences.

In the moving picture annotating method according to the second aspect, it is preferable that the step (e) sets the video sentence by grouping the video sentences according to a common theme.

The moving picture annotation system according to the present invention can index audio information of MC (Master of Ceremonies) or Narrator, which is an object of objective explanation, which is particularly important in documentary images, without loss of context In addition, video and video paragraphs, which are proposed index units, provide annotator structure consistency because annotators can assign regular annotation data and distinguish boundaries between indexes.

In addition, the moving picture annotation system according to the present invention can expect high accessibility by constructing an annotation model in accordance with information desired by a user through content analysis and user scenario analysis. At this time, the taxonomy structure used for annotation modeling not only provides a clear range of annotation information, but also provides high readability and usability to annotators and users through a hierarchical structure. Therefore, it is possible to solve the problems of the existing 'passive broadcast viewing' and to increase the accessibility of the user through the content consumption method using various annotation information.

A video sentence that has both voice and video information appears in several repeating forms in episodes of The Secrets of Life-History. If you can categorize these types of video sentences into one common to each other, classification is easy and users are easy to find. Especially, since there are more than 100 video sentences occurring in each episode, it is difficult for users to easily overview if they are not categorized, and it is not easy to find desired information. However, since video sentences have their own forms and can be classified, they can be annotated in all video sentences and browsed without missing indexes in search. The reasons why video sentences can have different forms are as follows. First, the secret of life history is the subject of health, so the purpose is clear and the composition of contents is limited. Second, considering the user accessibility, it is possible to classify the contents with high utilization value in the search, so that the limit of the category can be designated, so that the video sentence can be classified according to the form.

FIG. 1 is a conceptual diagram illustrating a problem of using a conventional shot index.
FIG. 2 is a conceptual diagram for explaining a problem in the case where event switching is performed on the scene boundary according to the conventional indexing method.
3 is a schematic diagram showing the structure of a video sentence according to the present invention.
FIG. 4 illustrates an advantage of using a video sentence index according to the present invention.
5 is a schematic diagram showing the structure of a video paragraph according to the present invention.
6 shows an example of the content of a video sentence and a video paragraph according to the present invention.
FIG. 7 is a block diagram of a moving picture annotation system according to a preferred embodiment of the present invention.
FIG. 8 is a photograph of a screen according to an example of a user interface unit in a video annotation system according to a preferred embodiment of the present invention, in order to explain a user interface unit.
FIG. 9 is a view illustrating a video sentence annotation interface unit of a user interface unit in a video annotation system according to a preferred embodiment of the present invention, which captures a screen according to an example of a video sentence annotation interface unit.
FIG. 10 is a view showing an example of a video document annotation interface unit in order to explain a video document annotation interface unit of a user interface unit in a video annotation system according to a preferred embodiment of the present invention.
FIG. 11 is a block diagram showing an entire annotation and search system for information search according to the present invention.
FIG. 12 is a graph exemplarily showing a video sentence type Texonomy for a broadcast content <secret of life history>, FIG. 13 is a graph exemplarily showing a video paragraph type Texonomy, FIG. 14 is an example of a video paragraph keyword Texonomy FIG.
15 is a table comparing features of a conventional broadcast content annotation method and an annotation method according to the present invention.

The moving picture annotation system according to the present invention defines a video sentence and a video paragraph as a new video indexing unit and performs indexing in units of the video sentence and the video paragraph. The video sentence refers to a video section in which one speaker performs a voice description in units of one sentence through one or several shots. The video paragraph is a section in which a plurality of speakers composed of one or more video sentences speak one small group It is bounded until it changes to another topic. INDUSTRIAL APPLICABILITY The indexing method according to the present invention has a hierarchical structure, so that one episode can be effectively searched by subject according to medical explanation and small scale system. In particular, the video annotation system according to the present invention defines a medical taxonomy in consideration of contents users' information access intention and contents-specific information, and annotates based on such a taxonomy. That is, video sentences can be annotated by classifying them into several video sentence types in accordance with characters, subjects, and screen contents, and the video paragraphs include a video paragraph type (Video Paragraph Type) and Keyword (Video Paragraph Keyword) annotations. The video paragraph type and keyword structure can be continuously added / extended by annotators.

In addition, in the moving picture annotation system according to the present invention, the annotation method is designed based on the information based on the characteristic analysis of the contents, and the user is required to focus more information on the information. Taxonomy is used to classify and annotate the annotation information. Structured. The present invention implements a new annotation tool specialized for content through such indexing and annotation method and defines the structure of the metadata generated by the annotation tool.

Hereinafter, a configuration and operation of a video annotation system according to a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

First, a video sentence and a video paragraph newly defined as indexing units will be described in detail in the present invention. 3 is a schematic diagram showing the structure of a video sentence according to the present invention. Referring to FIG. 3, the video sentence conceptually refers to a section in which one speaker includes a voice description of one sentence unit through one or several shots. The boundaries of the video sentences are bounded to the time when they are converted to other speakers, and when the same speaker continues the voice description, the video sentences are bounded until the other contents are spoken.

FIG. 4 illustrates an advantage of using a video sentence index according to the present invention. Referring to FIG. 4, when a video sentence is set as an indexing unit, audio information can be stored and video and audio information can be simultaneously reflected.

Next, a video paragraph according to the present invention is defined in order to complement the shortcomings of the scene and form a hierarchical structure with video sentences to induce effective retrieval. 5 is a schematic diagram showing the structure of a video paragraph according to the present invention. Referring to FIG. 5, a video paragraph is a segment separated by breaking the contents of a moving picture, and is composed of consecutive video sentences or a video sentence, and generates a video paragraph through the common points of the corresponding video sentences . The video paragraph is conceptually a section where several speakers talk about one sub-theme, and it is bounded until it is changed to another subject. The boundaries of the video paragraph index categorize subscales by considering frequent repetitive patterns and effective retrieval such as the contents of video sentences, and annotators can perform more precise indexing than subjective scenes considering the structure of this category.

On the other hand, the annotation model is composed of annotation data having a structure based on a taxonomy, which is designed in consideration of specialized information of the contents of a moving image.

6 shows an example of the content of a video sentence and a video paragraph according to the present invention. Referring to FIG. 6, it is possible to classify one episode of a medical documentary broadcast program <secret of life history> into a plurality of video paragraphs having a small size, and one video paragraph has a common small size but has a different content type Video sentences. These episodes can be classified into video paragraphs and video sentences, so that they can be structured to search contents in a large category and a small category. Video paradigms help you overview video sentences when video sentences are over-segmented, so you can see the benefits of a hierarchical structure for your search.

FIG. 7 is a block diagram of a moving picture annotation system according to a preferred embodiment of the present invention.

7, the video annotation system 10 according to the present invention includes a user interface unit 100, a video control unit 120, a script control unit 130, a video playback unit 140, a video sentence indexing unit 150, A video paragraph indexing section 160, a video sentence annotation section 170 and a video paragraph annotation section 180 to index any video into video sentences and video paragraphs and write annotations. Hereinafter, the operation of each unit constituting the video annotation system 10 will be described in more detail.

FIG. 8 is a photograph of a screen according to an example of a user interface unit in a video annotation system according to a preferred embodiment of the present invention, in order to explain a user interface unit. Referring to FIG. 8, the user interface unit 100 allows a user to access each unit of the video annotation system. The user interface unit 100 includes a video control interface unit 102 connected to the video control unit, A video sentence indexing interface unit 105 connected to the video sentence indexing unit, a video document accessing unit 105 connected to the video document indexing unit, An indexing interface unit 106, a video text annotation interface unit 107 connected to the video text annotation unit, and a video document annotation interface unit 108 connected to the video text annotation unit.

The video controller 120 includes a video loader for loading a video file to be indexed, and a shot boundary detection module. The shot boundary detection module detects shot boundaries of the moving picture file loaded by the video loader, segments the moving picture file by shot boundary by shot boundary detection, A shot index file 122 is generated and stored.

The shot index file includes time information and frame information for a border of each shot.

The script control unit 130 includes a PCM module and a script writer / loader, and generates and stores a script file 132. The PCM module analyzes audio PCM information, which is audio data of the moving picture, and generates and stores a script file 132 composed of text data. The script writer / loader may load a subtitle file for the moving image and store the subtitle file in a script file, and may also check and modify the script through a user window.

The video playback unit 140 plays back shots according to a shot sequence using the shot index file 122. [

The video sentence indexing unit 150 forms and indexes video sentences using the shot index file and the script file. That is, the video sentence indexing unit 150 sets a video sentence by grouping shots based on a sentence unit of a script file, and indexes the set video sentences.

The video paragraph indexing unit 160 groups the video sentences according to a common subject to form a video paragraph and index video paragraphs.

The video sentence annotation unit 170 writes annotations on the indexed video sentences and generates video sentence metadata on the basis of the annotations. FIG. 9 is a view illustrating a video sentence annotation interface unit of a user interface unit in a video annotation system according to a preferred embodiment of the present invention, which captures a screen according to an example of a video sentence annotation interface unit. Referring to FIG. 9, the annotations written by the video sentence annotator for each video sentence may include a video sentence type, specialized scene annotation information, and specialized frame annotation information.

The video paragraph annotation unit 180 writes an annotation on the video texts and generates video paragraph metadata based on the annotations. FIG. 10 is a view showing an example of a video document annotation interface unit in order to explain a video document annotation interface unit of a user interface unit in a video annotation system according to a preferred embodiment of the present invention. Referring to FIG. 10, the comment written by the video paragraph annotator for each video paragraph includes a video paragraph type and video paragraph keyword information for each video paragraph.

Hereinafter, the video annotation system according to the present invention having the above-described configuration is applied to the medical-related broadcast content " secret of life history "

First, when analyzing the characteristics of broadcast content <secret of life history>, the first characteristic is '① massive episode data', and the second characteristic is '② content on health theme'. The third characteristic is '③ patterned content structure', which has the form of repeating the contents of certain structure because there is a broadcast intention to transmit the information in a form that the age of viewers is so diverse and health information is easy to acquire. The fourth characteristic is '④ Various and useful information composition', which is a graphic scene that describes the long-term scenes inside the body that are difficult for human to see with the naked eye, the structures and action scenes of very small viruses, (Graphs, tables, etc.) that allow you to reuse information that you can trust with reliable sources of information, and interviews with experts.

After analyzing the characteristics of <the secret of life history>, it is necessary to grasp the intention of users and viewers to access contents. As described above, the content analysis and contents access intention are grasped and a contents descriptor is modeled, and a specific overall system is explained based on the contents descriptor modeling.

FIG. 11 is a block diagram showing an entire annotation and search system for information search according to the present invention.

Referring to FIG. 11, first, the Episode of the Secret of the Life History of the Contents Database is input to the annotation system first. ② The episode is segmented by shot. Shots are used as the smallest unit for indexing in the current system, not the smallest unit used to return results in the search system. Therefore, in order to make the video search unit claimed in the present invention, the group process is performed in the newly proposed units of the video sentence and the video paragraph through the information of the shots. ④ Annotator inputs information in accordance with the annotation structure. ⑤ The input information is stored in Annotation Database through metadata processing of annotation tool. The information in the Annotation Database is synchronized with the Contents Database information and used together in the search system. Ⓐ The search system receives the search query and accesses the Annotation Database and the Contents Database, and displays the information to the user through the return result of the database.

<Secrets of life history> One episode must be divided into the main scene part, so users can search the parts and get information. That is, the indexing process for the entire image must be performed first, so that the partial image can be annotated and the annotation data can be searched. Generally, for video content, a shot is basically used as a minimum segmentation unit. However, it is difficult to understand the meaning of the scene when the search unit of the episode of <the secret of life history> is shot, and the index unit such as the video sentence and the video paragraph is newly defined and proposed in the chapter. The reason for claiming a new unit is that the content can be effectively divided into meaning units, the content structure is clear, and the pattern is repeated in a new unit form, so that users can search information effectively through video sentence and video paragraph unit search have.

<Secrets of life history> In the content, loss of context of voice data occurs when divided into shots. In "The Secrets of Life History", content development is often made through narrator. When describing the disease or introducing the performers in advance, various shots are used to convey the content. In other words, since the narrator uses various shots while carrying out a voice description which is about one sentence length when transmitting contents, it is difficult to know what contents are transmitted by dividing the voice information when indexing such scenes in shot units. In addition, when the annotator ignores the voice information and annotates only through the video information, the annotated annotation data is generated. Therefore, <secrets of life history> contents are important not only for video information but also for voice information, and they should be indexed in units that can contain both information. In addition, when indexing content in a scene-by-scene basis, rules should be understood among annotators so that users can proceed in a direction that enables them to search effectively.

Taking these characteristics into consideration, the index unit is composed of video sentences and video sentences, thereby preventing the loss of context of voice information and helping annotators annotate more clearly. For example, in an interview scene, most of the shots are made up of one shot, and because the interview ambiguity is enough to know the intent of the scene, one shot can be a video sentence. As shown in Fig. 3, a video sentence is formed and a video paragraph is formed as shown in Fig. <Secrets of life-history> One episode can be classified into several video paragraphs with small scale, and one video paragraph can be classified into video sentences having different content types, although they have a common small scale. These episodes can be classified into video paragraphs and video sentences, so that they can be structured to search contents in a large category and a small category. Video paradigms help you overview video sentences when video sentences are over-segmented, so you can see the benefits of a hierarchical structure for your search.

<Secrets of the life history> Reflecting the characteristics of the contents, we divided episodes by defining and proposing video indexes and video paragraphs, which are new index units. Now you need to fill in the information through the annotation process so that you can better search the divided indexes. Since the video annotation system according to the present invention excludes the automatic annotation process by extracting features of video or audio and assumes a passive annotation process by annotators, the information input method includes a method of keying in words or sentences freely ② It can be simplified by selecting data in a given category. We have designed a model that can find information that is extensible and specific to contents by clearly reflecting the advantages of ① and ②. There are also some advantages when hierarchical indexes are constructed and each index is classified through the annotation model. Massive index data can be systematically classified, and classification is easy if the personality is clear. Minimal information annotations on all indexes are possible, so that omissions in the search can be avoided. To categorize these indexes, taxonomies are used to form the center of the annotation system and the search system structure, and the taxonomy allows users to search effectively. And there is useful information specialized in <secret of life history> appearing in video sentence, and there are cases where it can not contain all information with taxonomy composed. To solve this problem and to increase user accessibility to useful information, we designed additional annotation model.

FIG. 12 is a graph exemplarily showing a video sentence type Texonomy for a broadcast content <secret of life history>, FIG. 13 is a graph exemplarily showing a video paragraph type Texonomy, FIG. 14 is an example of a video paragraph keyword Texonomy FIG.

Next, since there is no specific information of a special scene in which only the information in the video sentence and the video paragraph exists, only scenes with high retrieval utilization are selected from users in a special scene. This includes, for example, professional interviews, graphic explanations, and life-style health information scenes. (a) The interview scene of the professional commented on the information such as the name of the specialist, the hospital belonging to the specialist, and the medical treatment belonging to the specialist. (b) Graphic explanations Since the scene often refers to a part of a body part and a scene in which a concept or an object influences it, a model capable of annotating body parts name, concept and object is designed. Finally, (c) The health information scene in life has designed a model that can annotate food, exercise, lifestyle as described above. Since the annotation elements in (a), (b), and (c) are broad in scope to designate them as taxonomies, they are all made up of the annotator's key in, and the keyed-in list is designed so that the annotator can be re- .

Next, specialized scenes are meaningful for users to acquire and use information through video. Unlike video, there is a special way to utilize frames. The types of specialized scenes include (a) medical device output photographs, (b) graphic images, (c) statistical and academic data, and the user is able to persuade or use Can be used as an effective presentation material. (a) The medical device output photograph is the medical tool name used in the output to reflect the user's intention to use, and the annotated information is the output subject. (b) The graphic image is the body part revealed in the scene, and the concept and object as annotation information. Finally, (c) Statistical and academic data shall be annotated with the subject, source, year and form of the data (graph, table, etc.).

We have summarized the necessary information through the designed index structure and annotation model so far. The metadata, which is the return data of the annotation system to be implemented through the corresponding design structure, is composed of class diagrams. First, there are several VideoPara classes corresponding to the video paragraph starting with the Episode class, and the VideoPara class again has a VideoSen class corresponding to several video sentences. The VideoPara class has an Enumeration class corresponding to the video paragraph type taxonomy and has a list of video paragraph keyword taxonomies and a respective Enumeration class. The VideoSen class also has an Enumeration class corresponding to a video sentence type taxonomy, and has a class for content specific scene (Char_) and frame annotation information (Achieve_).

FIG. 8 to FIG. 10 illustrate screens of a screen of a user interface unit, a video sentence annotation interface unit, and a video paragraph annotation interface unit based on the above description.

15 is a table comparing features of a conventional broadcast content annotation method and an annotation method according to the present invention.

Referring to FIG. 15, a new index is designed and utilized in the index utilization portion. Existing video content is mainly used as a video index for shots and scenes, but shots can not give the correct information about the scene, causing narrator's loss of context information in the documentary. Because the rules that can be based on are ambiguous, the size of the index can be different if there are multiple commentators. ① uses scene for documentary genre indexing, but does not utilize all the information of documentary because it uses only act of object appearing in scenes as annotation information without considering narrator's voice information. However, ② takes into consideration the contents of the documentary, and uses a semantic element as much as annotating a new index DocumentaryPiece which contains Actions information of Documentarist and Commentarist. The annotation method according to the present invention also confirms the importance of the information delivered by the narrator in the <secret of life history> contents through analyzing the content, and the semantic conversion of the voice information, Unit. We defined a video index, which is a top index of a video sentence, in order to exploit the commonality of video sentences and enable users to navigate through the contents in an in-depth manner. Through the hierarchical structure of video texts and video sentences, it is easy to access how much information you want from the user's perspective or how much information you want according to the characteristics of the information.

In the annotation modeling process, the annotation information structure was used in accordance with the nature of the contents and the extensible and changeable method was used. The method of using the field list of the annotation data of (1) can quickly annotate the annotator within a given range, and it is possible to prevent the problem that the annotation information of an excessively wide range is generated, resulting in information being searched for in the search. On the other hand, the disadvantage is that if the list of fields does not vary, information can be distorted and detailed information can not be obtained. In the annotation method according to the present invention, in order to utilize the advantages of (1) and to overcome the disadvantages, some annotation field lists are used in some annotation structures and some are used to expand through key in. In the case of the video sentence proposed in the present invention, the types that can be derived from the content are limited. In addition, if the type is overdivided, the annotation cost increases, and the information that the user may find difficult to match can be generated, so that the meaning of the video sentence is grouped in common and hierarchically expressed and composed of taxonomy that does not exceed 4Depth. With the built-in taxonomy, annotators can process annotations faster in more limited fields, and data can be generated without missing for every video sentence, so users can find information effectively. The video paragraph, on the other hand, constituted taxonomy by setting some restrictive fields and some extensible key in fields. In the case of food with a disease name, it can not be said to restrict the contents according to the contents. Therefore, it is not possible to set a limited field, but it is set to have a structure that can be extended by annotators because it is a factor that becomes important annotation information in <secret of life history>. These key in forms are not only extensible in general, but also practical because they can be professionally or publicly adaptable to the characteristics of the annotator group, using the medical classification of the disease name, can do.

The metadata schema, like other studies, did not follow the metadata standard. It is possible to increase the usability of the XML structure by expressing the newly proposed index more hierarchically and to add the special information of only the contents of the <secret of the life history> which is not utilized in other researches to the annotation model, The XML DTD of the system according to the present invention is set without standardization. However, as other research suggests, if it is not standardized, it is difficult to utilize it as metadata that can be used with other content utilization systems. If additional annotation information is generated for the content, or if the annotation model using the existing taxonomy is changed, Can be costly.

Meanwhile, the moving picture annotation system according to the present invention can be implemented as a program that can be executed in a computer or the like. Thus, the moving picture annotation system according to the present invention is a program that can be read and executed by a computer, Lt; / RTI >

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, It will be understood that various changes and modifications may be made without departing from the spirit and scope of the invention. It is to be understood that the present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics thereof.

The moving picture annotation system according to the present invention can be effectively used in the moving picture search field because it can effectively support the topic search of the broadcasting contents file based on the structural annotation information generated through the system.

10: Video annotation system
100: user interface section
120:
130: Script control unit
140:
150: Video text indexing section
160: Video paragraph indexing section
170: Video sentence annotation
180: Video paragraph annotation

Claims

A video controller for loading a moving image and generating a shot index file for the moving image;
A script control unit for analyzing the voice data of the moving picture to generate a script file composed of text data;
A video sentence indexing unit for forming and indexing video sentences using the shot index file and the script file;
A video paragraph indexing unit for grouping the video sentences to form a video paragraph and index video paragraphs;
A moving picture is divided into a video sentence and a video paragraph and is indexed in video sentence units and video sentence units,
Wherein the video sentence is set to a video section composed of one or more shots forming a single sentence in a voice description by a single speaker,
Wherein the video paragraph is set to a video section comprising one or more video sentences for a common subject.

2. The system of claim 1, wherein the video annotation system
A video reproducing unit for reproducing shots according to a shot sequence using the shot index file;
A video sentence annotation unit for annotating the indexed video sentences and generating metadata for the video sentences;
A video paragraph annotation unit for annotating the video paragraphs and generating metadata for the video paragraphs;
Wherein the video annotation system indexes and annotates a moving picture into a video sentence and a video sentence.

The video annotating system according to claim 1, wherein the annotation comprises annotation data having a structure based on a taxonomy designed in consideration of specialized information of a content of a moving image.

The method as claimed in claim 1, wherein the video controller loads video to be indexed, performs shot boundary detection on the loaded video, segments the video by a shot boundary by shot boundary detection, And a shot index file for the moving picture is generated and stored.
Wherein the shot index file includes time information and frame information for a border of each shot.

The video annotating system according to claim 1, wherein the script control unit fetches a caption file for the moving image and stores the caption file in the script file.

The video coding system according to claim 1, wherein the video sentence indexing unit
Wherein video segments are grouped based on a unit of a sentence of a script file to set video sentences, and the set video sentences are indexed.

The apparatus of claim 1, wherein the video paragraph indexing unit
And grouping the video sentences according to a common theme to set a video paragraph.

The video annotating system according to claim 2, wherein the video sentence annotation unit writes a video sentence type, specialized scene annotation information, and specialized frame annotation information for each video sentence.

The video annotating system according to claim 2, wherein the video paragraph annotation unit writes a video paragraph type and video paragraph keyword information for each video paragraph.

3. The video annotation system according to claim 2, wherein the video annotation system further comprises a user interface unit allowing a user to access each unit of the video annotation system,
The user interface unit
A video control interface unit connected to the video control unit;
A script control interface unit connected to the script control unit;
A video player interface unit connected to the video player unit;
A video sentence indexing interface unit connected to the video sentence indexing unit;
A video paragraph indexing interface unit connected to the video paragraph indexing unit;
A video sentence annotation interface unit connected to the video sentence annotation unit;
A video paragraph annotation interface unit connected to the video paragraph annotation unit;
Wherein the video annotation system comprises:

(a) loading a moving image and generating a shot index file for the moving image;
(b) analyzing the audio data of the moving picture to generate a script file composed of text data;
(c) reproducing shots according to a shot sequence using the shot index file;
(d) forming and indexing video sentences using the shot index file and the script file;
(e) grouping the video sentences to form a video paragraph and indexing the video paragraph;
Wherein the moving picture is indexed into a video sentence and a video paragraph.

The video annotation method according to claim 11,
(f) annotating the indexed video sentences and generating metadata for the video sentences;
(g) annotating the video paragraphs and generating metadata for the video paragraphs;
Wherein the moving picture is indexed into a video sentence and a video paragraph, and a comment is written.

The method of claim 11, wherein the step (a) includes loading a moving image to be indexed, performing shot boundary detection on the loaded moving image, segmenting the moving image by a shot boundary by shot boundary detection And generating and storing a shot index file for the moving image,
Wherein the shot index file includes time information and frame information for boundaries of each shot.

The video annotating method according to claim 11, wherein the step (d) sets a video sentence by grouping shots based on a sentence unit of a script file, and indexes the set video sentences.

The video annotating method according to claim 11, wherein the step (e) sets a video paragraph by grouping the video sentences according to a common theme.