KR20140050535A

KR20140050535A - Apparatus and method for providing n screen service using group visual objects based on depth and providing contents service

Info

Publication number: KR20140050535A
Application number: KR1020130113380A
Authority: KR
Inventors: 김광용; 윤장우
Original assignee: 한국전자통신연구원
Priority date: 2012-10-19
Filing date: 2013-09-24
Publication date: 2014-04-29

Abstract

The present invention relates to a method for providing an image service by using more than two different kinds of screens in a device for providing an N screen service. The present invention includes a step of separating and extracting independent visual objects of different depth values from an image; a step of storing the separated and extracted independent visual objects as scenes by groups by grouping the independent visual objects according to a depth value; and a step of selecting and playing one or more scenes stored by the groups as two or more screens according to a user interaction event. [Reference numerals] (10) Independent object extraction unit; (100) Visual object storage unit by group; (20) Independent visual object file storage unit; (30) Streaming unit; (AA) Network; (BB) N screen; (CC) First screen set-top box; (DD) Second screen set-top box; (EE) Nth screen set-top box

Description

Apparatus and Method for Providing N Screen Service Using Group Visual Objects based on Depth and Providing Contents Service}

The present invention relates to an apparatus and method for providing a multimedia content service, and more particularly, to an apparatus and method for producing a visual object for each depth-based group and providing an object for each group through an N screen service.

Today, not only 2D or 3D video and still images, but also media such as 3D video games are being serviced by real time streaming or VOD down and play. Accordingly, the field of application service technology through object unit coding based on the extraction of media objects and the MPEG-4 standard for image processing has been continuously developed.

MPEG-4 based object creation technology (Sang-Wook Kim et al., Publication No. 2003-0037614, MPEG-4 content generation method and apparatus thereof) as an application service technology field through the separation of media objects and object-based encoding based on the MPEG-4 standard. ), Image processing technology for object separation extraction (Gojongkuk 3 people, Publication No. 2012-0071226, object extraction method and apparatus) or image processing method (3, Park Ji-young and others, Publication No. 2012-0071219) to obtain depth information , 3D depth information acquisition device and method).

However, in the above-described conventional techniques, when visual objects such as a background, a person, and a car are overlapped in a 2D or 3D video and a still image scene, the viewer can see each of the objects included in the 2D or 3D video and the still picture perfectly. Can't. In other words, the visual objects hidden behind the overlapped area are not shown.

The present invention groups and authors independent visual (movie or still image) objects in group value units, extracts each grouped visual object scene in units of interest that can interact with a user, and views them on various screens. A service and an apparatus are provided.

The present invention provides a method for providing an image service using two or more different types of screens in an N screen service providing apparatus, comprising: separating and extracting independent visual objects having different depth values from the image; And grouping them according to depth values into scenes for each group, and selectively reproducing one or more scenes authored for each group to two or more screens according to a user interaction event.

The present invention provides an N screen service providing apparatus using a depth-based group visual object, an independent visual object extractor for separating and extracting independent visual objects having different depth values from an image, and the separated extracted independent visual objects according to a depth value. And a group unit visual object authoring unit for grouping and authoring scenes in groups, and an N screen unit for selectively reproducing one or more scenes authored for each group to two or more screens according to a user interaction event.

According to the present invention, in the case of a digital signage service supporting a multi-screen service such as multi-vision, if the user can separate and extract only the objects of interest among scenes being played on one screen, the user can re-view on an independent screen. Targeting advertising effect can be obtained.

1 is an MPEG-4 system reference model.
2 is a configuration diagram of an N screen service providing apparatus using a depth-based group visual object according to an exemplary embodiment.
3 is a detailed configuration diagram of a group unit visual object authoring unit according to an exemplary embodiment.
4 is a detailed configuration diagram of an N screen according to an embodiment of the present invention.
5 to 7 are flowcharts illustrating a method for providing an N screen service using a depth-based group visual object according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the like elements throughout.

In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

The terms used throughout the specification are defined in consideration of the functions in the embodiments of the present invention and can be sufficiently modified according to the intentions and customs of the user or the operator. It should be based on the contents of.

The present invention authors independent visual (movie or still image) objects in group units, and extracts one or more scenes of each grouped visual object into units of interest object that can interact with a user. A method and apparatus for viewing services, respectively, on the screens of the present invention. MPEG-4 is mainly used as an international standard for application services for digital video synthesis, manipulation, indexing, and retrieval as well as providing high compression rate through object-based coding for such visual objects. .

1 is a structural diagram of an MPEG-4 system reference model.

Referring to FIG. 1, the MPEG-4 system reference model synthesizes a media object including an interaction function in a desired audio visual scene, and then multiplexes the media data into a bitstream to ensure quality of service (QoS). And transmit (2) the media content source (1) generated by synchronization to the receiving side. The receiving side then demultiplexes (3) the received media content source (1) to composition (5) the decoded data into BIFS, video, audio, animation, text, and the like, and outputs it (7). At this time, the receiving side has a system structure capable of interacting the interaction 6 with the visual scene.

In order to solve the problem of object overlap in the system reference model such as MPEG-4, the present invention groups independent visual objects according to depth, authors each visual object in a group unit, and uses grouped visual object scenes. It interacts with and allows output through various screens. In the present invention, for convenience of explanation, the system model will be described using MPEG-4 as an example, but the present invention is not limited to MPEG-4.

2 is a configuration diagram of an N screen service providing apparatus using a depth-based group visual object according to an exemplary embodiment.

2, an N screen service providing apparatus using a depth-based group visual object according to the present invention includes an independent visual object extractor 10, a group unit visual object authoring unit 100, and an N-screen 40. do. In addition, the apparatus further includes an independent visual object storage unit 20 and a streaming unit 30.

The independent visual object extractor 10 automatically or semi-automatically extracts one or more independent visual objects included in a moving image or a still image. An object extraction method is, for example, modeling background information corresponding to a background image such as a Gaussian model and a clustering model of each pixel, and separating a background and a foreground by comparing the background information with a background image when a specific image is input. If the similarity between the pixels is lower than the reference similarity, it is determined that the image different from the background model image is input and designated as the foreground candidate pixel, and the object corresponding to the foreground is extracted from the background by applying an object extraction algorithm to the foreground candidate pixel. Can be.

According to an embodiment of the present invention, the independent visual object extractor 10 specifies a depth value for each independent visual object while extracting the independent visual object. For example, '1' may be designated as the depth value, and '2' may be designated as the depth value for 'Object 2' that is located at the deepest.

The independent visual object storage unit 20 stores a file of one or more independent visual objects extracted by the independent visual object extraction unit 10. Here, one or more independent visual object files already stored may be re-edited by the independent visual object extractor 10.

The group visual object authoring unit 100 groups independent visual objects stored in the independent visual object storage unit 20 according to depth, and authors a visual object scene in group units. That is, one or more independent visual objects are grouped according to the depth value specified by the independent visual object extractor 10, and the scenes are created by setting the spatiotemporal relationship information and interaction events of the visual objects in groups. This will be described in detail with reference to FIG. 3 below.

The streaming unit 30 streams the group object generated from the group unit visual object authoring unit 100 to the N-screen 40 through a network. Although not shown in the drawing, specifically, a session unit and a network channel are set up by RTSP through a session manager, and a packet unit including sync headers for transmission efficiency and reception synchronization through a network manager. Create and stream media stream over IP network (RTP).

The N screen 40 receives the streamed media object, decodes and composes the scene, and selectively plays it through the N screens according to a user interaction event. Here, N screen service refers to a next-generation computing / network service that can share a single content in a variety of digital information devices, such as smartphones, PCs, smart TVs, tablet PCs, cars, etc., which is time, place and digital devices It is a service that allows users to watch a content anytime, anywhere, without having to watch a movie downloaded from a computer on a TV or a smartphone on the subway or a tablet PC. In the present invention, the visual objects superimposed on the moving image or the still image are grouped according to the depth value using the N screen service, and the hidden visual objects can be clearly reproduced by unfolding them into the authored scene. The configuration and operation of the N screen 40 will be described in detail with reference to FIG. 4 below.

3 is a detailed configuration diagram of a group unit visual object authoring unit according to an exemplary embodiment.

Referring to FIG. 3, in detail, the group visual object authoring unit 100 may include an independent visual object setting unit 110, a group visual object setting unit 120, a scene composition tree structure generation unit 130, and a media file generation. The unit 140 is included.

The independent visual object setting unit 110 sets space-time relationship information and user interaction event information of one or more independent visual objects. Although not shown in the drawings, a user interface may be provided for the convenience of the user.

The independent visual object setting unit 110 includes a reproduction space setting unit 111, a reproduction time setting unit 112, and an interaction event setting unit 113 in detail.

The playback space setting unit 111 sets the spatial relationship constituting the scene with the attributes of the independent visual objects. The reproduction time setting unit 112 sets the reproduction start time and the end time as attributes of the independent visual objects.

The interaction event setting unit 113 authorizes interaction event handling information for a specific visual object. Interaction event handling defines the event attribute fields for each object for the user's actions, such as outputting additional information when the user clicks the mouse in the user terminal, or moving to a desired position when dragging the mouse. It is linked in advance to work. The interaction event setting selects a source object to receive an event, and sets a value to be changed according to the type of event, a target object to perform an action, an action type, and an action type. Here, the event may include object icon selection by the user, left mouse button click by the user, right mouse button click by the user, mouse dragging by the user, menu selection by the user, and keyboard input by the user. In addition, the spatiotemporal relationship information and interaction event information set as described above are generated as an object or scene description in the form of text.

The group visual object setting unit 120 includes a depth-based grouping unit 121, a playback space setting unit 122, a playback time setting unit 123, and an interaction event setting unit 124. This is in accordance with the present invention, and the service provider groups the nested objects according to the depth value, so that only the grouped objects can process the event.

The depth-based grouping unit 121 divides the plurality of objects into one or more groups according to the depth value. For example, when the depth value is specified from 1 to 4, the visual objects designated as the depth values 2 and 3 may be separated into one group.

The playback space setting unit 121 sets a spatial relationship in which objects included in each of the groups generated by the depth-based grouping unit 121 constitute a scene. The playback time setting unit 122 sets playback start and end times of the objects grouped by the depth-based grouping unit 121 for each group. The interaction event setting unit 113 authors event information that changes the scene by a user event such as a mouse click on the objects grouped by the depth-based grouping unit 121 for each group. The group visual object setting unit 120 edits / creates the scene until the time-space relation and interaction event setting for all groups is completed.

Through the independent visual object setting unit 110 and the group visual object setting unit 120 described above, event processing is possible for each individual independent object unit, and the user action is grouped according to a depth value. Event handling is enabled for.

The scene composition tree manager 130 creates a scene composition tree by hierarchizing the generated attribute information in a tree form, and changes the scene composition tree accordingly when an object created by a user is changed. In detail, the scene configuration tree manager 130 includes a tree configuration rule unit 131 and a tree generation unit 132.

The media file generation unit 140 generates a scene description and streaming media such as video and audio constituting the scene description in the form of a media stream through encoding and muxing processes that are converted into binary codes. The scene description formatted in binary code is called BIFS (Binary Format for Scene).

4 is a detailed configuration diagram of the N-screen unit according to an embodiment of the present invention.

Referring to FIG. 4, when the user action is given, the N screen unit 40 allows the event provider to perform event processing on each object or grouping objects as intended by the service provider. It processes events that are input through devices such as the mouse and keyboard generated by the user, detects the user's menu selections, mouse events, and keyboard events, interprets them, and calls a module to process the events. According to the present invention, only some grouping objects among the objects overlapping each other can interact with the user, and in editing the objects intentionally concealed by the service provider invisible to the nested objects, the grouped objects are When the user takes an action (by dragging with the mouse or moving the mouse), the hidden object can be shown, and the hidden object is displayed and the action can be taken to take an action with the user again. Is possible.

To this end, the N-screen unit includes a decoder 210, a user interface 220, and a rendering and screen display unit 230.

The decoder 210 decrypts the streamed object file, that is, the object description, the scene description, and the scene composition tree of each of the independent visual object, the group visual object, and the independent visual object and the group visual object.

The user interface unit 220 is a device such as a mouse and a keyboard for receiving a user event in order for the object provider or grouping objects to perform event processing as intended by the service provider.

The rendering and screen display unit 230 detects such user menu selections, mouse events, and keyboard events input through the user interface unit 220, interprets them, and displays the decoded scenes through the decoder 210. In this case, according to an embodiment of the present disclosure, a scene consisting of a group object may be selectively displayed through a plurality of screens.

5 is a flowchart illustrating a method of providing a depth-based group unit visual object scene through an N screen service according to an exemplary embodiment.

Referring to FIG. 5, in S510, one or more independent visual objects included in a moving image or a still image are automatically or semi-automatically extracted. In this case, according to an embodiment of the present invention, a depth value is designated for each independent visual object while extracting the independent visual object. For example, '1' may be designated as the depth value, and '2' may be designated as the depth value for 'Object 2' that is located at the deepest.

In S520, the extracted independent visual objects are grouped according to depth, and the visual object scene is authored in group units. That is, one or more independent visual objects are grouped according to a specified depth value, and scenes are authored by setting the spatiotemporal relationship information and interaction events of the visual objects in group units. This will be described in detail with reference to FIG. 6 below.

Although not shown in the figure, the authored group object is streamed to the N-screen through the network, where the N-screen receives the streamed media object to decode and compose the scene, and plays it through the N screens. In the present invention, the visual objects superimposed on the moving image or the still image are grouped according to the depth value using the N screen service, and the hidden visual objects can be clearly reproduced by unfolding them into the authored scene.

That is, as the visual object of interest is selected through interaction with the user in S530, the visual object of interest for each group may be played through the N screen in S540.

Then, S510 and S520 will be described in more detail with reference to FIG. 6.

FIG. 6 is a flowchart illustrating a group unit visual object authoring step according to an exemplary embodiment. Referring to FIG.

Referring to FIG. 6, the group unit visual object authoring unit 100 sets time-space relationship information and user interaction event information of one or more visual objects independent in S610. An authoring interface may be provided for the convenience of the user to facilitate this. That is, the spatial relationship composing the scene is set by the attributes of the independent visual objects, and the playback start time and the end time are set by the attributes of the independent visual objects. In addition, event information for changing the scene by a user event such as a mouse click on a specific visual object is authored.

The group unit visual object authoring unit 100 divides the plurality of objects into one or more groups according to the depth value in S620. For example, when the depth value is specified from 1 to 4, only visual objects designated as the depth values 2 and 3 may be separated into one group.

The group unit visual object authoring unit 100 sets the spatial relationship constituting the scene for each group of the objects grouped in S630 and the playback start time and the end time, and the scene information is changed by a user event such as a mouse click. Authors. This enables event processing for user actions in individual individual object units, and event processing for user actions in object units grouped according to depth values.

The group unit visual object authoring unit 100 determines whether N is the authored group in S640. In other words, it is determined whether the work of all groups is finished.

As a result of the determination of S640, when the authoring of all the groups is not completed, the process proceeds to S650.

However, when the authoring of all the groups is completed as a result of the determination in S640, the group-based visual object authoring unit 100 generates a scene composition tree hierarchically organizing each object in S660.

In operation S670, the group unit visual object authoring unit 100 generates an associated object description of the object inserted into the scene composition tree. That is, when event information is authored, the event object may be added to the source object of the event after generating the event object. Obtaining information about a scene in the authored scene, and generating the authored scene as a scene description corresponding to the scene based on the scene information, the image included in the authored scene according to a predetermined object descriptor generation rule, Create an object descriptor that is an object identifier for a media object including sound and video, information including the type of the object, media encoding information, and the size of the object.

In S680, the group visual object authoring unit 100 generates a scene description and streaming media such as video and audio constituting the scene description as a media file through encoding and muxing processes that are converted into binary codes. In S690, the group unit visual object authoring unit 100 streams the generated media file to the N screen.

7 is a flowchart illustrating a group-based visual object authoring step according to an embodiment of the present invention.

Referring to FIG. 7, the N screen unit 40 decodes the streamed media file in S710. At this time, the independent objects, group objects, tree, and description are decrypted.

In S820, the N screen unit 40 determines whether there is an interaction of the group visual object from the user.

As a result of the determination in S820, when there is a group visual object interaction, the N screen unit 40 moves the visual object selected in S830 to an arbitrary N screen.

In S840, it is determined whether there is an independent visual object selection interaction. As a result of the determination in S840, when there is an independent visual object interaction, the N screen unit 40 applies the interaction to the selected independent visual object. That is, given a user action, each object or grouping object causes event processing to be performed as the service provider intends when editing / writing.

This allows only some grouping objects to interact with the user among several overlapping objects, and also allows the user to edit the grouped objects when the service provider edits objects that are intentionally hidden from the nested objects. When you take an action (by dragging with the mouse or moving it), you can make the object hidden within it appear, and then process the event to be taken by the user and the action again.

Claims

In the N screen service providing apparatus for providing a video service using two or more different kinds of screens,
Separating and extracting independent visual objects having different depth values from the image;
Authoring the separated and extracted independent visual objects into scenes by grouping the separated visual objects according to a depth value;
And selectively reproducing one or more scenes authored for each group to two or more screens according to a user interaction event.

The method of claim 1, wherein the extracting step
And assigning a depth value to each of the independent visual objects.

The method of claim 1, wherein said authoring
And providing the independent visual objects by grouping the number of screens as the number of screens.

The method of claim 1,
And streaming the scenes authored for each group to the N-screen via a network.

The method of claim 1, wherein said authoring
Setting space-time relationship information and user interaction event information of one or more independent visual objects;
Grouping one or more independent visual objects according to depth values;
Setting space-time relationship information and user interaction event information of visual objects included in each group;
Generating a scene composition tree hierarchically configured with the set independent visual objects and grouped visual objects;
And generating a media file by encoding the scene configuration tree and the visual objects.

The method of claim 5, wherein the reproducing step
Determining whether a user interaction event for a scene composed of group visual objects has occurred from a user;
When a user interaction event for a scene composed of group visual objects occurs, moving the selected visual object to any N screen, N-screen service providing method using a depth-based group visual object.

The method of claim 1, wherein the reproducing step
Independent Visual Object Selection When there is a user interaction event, the method further includes applying a user interaction event to the selected independent visual object.

An independent visual object extractor for separating and extracting independent visual objects having different depth values from an image;
A group unit visual object authoring unit which groups the separated and extracted independent visual objects according to a depth value and authors them as scenes for each group;
And an N screen unit comprising two or more screens for selectively playing one or more scenes authored by the group according to a user interaction event.

The method of claim 8, wherein the independent visual object extraction unit
N-screen service providing apparatus using a depth-based group visual object, characterized in that for assigning a depth value to each of the independent visual objects.

The method of claim 8, wherein the group unit visual object authoring unit
N-screen service providing apparatus using a depth-based group visual object, characterized in that for grouping the independent visual objects as the number of the screen.

The method of claim 8,
The apparatus for providing N screen service using a depth-based group visual object further comprises a streaming unit for streaming the scenes authored for each group to an N-screen through a network.

The method of claim 11, wherein the streaming unit
An apparatus for providing N screen service using a depth-based group visual object, characterized in that a session is established and a network channel is set by RTSP, a packet unit including sync headers is generated, and a media stream is transmitted through an IP network.

The method of claim 8, wherein the group unit visual object authoring unit
An independent visual object setting unit for setting space-time relationship information and user interaction event information of one or more independent visual objects;
A group visual object setting unit for grouping one or more independent visual objects according to a depth value, and setting space-time relationship information and user interaction event information of visual objects included in each group;
A scene composition tree manager configured to generate a scene composition tree hierarchically configured with the set independent visual objects and grouped visual objects;
And a media file generation unit for generating a media file by encoding the scene composition tree and the visual objects.

The method of claim 8, wherein the N screen unit
An apparatus for providing N screen service using a depth-based group visual object, characterized in that, when a user interaction event for a scene composed of group visual objects is generated from a user, the selected group visual object is moved to an arbitrary N screen.

The method of claim 8, wherein the N screen unit
Independent visual object selection When there is a user interaction event, the apparatus for providing N screen service using a depth-based group visual object, wherein the user interaction event is applied to the selected independent visual object.