KR20120134965A - Method for interaction using multi-modal input device - Google Patents

Method for interaction using multi-modal input device Download PDF

Info

Publication number
KR20120134965A
KR20120134965A KR1020110054233A KR20110054233A KR20120134965A KR 20120134965 A KR20120134965 A KR 20120134965A KR 1020110054233 A KR1020110054233 A KR 1020110054233A KR 20110054233 A KR20110054233 A KR 20110054233A KR 20120134965 A KR20120134965 A KR 20120134965A
Authority
KR
South Korea
Prior art keywords
input
input device
multimedia
coordinate
image
Prior art date
Application number
KR1020110054233A
Other languages
Korean (ko)
Inventor
홍영표
Original Assignee
제노젠(주)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 제노젠(주) filed Critical 제노젠(주)
Priority to KR1020110054233A priority Critical patent/KR20120134965A/en
Publication of KR20120134965A publication Critical patent/KR20120134965A/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

Provided is a method for interacting with a communication broadcasting device in a communication broadcasting environment that distributes multimedia contents.
The interaction method includes a data input step of inputting multimedia data including text, images, moving pictures, and audio by a single multi-modal input device 100, wherein the multimedia data input at the input step is metadata of multimedia content. The search word may be inputted as a search word for searching for multimedia content, or the search term for searching metadata of the multimedia content.
The data input by the multi-modal input device 100 may include a hand sketch or a hand script, and may include a click command for controlling the function and operation of an application executed in the communication broadcasting device.

Description

Interaction method using multi-modal input device {METHOD FOR INTERACTION USING MULTI-MODAL INPUT DEVICE}

The present invention relates to an interaction method in a communication broadcasting device using a multi-modal input device, and more specifically, to a text, image, moving picture, voice, The present invention relates to a method of searching, accessing, producing, modifying, editing, and playing a multimedia document by interacting with a communication broadcasting device using a multimodal input device capable of inputting data such as a hand sketch and a hand script.

The digital convergence communication / broadcast convergence market such as DMB, IPTV, PC, PMP, and smart phone is growing rapidly, and the demand for metadata-based multimedia contents is increasing, and it is changing from text-oriented content to audio visual content. Doing.

Unlike text-based content that used only text search terms, metadata-based audio visual content cannot provide satisfactory searching using text search terms.

Therefore, there is an active development of a technique for making content access and retrieval using content feature descriptions such as MPEG-7. These technologies enhance search and access to audio visual content by using search as a search term for all the media that make up audio visual content such as images, moving images, voice, hand sketches, hand scripts, etc., as well as text. To enable the use of.

On the other hand, in a communication / broadcasting convergence environment, instead of receiving and viewing content unilaterally, both parties can create, modify, or edit audio visual content. Especially important.

Applicant has described the invention as "the coordinate pattern and the pattern sheet and the coordinate recognition method using the same", another invention named "method of recognizing coordinates from the coordinate pattern on the pattern sheet", and Another invention entitled "digitizer and coordinate recognition method using a camera" and another invention entitled "Camera-based input device with digitizer function" have been patented. The inventions propose an input device configured by a digital camera and capable of inputting various media used in audio visual contents such as text, images, moving images, voices, hand sketches, and hand scripts.

The present invention is to provide a method for interacting with a communication broadcasting device using a multi-modal input device that can input text, images, moving pictures, voice, hand sketch, hand script, and the like.

The present invention is to provide a method for retrieving a multimedia document on a communication broadcasting device using a multi-modal input device capable of inputting text, images, moving images, voices, hand sketches, hand scripts, and the like.

The present invention seeks to provide a method for accessing a multimedia document and a specific content segment within a document on a telecommunication broadcasting device using a multimodal input device capable of inputting text, images, moving images, voice, hand sketches, hand scripts, and the like. will be.

The present invention is to provide a method for producing a multimedia document on a communication broadcasting device using a multi-modal input device that can input text, images, moving images, voice, hand sketch, hand script and the like.

The present invention is to provide a method for modifying a multimedia document on a telecommunication broadcasting device using a multi-modal input device capable of inputting text, images, moving images, voices, hand sketches, hand scripts, and the like.

The present invention is to provide a method for editing a multimedia document on a telecommunication broadcasting device using a multi-modal input device that can input text, images, moving images, voice, hand sketch, hand script, and the like.

The present invention is to provide a method for playing a multimedia document on a telecommunication broadcasting device using a multi-modal input device that can input text, images, moving images, voice, hand sketch, hand script, and the like.

According to the present invention, a method of interacting with a communication broadcasting device in a communication broadcasting environment for distributing multimedia contents is provided.

The interaction method includes a data input step of inputting multimedia data including text, images, moving pictures, and audio by a single multimodal input device.

Preferably, the multimedia content is metadata-based multimedia content, and the multimedia data input at the input step is input as metadata of the multimedia content.

Optionally, the multimedia data input in the input step may be input as a search word for searching for multimedia content.

Preferably, the multimedia content is metadata-based multimedia content, and the multimedia data input in the input step is input as a search word for searching metadata of the multimedia content.

Preferably, the multimedia data input in the input step includes a hand sketch or a hand script input by the multimodal input device.

Preferably, the multimedia data input in the input step includes a click command for controlling the function and operation of an application executed in the communication broadcasting device.

Using the interaction method according to the present invention, a single multi-modal input device can be used to easily control a communication broadcasting device, and to easily, easily create, modify, and edit metadata-based multimedia contents, and to search text-based search terms. You can easily and easily search and access multimedia contents that are difficult to obtain satisfactory search results by using various multimedia-based search terms such as text, images, audio, hand scripts, and hand sketches. It can be easily and easily moved to the desired position within.

According to the present invention, the distribution and utilization of multimedia contents can be greatly expanded and contribute to the development of the telecommunication broadcasting convergence industry.

1 shows a perspective view of a pen-type input device constructed using a digital camera according to a preferred embodiment of the present invention,
FIG. 2 is a view showing a state in which the left half and the front cap of the outer shell of the input device shown in FIG. 1 are removed;
3 is a view illustrating a front portion of the input device shown in FIG. 1,
4 is a view showing a state in which the front cap of the input device shown in Figure 3 is removed,
FIG. 5 is a diagram illustrating a state of recognizing coordinate values while moving on a pattern sheet according to a preferred embodiment of the present invention in order to implement a digitizer function of the input device shown in FIG. 1.
FIG. 6 is a diagram illustrating an image for describing an example of implementing a scanner function of the input apparatus illustrated in FIG. 1.
FIG. 7 is a diagram illustrating an image for describing another example of implementing a scanner function of the input apparatus illustrated in FIG. 1.
8 is a screen capture image showing the appearance of producing multimedia content in an exemplary multimedia document creator for implementing a communication broadcasting device interaction method according to a preferred embodiment of the present invention.
FIG. 9 is a screen capture image showing when the multimedia content produced in FIG. 8 is opened with Internet Explorer.
10 is a captured image of a main menu and a toolbar of an exemplary multimedia document composer for implementing a communication broadcasting device interaction method according to a preferred embodiment of the present invention.
FIG. 11A is an image capturing a pull-down menu that appears when a file menu is clicked in the main menu shown in FIG. 10.
FIG. 11B is an image capturing a toolbar associated with a pull-down menu of the file menu shown in FIG. 11A.
12A is an image capturing a pull-down menu that appears when an edit menu is clicked in the main menu shown in FIG. 10.
12B is an image capturing a toolbar associated with a pull-down menu of the edit menu shown in FIG. 12A.
FIG. 13 is a screen capture image showing the operation of clicking a tag finder in a pull-down menu of the file menu shown in FIG. 12A.
FIG. 14A is an image capturing a pull-down menu that appears when an insert menu is clicked in the main menu shown in FIG. 10.
FIG. 14B is an image capturing a toolbar related to the pull-down menu of the insert menu shown in FIG. 14A,
FIG. 15 is a screen capture image showing an operation of clicking a horizontal line property in a pull-down menu of an insert menu shown in FIG. 14A.
FIG. 16A is an image capturing a pull-down menu that appears when the format menu is clicked in the main menu shown in FIG. 10.
FIG. 16B is an image capturing a toolbar associated with a pull-down menu of the format menu shown in FIG. 16A.
FIG. 17 is a screen capture image showing a user working by clicking a hyperlink in a pull-down menu of the format menu shown in FIG. 16A.
FIG. 18A is an image capturing a pull-down menu that appears when a table menu is clicked in the main menu shown in FIG. 10.
FIG. 18B is an image capturing a toolbar associated with a pull-down menu of the table menu shown in FIG. 18A;
FIG. 19 is a screen capture image showing the operation of clicking Insert Table in the pull-down menu of the Table menu shown in FIG. 18A.
FIG. 20 is a screen capture image showing the operation of clicking a table property in a pull-down menu of the table menu shown in FIG. 18A.
FIG. 21 is a screen capture image showing how a moving picture, an image, and an audio are inserted into multimedia content in the exemplary multimedia document creator illustrated in FIG. 8 by using a multi-modal input device according to a preferred embodiment of the present invention.
FIG. 22 is a screen capture image showing an example of inserting audio into a moving picture of multimedia content in the exemplary multimedia document writer illustrated in FIG. 8 using a multi-modal input device according to a preferred embodiment of the present invention.
FIG. 23 is a screen capture image showing how a subtitle is inserted in a moving picture of multimedia content in the exemplary multimedia document creator shown in FIG. 8 using the multi-modal input device according to the preferred embodiment of the present invention.

Hereinafter, the configuration of a multi-modal input device according to a preferred embodiment of the present invention and an interaction method for a communication broadcasting device using such an input device will be described in detail.

1. Configuration of Input Device

First, the configuration of a multi-modal input device according to a preferred embodiment of the present invention will be described in detail.

1 shows a perspective view of a pen-type input device 100 constructed using a digital camera 110 in accordance with a preferred embodiment of the present invention, and FIG. 2 is a left half of the outer shell of the input device 100 shown in FIG. The unit 151 and the front cap 160 are removed, and FIG. 3 shows a front portion of the input device 100 shown in FIG. 1, and FIG. 4 shows a front cap of the input device shown in FIG. 3. It shows in the state which removed 160.

As shown in the drawing, the input device 100 according to the preferred embodiment of the present invention includes a digital camera 110 and includes a digitizer, a circuit device 120 for implementing a scanner and a camera function according to the present invention. The circuit device 120 is attached to a microphone (not shown) for inputting audio.

The camera 100 and the circuit device 120 are embedded in an outer shell composed of a left half 151 and a right half 152, and the front of the outer shell is closed by the front cap 160.

The front cap 160 opens the field of view of the camera 100 or includes a transparent window.

The input device 100 according to this embodiment includes a pen shim 140 configured to maintain a constant distance of the camera 110 to the ground and to photograph the ground in a stable posture.

In this embodiment, the central axis of the pen shim 140 is inclined with respect to the central axis of the stem 150 which is the outer shell of the input device to be held by the user so that the user holds the stem 150 and the pen tip in a comfortable posture. When placing 141 on the ground, the central axis of the pen shim 140 is substantially perpendicular to the ground. In this state, if the optical axis of the camera 110 is arranged to coincide with the central axis of the pen shim 140, the subject on the ground can be photographed while the optical axis of the camera 110 is substantially orthogonal. It is possible to reduce the distortion of the character to be recognized by the coordinate pattern or scanner function to be recognized by the digitizer function.

The input device 100 according to this embodiment includes a switch 130 for mode selection and operation control. One or more switches 130 may be configured to implement functions assigned to each switch according to an operation method, that is, a time and a number of times of pressing the switch.

The input device 100 is a multi-modal input device capable of inputting various media data such as text, image, moving picture, voice, hand sketch, hand script, etc. by combining a digitizer function, a scanner function, a camera function, and a microphone function.

In order to implement the digitizer function of the input device 100, a configuration of a coordinate recognition method for recognizing a position of a point on the pattern sheet 200 as a coordinate value will be described in detail.

FIG. 5 illustrates a state of recognizing coordinate values while moving the input device 100 shown in FIG. 1 onto the pattern sheet 200 according to the preferred embodiment of the present invention.

Although the coordinate patterns on the pattern sheet 200 are arranged to form a matrix coinciding with the horizontal direction and the vertical direction, the image photographed by the digital camera 110 may be inconsistent with the coordinate pattern due to lens aberration and tilt of the camera optical axis with respect to the subject. The matrix does not coincide with the horizontal direction and the vertical direction, but the distortion occurs.

In order to increase the recognition rate of the coordinate value and to accurately recognize it, it is necessary to analyze the coordinate pattern while correcting such distortion on the captured image to make the matrix of the coordinate pattern as consistent with the horizontal and vertical directions as possible.

According to this embodiment, the distortion is corrected by measuring the distortion state, i.e., the inclination in the horizontal and vertical directions, by the boundary line separating the adjacent coordinate patterns, and moving the pixels constituting each coordinate pattern by the measured inclination. do.

According to this embodiment, the coordinate pattern arranged on the pattern sheet 200 has a directionality, and the image is rotated so that the coordinate pattern on the image picked up on the pattern sheet 200 by the camera 110 is disposed in the forward direction.

According to this embodiment, the coordinate pattern arranged on the pattern sheet 200 is divided into a first type coordinate pattern and a second type coordinate pattern.

The first type coordinate pattern is an X-axis coordinate pattern in which a direction identification pattern element is disposed at the center of the left end. The X-axis coordinate pattern means that a value composed of a combination of coordinate identification pattern elements represents an x coordinate value of a point where the corresponding coordinate pattern is disposed. The second type coordinate pattern is a Y-axis coordinate pattern in which a direction identification pattern element is disposed at the top center. The Y-axis coordinate pattern means that the value formed by the combination of the coordinate identification pattern elements represents the y coordinate value of the point where the corresponding coordinate pattern is disposed.

According to this embodiment, the position coordinate of either the horizontal direction or the vertical direction is recognized from each coordinate pattern, and includes both the horizontal position coordinate and the vertical position coordinate with reference to the position coordinate of the adjacent coordinate pattern. The coordinate value is determined as the center coordinate value of each coordinate pattern.

An image of the pattern sheet 200 photographed by the camera 110 may be inclined to the left or the right. This phenomenon occurs when the user of the pen-type input device 100 photographs in a state in which the camera 110 of the input device 100 cannot be disposed in the forward direction of the pattern sheet 200, and has a camera having a lens 111 ( In the process of using the input device 100 according to the present invention using the 110 is a phenomenon that can occur almost daily.

In order to increase the recognition rate and accuracy of coordinate values, the matrix of coordinate patterns of the tilted image should be rotated to match the horizontal and vertical lines.

In addition, the image photographed by the camera 110 of the pattern sheet 200 may not only be inclined to the left or the right, but also may be distorted without forming a square pattern.

This phenomenon is not only because the user of the pen-type input device 100 did not arrange the camera 110 of the input device 100 in the forward direction of the pattern sheet 200, but also the optical axis of the camera 110 in the pattern sheet 200. This is a phenomenon caused by imaging in an inclined state without being disposed so as to be orthogonal to, and may occur almost daily in the process of using the input device 100 according to the present invention using the camera 110 having the lens 111.

In order to increase the recognition rate and accuracy of coordinate values, the distortion state is measured by a boundary line separating adjacent coordinate patterns, and the pixels constituting each coordinate pattern must be moved by the measured slope.

For this purpose, the coordinate pattern of the image is processed in monochrome. The monochromatic processing is performed by color substitution of the yellow pattern element with the same color as the gray pattern element among the coordinate identification pattern elements composed of yellow and gray. Black and white processing an image having a monochrome coordinate pattern. By inversely processing the black and white processed image, an image in which the boundary line between adjacent coordinate patterns is represented by a black line is obtained.

By replacing the pixels constituting each coordinate pattern with black and replacing the remaining pixels with white, the image may be reversed and the boundary line between adjacent coordinate patterns may be represented by a black line, but the pixels constituting each coordinate pattern may be white. By replacing the remaining pixels with black, the boundary line between adjacent coordinate patterns may be directly represented by a black line without reversed processing.

Measure the horizontal tilt of the image by rotating the image until the black border near the horizontal line coincides with the horizontal line, and measure the vertical tilt of the image by rotating the image until the black border line near the horizontal line coincides with the vertical line. do.

The difference between the horizontal slope and the vertical slope of the coordinate pattern means a distortion state, that is, a degree of distortion of the coordinate pattern.

By rotating the image and aligning it with the horizontal line, the horizontal tilt is corrected. In the image where the horizontal tilt is corrected, the distortion of the coordinate pattern can be corrected by moving the pixels of the coordinate pattern by the difference between the horizontal tilt and the vertical tilt. have. It is also possible to correct the distortion of the coordinate pattern by performing vertical alignment first and moving the pixels of the coordinate pattern up and down.

Coordinate values may be recognized by analyzing pattern elements of one or more recognition target coordinate patterns near the reference point, and coordinate values corresponding to the reference point may be recognized by interpolation by a distance between the reference point and the coordinate patterns.

On the other hand, in the case of contamination or damage of a part of the coordinate pattern, deterioration of the captured image, or pen type digitizer, the recognition of the coordinate value in the part of the coordinate pattern may fail due to various reasons such as part of the coordinate pattern being covered by the pen tip. Can be.

However, if the coordinate value can be recognized only from two or more coordinate patterns, that is, one or more Y-axis coordinate patterns and one or more X-axis coordinate patterns, even if it fails to recognize the coordinate value from the remaining recognition target coordinate pattern, The coordinate value of the reference point can be obtained.

The input device 100 according to the present invention recognizes the coordinate values on the pattern sheet 200 as described above, thereby digitizing the movement of the input device 100 on the pattern sheet 200 as a coordinate value, that is, a conventional You can implement the functions of the digital pen. However, while a digitizer such as a digital pen or the like recognizes a coordinate value by capturing a coordinate pattern by an image sensor for an optical mouse, the input device 100 according to the present invention captures a coordinate pattern by a digital camera 100. The configuration of image processing and analysis is different by recognizing coordinate values.

This digitizer function allows input of data such as hand sketches, hand scripts, and the like, and can move the cursor.

The implementation of a scanner function of capturing characters using the input device 100 and converting them into text by optical character recognition technology will be described in detail.

FIG. 6 is a diagram illustrating an image for explaining an example of implementing a scanner function of the input apparatus 100 illustrated in FIG. 1.

As shown in FIG. 6, in this example, the input device 100 shown in FIG. 1 is photographed as a plurality of divided images while moving along a character string printed on the ground, and then connected by an image stitch method to obtain an image to be recognized.

The acquired recognition target image is converted into text by monochrome processing, monochrome processing, and noise removal to analyze the arrangement of the black pixels forming the recognition target character string.

Monochromatic processing and monochrome processing are the same as the monochrome processing and monochrome processing in the digitizer function.

FIG. 7 is a diagram illustrating an image for explaining another example of implementing a scanner function of the input apparatus 100 illustrated in FIG. 1.

As shown in FIG. 7, in this example, the camera prints the character string to which the character string to be recognized by the camera 110 of the input apparatus 100 shown in FIG. Obtain the recognition target image.

As described above, the obtained recognition object image is converted into text by monochrome processing and monochrome processing, and noise is removed to analyze the arrangement of the black pixels forming the character string to be recognized.

In order to cut only the part of the character string to be recognized in the captured image, the position where the character string is printed on the image should be recognized. Analyzing the image captured by the camera, it can be seen that contrast between the pixels forming the character and the adjacent pixels is very large in the region where the character string is printed unlike other parts. Therefore, a region where the contrast between adjacent pixels is larger than other regions can be recognized as a region where a character string is printed.

In the case of implementing the scanner function in the input device 100 illustrated in FIG. 1, the recognition rate and the accuracy may be increased by obtaining a high quality image focused on an area where a character string is printed.

Therefore, after focusing on the area where the character string is printed by using the auto focusing function of the camera 110, the recognition image is captured.

In order to focus on the area where the character string is printed, after capturing images of the focal lengths of several frames, the character of the image having the largest contrast is analyzed by analyzing the pixels of the area where the character string is printed and comparing the contrast. Select the focal length for the printed area.

Image processing and analysis for recognizing text, as described above, includes monochrome processing and monochrome processing.

As described above, the input apparatus 100 according to the present invention may implement a scanner function of capturing a character recognition area and converting the text into a text.

Text recognized by the scanner function can be input to the communication broadcasting device.

On the other hand, the input device 100 shown in Figure 1, including a digital camera 110 consisting of a lens 111 and an image sensor (not shown), may include all the components included in a typical webcam, A microphone (not shown) is also attached.

Implementing the camera function in the input device 100 according to the present invention can be configured in the same way as in the conventional camera using a digital image sensor, such as a webcam, it will not be described in detail here.

As described above, the input device 100 according to the present invention is a multi-modal input device capable of inputting various media data such as text, image, moving picture, audio, hand catch, hand script, and the like.

2. In telecommunication broadcasting equipment Interaction  Way

Hereinafter, a method of searching, accessing, producing, modifying, editing, and playing multimedia content in a communication broadcasting device by interaction using the input apparatus 100 described above will be described in detail.

8 is a screen capture image showing the appearance of producing multimedia content in an exemplary multimedia document creator for implementing a communication broadcasting device interaction method according to a preferred embodiment of the present invention.

The communication broadcast receiver is, for example, a DMB, an IPTV, a PC, a PMP, a smartphone, and the like, and generically refers to all devices on which an operating system (OS) in which an application such as a document writer shown in FIG. 8 can be executed is built.

The document writer shown in FIG. 8 is an example for explaining a communication broadcasting device interaction method using a multi-modal input device according to a preferred embodiment of the present invention, and is capable of producing, editing, and playing metadata-based multimedia content. This invention can be applied to a document writer.

FIG. 9 is a screen capture image showing when the multimedia content produced in FIG. 8 is opened with Internet Explorer. This multimedia content can be distributed in the same distribution channel as the distribution channel of all multimedia contents currently being distributed, and can be searched, accessed and utilized in the same manner. Currently searching and accessing only by text. If future search and access technologies based on various media search terms such as images, audio, hand sketches, hand scripts, etc., such as MPEG-7 are developed, the search and access will also be applicable. Can be.

10 is a captured image of the main menu and toolbar of an exemplary multimedia document creator for implementing a telecommunications device interaction method according to a preferred embodiment of the present invention. As shown in Fig. 10, the main menus of the document composer are the menus of the general document composer. Therefore, description of the configuration and function of these menus is omitted. According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

FIG. 11A is an image capturing a pull-down menu that appears when the file menu is clicked in the main menu shown in FIG. 10, and FIG. 11B is an image capturing a toolbar related to the pull-down menu of the file menu shown in FIG. 11A. The pull-down menus in the File menu are also available in the common web document builders. Therefore, description of the configuration and function of these menus is omitted. According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

12A is an image capturing a pull-down menu that appears when the edit menu is clicked in the main menu shown in FIG. 10, and FIG. 12B is an image capturing a toolbar related to the pull-down menu of the edit menu shown in FIG. 12A. The pull-down menu in the Edit menu also contains the menus found in the common web document builders. According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

The pull-down menu of the edit menu shown in FIG. 12A has a find menu along with a find menu. The find menu searches for display data of content, that is, data that a user can watch in a viewer such as a web browser. The menu is metadata, that is, a user cannot watch in the viewer, but searches for data defining an attribute of the display data or describing or summarizing the contents, such as a content descriptor describing an attribute of a tag or a content segment.

FIG. 13 is a screen capture image showing that a tag tag search operation is performed by using a pull-down menu of the file menu illustrated in FIG. 12A. If you enter a search term in the search box and click the Find button, the search is performed on the metadata, not the display data, and the content or content segments with the metadata including the search term are displayed as search results. Clicking on a result brings you to that content or content segment.

The text search word input to the search box of FIG. 13 may input a character recognized using a scanner function of the input apparatus 100 shown in FIG. 1, or may recognize a character that recognizes a hand script input by using a digitizer function. You can also type. In addition, currently searching and accessing only by text, so we only illustrate text search terms, but in the future, search and access technologies based on various media search terms such as images, audio, hand sketches, hand scripts, etc. will be developed. This invention can also be applied. Such input of image, audio, hand sketch, hand script, etc. is also made by the input device 100 shown in FIG.

14A is an image capturing a pull-down menu that appears when the insert menu is clicked in the main menu illustrated in FIG. 10. FIG. 14B is an image capturing a toolbar related to the pull-down menu of the insert menu illustrated in FIG. 14A. In FIG. 14A, only a horizontal line, a picture, and a video insertion menu are illustrated, but various media data such as audio, hand sketches, and hand scripts can be inserted, and text data can be directly inserted at the position of the cursor on the desktop of the document. . According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

FIG. 15 is a screen capture image showing the operation of clicking a horizontal line property in a pull-down menu of the insert menu shown in FIG. 14A. The horizontal line attribute may be a search target as metadata defining an attribute of the horizontal line which is display data.

FIG. 16A is an image capturing a pull-down menu that appears when the form menu is clicked in the main menu shown in FIG. 10, and FIG. 16B is an image capturing a toolbar related to the pull-down menu of the form menu shown in FIG. 16A. The pull-down menus of the form menus also include the menus found in common web document builders. According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

FIG. 17 is a screen capture image showing that a user clicks a hyperlink in a pull-down menu of a format menu shown in FIG. 16A. The hyperlink is also metadata, and may be a search target, and may be input using the input apparatus 100 shown in FIG. 1.

FIG. 18A is an image capturing a pull-down menu that appears when the table menu is clicked in the main menu shown in FIG. 10, and FIG. 18B is an image capturing a toolbar related to the pull-down menu of the table menu shown in FIG. 18A. The pull-down menu of the table menu also includes the menus that are available in the common web document builders. According to the present invention, such menus may be executed by moving a cursor and inputting a click command using the input apparatus 100 shown in FIG. 1.

FIG. 19 is a screen capture image showing an operation of clicking Insert Table in the pull-down menu of the Table menu shown in FIG. 18A. In a typical web document, a table is used to maintain the structure of the web document by defining spatial positions of content segments on the web document, and a table inserting operation is also performed by using the input device 100 shown in FIG. You can execute it by moving it and entering the click command.

FIG. 20 is a screen capture image showing the operation of clicking a table attribute in a pull-down menu of the table menu shown in FIG. 18A. Table attributes can also be searched as metadata.

FIG. 21 is a screen capture image showing a moving picture, an image, and an audio inserted into multimedia content in the exemplary multimedia document creator illustrated in FIG. 8 using a multimodal input device according to a preferred embodiment of the present invention.

The illustrated work illustrates that both moving pictures, images, and audio are input using an input device labeled "XENOGEN MultiPen", that is, the input device 100 shown in FIG. However, the present invention is not limited to such an example, and may be input using any input device other than the input device 100 shown in FIG. 1, that is, any of built-in or external cameras and microphones provided in the communication broadcasting device. It can also be inserted from a previously created and stored file.

FIG. 22 is a screen capture image showing the insertion of audio onto a moving picture of multimedia content in the exemplary multimedia document writer illustrated in FIG. 8 using the multi-modal input device according to the preferred embodiment of the present invention. While the task shown in FIG. 21 is a task of inserting a content segment made up of moving images, images, and audio data on the multimedia content, the task illustrated in FIG. 22 further inserts audio into the video content segment on the multimedia contents. It's work.

FIG. 23 is a screen capture image showing how a subtitle is inserted in a moving picture of multimedia content in the exemplary multimedia document creator shown in FIG. 8 using the multi-modal input device according to the preferred embodiment of the present invention. In the operation shown in Fig. 23, the caption, i.e., text data is applied to the moving image content segment on the multimedia content. However, it is also possible to apply an image, a hand sketch, a hand script, and the like to the moving image content segment. It is also possible to insert the generated one by using the input device 100 shown in FIG. 1, and is generated by the built-in or external device of the communication broadcasting device by inputting a click command by the input device 100 shown in FIG. 1. It is also possible to insert it, or it is also possible to insert from a file generated and stored in advance by inputting a click command by the input device 100 shown in FIG.

As described above, in the multimedia document creator for producing and editing metadata-based multimedia content, text, images, moving images, audio, hand sketches, hand scripts, etc. are displayed using the multimodal input device 100 shown in FIG. 1. While inputting data or metadata, multimedia content may be produced, modified, and edited, and such content may be distributed through various distribution channels in a communication broadcasting environment, and the multimodal input device 100 shown in FIG. 1 may be used. Can be searched and accessed by input text, images, audio, hand sketches, hand scripts, etc., and can be played in a viewer such as a web browser.

It will be apparent to those skilled in the art that modifications, variations, and substitutions of the construction of the present invention in accordance with the preferred embodiments described above are possible. Modifications, changes, and substitutions of the structure of the present invention without departing from the spirit and spirit of the present invention are intended to be within the protection scope of the present invention.

100: input device
110: camera
111: Lens
120: circuit device
130: switch
140: pen shim
141: Tips
150: stem
151: left half
152: right half
160: front cap
200: pattern sheet

Claims (6)

In a method of interacting with a communication broadcasting device in a communication broadcasting environment that distributes multimedia content,
And a data input step of inputting multimedia data including text, images, moving pictures, and audio by a single multi-modal input device.
The method according to claim 1,
The multimedia content is metadata-based multimedia content,
And the multimedia data input in the input step is input as metadata of the multimedia content.
The method according to claim 1,
And the multimedia data input in the input step is inputted as a search word for searching for the multimedia contents.
The method according to claim 3,
The multimedia content is metadata-based multimedia content,
And the multimedia data input in the input step is input as a search word for searching metadata of the multimedia content.
5. The method according to any one of claims 1 to 4,
And a hand sketch or a hand script in which the multimedia data input in the input step is input by the multi-modal input device.
The method according to claim 5,
And a click command for controlling the function and operation of an application executed in the communication broadcasting device in which the multimedia data input in the input step is executed.
KR1020110054233A 2011-06-04 2011-06-04 Method for interaction using multi-modal input device KR20120134965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020110054233A KR20120134965A (en) 2011-06-04 2011-06-04 Method for interaction using multi-modal input device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020110054233A KR20120134965A (en) 2011-06-04 2011-06-04 Method for interaction using multi-modal input device

Publications (1)

Publication Number Publication Date
KR20120134965A true KR20120134965A (en) 2012-12-12

Family

ID=47903079

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020110054233A KR20120134965A (en) 2011-06-04 2011-06-04 Method for interaction using multi-modal input device

Country Status (1)

Country Link
KR (1) KR20120134965A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102279797B1 (en) * 2021-03-05 2021-07-21 전남대학교산학협력단 Multimodal data fusion system and method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102279797B1 (en) * 2021-03-05 2021-07-21 전남대학교산학협력단 Multimodal data fusion system and method

Similar Documents

Publication Publication Date Title
US10311649B2 (en) Systems and method for performing depth based image editing
US8238662B2 (en) Method for manipulating regions of a digital image
WO2019109643A1 (en) Video recommendation method and apparatus, and computer device and storage medium
US9798464B2 (en) Computing device
US9503629B2 (en) Image processing apparatus, image processing method, and computer-readable device having instructions for generating focused image
US9667860B2 (en) Photo composition and position guidance in a camera or augmented reality system
KR100996787B1 (en) A system and method for whiteboard and audio capture
WO2020169051A1 (en) Panoramic video data processing method, terminal and storage medium
US20110099493A1 (en) Image auxiliary data searching and displaying method and document camera using such method
JP2013257869A (en) Overlay image provision system, server and method
JP2011170838A (en) Image processing device and electronic apparatus
US20060227384A1 (en) Image processing apparatus and image processing program
WO2019020061A1 (en) Video dialogue processing method, video client, video server, and computer readable storage medium
WO2022161260A1 (en) Focusing method and apparatus, electronic device, and medium
CN104537339A (en) Information identification method and information identification system
US9519355B2 (en) Mobile device event control with digital images
JP6601944B2 (en) Content generating apparatus and program
TWI698117B (en) Generating method and playing method of multimedia file, multimedia file generation apparatus and multimedia file playback apparatus
US20050251741A1 (en) Methods and apparatus for capturing images
JP3332166B2 (en) Video search device
US9881223B2 (en) Forming scanned composite document with optical character recognition function
KR20120134965A (en) Method for interaction using multi-modal input device
JP2008217660A (en) Retrieval method and device
JP2009259254A (en) Content expression control device, content expression control system, reference object for content expression control and content expression control program
JP2010191907A (en) Character input device and character input method

Legal Events

Date Code Title Description
WITN Withdrawal due to no request for examination