WO2012032537A2 - Procédé et système pour fournir un affichage de cours vidéo conservant la lisibilité et adaptif au contenu, sur un dispositif vidéo miniature - Google Patents

Procédé et système pour fournir un affichage de cours vidéo conservant la lisibilité et adaptif au contenu, sur un dispositif vidéo miniature Download PDF

Info

Publication number
WO2012032537A2
WO2012032537A2 PCT/IN2011/000597 IN2011000597W WO2012032537A2 WO 2012032537 A2 WO2012032537 A2 WO 2012032537A2 IN 2011000597 W IN2011000597 W IN 2011000597W WO 2012032537 A2 WO2012032537 A2 WO 2012032537A2
Authority
WO
WIPO (PCT)
Prior art keywords
textual
frames
newly added
frame
added data
Prior art date
Application number
PCT/IN2011/000597
Other languages
English (en)
Other versions
WO2012032537A3 (fr
Inventor
Subhasis Chaudhuri
A. Ranjith Ram
Original Assignee
Indian Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indian Institute Of Technology filed Critical Indian Institute Of Technology
Publication of WO2012032537A2 publication Critical patent/WO2012032537A2/fr
Publication of WO2012032537A3 publication Critical patent/WO2012032537A3/fr

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/06Electrically-operated educational appliances with both visual and audible presentation of the material to be studied

Definitions

  • This invention relates to a method and system for providing a content adaptive and legibility retentive display of a lecture video on a miniature video device. More specifically, the invention relates to providing a legible display of a lecture video on the miniature video device.
  • miniature video devices include, but are not limited to mobile terminals, Personal Digital Assistants (PDA), hand-held computers, cell phones, tablet PCs, etc.
  • PDA Personal Digital Assistants
  • the video content intended for large screens like television and computer monitors is resized to fit on smaller screens of such miniature devices.
  • some information might be inherently lost either by spatial sub-sampling or by marginal cropping while attempting to fit it on the smaller screen.
  • the original video when played on miniature video devices suffers from overhead in memory usage and poor resolution.
  • An educational media such as a lecture video mainly comprises a lecturer writing content on a board/writing pad or explaining some slide shows.
  • lecture videos are largely useful in distance education, community outreach programs and video-on demand applications.
  • the textual content becomes too small to be legible. Therefore, there is a need to efficiently overcome the degradation of the legibility of the displayed textual content of a lecture video, due to the limitations on the screen size.
  • WO2009042340 discloses processing video data for automatically selecting at least one key-frame based on a set of criteria and then encoding the video data with an identifier.
  • it does not provide legibility retention of the video frames which is crucial in the case of lecture videos.
  • the method and system should facilitate legible display of textual content of a lecture video on a portable device without any loss in instructional value. Further, the legible display of the lecture video should be highly compressed, streamable, and synchronized with respect to the audio channel of the original video.
  • An object of the invention is to provide a content adaptive and legibility retentive display of a lecture video on a miniature video device.
  • Another object of the invention is to provide legible display of textual content of a lecture video on a miniature video device. Yet another object of the invention is to provide a legible display of the lecture video which is highly compressed, streamable, and synchronized with respect to the audio channel of the original video.
  • a method for providing a content adaptive and legibility retentive display of a lecture video on a miniature video device comprising a sequence of textual and non-textual frames along with associated audio.
  • the method comprising the steps of: creating a metadata that indicates location of newly added data points in textual frames temporally spaced part by a predefined time interval by computing horizontal and vertical projection profiles of ink pixels in said textual frames and detecting x-y positions of newly added data points thereof; and sequentially displaying key-frames extracted from the textual and non-textual frames in accordance with the metadata by panning textual key-frames with a selection window having an aspect ratio and size in accordance with a display screen of the miniature video device and a center point as x-y position of newly added data point in the respective textual frame.
  • detecting the y-position of newly added data point in a current textual frame comprises comparing Horizontal Projection Profile (HPP) of ink pixels of the current textual frame with that of a previous textual frame, and setting the y-position as the point where the amount of differential ink pixels in the HPPs reaches a threshold value.
  • detecting the x-position of the newly added data point in the current textual frame comprises comparing Vertical Projection Profiles (VPPs) of ink pixels contained in regions around the detected y-position of the current and previous textual frames and setting the x-position as the point where the amount of differential ink pixels in the VPPs is maximum.
  • VPPs Vertical Projection Profiles
  • the location of newly added data point in an intermediate frame occurring in time period between two textual frames temporally spaced apart by the predefined time interval is detected by temporally interpolating x-y positions of newly added data points of said two textual frames.
  • the size of selection window is equal to the size of display screen of the miniature video device and the size of selection window is varied based on the difference between locations of newly added data points of temporally adjacent textual frames.
  • the lecture video has a frame rate of 25 frames per second and a sub-sampling rate as 50
  • the HPP and VPP of textual frames spaced apart by 2 seconds are computed, leading to detection of newly added data points at regular interval of 50 frames.
  • a system for providing a content adaptive and legibility retentive display of a lecture video on a miniature video device comprising a sequence of textual and non-textual frames along with associated audio.
  • the system comprises a metadata creation module configured to create a metadata that indicates location of newly added data points in textual frames temporally spaced part by a predefined time interval by computing horizontal and vertical projection profiles of ink pixels in said textual frames and detecting x-y positions of newly added data points thereof.
  • the system further comprises a media re-creation module configured to receive the metadata, associated audio, and key-frames extracted from textual and non-textual frames; and sequentially displaying the key-frames in accordance with the metadata by panning the textual key-frames with a selection window having an aspect ratio and size in accordance with the miniature video device and center point as x-y position of newly added data point of the respective textual frame.
  • a media re-creation module configured to receive the metadata, associated audio, and key-frames extracted from textual and non-textual frames; and sequentially displaying the key-frames in accordance with the metadata by panning the textual key-frames with a selection window having an aspect ratio and size in accordance with the miniature video device and center point as x-y position of newly added data point of the respective textual frame.
  • the metadata creation module is configured to detect the y-position of a newly added data point in a current textual frame by comparing the Horizontal Projection Profile (HPP) of ink pixels of the current textual frame with that of a previous textual frame, and setting the y- position as the point where the amount of differential ink pixels in the HPPs reaches a threshold value; and detect the x position of the newly added data point in the current textual frame by comparing Vertical Projection Profiles (VPPs) of ink pixels contained in the regions around the detected y-position of the current and previous textual frames and setting the x-position as the point where the amount of differential ink pixels in the VPPs is maximum.
  • the media recreation module is configured to vary the size of selection window based on the difference between locations of newly added data points of temporally adjacent textual frames.
  • the media recreation module is configured to provide a manual over-riding control to the viewer for selecting their region of interest in the lecture video and displaying said region with full or appropriate resolution.
  • the media recreation module is also configured to receive the metadata, associated audio and key-frames from a Server device as streaming media.
  • Fig 1 is a block diagram illustrating a system for providing a content adaptive and legibility retentive display of a lecture video on a miniature video device;
  • Fig. 2 is a functional block diagram representing the steps involved in a method for creating a meta data indicating location of newly added data point in two textual frames temporally spaced apart by a pre-defined time interval;
  • Fig. 3 is a plot of x-y positions of newly added data points in temporally adjacent textual frames
  • Fig. 4 is a block diagram illustrating a lecture video, textual and non-textual key-frames, associated audio, and the recreated lecture video using the metadata;
  • Fig. 5 is a graph illustrating horizontal projection profiles of ink pixels of two temporally adjacent textual frames
  • Fig. 6 is a graph illustrating local sum of differential ink pixels of horizontal projection profiles of two temporally adjacent textual frames
  • Figs.7a and 7b are graphs illustrating vertical projection profiles of ink pixels in cropped regions of two temporally adjacent textual frames
  • Fig. 8 is a graph illustrating local sum of differential ink pixels of vertical projection profiles of cropped regions of two temporally adjacent textual frames.
  • the system 100 includes a media splitter 101, a gray scale conversion module 102, a segmentation and shot recognition module 103, a non-textual key-frame extraction module 104, a textual key-frame extraction module 105, a metadata creation module 106, a metadata interpolation module 107, and a media re-creation module 108.
  • Each module performs a specific function, each function being a contributory step in providing a content adaptive and legibility retentive display of a lecture video on the miniature video devices such as mobile terminals, Personal Digital Assistants (PDA), hand-held computers, tablet PCs.
  • PDA Personal Digital Assistants
  • a typical lecture video comprises a lecturer explaining a topic by writing on a board or using slide shows.
  • Such lecture video may include scenes of various instructional activities such as a talking head activity scene, a class room activity scene, a writing hand activity scene or a slide show activity scene.
  • a typical video data is made up of unique consecutive images referred as frames. The frames are displayed to the user at a rate referred as frames per second.
  • the media splitter 101 is configured to receive an original lecture video and split it into video and audio data.
  • the gray-scale conversion module 102 and segmentation and shot recognition module 103 executes temporal segmentation of the video data to detect scene changes/breaks therein and then detect the activities in it.
  • a histogram difference is measured between two consecutive frames of the video data. If the sum of absolute difference of the histograms between two consecutive frames of the video data crosses a threshold, the frames are declared as shot boundary frames thereby determining a scene break.
  • the activity detection of scenes is HMM (Hidden Markov Model) based and is carried out in two phases i.e. a training and a testing phase.
  • the HMM parameters are learned based on which a scene would be classified into one of the above mentioned activities. For example, to classify a scene into a talking head activity scene, writing hand activity scene or a slide show activity scene, motion within the scene is taken into account for classification. Motion in a talking head activity scene is more than that of writing hand activity scene and the least in the slide show activity scene. Therefore, the energy of the temporal derivative in intensity space is used as a relevant feature.
  • the gray-level histogram gives the distribution of the image pixels over different intensity values. It is very sparse for the slide show activity scene, moderately sparse for the writing hand activity scene and dense for talking head activity scene.
  • Histogram entropy is a direct measure of the variation of pixel intensity in an image. If there is a high variation of intensity, the entropy will be high and vice versa.
  • the content of the video data may be classified into textual and non-textual content.
  • the talking head activity and class room activity scenes are non-textual content.
  • the writing hand activity and slide show activity scenes are textual content, as in some way or other, these scenes display textual content to the user.
  • the frames of lecture video which include textual content are hereinafter referred to as textual frames, whereas the frames which include non-textual content are hereinafter referred to non-textual frames.
  • a typical lecture video comprises a sequence of textual and non-textual frames along with associated audio.
  • the textual key-frame extraction module 105 and non-textual key- frame extraction module 106 are configured to extract representative key-frames from the textual and non-textual frames respectively, such that a set of key-frames represent a summarized semantic content of an entire scene for a particular duration.
  • the textual key-frame extraction module 105 performs ink-pixel based extraction of key-frames from textual frames
  • the non-textual key-frame extraction module 104 performs a visual quality based extraction of key-frames from non-textual frames.
  • the meta-data creation module 106 is configured to create a metadata which indicates location in the textual frames where the lecturer is currently writing or the text is appearing in a slide show.
  • the location in a textual frame which is currently being scribbled is represented by an x-y position which varies in accordance with the writing advancement from frame to frame.
  • the region in the textual frame where the lecturer is currently writing or the text is appearing is the region of interest for a viewer as during the display of video data, the viewer will usually focus on that portion of the video where text is appearing or the lecturer is writing.
  • the metadata creation module 106 creates metadata that indicates location of newly added data points in textual frames temporally spaced apart by a pre-defined time interval by computing horizontal and vertical projection profiles of ink pixels in said textual frames and detecting x-y positions of newly added data points thereof.
  • a set of textual frames temporally spaced apart by the pre-defined time interval is obtained by sub-sampling textual frames at a predefined rate, the pre-defined rate being equal to product of the pre-defined time interval and the frame rate of the lecture video.
  • the frame rate of the lecture video may be represented by f and the temporal sub-sampling rate by k, where f may take values such as 25fps, 30fps, etc and k may take values such as 10, 20, 40, 50, 60 or even higher.
  • the y-position of newly added data point in a current textual frame is detected by comparing the Horizontal Projection Profile (HPP) of ink pixels of the current textual frame with that of a previous textual frame, and setting the y-position as the point where the amount of differential ink pixels in the HPPs reaches a threshold value.
  • HPP Horizontal Projection Profile
  • the corresponding x-position is detected by comparing Vertical Projection Profiles (VPPs) of ink pixels contained in the regions around the detected y-position of the current and previous textual frames and setting the x-position as the point where the amount of differential ink pixels in the VPPs is maximum.
  • VPPs Vertical Projection Profiles
  • the x-y position of newly added data point in a textual frame may also be referred as a track point of the textual frame.
  • the track points of textual frames spaced apart by the predefined time interval constitutes the metadata, which may be an array of size 2 * L, where L is the total number of textual frames derived from the video data. L may be higher than the count of non-textual frames since writing operation is a much slower activity compared to the standard video frame rate.
  • the metadata interpolation module 107 is configured to interpolate the location of a newly added data point of an intermediate frame occurring in time period between two textual frames temporally spaced apart by a predefined time interval using locations of newly added data points in said two textual frames.
  • the media re-creation module 108 is configured to recreate the lecture video on the miniature video device. Essentially, the media re-creation module 108 is configured to receive the metadata, associated audio and key-frames extracted from textual and non-textual frames, and sequentially display the key-frames in accordance with the metadata by panning the textual keyframes with a selection window having an aspect ratio and size in accordance with the display screen of the miniature video device and center point as x-y position of the respective textual frame. In the recreated lecture video, the non textual key-frames are resized to fit onto the screen of the miniature video device, whereas the textual key-frames are cropped to display the region of interest with maximum resolution.
  • the metadata drives the selection window to scan the entire content in the textual key-frame for the time for which that textual key-frame is intended for display.
  • the media re-creation module 108 displays a cropped region of the textual key-frame to the viewer, the cropped region being the region of interest and having size equal to the display screen of the miniature video device.
  • the cropped region of a textual key-frame may also be referred to as key-hole image of respective key-frame.
  • the cropped region may further be referred as a child frame of the parent key-frame.
  • the system 100 may be deployed using a client-server configuration.
  • the Server device will have the lecture video, the media splitter 101, gray scale conversion module 102, segmentation and shot recognition module 103, textual and non-textual key- frame extraction modules 104 and 105, metadata creation module 106.
  • the client device will include the media re-creation module 108.
  • the media re-creation module 108 is an instructional media player of the client device.
  • the metadata interpolation module 107 may either be present at the client device or the Server device.
  • the client device may be a miniature video device and may request for a lecture video from the Server device.
  • the Server device may send the metadata, key-frames and associated audio of the lecture video to the client device.
  • the client device may either download the metadata, key-frames and associated audio from the Server device or receive them as streaming data.
  • the data received at the media re-creation module 108 in response to a request for a lecture video will include (a) few key-frames in any image file format, (b) an audio file in suitable file format and (c) an XML or equivalent text file containing the temporal marking for the placement of key-frames and the metadata.
  • the media re-creation module 108 receives the key- frames, metadata and audio.
  • the temporally long-lasting video shots are replaced by the corresponding static key-frames, therefore, the data received at the client device is highly compressed with respect to the original lecture video.
  • the original lecture video is received in a highly reduced form at the client device but with better means to suit the target display and storage with minimal information loss.
  • the Horizontal Projection Profile (HPP) of ink pixels of a current frame i.e. ⁇ ⁇ frame is computed by projecting its ink pixels on the y-axis.
  • the HPP of ink pixels of a previous frame i.e. (p-k)* frame is computed.
  • the HPP of (p-k)* frame is subtracted from the HPP of p* frame to detect the point where the local amount of differential ink pixels in the HPPs reaches a threshold value. Said point is then set as the y-position of newly added data point of the p th frame.
  • the Vertical Projection Profile (VPP) of ink pixels of a cropped p th frame is computed.
  • the cropped p th frame is a horizontal strip image of the p th frame including ink pixels contained in the regions around the detected y-position.
  • the VPP of ink pixels of a cropped (p-k) th frame is computed.
  • the cropped (p-k) th frame is a horizontal strip image of the (p-k) th frame including ink pixels contained in the regions around the detected y-position.
  • the VPP of (p- k) th frame is subtracted from the VPP of p th frame to detect the point where the local amount of differential ink pixels in the VPPs is maximum. Said point is then set as the x-position of newly added data point of the p 1 * 1 frame.
  • the detected x-y position of the ⁇ ⁇ frame is then set as the track point of the p* textual frame.
  • B t (m, n), 0 ⁇ m ⁇ M, 0 ⁇ n ⁇ N be the textual frame at time t, where M and N are number of rows and columns of said frame respectively.
  • the HPP of an image is a vector in which each element is obtained by summing up the pixels values along a row. Therefore, the HPP of the textual frame at time t is an array P t of length M in which each element stands for the total number of ink pixels in a row of the processed frame. Since pixels in each frames of the content scene are converted into ink and paper pixels corresponding to 0 and 255 values of gray levels respectively, the array P t is represented by
  • the local absolute summation SD(j) of the difference HPP array D(m) is performed to detect any considerable variation in ink pixels.
  • Wd is the localization parameter for summation and ⁇ is the threshold for differential ink detection. If SD(j) > ⁇ , it may be assumed that there is a scribbling activity or a new data point is introduced in the jth line since the local sum of the absolute HPP difference yield a high value which is a result of the introduction of extra ink pixels in that line. Therefore, with the help of HPPs of two temporally adjacent frames, the y-position of the newly added data point in a textual frame is detected.
  • the x-position of the newly added data point of the textual frame B t (m, n) is detected using the detected y-position.
  • the horizontal strip region around the detected y-position is cropped and VPPs of the cropped images is computed.
  • the point at which the local sum of the absolute differences of the VPPs yields maximum value is taken as the x-position of the newly added data point of the textual frame B t (m, n).
  • HPP and VPP of textual frames temporally spaced apart by 2 seconds are computed, leading to detection of x-y positions of newly added data points in the lecture video at regular interval of k, i.e. 50 frames.
  • the time interval of 2 seconds for computation of track points in textual frames is taken with the assumption that there would not be too much variation of textual content in 2 second duration in a lecture video.
  • the writing pace of the lecturer is usually very slow compared to the video frame rate, therefore increment in textual content (extra ink pixels introduced) from frame to frame is too less such that no noticeable difference appears in HPPs of consecutive textual frames.
  • the computation of x-y positions of newly added data points at interval of 50 frames not only increases the efficiency but also reduces unnecessary computations.
  • the media-recreation module 108 displays the textual key-frames by panning them with a selection window in accordance with the metadata.
  • the metadata derived in regular intervals of k frames for driving the selection window may result in a jittered panning of the textual key-frames.
  • the track points of intermediate frames occurring in time period between p* and (p+k) 111 frame positions are detected.
  • (x(ti), y(t ) and (x(t 2 ), y(t 2 )) be track points of frames occurring at time tj and t 2 respectively.
  • the track point of an intermediate frame occurring at a time t between and t 2 is interpolated using track points of frames at ti and t 2 .
  • the x-position of the track point of such intermediate frame at time t can be calculated as
  • the y-position of the track point of the intermediate frame at time t is computed using y-position of track points of frames occurring at and t 2 .
  • a plot of track points of temporally spaced textual frames is illustrated.
  • the track points are plotted for textual frames occurring in time period from t[ to t 8 .
  • the plot of track points essentially provides an idea of the movement of pen of the lecturer on the writing board. When there is a considerable deviation between two consecutive track points, say track points at t4 and t 5 , it implies that during the writing activity, the lecturer suddenly toggles from one point to another.
  • the size of the selection window for panning a key-frame is equal to the size of the display screen of the miniature video device so as to display the textual content with maximum resolution.
  • a considerable deviation between two track points has to be taken into account as it may be beyond both the spatial span and movement of the preferable selection window. Such deviations are taken into account by computing distance between consecutive track points at ti and t 5 as
  • (x(U), y(U)) and (x(t 5 ), y(t 5 )) are x-y positions of consecutive track points. If d is greater than a predefined value, say 100, the size of the selection window is increased between U and t 5 to make it large enough to include track points at U and t 5 , i.e. (x(t 4 ), y(t 4 )) and (x(t 5 ), y(t 5 )) totally inside the selection window.
  • the size of selection window is increased for wider visibility such that the viewer does not miss any region of interest and to provide a sense of spatial context. Under these circumstances, the text does not appear in its full resolution to the viewer since scaling is required to fit the selection window on the screen.
  • the size of the selection window is then decreased to the preferable size in time interval from t 5 to since, there is no considerable deviation between track points at t 5 and
  • the feature of varying the size of the selection window based on the difference between the track points of temporally adjacent textual frames generates a feel of automatic zoom in/out to the viewer and may be referred to as a multi resolution local visual delivery.
  • the display is a content adaptive one.
  • a viewer may be provided with an optional manual over-riding control over the region of interest during display of the recreated lecture video on the miniature video device.
  • the viewer can use the touch screen technology or the selection key in their miniature devices for selecting their region of interest and displaying it with full or appropriate resolution.
  • a user is allowed to opt their own way of viewing the content.
  • the creation of metadata may be slightly different from the manner in which the metadata is created for video containing hand written slides.
  • the position of the current line is detected by using the HPP of the ink pixels of consecutive textual frames.
  • the position of the current line serves as the metadata required on media re-creation module 108 for the vertical positioning of the selection window after which it is linearly swept horizontally with a timeline slow enough to watch the content. This process is repeated until a new line is introduced below that.
  • the vertical step size can be calculated from the ratio of the vertical dimensions of the display size of the server and client devices.
  • the time interval for sweeping along a particular line can be calculated from the total time duration required for displaying that key-frame and the vertical step size.
  • the total time duration of the original lecture video 401 is 16 minutes, which includes a talking head activity scene for 4 minutes, a writing hand activity scene for 8 minutes, again a talking head activity scene for 4 minutes and associated audio 407.
  • the first talking head activity scene includes total 6000 non-textual frames
  • the writing hand activity scene comprises total 12000 textual frames
  • the second talking head activity scene include total 6000 non-textual frames.
  • a metadata indicating writing advancement in the textual frames is created using HPPs and VPPs of textual frames temporally spaced apart by a predefined time interval, i.e.
  • the non-textual key-frames 402, 405 and textual key-frames 403, 404 are extracted from non-textual and textual frames respectively using known methods. Based on the metadata, audio 407 and key-frames 402-405, the lecture video is recreated on the miniature video device. In the recreated lecture video 406, the non-textual key-frames 402 and 405 are resized to fit onto the miniature device screen, where as the textual key-frames 403 and 404 are panned with selection window to display child frames 408 and 409 so to provide the region of interest to the viewer with maximum resolution.
  • a lecture video of time duration 55-60 minutes having frame rate as 25 frames per second is taken, where the content portion of the lecture video comprise hand written slides.
  • the extraction of metadata is performed on the textual frames of the original video by computing HPPs of every two textual frames temporally spaced apart by two seconds.
  • HPPs of two such textual frames are plotted on the same graph as illustrated in Fig. 5.
  • the plot of local sum of the absolute difference of said HPPs is shown in Fig. 6.
  • the VPP based x-position detection method is illustrated in Figs. 7 and 8.
  • Figs. 7(a) and (b) respectively illustrate the VPPs of cropped horizontal strip regions around the detected y-position of the two frames.
  • the plot of local sum of the absolute difference of these VPPs is shown in Fig. 8.
  • the region where a high overshoot occurs essentially contains the newly written text.
  • the tracked point is (222, 400). This procedure is repeated for all content textual frames in the intervals of 50 frames.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Circuits (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

L'invention concerne un procédé et un système pour fournir un affichage de cours vidéo adaptatif au contenu et conservant la lisibilité, sur un dispositif vidéo miniature, ledit cours vidéo comprenant une séquence de trames textuelles et non-textuelles ainsi que l'audio associé. Ledit procédé consiste à créer des métadonnées qui indiquent l'emplacement de points de données ajoutés récemment dans des trames textuelles espacées dans le temps par un intervalle de temps prédéfini, en calculant des profils de projection horizontaux et verticaux de pixels d'encre dans lesdites trames textuelles, et à détecter les positions x-y des points de données de celles-ci ajoutés récemment; et à afficher séquentiellement des trames clés extraites des trames textuelles et non-textuelles en fonction des métadonnées en faisant un panoramique des trames clés textuelles à l'aide d'une fenêtre de sélection possédant un rapport hauteur-largeur et une taille correspondant à un écran d'affichage du dispositif vidéo miniature et un point central en tant que position x-y du point de données ajouté récemment dans la trame textuelle respective.
PCT/IN2011/000597 2010-09-06 2011-09-02 Procédé et système pour fournir un affichage de cours vidéo conservant la lisibilité et adaptif au contenu, sur un dispositif vidéo miniature WO2012032537A2 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN2474MU2010 2010-09-06
IN2474/MUM/2010 2010-09-06

Publications (2)

Publication Number Publication Date
WO2012032537A2 true WO2012032537A2 (fr) 2012-03-15
WO2012032537A3 WO2012032537A3 (fr) 2012-06-21

Family

ID=45811027

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2011/000597 WO2012032537A2 (fr) 2010-09-06 2011-09-02 Procédé et système pour fournir un affichage de cours vidéo conservant la lisibilité et adaptif au contenu, sur un dispositif vidéo miniature

Country Status (1)

Country Link
WO (1) WO2012032537A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113316012A (zh) * 2021-05-26 2021-08-27 深圳市沃特沃德信息有限公司 基于墨水屏设备的音视频帧同步方法、装置和计算机设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US20040125877A1 (en) * 2000-07-17 2004-07-01 Shin-Fu Chang Method and system for indexing and content-based adaptive streaming of digital video content
US20040205513A1 (en) * 2002-06-21 2004-10-14 Jinlin Chen Web information presentation structure for web page authoring

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6336124B1 (en) * 1998-10-01 2002-01-01 Bcl Computers, Inc. Conversion data representing a document to other formats for manipulation and display
US20040125877A1 (en) * 2000-07-17 2004-07-01 Shin-Fu Chang Method and system for indexing and content-based adaptive streaming of digital video content
US20040205513A1 (en) * 2002-06-21 2004-10-14 Jinlin Chen Web information presentation structure for web page authoring

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113316012A (zh) * 2021-05-26 2021-08-27 深圳市沃特沃德信息有限公司 基于墨水屏设备的音视频帧同步方法、装置和计算机设备
CN113316012B (zh) * 2021-05-26 2022-03-11 深圳市沃特沃德信息有限公司 基于墨水屏设备的音视频帧同步方法、装置和计算机设备
WO2022247014A1 (fr) * 2021-05-26 2022-12-01 深圳市沃特沃德信息有限公司 Procédé et appareil de synchronisation de trames audio et vidéo basés sur un dispositif d'écran à encre, et dispositif informatique

Also Published As

Publication number Publication date
WO2012032537A3 (fr) 2012-06-21

Similar Documents

Publication Publication Date Title
US11849196B2 (en) Automatic data extraction and conversion of video/images/sound information from a slide presentation into an editable notetaking resource with optional overlay of the presenter
CN107633241B (zh) 一种全景视频自动标注和追踪物体的方法和装置
US9167221B2 (en) Methods and systems for video retargeting using motion saliency
CA2761187C (fr) Systemes et procedes de production autonome de videos a partir de donnees multi-detectees
US8457469B2 (en) Display control device, display control method, and program
US10645344B2 (en) Video system with intelligent visual display
US8085302B2 (en) Combined digital and mechanical tracking of a person or object using a single video camera
US20120057775A1 (en) Information processing device, information processing method, and program
US20060044446A1 (en) Media handling system
US8515258B2 (en) Device and method for automatically recreating a content preserving and compression efficient lecture video
Carlier et al. Crowdsourced automatic zoom and scroll for video retargeting
Choudary et al. Summarization of visual content in instructional videos
JP2010503006A5 (fr)
KR20080078186A (ko) 멀티미디어 휴대형 단말기 사용자를 위한 관심 영역의 추출방법
Hoshen et al. Wisdom of the crowd in egocentric video curation
US20110235859A1 (en) Signal processor
Xiong et al. Snap angle prediction for 360 panoramas
Tang et al. Exploring video streams using slit-tear visualizations
Miniakhmetova et al. An approach to personalized video summarization based on user preferences analysis
US20050198067A1 (en) Multi-resolution feature extraction for video abstraction
WO2012032537A2 (fr) Procédé et système pour fournir un affichage de cours vidéo conservant la lisibilité et adaptif au contenu, sur un dispositif vidéo miniature
Shi et al. Consumer video retargeting: context assisted spatial-temporal grid optimization
Liao et al. An automatic lecture recording system using pan-tilt-zoom camera to track lecturer and handwritten data
Ram et al. Video Analysis and Repackaging for Distance Education
Ichimura Delivering chalk talks on the internet

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11823161

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11823161

Country of ref document: EP

Kind code of ref document: A2