CN1600032A - Methods for multimedia content repurposing - Google Patents

Methods for multimedia content repurposing Download PDF

Info

Publication number
CN1600032A
CN1600032A CNA028240332A CN02824033A CN1600032A CN 1600032 A CN1600032 A CN 1600032A CN A028240332 A CNA028240332 A CN A028240332A CN 02824033 A CN02824033 A CN 02824033A CN 1600032 A CN1600032 A CN 1600032A
Authority
CN
China
Prior art keywords
content
image
video
video content
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA028240332A
Other languages
Chinese (zh)
Inventor
R·S·贾辛斯奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Koninklijke Philips NV
Original Assignee
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US10/011,883 external-priority patent/US20030105880A1/en
Application filed by Koninklijke Philips Electronics NV filed Critical Koninklijke Philips Electronics NV
Publication of CN1600032A publication Critical patent/CN1600032A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4126The peripheral being portable, e.g. PDAs or mobile phones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • H04N19/21Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding with binary alpha-plane coding for video objects, e.g. context-based arithmetic encoding [CAE]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/40Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video transcoding, i.e. partial or full decoding of a coded input stream followed by re-encoding of the decoded output stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/436Interfacing a local distribution network, e.g. communicating with another STB or one or more peripheral devices inside the home
    • H04N21/4363Adapting the video stream to a specific local network, e.g. a Bluetooth® network
    • H04N21/43637Adapting the video stream to a specific local network, e.g. a Bluetooth® network involving a wireless protocol, e.g. Bluetooth, RF or wireless LAN [IEEE 802.11]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44227Monitoring of local network, e.g. connection or bandwidth variations; Detecting new devices in the local network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/462Content or additional data management, e.g. creating a master electronic program guide from data received from the Internet and a Head-end, controlling the complexity of a video stream by scaling the resolution or bit-rate based on the client capabilities
    • H04N21/4621Controlling the complexity of the content stream or additional data, e.g. lowering the resolution or bit-rate of the video stream for a mobile client with a small screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Graphics (AREA)
  • Processing Or Creating Images (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)

Abstract

Comprehensive multimedia content re-purposing relating to wireless communications employs content constructs that are compact representations of the content information. For video content, the constructs are content operators that represent 2D image regions and/or 3D volumetric regions for objects within the sequence and characterized by various visual attributes, and are extracted from the video sequence by segmentation utilizing video processing techniques. The constructs are employed for intra- and inter-modality transformation to accommodate resource constraints of the mobile device.

Description

The method that content of multimedia utilizes again
The application's conduct was submitted in December 4 calendar year 2001, exercise question is the U.S. Patent Application Serial Number No.10/011 of " distributed treatment of multimedia messages, storage and transmission ", 883 part subsequent application requires priority, and the content of above-mentioned application is here introduced by reference.
The present invention concentrates on the code conversion of content of multimedia usually, especially (inter-modality) multimedia content transcoding between (intra-modality) and mode in the mode of using under the resource limit of mobile device.
Content of multimedia can adopt the form of one of these three kinds of tangible mode of audio frequency, vision and text, perhaps adopts the form of the combination in any of three kinds of mode.Content " is utilized (repurposing) again " and is referred to reformatting usually in theory, change yardstick (re-scaling), and/or by coming content is carried out code conversion to changing content representation in the localization, for example: in the vision territory from the video to the video, video is to the static graphics image, and perhaps original image is to cartoon; In audio domain from the original image to the synthetic video; In textview field from full to summary.In addition, can utilize (re-purpose) content by change again, for example from the video to the text or from the audio frequency to the text from a territory to another territory.
A kind of main application that content is utilized again makes it possible to the multimedia messages that moves in (for example wireless) equipment is handled, stores, transmitted and shows.This equipment typically has strict restriction to processing, storage, transmission/reception and display capabilities.Utilize by content, by using available optimum multimedia mode, mobile device user can have the multimedia messages of variable-quality according to the ambient stable accessing again.
Present content utilizes realization mainly to comprise speech-to-text again, is for example replying or is responding in (dialling in) system, and the sound of wherein telling about is analyzed, thereby it is converted to vowel harmony and sound so that translate into the text that will utilize.Almost be that the summary of handling text message is exclusively also utilized.
Therefore, this area needs the improvement technology that a kind of content that concentrates on more general application is utilized again.
In order to handle above-mentioned defective of the prior art, main purpose of the present invention provides comprehensive content of multimedia and utilizes so that use in wireless communication system again, and this comprehensive content of multimedia utilizes and adopted the content structure of representing as the deflation of content information.For video content, this structure is the 2D image-region of object in the expression sequence and/or the content operator (operator) of 3D volumetric region, and is characterized by various perceptual property, utilizes video processing technique to extract this structure by segmentation from video sequence.Adopt these structures in the conversion in mode and between the mode so that adapt to the resource limit of mobile device.
Foregoing has been summarized feature of the present invention and technical advantage very widely, makes those skilled in the art can understand detailed description of the present invention subsequently better.Other feature and the advantage that forms claim theme of the present invention will be described hereinafter.Those skilled in the art will recognize, they are easy to use disclosed notion and specific embodiment as revising or design the basis that other realize identical purpose structure with the present invention.Those skilled in the art will recognize that also this equivalent construction does not depart from the present invention's spirit and scope of wide form.
Following set about describing the present invention in detail before, it is useful being set out in this patent file some word that uses from start to finish or the definition of phrase: term " comprise " and " comprising " with and derivative represent hard-core comprising; Term " or " in being included in, connotation be and/or; Phrase " with ... be associated " and " associated " with and derivative represent to comprise, be included in interior, interconnect, comprise, be comprised in interior, be connected to or with ... connect, be coupled to or with ... the coupling, can with ... communication, with ... cooperate, interweave, side by side, approaching, be bound to or with ... be strapped in together, have ... character, or the like; Any equipment of at least one operation of term " controller " expression control, system or a part wherein, no matter this equipment is to realize with hardware, firmware, software or with the combination of at least two kinds of above-mentioned same object.No matter should be noted that the function that is associated with any special controller can be centralized or distributed, be local or long-range.The definition that provides for certain word and phrase runs through this patent file all the time, and those of ordinary skill in the art will be understood that this definition is applied in many (if not major part) existing example, and the word of definition like this and phrase use in the future.
In order more completely to understand the present invention and advantage thereof, with reference now to the description below in conjunction with accompanying drawing, wherein identical numeral is indicated identical object, wherein:
Fig. 1 has described the data handling system network that adopts content to utilize again according to one embodiment of the invention;
Vision content utilizes again in for example clear mode according to an embodiment of the invention of Fig. 2 A to Fig. 2 C;
Fig. 3 is for example clear to be utilized according to content between the mode of one embodiment of the invention utilization deflation information again.
Fig. 1 to 3 that discusses and various being used for only illustrate at the embodiment of this patent file description principle of the invention below, should not be interpreted as limitation of the scope of the invention by any way.Which technical staff this area will be understood that, principle of the present invention can realize in the equipment of any suitable arrangement.
Fig. 1 has described the data handling system network that adopts content to utilize again according to one embodiment of the invention.Data handling system network 100 comprises server system 101 and client 102.In the example shown, server 101 wirelessly is connected with client 102 and can co-operate.Server 101 can be any system, and for example upward PC (PC), kneetop computer, " supercomputer " or any other comprise CPU (CPU), local storage system and one group of system that carries out the special chip of the signal specific processing operation such as convolution to table.Data handling system 100 can comprise the cordless communication network of any kind, comprises video, data, voice/audio, perhaps their some combinations.(the perhaps Gu Ding wireless connections) equipment 102 that moves can be for example phone, PDA(Personal Digital Assistant), computer, satellite or terrestrial television and/or radio receiving system or set-top box.
Those skilled in the art will recognize, the complete structure of data handling system network and operation are not described or in the drawings in this description.Otherwise, for simple and clear for the purpose of, only show and to have described data handling system be unique to the present invention or be the essential so many structures and the details of operation to understanding the present invention.This system's rest parts can be constructed and operate by convention.
Vision content utilizes again in for example clear mode according to an embodiment of the invention of Fig. 2 A to 2C.In an exemplary embodiment, server 101 can carry out video sequence and/or still image utilizes again to the content that is delivered to client 102.
Video for Fig. 2 A illustrated utilizes again, and video sequence 201 is converted to structure by structure generator 202.This structrual description tightens the element that video sequence is represented, allows (a) accessing video sequence content information 203, synthetic original input video sequence 204 (perhaps creating new video sequence), and (c) compression of video sequence 205.Each all is that the deflation of video content information is represented for this structure, can represent long video sequence with a small amount of structure.
The use of structure is considerably beyond video compression or the like.When video sequence being converted to one group of structure, in fact video structure is designed to one group of new building block again.For example in video coding, video sequence is represented by the frame of non-compressed format or the video flowing of field or compressed format.In this expression, atomic unit is the pixel of non-compressed format or the bag of (frame) and compressed format, and for video content information, expression is open (unstructured) freely.
Video content information is by " object " given intermediate vision content information, and " object " for example is two dimension (2D) image-region or three-dimensional (3D) volumetric region that is characterized by various perceptual property (for example, color, motion, shape).In order to produce video content information, information must be carried out segmentation from video sequence, and this need use various image processing and/or computer image technology.For example, for segment processing, can adopt that edge/shape is cut apart, motion analysis (2D or 3D), perhaps color segmentation.In addition, represented it is important equally by the deflation of the video content information of segmentation.Fig. 2 B for example understands segmentation and deflation, and wherein, input video sequence 201 is by segmentation and tighten unit 206 and 207 processing, tightens video content operator 208 thereby produce.Content operator 208 forms the part of video content structural group.
Another kind of video content structural stratification inlays 209, produces by following steps: (i) determine the relative depth information between difference is inlayed; (ii) incrementally relative depth information is inlayed with the content operator and combined with individual frame, the part of the input source shown in Fig. 2 C.
Deflation video content operator 208 among Fig. 2 C and layering are inlayed 209 and have been constituted video structure, the structure generator 202 of this video structure in video content segmentation and deflation unit 206 and 207 presentation graphs 2A.
Provide below in the example of structure generation, visual condition is assumed to be: the 3D world (scenery) is made up of rigid objects; Those objects are distributed in the different depth level that forms scenery background, and background is static (perhaps changing at least) very slowly, and prospect comprises the set of self-movement (rigidity) object; This object has the local surfaces that is approximately the plane; And the illuminance of whole scenery is uniform.
Suppose from video sequence, to obtain two continuous frame I at moment k-1 and k respectively K-1And I k, tighten the video content operator and produce as follows:
At first, come document image I by the image intensity that compares each pixel K-1And I kIf I K-1=I K-1(x K-1, y K-1) and I k=I k(x k, y k), (x wherein K-1, y K-1) and (x k, y k) be illustrated respectively in x and the y coordinate image pixel of moment k-1 and k, then, usually write down I by 9 units calculating among 3 * 3 matrix R () K-1And I k:
x k - 1 = R ( 0,0 ) × x k + R ( 0,1 ) × y k + R ( 0,2 ) R ( 2,0 ) × x k + R ( 2,1 ) × y k + R ( 2,2 ) - - ( 1 )
y k - 1 = R ( 1,0 ) × x k + R ( 1,1 ) × y k + R ( 1,2 ) R ( 2,0 ) × x k + R ( 2,1 ) × y k + R ( 2,2 ) - - ( 2 )
Matrix R () can calculate with different modes, and for example by using R (2,0)=R (2,1)=0, R (2,2)=1, R (0,0)=sx, R (0,1)=rx, R (1,0)=ry, and (6 parameter) affine model (affine model) of R (1,1)=sy, wherein s x, s y, r x, r yRepresent (2D) image zoom and rotating vector respectively
Figure A0282403300081
With X and y component.Other model that is fit to comprises 8 parameter perspective models (perspective model).In any case, document image I K-1To image I kThe result be image I R K-1
Next, utilize to comprise, estimate document image I based on energy and a kind of based in the multiple technologies of gradient K-1And I kSpeed image.Pixel speed in the zone that resulting speed image is determined to be correlated with the 3D rigid objects that moves in consistent mode, and corresponding to prospect 3D object and corresponding 2D image-region.
Based on speed image results estimated and other perceptual property, image-region then by segmentation so that determine the part relevant with foreground object.This image-region that causes suitably carrying out reprocessing shines upon with relevant Alpha fills the gap.
Can produce the deflation group of shape template (shapetemplate) by computational geometry techniques from this image-region.A kind of simple expression is approximate according to rectangular shape.For example, inlaying is the image encoding nonredundancy information that extend on the plane, occurs in layering according to relative depth relevant in the Zone Full (worldregion), and incrementally produces by recursive algorithm.In each step of this algorithm, thus nearest inlaying to compare and having produced the new example of inlaying of producing in advance with the current video sequence image.Briefly, the layering generation of inlaying starts from the video sequence { I that is made up of N successive frame 1..., I N, each frame all has one accordingly at { α 1..., α NIn the Alpha mapping α of deflation.Inlay the inside in zone by filling and come to obtain each Alpha mapping from tighten the video content operator, it is 1 and be 0 binary picture in other place at interior zone that this Alpha mapping is one.
Suppose to have adopted information about relative depth (promptly with respect to the relative order of each foreground object of background image and the relative order of all foreground objects) and distinguish that in the L level each inlays the plane is possible, one group of L inlays { Φ 1..., Φ L, wherein inlay Φ for i r iPass through Φ at initial step r=1 1 i1 iI 1Calculate, at step r=2 subsequently ..., pass through recursively in conjunction with Alpha mapping group { α among the N 2 i..., α N iAnd { I 1..., I NCalculate, thereby be each step r generation Φ r i
At last, determine the comprehensive supplementary of representing the required speed image of video content information and other perceptual property of any supplemental image district and shape template and description.The result that video structure produces is that one group of deflation video content operator, component layers are inlayed and supplementary.Image utilizes and concentrates on the complexity that reduces image.For example, image can be switched in the level and smooth value region of color, brightness, texture, motion or the like.A kind of possible ordinary skill that is used for this task is to minimize cost function (cost function):
E ( I , Γ ) = ∫ ∫ R ( I ( x , y ) - I M ( x , y ) ) 2 dxdy + ∫ ∫ R - T | | ▿ I ( x , y ) | | 2 dxdy + v | Γ | - - ( 3 )
I () the estimative image-region R that indicates wherein, I MActual (original) image of () expression, and
▿ I ( x , y ) = ( ∂ I ( x , y ) ∂ x , ∂ I ( x , y ) ∂ y ) - - - ( 4 )
In fact, image-region R=∪ iR i+ Γ, and whole border Γ is around whole region R.First definite real image in equation (3) and " error " between the smoothed image, determine " smoothness " and the 3rd and boundary length for second | Γ | proportional, wherein v is a constant.Realize that for action (actuation) equation (3) should suitably be dispersed, and just, is similar to by every sum.
When analysis equation formula (3), should be noted that I () and I MThe smoothed perceptual property of () expression.For example, if smoothed image speed V (), I ()=V () so, or the like.
Replacedly, can image transitions be become animated image I by the simple version that utilizes equation (3) C, wherein I () is defined to fragment constant value I () → K.More accurately, for each region R i, the numerical value of I () is approximately I i()=K i, K iHas region R iInterior constant real number value.If μ is a constant, equation (3) can be similar to so:
μ - 2 E ( I , Γ ) = Σ i ∫ ∫ R i ( I - I M ) 2 dxdy + v 0 | T | - - ( 5 )
Wherein v = v μ 2 。As can be seen
K i = mean R i ( I M ) = ∫ ∫ R i I M ( x , y ) dxdy area ( R i ) - - ( 6 )
The animation of I () is that given attribute is created a constant zone.When zone boundary during with density bullet, a complete animation is done.Animated image I CBe the version that original image is simplified very much, it is keeping the principal character of original image I.
From original be the important application that content is utilized again to synthetic visual information conversion.Can use the 3D grid to convert original 3D object to synthetic 3D object; Can use the combination of perspective with 2D grid and projection conversion to convert original 3D object to synthetic 2D object; Can use 2D grid and computational geometry instrument to convert original 2D object to Synthetic 2 D object.
Audio frequency utilizes and comprises the speech-to-text conversion of carrying out according to known technology, produces phoneme by voice recognition, is text from phoneme conversion then.In the present invention, phoneme should be counted as the deflation collection of basic element, and by them and utilize dictionary to produce text message, this will be described in greater detail below.
Between mode content utilize again with different modalities between utilize multimedia messages consistent again.Usually, the framework that content is utilized again between mode comprises the segmentation of (i) content of multimedia, (ii) template/Model Matching; (iii) intersect the use of mode dictionary for translation.When handling multimedia messages, be present in three kinds of components (vision, audio frequency and text), as follow about the overall hierarchy of complexity:
Vision (video) → vision (picture) → audio frequency → text (7) therefore, the conversion of crossing over these different modalities should be according to the flow process of definition in the equation (7).Though needn't stipulate as content level, this pattern is essential for the needed bit of content in the various mode of expression.
For a kind of ordinary skill that utilizes again according to the content of the flow process of definition in the equation (7) is that all visions and audio-frequency information are converted to textual description.Video is carried out by the sub sampling frame in the video sequence usually to the conversion of still image, and is more uncommon about the conversion of the content information of point of observation (or perspective).
In the present invention, when converting video to text, in textview field, provide the description of tightening video content (video structure).Similarly, tighten picture material and be converted into textual description.At video in the conversion of image, by tighten the specific region (information) of picture material operator accessing video structures to those area applications.
Fig. 3 is for example clear to be utilized according to content between the mode of one embodiment of the invention utilization deflation information again.Usually, utilize again by using deflation information (for example, video structure, image animation) to carry out the content of crossing over multimedia mode in the present invention.Represent that the conversion between the deflation element of given mode uses a kind of deflation information format, it is from frame of video/field to static frames or be important the conversion of text.
In system 300, adopt independent video, audio frequency and text input 301-303, adopt additional input 304 for the still image of importing 301 sub samplings from independent input or video.As mentioned above, utilize generation contracted structure 305-308 again, the information between the deflation content element group of this dictionary translation different modalities by content between the mode that adopts one group of dictionary (not describing individually).How the dictionary definition of cross-module attitude is described in given mode and is tightened content information, it can be text and/or based on dedicated form or adopt the metadata of unified standard (for example, MPEG-7, TV-Anytime and/or SIMPTE).Should use these to be particularly suitable for description from the video to the image transitions and carry out translation between the different modalities element.When video, image or audio conversion become text, this description list is shown in the explanation that different level of detail can be realized.The 26S Proteasome Structure and Function of such dictionary has been described in the above-mentioned cross-referenced applications that is hereby incorporated by in more detail.
The present invention can realize on the connected reference content server of a content in database, so that utilize content again for this content of mobile access.This content can be reused during to the specific request of the available resources customized content in the mobile device (for example, when content be loaded so that during from server access) before any request that mobile device carries out these contents or at the response special installation.Especially, the present invention can advantageously use in the radio communication of utilizing transmission convergence agreement (TCP) or wireless transmission protocol (RTP) so that provide access to the Internet to the PDA of customization, mini kneetop computer etc.
Be noted that especially, though the present invention is described in the context of a complete function system, those skilled in the art will be understood that, at least a portion of mechanism of the present invention can distribute with the form of the machine usable mediums that comprises various ways instruction, and no matter the present invention is the specific type of the medium of the actual carrying signal that is used for carrying out distribution, and the present invention is suitable equally.The example of machine usable mediums comprises: the medium of non-volatile, hardware encoding type, for example read-only memory (ROM) or Electrically Erasable Read Only Memory (EEPROM); But the medium of record type, for example floppy disk, hard disk drive and compact disc read-only memory (CD-ROM) or digital versatile disc (DVD); And transmission type media, for example numeral and analog communication links.
Though the present invention is described in detail, but those skilled in the art will be understood that, various variations of the present invention disclosed herein, replacement, change, enhancing, nuance, grade, still less form, change, revision, improving and remove can it carries out under situation of the spirit and scope of wide form not departing from the present invention.

Claims (16)

1. one kind is used for the system 100 that content of multimedia utilizes again, this system comprises the controller 200 that produces content structure 208,300, this content structure is the content operator that video content is represented and comprised in the deflation of content information, wherein, this content structure 208 is suitable for the content conversion in mode or between the mode.
2. the system as claimed in claim 1 100, wherein, form the vision content operator 306 of video content, estimate the speed image of document image by the consecutive image in the records series, the image-region segmentation with the identification foreground object, and is produced shape template.
3. the system as claimed in claim 1 100, and wherein, content structure 208 comprises that the layering of video content inlays 209, and it is from recursively deriving the Alpha mapping of the consecutive image of combination by cliping and pasting operation.
4. the system as claimed in claim 1 100, and wherein, the content structure 208 of video content is used to the level and smooth value region of the image transitions in this video content to one or more colors, brightness, texture and motion.
5. the system as claimed in claim 1 100, and wherein, it is the image of similar animation that the content structure 208 of video content is used to the image transitions in this video content.
6. one kind is used for the system 100 that content of multimedia utilizes again, comprising:
-can visit the mobile device 102 of content of multimedia selectively; And
-comprise the server 101 of the content of multimedia that is sent to mobile device, this server 101 comprises the controller 200 that produces content structure 208, this content structure is the content operator that video content is represented and comprised in the deflation of content information, wherein, this content structure 208 is suitable in mode or converted contents between the mode.
7. a content of multimedia utilizes method again, comprises producing content structure 208, and this content structure is the content operator that video content is represented and comprised in the deflation of content information, and wherein, this content structure 208 is suitable in mode or converted contents between the mode.
8. method as claimed in claim 7, wherein, the step that produces the content operator of video content also comprises:
Consecutive image in the-records series;
-estimate the speed image of document image;
-to the image-region segmentation so that identification foreground object and produce shape template.
9. method as claimed in claim 7, wherein, the step that produces the content structure 208 of video content also comprises by cliping and pasting operation recursively in conjunction with the Alpha mapping of consecutive image, inlays 209 so that form layering.
10. method as claimed in claim 7 comprises that also the content structure 208 that adopts video content is with the level and smooth value region of the image transitions in this video content to one or more colors, brightness, texture and motion.
11. method as claimed in claim 7 comprises that also the content structure 208 that adopts video content is the image of similar animation with the image transitions in this video content.
12. one kind comprises that this content structure is the content operator that video content is represented and comprised in the deflation of content information from the signal of the multimedia messages of content structure 208 generations, wherein, this content structure 208 is suitable in mode or converted contents between the mode.
13. signal as claimed in claim 12 wherein, forms the vision content operator of video content by the consecutive image in the records series, estimates the speed image of document image, to the image-region segmentation so that identification foreground object and produce shape template.
14. signal as claimed in claim 12, wherein, this content structure 208 comprises that the layering of video content inlays 209, and it is from recursively deriving the Alpha mapping of the consecutive image of combination by cliping and pasting operation.
15. signal as claimed in claim 12, wherein, the content structure 208 of video content is used to the level and smooth value region of the image transitions in this video content to one or more colors, brightness, texture and motion.
16. signal as claimed in claim 12, wherein, it is animated image that the content structure 208 of video content is used to the image transitions in this video content.
CNA028240332A 2001-12-04 2002-12-02 Methods for multimedia content repurposing Pending CN1600032A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US10/011,883 US20030105880A1 (en) 2001-12-04 2001-12-04 Distributed processing, storage, and transmision of multimedia information
US10/011,883 2001-12-04
US10/265,582 2002-10-07
US10/265,582 US7305618B2 (en) 2001-12-04 2002-10-07 Methods for multimedia content repurposing

Publications (1)

Publication Number Publication Date
CN1600032A true CN1600032A (en) 2005-03-23

Family

ID=26682897

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA028240332A Pending CN1600032A (en) 2001-12-04 2002-12-02 Methods for multimedia content repurposing

Country Status (5)

Country Link
EP (1) EP1459552A2 (en)
JP (1) JP2005512215A (en)
CN (1) CN1600032A (en)
AU (1) AU2002351088A1 (en)
WO (1) WO2003049450A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110089118A (en) * 2016-10-12 2019-08-02 弗劳恩霍夫应用研究促进协会 The unequal streaming in space
WO2022156482A1 (en) * 2021-01-25 2022-07-28 中兴通讯股份有限公司 Volumetric media processing method and apparatus, and storage medium and electronic apparatus

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2048887A1 (en) * 2007-10-12 2009-04-15 Thomson Licensing Encoding method and device for cartoonizing natural video, corresponding video signal comprising cartoonized natural video and decoding method and device therefore

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2205704B (en) * 1987-04-01 1991-06-19 Univ Essex Reduced bandwidth video transmission based on two-component model
US6061462A (en) * 1997-03-07 2000-05-09 Phoenix Licensing, Inc. Digital cartoon and animation process

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11496539B2 (en) 2016-10-12 2022-11-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11496541B2 (en) 2016-10-12 2022-11-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11283850B2 (en) 2016-10-12 2022-03-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Spatially unequal streaming
CN114928737B (en) * 2016-10-12 2023-10-27 弗劳恩霍夫应用研究促进协会 Spatially unequal streaming
CN114928737A (en) * 2016-10-12 2022-08-19 弗劳恩霍夫应用研究促进协会 Spatially unequal streaming
US11489900B2 (en) 2016-10-12 2022-11-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11218530B2 (en) 2016-10-12 2022-01-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11496540B2 (en) 2016-10-12 2022-11-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11516273B2 (en) 2016-10-12 2022-11-29 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
CN110089118A (en) * 2016-10-12 2019-08-02 弗劳恩霍夫应用研究促进协会 The unequal streaming in space
US11496538B2 (en) 2016-10-12 2022-11-08 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E. V. Spatially unequal streaming
US11539778B2 (en) 2016-10-12 2022-12-27 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
US11546404B2 (en) 2016-10-12 2023-01-03 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Spatially unequal streaming
WO2022156482A1 (en) * 2021-01-25 2022-07-28 中兴通讯股份有限公司 Volumetric media processing method and apparatus, and storage medium and electronic apparatus

Also Published As

Publication number Publication date
WO2003049450A2 (en) 2003-06-12
JP2005512215A (en) 2005-04-28
EP1459552A2 (en) 2004-09-22
WO2003049450A3 (en) 2003-11-06
AU2002351088A1 (en) 2003-06-17

Similar Documents

Publication Publication Date Title
US20210150793A1 (en) Matching mouth shape and movement in digital video to alternative audio
CN113194348B (en) Virtual human lecture video generation method, system, device and storage medium
CN113408471B (en) Non-green-curtain portrait real-time matting algorithm based on multitask deep learning
US7305618B2 (en) Methods for multimedia content repurposing
Yu et al. Three-dimensional model analysis and processing
JP2004177965A (en) System and method for coding data
CN107231566A (en) A kind of video transcoding method, device and system
Ferilli Automatic digital document processing and management: Problems, algorithms and techniques
CN114549574A (en) Interactive video matting system based on mask propagation network
CN113961736A (en) Method and device for generating image by text, computer equipment and storage medium
CN114443899A (en) Video classification method, device, equipment and medium
CN110852980A (en) Interactive image filling method and system, server, device and medium
CN115270184A (en) Video desensitization method, vehicle video desensitization method and vehicle-mounted processing system
CN112966676B (en) Document key information extraction method based on zero sample learning
CN112069877B (en) Face information identification method based on edge information and attention mechanism
CN1600032A (en) Methods for multimedia content repurposing
CN115797171A (en) Method and device for generating composite image, electronic device and storage medium
CN114979705A (en) Automatic editing method based on deep learning, self-attention mechanism and symbolic reasoning
CN117475443B (en) Image segmentation and recombination system based on AIGC
Ji et al. Deep Learning Based Video Compression
Sri Geetha et al. Enhanced video articulation (eva)—a lip-reading tool
JP2004357062A (en) Processing apparatus and processing method for information signal, creating apparatus and creating method for code book, and program for implementing each method
JP2010267034A (en) Information processing apparatus and information processing method
WO2024112910A1 (en) Visual transformers with sparse application of video kernels
Berkner et al. Resolution-sensitive document image analysis for document repurposing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication