CN103997687B

CN103997687B - For increasing the method and device of interaction feature to video

Info

Publication number: CN103997687B
Application number: CN201410055610.1A
Authority: CN
Inventors: D·C·米德尔顿; O·内斯塔雷斯; L·B·安斯沃思
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2013-02-20
Filing date: 2014-02-19
Publication date: 2017-07-28
Anticipated expiration: 2034-02-19
Also published as: CN103997687A

Abstract

Disclose a kind of for increasing the technology of interaction feature to video, enable a user to the dynamic mixing using motion and standstill image to create new media.Interaction technique can be included between allowing at the beginning of one or more of user's change spatial redundancy of video in given frame object, or only animation/broadcasting part gives framing scene.The technology can include splitting each frame of video, to recognize one or more objects of each frame in, selection（Or the selection of reception pair ...）To one or more objects in framing scene, it is tracked from frame to frame selected objects, and the processing of Alpha's masking-out only to play/animation selected objects.In some instances, pixel depth information can be used（For example using depth map）Come improve and/or strengthen segmentation, selection, and/or follow the trail of.

Description

For increasing the method and device of interaction feature to video

Technical field

The present invention relates to for increasing the technology of interaction feature to video.

Background technology

How rest image and video describes event in time if being for them all has benefit and limitation.They are right In having defined limitation with interacting also for media.Generally, they are attractive for creator, and lack ginseng for spectators With property.For example, after video has been created, user generally only can according to creator initially expectedly by the frame of video passively Navigate (for example, play, refund, advance, suspend and stop), user has no chance and video interaction.Similar limitation is same Sample is applied to rest image.In this sense, video and rest image do not invite user to input.

The content of the invention

According to the first aspect of the invention, a kind of method for processing video frequency, including：Each frame of video is divided into it semantic Composition, to recognize one or more objects in each frame scene based on corresponding pixel groups, wherein, the video is media A part；Receive to the selection to one or more objects in framing scene；Frame by frame is followed the trail of described from the frame of video One or more objects, to recognize corresponding pixel groups, wherein corresponding pixel groups are including described in each frame One or more objects；And the media are carried out with Alpha-masking-out processing, it is one or more selected right with frame by frame isolation As.

According to the second aspect of the invention, a kind of video process apparatus, including：For each frame of video to be divided into it Semantic component, to recognize the module of one or more objects in each frame scene based on corresponding pixel groups, wherein, it is described Video is a part for media；For receiving the module to the selection to one or more objects in framing scene；For by One or more of objects are followed the trail of from the frame of video, to recognize the module of corresponding pixel groups, wherein described frame Corresponding pixel groups include one or more of objects in each frame；And for carrying out Alpha-illiteracy to the media Version processing, the module of one or more selected objects is isolated with frame by frame.

According to the third aspect of the invention we, a kind of computing device, including：Processor；Memory, it can be by the processing Device is accessed；And application program, it is stored on the memory, and can be by the computing device, the application journey Sequence is configured as：Each frame of video is divided into its semantic component, to recognize each frame scene based on corresponding pixel groups Interior one or more objects, wherein, the video is a part for media；Receive to one or more in framing scene The selection of object；Frame by frame follows the trail of one or more of objects from the frame of video, to recognize corresponding pixel groups, its Described in corresponding pixel groups include one or more of objects in each frame；And to the media carry out Alpha- Masking-out processing, isolates one or more selected objects with frame by frame.

According to the fourth aspect of the invention, a kind of method for processing video frequency, including：Each frame of video is divided into it semantic Composition, to recognize one or more objects in each frame scene based on corresponding pixel groups, wherein, the video is media A part；Receive to the selection to one or more objects in framing scene；Frame by frame is followed the trail of from the frame of the video One or more of objects, to recognize corresponding pixel groups, wherein corresponding pixel groups are included in each frame One or more of objects；The media are carried out with Alpha-masking-out processing, it is one or more selected right with frame by frame isolation As；And new media are produced using one or more of selected objects.

According to the fifth aspect of the invention, a kind of video process apparatus, including：For each frame of video to be divided into it Semantic component, to recognize the module of one or more objects in each frame scene based on corresponding pixel groups, wherein, it is described Video is a part for media；For receiving the module to the selection to one or more objects in framing scene；For by One or more of objects are followed the trail of from the frame of the video frame, to recognize the module of corresponding pixel groups, wherein Corresponding pixel groups include one or more of objects in each frame；For carrying out Alpha-illiteracy to the media Version processing, the module of one or more selected objects is isolated with frame by frame；And for using one or more of selected right As producing the modules of new media.

Brief description of the drawings

Fig. 1 a-c show three kinds of methods, show according to one or more embodiments of present disclosure be used for regarding Frequency increases the technology of interaction feature.

Fig. 2 a-g ' show the example images according to some embodiments, and these example images are exemplified with Fig. 1 a-c's Technology.

Fig. 3 a-b show the screenshot capture according to one or more embodiments, exemplified with exemplary user interface, use In with the media reciprocation including interaction feature as described herein.

Fig. 4 shows the example system according to one or more embodiments, its can implement it is as described herein be used for Video increases the technology of interaction feature.

Fig. 5 shows the embodiment of small form factor apparatus, and Fig. 4 system can be embodied wherein.

Embodiment

Disclose for increasing the technology of interaction feature to video, enable a user to using motion and standstill image Dynamic mixes to create new media.Interaction technique can include allowing user to change one or more objects in spatial redundancy of video in given frame At the beginning of between, or only animation (animate)/broadcasting part gives framing scene.The technology can include to video Each frame segmentation, to recognize one or more objects of each frame in, framing scene is given in selection (or selection of reception pair ...) Interior one or more objects, are tracked from frame to frame selected objects, and Alpha-masking-out processing (alpha-matting) only to broadcast Put/animation selected objects.In some instances, pixel depth information (for example using depth map) can be used improve and/ Or enhancing segmentation, selection, and/or tracking.According to present disclosure, many changes will be apparent.

Overview

As previously explained, rest image and video have defined limitation, and they are generally all to have attraction for creator Power, but lack property of participation for spectators/beholder.At present, viewing video generally only includes playing, refunding at once, advancing, temporarily Stop and stop the ability of all vision contents.It there are currently no simply and intuitively technology be used for and video interactive, only to broadcast every time Put partial video scene, or changing section video time/position, to allow to create new visual media, its middle part Branch scape and the remainder of scene are out-of-sequence.

So, and according to one or more embodiments of present disclosure, disclose special for increasing interaction to video The technology levied.Herein referred video includes the sequence of at least two rest images/frame, for example, film or use burst mode One group of photo of (burst mode).The entirety of single frame is referred to herein as " scene ", and pair interested in the scene of frame As or region (for example, people, animal, various projects, background or background portion are graded) be referred to as herein " object ".By this paper institutes Stating the interaction feature of technology generation includes that following new media can be created from video：1) new rest image, with field The object of one or more videos of (or from different frame) at the time of the remainder of scape is different；2) new video artifacts, tool There are the one or more objects started out of order；And 3) new visual media pseudomorphism, here, one or more objects are played, But the remainder transfixion (being similar to dynamic picture) of frame scene.So, in one or more embodiments, interaction feature It is included in the dynamic mixing that motion and standstill image is generated in the scene of broadcasting.Can be with dynamic-form (such as further In the case of interaction is possible) or quiescent form (such as further interact impossible in the case of) preserve and/or shared New media, as discussed in more detail below.

In certain embodiments, the technology as described herein for being used to increase interaction feature can at least include herein below： Segmentation, selection, tracking and Alpha-masking-out processing.As that can be understood according to present disclosure, thus it is possible to vary these functions Sequentially.Segmentation can include each frame of video being divided into its semantic component (semantic component), for example to make With without supervision figure cutting method (unattended graph cut method) or other suitable methods, based on corresponding pixel groups Recognize one or more objects in each frame scene.In some instances, segmentation can be completed with fully automated；But in other realities In example, segmentation can be semi-automatic or manually performed.Selection can include clicking on (for example, mouse input in the case of) or What person touched/rapped (such as under touch-sensitive input condition) video is presented one or more of frame object.In some implementations Example in, the pixel depth information (for example, depth map) of each frame for video can be used improve segmentation, select and/or Follow the trail of.In some this embodiments, three-dimensional or array camera can be used to produce depth information, such as discussed more fully below State.Note, in certain embodiments, selection can occur before it is split, this can aid in improvement and/or improves segmentation Process.

Tracking can follow the trail of selected objects including frame by frame from the frame of video, include selected objects in each frame to recognize Respective groups of pixels.A variety of methods can be used to perform Alpha-masking-out processing.One this illustrative methods includes being formed The transparent masking-out matched with the shape from one or more selected objects to framing scene, to allow by by described Video is played in one or more holes that bright masking-out is created, wherein, one or many is updated in given scenario for each frame of video The shape in individual hole, to match the shape of one or more of the frame just played selected objects.Another illustrative methods includes One or more selected objects form transparent masking-out in each frame, to allow by by one in the frame just played or many Individual selected objects are copied to playing video on framing scene.According to present disclosure, other suitable Alpha-masking-outs Processing method can be obvious.

As it was previously stated, the interaction feature for increasing to video using technology described herein can be used for creating new visual media Pseudomorphism, here, one or more objects are played, but the remainder remains stationary of frame scene is motionless.Only giving framing scene A part in realize animation, keep remaining to framing scene is constant and static aspect, this exemplary new media type is similar In dynamic picture.However, the interaction feature for increasing to video using technology described herein is provided and created better than conventional dynamic picture Multiple benefits of construction method.First, interaction feature as described herein allows the dynamic change to scene, and dynamic picture is non-friendship The mutual constant video circulation of formula.Secondly, complete or semi-automated techniques can be used to increase interaction feature as described herein, and Dynamic Graph Piece, which is created, is mainly manual processes.3rd, dynamic picture uses inaccurate border, result in unexpected visual artefacts, makes It can be avoided with segmentation as described herein, tracking and Alpha-masking-out treatment technology or eliminate it.It is excellent according to present disclosure It will be apparent in other benefits of conventional dynamic picture creation method.

According to some embodiments, for example can by including interaction feature as described herein (for example, only playing a part of video Ability) visual inspection/assessments of media detect the use of disclosed technology.It is also based on the result vision matchmaker produced Body detects the use of presently disclosed technology.It is, for example, possible to use describe by different way herein be used for increase to video The technology of interaction feature produces the image of only a fraction scene animation, or the video that object starts out of order.Root According to present disclosure, many changes or structure will be apparent.

Method or exemplary application

Fig. 1 a-c respectively illustrate method 100a-c, and method 100a-c is exemplified with according to the one or more of present disclosure Embodiment be used for video increase interaction feature technology.Fig. 2 a-g ' show the example images according to some embodiments, They show Fig. 1 a-c technology.As it was previously stated, mainly increasing interaction feature to the video with multiple frames herein Technology is discussed under linguistic context；However, it is not necessary to so limit the technology.For example, the technology shown in method 100a-c can be used for Rest image group increases interaction feature, such as root to other visual medias of the sequence including at least two rest images/frame Can be obvious according to present disclosure.Method 100a-c all include segmentation 102, selection 104, follow the trail of 106 and Alpha- Masking-out processing 108, will be discussed in more detail each below.

Fig. 2 a show image/frame 210 of the people 214 before waterfall 216 according to the station of exemplary embodiment.As can be seen , in this example frame, people 214 makes waving motion.Prospect 212 and sky 218 are also show in frame 210.Fig. 2 b show The example images gone out after the segmentation 102 to recognize people 214 is performed.Segmentation 102 can include splitting the frame of video For its semantic component, to recognize one or more objects of each frame in.Any of dividing method can be used to perform Segmentation 102, such as figure split plot design, clustering procedure, threshold value setting, the inspection of the method based on compression, the method based on block diagram, edge Survey, region-growing method, segmentation-fusion method, the method based on partial differential equation (PDE), watershed (watershed) method, or It is any other obvious suitable method according to present disclosure.In one exemplary embodiment, using without supervision Figure cutting method performs segmentation 102.Depending on structure or method used, segmentation 102 can be it is full automatic, automanual or Manual.

In certain embodiments, them can be split based on the corresponding pixel groups of one or more objects.For example, Fig. 2 b The pixel for the people 214 for representing to be carried knapsack by (in the frame 210) of the outlining of shape 215 and waved is shown, and is represented by shape The pixel of the sky 218 of the outlining of shape 219.Note, according to the exemplary segmentation procedure of such as without supervision (automatic) figure cutting method 102 are recognized, only include the object of people 214 or sky 218.Other objects in frame 210 can include waterfall area 216, Prospect 212 or any other suitable object interested or region.As it was previously stated, depending on the process of segmentation 102 used, can With automatic, semi-automatic or manually recognize one or more objects.

In certain embodiments, the depth information of the frame for video can be used to improve or strengthen segmentation 102.For example The depth map of frame can be used to provide or produce depth data.In some instances, each pixel can include RGB-D numbers According to here, RGB (red, green, blue color model) relevant with the color of each pixel, D and the depth information of each pixel have Close.In certain embodiments, depth information can be collected by the particular device of the capture video for technology described herein.This Various stereocameras, array camera, light-field camera or other depth transducers or depth sense can be included by planting equipment Technology.In particular instances, under the conditions of low-light level (low-light), Infrared Projector and monochromatic complementary metal Oxide semiconductor (CMOS) sensor (for example forIn) it can be used for capture three-dimensional Video data.In certain embodiments, it can be the video estimating depth information existed.For example, in some instances, The movable information of video through presence can be used for estimating depth information.In some cases, from the adjacent of monoscopic video The room and time information of frame can be used for estimating depth information.Depending on structures and methods used, it can use automatically, partly Automatic or manual technology is estimated to produce depth map.

Fig. 2 c show the example Sexual behavior mode 104 of object in the frame of video according to an embodiment.More specifically, showing hand People 214 in 250 selection frames 210.Selection 104 can include one or more objects of the given frame in of selection.In this way, example Such as, method 100a-c can be configured as receiving selection 104 from user.Various input equipments can be used to perform selection 104, Expected object such as is clicked on using mouse or tracking plate, expected object is touched (such as with touch-control using touch-sensitive device Rapping for suitably placing is used in the equipment of screen), or by any other suitable method, the posture such as made by people, Or sound or spoken word from people.Fig. 2 d show the example of the frame 210 after it have selected people 214.As can be seen, make For the result of Fig. 2 c selection, highlighted people 214.Note, in this embodiment, the shape 215 of people is via splitting 102 mistakes Journey is identified.However, in other embodiments, can be without segmentation 102, after having been carried out selection 104, such as originally What text as will be discussed in greater detail.In certain embodiments, pixel depth information can be used for automatically selecting object (for example, automatic choosing Select to the foreground and background in framing scene), or enhancing selection is (for example, the pixel groups of same or similar depth are shared in enhancing User selection).

Fig. 2 e-f show the example that 106 are followed the trail of between the first frame 210 and the second frame 220 according to embodiment.Follow the trail of 106 can follow the trail of selected objects including frame by frame from the frame of video.In the exemplified embodiment, the first frame 210 and Two frames 220 are the series of frames from video.Fig. 2 e show the first frame 210, include the profile 215 of people 214 and its segmentation, such as Described in preceding.Fig. 2 f show the second frame 220, with the numbering corresponding to those in the first frame 210 (for example, for the first frame In sky 218 and 228 etc. for the sky in the second frame).The second frame 220 in the exemplified embodiment includes With the identical people 224 of the first frame 210, in addition to his hand position is moved, because as can be seen, people 214,224 is waving. Segmentation contour 225 shows the result of new pixel groups, and expression its left hand in brandishing changes the people 224 of position.Follow the trail of 106 It can include being tracked from frame to frame selected objects, to recognize the correct pixel groups in each frame.For example, performing segmentation 102 Afterwards, can from first the 210 to the second frame of frame follow the trail of as pixel groups 215 and 225 know others 214,224.In some embodiments In, pixel depth information can be used for enhancing and follow the trail of (for example, being increased the effect of frame by frame identification object using this depth information Power).

As segmentation 102 can be performed before or after selection 104 seen in Fig. 1 a-c.In exemplary side In method 100a, segmentation 102 is performed, is the selection 104 of one or more objects afterwards, 106 selected objects is followed the trail of afterwards.At this In embodiment, execution segmentation 102 can reduce the delay between selection 104 and media playback before selection 104.At another In illustrative methods 100b, the selection 104 of one or more objects is performed, is segmentation 102 afterwards, is followed by tracking 106.At this , can be as 104 information of refinement increase selection (for example, selection coordinate) to splitting 102 processes (such as to nothing in individual embodiment Supervision figure cuts algorithm increase selection coordinate).In another illustrative methods 100c, segmentation 102 is performed, is tracking 106 afterwards, It is the selection 104 of one or more objects afterwards.In this embodiment, segmentation 102 is performed before selection 104 and is followed the trail of 106 can reduce the delay between selection 104 and media playback.In illustrative methods 100a-c, in segmentation 102, selection 104 handle 108 with execution Alpha-masking-out after the completion of tracking 106；However, according to present disclosure, it is clear that need not be this Situation.In other exemplary embodiments, method can be included in before execution tracking 106 and Alpha-masking-out processing 108 104 processes of multiple segmentations 102 and selection.For example, in this embodiment, method can include automatic cutting procedure, Yong Huxuan Select, be subsequently based on selection input and split again.This exemplary series can be repeated, until user obtains the true to nature of expected degree Degree.

Fig. 2 g-g ' show the example of Alpha-masking-out processing 108 of the frame 210 according to some embodiments.Alpha- Masking-out processing 108 can include frame by frame and isolate selected objects, only to make selected objects animation.For example, at Alpha-masking-out Reason 108 can include：1) form the transparent masking-out with the form fit to the selected objects in framing scene, with allow by by Video is played in the hole that transparent masking-out is created, here, the shape of given scenario mesopore is updated for each frame of video, to match just The shape of selected objects in the frame of broadcasting；Or 2) surround each frame in selected objects form transparent masking-out, to allow to pass through Selected objects in the frame just played are copied to playing video on framing scene.In other words, exemplary Alpha- Masking-out processing 108 during, video it is initial/to framing in cut one or more holes, its represent selected objects shape, (tool is porose) initial frame is stacked on each subsequent frame of video, to play video by hole, here, connecing a frame in a frame On the basis of update hole in initial frame, to match the shape of selected objects in the current frame just played.It is exemplary at another It is initial/to framing be starting point again during Alpha-masking-out processing 108, except in this example process, from every One subsequent frame isolates one or more selected objects (for example, by cutting off, removing the remaining scene of each subsequent frame or make Its is transparent), then on the basis of a frame connects a frame, by the selected objects from the frame currently just played copy to initial frame it On, to play the video.

Fig. 2 g show that Alpha-masking-out is handled into 108 methods is used for primitive frame 210 (for example, making a choice at 104 Frame) produced by image example.As can be seen, cut off with people 214 that (he is pair that is uniquely pre-selected from primitive frame 210 As) the hole 217 that matches of shape.Then video can be played by hole 217, in each subsequent frame, reset primitive frame 210, and create in primitive frame new hole, the selected objects (being people 214 in the case) of itself and present frame match.With working as The original image in the new hole of the excision that previous frame matches can then be covered on present frame to play the frame.Can be each Individual successive frames proceed hole excision-overwrite procedure (because following the trail of 106 this information) to play video.

Fig. 2 g ' alternatively show handles 108 methods for produced by primitive frame 210 by another Alpha-masking-out Example images.In this interchangeable Alpha-masking-out handles 108 methods, cut off from subsequent frame 220 around people 224 scene 230 (profile for depicting frame 220 for illustrative purposes).It can then be surrounded by that will cut off Remaining copying image is to playing video on primitive frame 210 after the scene 230 of the shape of people 224.In this way, only by institute Object (for example, being people 224 in the case) is selected to copy on primitive frame, to make selected objects animation when playing video Change.The scene that can proceed for each successive frames around object cut off-copy on primitive frame (because follow the trail of 106 this Kind of information) to play video.Note, although only used in the example that these Alpha-masking-out handle 108 processes one it is right As (people 214,224), but multiple objects can be used.For example, if selection sky 218 is as extra animation object, Sky 218 will be cut off in Fig. 2 g, sky 218 can be also shown in Fig. 2 g '.It is also noted that in certain embodiments, excision is selected One or more objects (or around scene of selected one or more objects) may be constructed selected one or more objects (or around scene of selected one or more objects) is set as transparent.

Exemplary media is created

According to one or more embodiments, increase interaction feature (using the techniques described herein) to video and can be used for wound Build polytype media.Media can include：1) new rest image, with different from scene remainder at the time of One or more objects of the video of (or from different frame)；2) new video artifacts, with one started out of order or Multiple objects；And 3) new visual media pseudomorphism, one or more objects are played herein, but the remainder of frame scene keeps quiet Only motionless (being similar to dynamic picture).For exemplary purposes and there is provided these three examples, and it is not intended to limit this Disclosure, will be clarified in more detail them below.

First exemplary new media bag that the interaction feature that can use increases to video with technology described herein is obtained Include and create new rest image, one with the video of (or from different frame) different from scene remainder at the time of Or multiple objects.This with animation or can play those objects by selecting to give one or more of framing object, simultaneously Constant realize is kept to the remaining scene in framing.In certain embodiments, interaction feature can allow animation/broadcasting simultaneously And then stop giving one or more of framing object in different frame.In some this embodiments, interaction feature then may be used To allow user's animation/broadcasting and then stop different one or more objects, make it that at least two objects can be with Framing scene is given in different frame positions relative to remaining.Therefore, in such embodiments it is possible to single quiet in the presence of being presented in Only three different video time/frame positions in image.

Second exemplary new media bag that the interaction feature that can use increases to video with technology described herein is obtained Include and create new video artifacts, it has the one or more objects started out of order.This can be by selecting in framing One or more objects come animation or broadcasting and then so that the remainder of scene is played to realize.In some realities Apply in example, interaction feature can allow animation or broadcasting and then stop in different frame to one or more of framing pair As.In some this embodiments, interaction feature can then allow user's animation/broadcasting and then stop different one Individual or multiple objects, with allow at least two objects relative to residue to framing scene in different frame positions.Therefore, exist In this embodiment, user is then able to play whole media, wherein, two or more objects do not have in sequence each other, And the remainder of two or more objects and frame is without in sequence.

The 3rd exemplary new media bag that the interaction feature that can use increases to video with technology described herein is obtained New visual media pseudomorphism is included, wherein one or more objects are played, but the remainder remains stationary of frame scene is motionless.This can With by selecting to one or more of framing object come animation or broadcasting, while the remainder code insurance to framing Scene Hold constant realize.In certain embodiments, interaction feature can allow animation/broadcasting and then according to order stopping to One or more of framing object.In some this embodiments, interaction feature then allow user's animation/broadcasting and Stop different one or more objects also according to order.Therefore, in such an embodiment, user is then able to play matchmaker Body, two of which or more object is each other without in sequence, and the remainder of two or more objects and frame does not have Have in sequence, but the remainder of primitive frame keeps constant and static.

Animation is realized in only in a part to framing scene and retains remaining to framing scene is constant and static aspect, 3rd exemplary new media is similar to dynamic picture.However, the interaction feature for increasing to video using technology described herein is carried Multiple benefits better than conventional dynamic picture creation method are supplied.First, interaction feature as described herein allows to move scene State changes, and dynamic picture is the constant video circulation of non-interactive type.Secondly, complete or semi-automated techniques can be used to increase this Interaction feature described in text, and dynamic picture creates and is mainly manual processes.3rd, dynamic picture uses inaccurate border, Result in unexpected visual artefacts, using it is as described herein segmentation, follow the trail of and Alpha-masking-out treatment technology can avoid or Eliminate them.According to present disclosure, other benefits better than conventional dynamic picture creation method will be apparent.

Fig. 3 a-b show the screenshot capture according to one or more embodiments, exemplified with exemplary user interface, use In with the media reciprocation including interaction feature as described herein.It is that user is presented the of video as seen in Fig. 3 a One screenshot capture 310, it is similar to frame 210 as described herein.For example, still showing three foregoing objects, people 314, waterfall 316 With sky 318.People 314 is shown as with dot-dash line profile, and waterfall 316 is shown as with dash line profile, and sky 318 is shown as With dash line profile.In the exemplified embodiment, simultaneously Alpha-masking-out has handled three for segmented, selection, tracking Object 314,316 and 318, it is allowed to which user selects one or more of which, with relative to shown in the first screenshot capture 310 Frame remainder broadcasting/animation selected objects.Instruction 311 is included in this exemplary UI, to inform the user " choosing Select you and want the object of broadcasting/animation ", " pressing and holding to select multiple objects " and " object of animation is selected again To stop it ".For exemplary purposes and there is provided this exemplary UI and corresponding instruction, it is not intended that limit this public affairs Open content.

Fig. 3 b are shown in selection people 314 and sky 318 with relative to scene broadcasting/animation around the two objects The new media created after the two objects.As that can see in the second screenshot capture 320, making one 314 animations causes him Hand brandish shown new position, make sky animation cause cloud 328 to occur.It stopped the institute in the second screenshot capture 320 The whole scene shown is to be easy to discuss, this can be by individually selecting them in object 314 and 318 animation or passing through Some other suitable orders (stop whole buttons for example, using or tap space bar) are performed.However, it is possible to by user one Next ground animation simultaneously stops object, to allow object and other objects and the remainder of scene out of turn Mobile and/or stopping.There is provided and continue button 321 to allow user to continue selected object before broadcasting/animation.In some realities In example, interaction feature can be configured as allowing user to select one or more objects to stop, playing before animation is continued And/or reset object.Various features can be included and can be used for selecting on which object to inform the user, which is currently selected Which object object broadcasting/animation, currently select stop/not animation, which object using frame out of order (for example, refer to Show device to show that existing object is commenced play out from which frame), or other can aid in user using retouching by different way herein The information for the interaction feature stated.

In certain embodiments, the new matchmaker created using the interaction feature for increasing to video described by different way herein Body can be preserved and/or shared (output, Email are sent, uploaded) with dynamic or static format.Dynamic is shared to wrap Include shared particular media type, no matter it is rest image, the video artifacts created, or similar dynamic picture pseudomorphism, its Mode allows the reciever of media or later beholder further to be interacted with media (for example, by changing one or many The start sequence of individual object).Static state is shared can be included sharing media according to the media of establishment.For example, represent video in Rest image at the time of abiogenous different can be shared as JPEG's (JPEG) file or portable network Image (PNG) file, only enumerates two common forms.Creating video, the example disposition of which part video out of order Under condition, new media can be shared as Motion Picture Experts Group's (MPEG) file or Audio Video Interleaved form (AVI) file, only arrange Lift two common forms.New visual media pseudomorphism is being created, wherein the only example disposition of a part of animation/broadcasting of frame Under condition, new media can be shared as GIF(Graphic Interchange format) (GIF) file, only enumerate a common form.Shown in Fig. 3 b In example, new media can save as dynamic or static file, or it can be by selecting corresponding button 323,325 to share (output, Email are sent, uploaded) is dynamic or static file.

Example system

Fig. 4 shows the example system 400 according to one or more embodiments, and it can implement as described herein be used for Increase the technology of interaction feature to video.In certain embodiments, system 400 can be media system, although system 400 is not limited In this linguistic context.For example, system 400 may be embodied in personal computer (PC), notebook computer, ultrabook computer, flat board electricity Brain, Trackpad, portable computer, HPC, palm PC, personal digital assistant (PDA), cell phone, cell phone/ PDA combinations, TV, smart machine (for example, smart phone, Intelligent flat computer or intelligent television), mobile internet device (MID), messaging devices, data communications equipment, set top box, game machine or be able to carry out graphical rendering operations other this In class computing environment.

In certain embodiments, system 400 includes platform 402, and it is coupled to display 420.Platform 402 can be from such as The content device of content services devices 430 or content transmitting apparatus 440 or other similar content sources receives content.Including one Or the navigation controller 450 of multiple navigation characteristics can be used for for example with platform 402 and/or the reciprocation of display 420.Below Each in these example components is described in more detail.

In certain embodiments, platform 402 can include chipset 405, processor 410, memory 412, storage device 414th, graphics subsystem 415, application program 416 and/or wireless device 418.Chipset 405 can be in processor 410, storage Phase intercommunication is provided between device 412, storage device 414, graphics subsystem 415, application program 416 and/or wireless device 418 Letter.For example, chipset 405 can include storage adapter (not shown), it can provide the phase intercommunication with storage device 414 Letter.

Processor 410 for example can be implemented as CISC (CISC) or Reduced Instruction Set Computer (RISC) Processor, processor, multinuclear or any other microprocessor of compatibility x86 instruction set or CPU (CPU).One In a little embodiments, processor 410 can include dual core processor, double-core move processor etc..Memory 412 for example can be real Now be volatile storage devices, such as, but not limited to, random access memory (RAM), dynamic random access memory (DRAM), Or static state RAM (SRAM).Storage device 414 for example can be implemented as non-volatile memory device, such as, but not limited to, and disk drives Dynamic device, CD drive, tape drive, internal storage device, affixed storage device, flash memory, battery backup type SDRAM are (same Walk DRAM), and/or network can obtain type storage device.In certain embodiments, for example multiple hard disk drives are being included When, storage device 414 can include the technology to increase the storage performance enhancing protection to valuable Digital Media.

Graphics subsystem 415 can perform the processing to such as static or video image, for display.Figure subsystem System 415 for example can be graphics processing unit (GPU) or VPU (VPU).Analog or digital interface can be used for can It is communicatively coupled graphics subsystem 415 and display 420.For example, interface can be HDMI, display port (DisplayPort), any one in radio HDMI, and/or compatible wireless HD technology.Graphics subsystem 415 can collect Into into processor 410 or chipset 405.Graphics subsystem 415 can be independent card, and it is communicably coupled to chipset 405.It can realize that being used for as described in herein by different way increases the skill of interaction feature to video in various hardware structures Art.For example, segmentation 102, selection 104, follow the trail of 106 and Alpha-masking-out processing 108 can all by individual module (for example CPU) perform or receive, and in other instances, this processing can be performed in the module of separation (for example, in cloud mode Perform segmentation 102, selection 104 received from touch screen input, locally executed on the computer of user tracking 106 and Alpha- Masking-out processing 108, or according to the obvious some other changes of present disclosure meeting).In certain embodiments, for The technology of video increase interaction feature can be by specifying the discrete processors for this purpose to realize, or by one or many Individual general processor (including polycaryon processor) realizes that it can access and perform the software for embodying the technology.In addition, one In a little embodiments, segmentation 102, selection 104, tracking 106 and Alpha-masking-out processing 108 can be stored in one or more moulds In block, such as including memory 412, storage device 414, and/or application program 416., will under a this exemplary cases The technology for encoding is into image procossing application program, and it is included in application program 416, wherein, application program can be in processing Performed on device 410.Note, image procossing application program directly can be locally loaded into the computing system 400 of user.It can replace Change ground, image procossing application program can via such as network 460 network (for example, LAN and internet) and remote service Device is supplied to the computing system 400 of user, wherein, remote server be configured as comprising or using provided herein is figure As the main frame of the service of processing application program.In some this embodiments, some parts of image procossing application program can be with Perform on the server, and other parts can as the browser for the computing system 400 for being supplied to user executable module Performed via processor 410, as that can be obvious according to present disclosure.

Wireless device 418 can include one or more wireless devices, and it can use various suitable channel radios Signal is launched and received to letter technology.This technology can include across one or more wireless networks (e.g., including in network In 460) communication.Exemplary wireless network include but is not limited to WLAN (WLAN), Wireless Personal Network (WPAN), Wireless MAN (WMAN), Cellular Networks and satellite network.In the communication across this network, wireless device 418 can root Operated according to one or more applicable standards of any version.

In certain embodiments, display 420 can include any TV or computer type monitor or display.It is aobvious Show that device 420 can for example include liquid crystal display (LCD) screen, electrophoretic display device (EPD) (EPD) or liquid paper display (liquid Paper display), flat-panel monitor, touch-control panel type display, the equipment of similar TV, and/or TV.Display 420 can be with It is numeral and/or simulation.In certain embodiments, display 420 can be holographic or three dimensional display.In addition, display Device 420 can be transparent surface, and it can receive visual projection.This projection can pass on various forms of information, image and/ Or object.For example, this projection can be the vision covering applied to mobile augmented reality (MAR).In one or more softwares Under the control of application program 416, platform 402 can show user interface 422 on display 420.

In certain embodiments, content services devices 430 can be by any national, international and/or independent service Possess and (for example, one or more remote servers, be configured to supply content, such as video, rest image and/or have Provided herein is function image procossing application program), and thereby for platform 402 be can via such as internet and/or What other networks 460 were obtained.Content services devices 430 may be coupled to platform 402 and/or display 420.Platform 402 and/or Content services devices 430 may be coupled to network 460, transmit and (for example send and/or receive) media letter to be to and from network 460 Breath.Content delivery apparatus 440 is also coupled to platform 402 and/or display 420.In certain embodiments, content services devices 430 can include cable television box, personal computer, network, phone, can transmit digital information and/or content have interconnection The device of net function, and any other it is similar can via network 460 or directly content provider with The equipment of unidirectional or bi-directionally transmitted content between platform 402 and/or display 420.It will be appreciated that can be come and gone via network 460 Any one component in system 400 and content provider be unidirectional or bi-directionally transmitted content.The example of content can include appointing What media information, including such as video, music, figure, text, medical science and game content etc..

Content services devices 430 receive content, such as cable television program, including media information, digital information, and/or Other online contents (for example, video, rest image sequence etc.).The example of content provider can include any wired or satellite TV or broadcast or ICP.In a this exemplary embodiment, the computing system 400 of user can be through By obtaining the image procossing application program configured as herein provided by the addressable ICP of network 460 Or service.As previously explained, this service can be based on from the defeated of so-called client-side (computing system 400 of user) reception Enter (for example, any other input of selection 104 or agreement (engage) service) there is provided image procossing application program in server Execution on side.Alternatively, service can provide executable code to client-side computing system 400, and it comprises whole Image procossing application program.For example, service can provide one or more webpages to browser application, it has what is be adapted to User interface and code therein is encoded in, browser application operates in computing system 400 and is configured as combining Processor 410 efficiently performs the code.Browser can be for example included in application program 416.In other embodiments In, some image application programs can be performed on the server side, and other parts can be performed in client-side.Many this visitors Family end-server architecture can be obvious.The example of offer is not intended to limit present disclosure.In some embodiments In, platform 402 can receive control signal from the navigation controller 450 with one or more navigation characteristics.Controller 450 Navigation characteristic can be used for for example interacting with user interface 422.In certain embodiments, navigation controller 450 can be fixed point Equipment, it can be computer hardware component (being in particular human interface device), and it allows user by space (for example, continuous And multidimensional) data input is into computer.Such as many systems of graphic user interface (GUI), TV and monitor all permit Family allowable is using body gesture or sound or voice command control computer or TV and provides it data.

The motion of the navigation characteristic of controller 450 can be by indicator, cursor, focusing ring or display over the display The motion reflection of other visual detectors is over the display (such as display 420).For example, in the control of software application 416 Under system, the navigation characteristic on navigation controller 450 may map to the virtual navigation being for example shown in user interface 422 Feature.In certain embodiments, controller 450 can not be the component of separation, and be integrated into platform 402 and/or display In 420.But it will be appreciated that embodiment is not limited in element shown or described herein or linguistic context.

In certain embodiments, driver (not shown) can include with enable a user to for example realizing it is initial The technology of the platform 402 such as TV is opened and closed immediately by the touch of button after startup.When " closing " platform, program is patrolled Collecting can allow platform 402 to be passed in a streaming manner to media filter or other guide service equipment 430 or content delivery apparatus 440 Send content.In addition, for example chipset 405 can be included to 5.1 surround sound audios and/or the surround sound audio of fine definition 7.1 Hardware and/or software support.Driver can include the graphdriver for integrated image platform.In certain embodiments, Video driver device can include Peripheral Component Interconnect (PCI) express video cards.

In various embodiments, any one or more components shown in system 400 can be integrated.For example, Platform 402 or content services devices 430 can be integrated, or platform 402 and content delivery apparatus 440 can be integrated , such as platform 402, content services devices 430 and content delivery apparatus 440 can be integrated.In multiple embodiments In, platform 402 and display 420 can be integrated units.For example, display 420 and content services devices 430 can be collection Into, or display 420 and content delivery apparatus 440 can be integrated.These examples are not intended in the limitation disclosure Hold.

In various embodiments, system 400 can be implemented as wireless system, wired system or combination.Work as realization During for wireless system, system 400 can include the component or interface for being suitable for communicating by wireless shared medium, wireless sharing matchmaker Style one or more antennas 404, emitter, receiver, transceiver, amplifier, wave filter, control logic etc. in this way.Nothing The example that line shares media can include part wireless frequency spectrum, such as RF spectrum etc..When implemented as a wired system, system 400 can include being suitable for the component or interface that are communicated by wired communication media, wired communication media be, for example, input/ Export (I/O) adapter, the physical connector I/O adapters to be connected with corresponding wired communication media, network interface Block (NIC), disk controller, Video Controller, Audio Controller etc..The example of wired communication media can include wire, electricity Cable, metal lead wire, printed circuit board (PCB) (PCB), bottom plate, switching fabric, semi-conducting material, twisted-pair feeder, coaxial cable, optical fiber etc. Deng.

Platform 402 can set up one or more logics or physical channel, to transmit information.Information can include media Information and control information.Media information may refer to be intended for any data of the content of user.The example of content can be wrapped Include for example from voice conversation, video conference, stream video, Email or text message, voice mail message, alphabetic characters Number, figure, image, video, text etc..Control information may refer to represent the order for being intended for automated system, instruction Or any data of control word.Sent for example, control information can be used for media information routeing by system, or indicate section (for example, using the interaction feature as described herein for video) handles media information to point in a predefined manner.But embodiment is not limited Shown in Fig. 4 or described element or linguistic context.

As described above, system 400 can be embodied with different physical type or form factor.Fig. 5 shows small shape The embodiment of factor equipment 500, can embody system 400 wherein.For example in certain embodiments, equipment 500 can be realized For the mobile computing device with wireless capability.For example, mobile computing device may refer to processing system or such as one Or any equipment of the portable power source of multiple batteries.

As it was previously stated, the example of mobile computing device can include personal computer (PC), notebook computer, ultrabook electricity Brain, tablet personal computer, Trackpad, portable computer, HPC, palm PC, personal digital assistant (PDA), cell phone, honeybee Cellular telephone/PDA combinations, TV, smart machine (for example, smart phone, Intelligent flat computer or intelligent television), mobile Internet Equipment (MID), messaging devices, data communications equipment etc..

The example of mobile computing device can also include the computer for being arranged as being dressed by people, such as wrist computer, finger Computer, finger ring computer, glasses computer, band folder computer, arm are with computer, clothes computer on computer, footwear, and other wearable electricity Brain.For example, in certain embodiments, mobile computing device can be implemented as smart phone, it is able to carry out computer application journey Sequence, and voice communication and/or data communication.Although being come using the mobile computing device for the smart phone for being implemented as example Some embodiments are illustrated, but it will be appreciated that other wireless mobile computing devices can also be used to realize other embodiment.Implement Example is not limited to this linguistic context.

As shown in figure 5, equipment 500 can include shell 502, display 504, input/output (I/O) equipment 506 and day Line 508.Equipment 500 can also include navigation characteristic 512.Display 504 can include any suitable display unit, for showing Show the information suitable for mobile computing device.I/O equipment 506 can include any suitable I/O equipment, for entering information into In mobile computing device.The example of I/O equipment 506 can include alphanumeric keyboard, numerical keypad, Trackpad, input Key, button, switch, rocker switch, microphone, loudspeaker, speech recognition apparatus and software etc..Can also be via microphone by information It is input in equipment 500.This information can be digitized by speech recognition apparatus.Embodiment is not limited to this linguistic context.

Hardware element, software element or combination can be used to realize multiple embodiments.The example of hardware element Can include processor, microprocessor, circuit, circuit element (for example, transistor, resistor, capacitor, inductor etc.), Integrated circuit, application specific integrated circuit (ASIC), PLD (PLD), digital signal processor (DSP), scene can be compiled Journey gate array (FPGA), gate, register, semiconductor devices, chip, microchip, chipset etc..The example of software can be with Including component software, program, using, computer program, application program, system program, machine program, operating system software, in Between part, firmware, software module, routine, subroutine, function, method, process, software interface, application programming interfaces (API), instruction Collection, calculation code, computer code, code segment, computer code segments, word, numerical value, symbol or its any combinations.According to any The factor of quantity, if between the embodiments can be different using hardware element and/or software element, the factor is, for example, pre- The computation rate of phase, power stage, heat resistance, process cycle budget, input data rate, output data rate, memory resource, data Bus speed and other designs or performance constraints.

Some embodiments can for example use machine readable media or product or computer program product to realize, the machine Device computer-readable recording medium or product or computer program product can be with store instruction or instruction set, and when being performed by machine, it can make Obtain method and/or operation of the machine execution according to the embodiment of present disclosure.This machine can for example include any be adapted to Processing platform, calculating platform, computing device, processing equipment, computing system, processing system, computer, processor etc., and can Realized with the combination using any suitable hardware and software.Machine readable media or product or computer program product are for example The non-transient memory cell of any suitable type, memory devices can be included, memory product, storage medium, deposited Equipment, storage product, storage medium and/or memory cell are stored up, for example, memory, removable or irremovable medium, erasable Or non-erasable medium, writeable or rewritable media, numeral or simulation medium, hard disk, floppy disk, aacompactadisk read onlyamemory (CD- ROM), can imprinting compact-disc (CD-R), rewritable compact-disc (CD-RW), CD, magnetizing mediums, magnet-optical medium, mobile memory card Or disk, various types of digital multi-purpose disks (DVD), tape, cassette tape etc..Instruction can include any suitable type can Execution code, it uses arbitrarily suitable senior, rudimentary, object-oriented, visual, compiling and/or explanatory volume Cheng Yuyan is realized.Some embodiments can realize that it is comprising disclosed by different way herein in computer program product For the function for the technology for increasing interaction feature to video, this computer program product can including one or more machines Read medium.

Unless specifically indicated that different, it will be understood that " processing ", " calculating ", " computing ", " it is determined that " etc. term Refer to computer or computing system, or similar electronic computing device action and/or processing, it is by the deposit of computing system The data manipulation of physical quantity (such as electronics) is expressed as in device and/or memory and/or other data are transformed to, other described numbers According to the memory, register or other this information storing device, transmission equipment or displays for being similarly represented as computing system Interior physical quantity.Embodiment is not limited to this linguistic context.

Further exemplary embodiment

Following instance is on further embodiment, according to them, and many displacements or structure will be apparent.

Example 1 is a kind of method, including：Each frame of video is divided into its semantic component, with based on corresponding pixel One or more objects in each frame scene of group identification, wherein, video is a part for media；Receive to in framing scene One or more objects selection；Frame by frame follows the trail of one or more of objects from the frame of video, to recognize the phase The pixel groups answered, corresponding pixel groups include one or more of objects in each frame；And A Er is carried out to media Method-masking-out processing, isolates one or more selected objects with frame by frame.

Example 2 includes the theme of example 1, wherein, media are carried out with Alpha-masking-out processing includes：Formed with coming self-supporting The transparent masking-out that the shape of one or more of selected objects of framing scene matches, to allow by by the transparent illiteracy Video is played in one or more holes that version is created, wherein, update one or more holes in given scenario for each frame of video Shape, to match the shape of one or more of selected objects in the frame just played；Or around described in each frame One or more selected objects form transparent masking-out, with allow by will be in the frame just played it is one or more of selected right As copying to playing video on framing scene.

Example 3 includes the theme of example 1 or 2, wherein, perform point to each frame of video using without supervision figure cutting method Cut.

Example 4 includes the theme of foregoing any one example, further comprises pixel depth information, every to improve identification The segmentation of one or more objects of one frame in.

Example 5 includes the theme of example 4, further comprises that producing the pixel depth using three-dimensional or array camera believes Breath.

Example 6 includes the theme of foregoing any one example, further comprises receiving to one or more of from user The selection of object.

Example 7 includes the theme of example 6, further comprises performing from one or more of objects in framing Click or rap input receive user selection.

Example 8 includes the theme of any one in example 1-7, further comprises receiving to described before each frame is split The selection of one or more objects, wherein, only split selected one or more objects.

Example 9 includes the theme of any one in example 1-7, further comprises receiving pair to one or more trackings Before the selection of elephant, one or more of objects are followed the trail of.

Example 10 includes the theme of any one in example 1-9, further comprises producing rest image, wherein, described one Individual or multiple selected objects from from the frame different to framing.

Example 11 include example 1-9 in any one theme, further comprise produce video, wherein, it is one or Multiple selected objects do not start in sequence relative to framing.

Example 12 includes the theme of any one in example 1-9, further comprises producing visual media, wherein, only play One or more of selected objects, while keeping the remainder transfixion to framing.

Example 13 includes the theme of any one in example 1-9, further comprises producing visual media, wherein it is possible to select One or more of particular frame of video object is selected, so that selected one or more objects are surplus relative to the particular frame Remaining part point animation.

Example 14 is a kind of mobile computing system, the method for being configured as performing foregoing any one example.

Example 15 is a kind of computing device, including：Processor；Memory, can be accessed by the processor；And apply journey Sequence, is stored on the memory, and can be configured as by the computing device, the application program：By each of video Frame is divided into its semantic component, to recognize one or more objects in each frame scene based on corresponding pixel groups, wherein, depending on Frequency is a part for media；Receive to the selection to one or more objects in framing scene；Frame by frame is from the frame of video One or more of objects are followed the trail of, to recognize corresponding pixel groups, corresponding pixel groups are included in each frame One or more of objects；And media are carried out with Alpha-masking-out processing, it is one or more selected right with frame by frame isolation As.

Example 16 includes the theme of example 15, wherein, media are carried out with Alpha-masking-out processing includes：Formed and come from The transparent masking-out that shape to one or more of selected objects of framing scene matches, to allow by by described transparent Video is played in one or more holes that masking-out is created, wherein, update one or more in given scenario for each frame of video The shape in hole, to match the shape of one or more of selected objects in the frame just played；Or in each frame institute State one or more selected objects and form transparent masking-out, with allow by will be in the frame just played it is one or more of selected Object tools play video on to framing scene.

Example 17 includes the theme of example 15 or 16, further comprises：Display, is operatively coupled to the processing Device；And at least one input equipment, be operatively coupled to the processor, wherein, user can use it is described at least one Input equipment is selected to one or more of objects in framing scene.

Example 18 includes the theme of example 15 or 16, further comprises the touch-control panel type display for being coupled to the processor, Wherein, the touch screen is configured as inputting the selection received to one or more of objects from user.

Example 19 is at least one computer program product, and it is encoded with instructing, when by one or more processors During execute instruction, causing to perform is used to increase the process of interaction feature to video, and the process includes：By each frame of video point Its semantic component is segmented into, to recognize one or more objects in each frame scene based on corresponding pixel groups, wherein, video is A part for media；Receive to the selection to one or more objects in framing scene；Frame by frame is followed the trail of from the frame of video One or more of objects, to recognize corresponding pixel groups, corresponding pixel groups include described in each frame One or more objects；And media are carried out with Alpha-masking-out processing, one or more selected objects are isolated with frame by frame.

Example 20 includes the theme of example 19, wherein, media are carried out with Alpha-masking-out processing includes：Formed and come from The transparent masking-out that shape to one or more of selected objects of framing scene matches, to allow by by described transparent Video is played in one or more holes that masking-out is created, wherein, update one or more in given scenario for each frame of video The shape in hole, to match the shape of one or more of selected objects in the frame just played；Or in each frame institute State one or more selected objects and form transparent masking-out, with allow by will be in the frame just played it is one or more of selected Object tools play video on to framing scene.

Example 21 includes the theme of example 19 or 20, wherein, performed using without supervision figure cutting method to each frame of video Segmentation.

Example 22 includes the theme of any one in example 19-21, further comprises pixel depth information, to improve knowledge The segmentation of one or more objects of not each frame in.

Example 23 includes the theme of example 22, further comprises producing the pixel depth using three-dimensional or array camera Information.

Example 24 includes the theme of any one in example 19-23, further comprise receiving to one from user or The selection of multiple objects.

Example 25 includes the theme of example 24, further comprises holding from one or more of objects in framing Capable click raps input and receives user's selection.

Example 26 includes the theme of any one in example 19-25, further comprises the reception pair before each frame is split The selection of one or more of objects, wherein, only split selected one or more objects.

Example 27 includes the theme of any one in example 19-25, further comprises receiving to one or more trackings Object selection before, follow the trail of one or more of objects.

Example 28 includes the theme of any one in example 19-27, further comprises producing rest image, wherein, it is described One or more selected objects from from the frame different to framing.

Example 29 includes the theme of any one in example 19-27, further comprises producing video, wherein, it is one Or multiple selected objects do not start in sequence relative to framing.

Example 30 includes the theme of any one in example 19-27, further comprises producing visual media, wherein, only broadcast One or more of selected objects are put, while keeping the remainder transfixion to framing.

Example 31 includes the theme of any one in example 19-27, further comprises producing visual media, wherein it is possible to One or more of the particular frame of video object is selected, so that selected one or more objects are relative to the particular frame Remainder animation.

Example 32 is a kind of mobile computing system, is configured as at least one meter of any one in running example 18-31 Calculation machine program product.

The preceding description of exemplary embodiment is provided for purposes of illustration and description.It is not intended to exhaustive Or present disclosure is confined to disclosed precise forms.According to present disclosure, many modifications and variations are possible. Scope of the present disclosure being not intended to be limited by present embodiment part, but limited by appended claims System.The application for requirement the application priority that future submits can require disclosed theme by different way, it is possible to generally Including as disclosed in herein by different way or demonstration one or more limitations any combination.

Claims

1. a kind of method for processing video frequency, including：

Each frame of video is divided into its semantic component, to recognize one in each frame scene based on corresponding pixel groups Or multiple objects, wherein, the video is a part for media；

Receive to the selection to one or more objects in framing scene；

Frame by frame follows the trail of one or more of objects from the frame of video, to recognize corresponding pixel groups, wherein described Corresponding pixel groups include one or more of objects in each frame；And

The media are carried out with Alpha-masking-out processing, one or more selected objects are isolated with frame by frame.

2. according to the method described in claim 1, wherein, the media are carried out with Alpha-masking-out processing includes：

The transparent masking-out matched with the shape from one or more of selected objects to framing scene is formed, to allow The video is played by the one or more holes created by the transparent masking-out, wherein, for each frame of the video The shape in one or more of holes in given scenario is updated, it is one or more of selected in the frame just played to match The shape of object；Or

One or more of selected objects in each frame form transparent masking-out, to allow by just being played described One or more of selected objects in frame copy to described to playing the video on framing scene.

3. according to the method described in claim 1, wherein, use without supervision figure cutting method to perform point to each frame of the video Cut.

4. according to the method described in claim 1, further comprise pixel depth information, to improve each frame in one of identification Or the segmentation of multiple objects.

5. method according to claim 4, further comprise producing using stereocamera or array camera described Pixel depth information.

6. according to the method described in claim 1, further comprise receiving the selection to one or more of objects from user.

7. method according to claim 6, further comprises by one or more of objects in framing The click of execution raps input to receive the selection of the user.

8. according to the method described in claim 1, further comprise receiving to one or more of before each frame is split The selection of object, wherein, only split selected one or more objects.

9. according to the method described in claim 1, further comprise receive the selection to the objects of one or more trackings it Before, follow the trail of one or more of objects.

10. the method described in any one in claim 1-9, further comprises producing rest image, wherein, it is described One or more selected objects come from and the frame different to framing.

11. the method described in any one in claim 1-9, further comprises producing video, wherein, it is one Or multiple selected objects do not start in sequence relative to framing.

12. the method described in any one in claim 1-9, further comprises producing visual media, wherein, only broadcast One or more of selected objects are put, while keeping the remainder transfixion to framing.

13. the method described in any one in claim 1-9, further comprises producing visual media, wherein it is possible to One or more of the particular frame of video object is selected, so that selected one or more objects are specific relative to described The remainder animation of frame.

14. a kind of video process apparatus, including：

For each frame of video to be divided into its semantic component, to be recognized based on corresponding pixel groups in each frame scene The module of one or more objects, wherein, the video is a part for media；

For receiving the module to the selection to one or more objects in framing scene；

One or more of objects are followed the trail of from the frame of video for frame by frame, to recognize the mould of corresponding pixel groups Block, wherein corresponding pixel groups include one or more of objects in each frame；And

For the media to be carried out with Alpha-masking-out processing, the module of one or more selected objects is isolated with frame by frame.

15. a kind of video processing equipment, including：

Processor；

Memory, it can be accessed by the processor；And

Application program, it is stored on the memory, and can be configured by the computing device, the application program For：

Receive to the selection to one or more objects in framing scene；

16. equipment according to claim 15, wherein, the media are carried out with Alpha-masking-out processing includes：

The transparent masking-out matched with the shape from one or more of selected objects to framing scene is formed, with Allow to play the video by the one or more holes created by the transparent masking-out, wherein, for the every of the video One frame updates the shape in one or more of holes in given scenario, one or more of in the frame just played to match The shape of selected objects；Or

One or more of selected objects in each frame form transparent masking-out, to allow by by the frame just played One or more of selected objects copy to it is described to playing the video on framing scene.

17. the equipment according to claim 15 or 16, further comprises the display for being operatively coupled to the processor Device and at least one input equipment for being operatively coupled to the processor, wherein, user can use described at least one Individual input equipment is selected to one or more of objects in framing scene.

18. the equipment according to claim 15 or 16, further comprises the touch-control panel type display for being coupled to the processor, Wherein, the touch screen is configured as inputting the selection received to one or more of objects from user.

19. a kind of method for processing video frequency, including：

Receive to the selection to one or more objects in framing scene；

Frame by frame follows the trail of one or more of objects from the frame of the video, to recognize corresponding pixel groups, wherein Corresponding pixel groups include one or more of objects in each frame；

The media are carried out with Alpha-masking-out processing, one or more selected objects are isolated with frame by frame；And

New media are produced using one or more of selected objects.

20. method according to claim 19, wherein, the media are carried out with Alpha-masking-out processing includes：

21. method according to claim 19, wherein, the new media include rest image, wherein, it is one or Multiple selected objects come from and the frame different to framing.

22. method according to claim 19, wherein, the new media include video, wherein, it is one or more of Selected objects do not start in sequence relative to framing.

23. method according to claim 19, wherein, the new media include visual media, wherein, only play described One or more selected objects, while keeping the remainder transfixion to framing.

24. method according to claim 19, wherein, the new media include visual media, wherein, institute can be selected One or more of particular frame of video object is stated, so that selected one or more objects are surplus relative to the particular frame Remaining part point animation.

25. a kind of video process apparatus, including：

One or more of objects are followed the trail of from the frame of the video for frame by frame, to recognize corresponding pixel groups Module, wherein corresponding pixel groups include one or more of objects in each frame；

For the media to be carried out with Alpha-masking-out processing, the module of one or more selected objects is isolated with frame by frame；With And

Module for producing new media using one or more of selected objects.