US10839573B2 - Apparatus, systems, and methods for integrating digital media content into other digital media content - Google Patents
Apparatus, systems, and methods for integrating digital media content into other digital media content Download PDFInfo
- Publication number
- US10839573B2 US10839573B2 US15/466,135 US201715466135A US10839573B2 US 10839573 B2 US10839573 B2 US 10839573B2 US 201715466135 A US201715466135 A US 201715466135A US 10839573 B2 US10839573 B2 US 10839573B2
- Authority
- US
- United States
- Prior art keywords
- digital content
- host region
- target digital
- host
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
- 238000000034 method Methods 0.000 title claims description 207
- 230000009466 transformation Effects 0.000 claims description 162
- 238000004590 computer program Methods 0.000 claims description 53
- 230000015654 memory Effects 0.000 claims description 51
- 238000013528 artificial neural network Methods 0.000 claims description 48
- 239000003550 marker Substances 0.000 claims description 46
- 230000003287 optical effect Effects 0.000 claims description 25
- 238000010801 machine learning Methods 0.000 claims description 24
- 230000011218 segmentation Effects 0.000 claims description 24
- 238000003384 imaging method Methods 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 8
- 230000003362 replicative effect Effects 0.000 claims description 2
- 230000010354 integration Effects 0.000 abstract description 169
- 238000004891 communication Methods 0.000 abstract description 22
- 230000002349 favourable effect Effects 0.000 abstract description 14
- 238000009826 distribution Methods 0.000 description 73
- 230000006870 function Effects 0.000 description 57
- 239000000463 material Substances 0.000 description 54
- 230000008569 process Effects 0.000 description 54
- 238000003860 storage Methods 0.000 description 43
- 230000004913 activation Effects 0.000 description 30
- 230000000007 visual effect Effects 0.000 description 30
- 238000011176 pooling Methods 0.000 description 26
- 230000001537 neural effect Effects 0.000 description 24
- 239000000872 buffer Substances 0.000 description 22
- 238000004422 calculation algorithm Methods 0.000 description 19
- 238000001514 detection method Methods 0.000 description 19
- 238000003062 neural network model Methods 0.000 description 19
- 239000011159 matrix material Substances 0.000 description 14
- 238000013459 approach Methods 0.000 description 13
- 238000012545 processing Methods 0.000 description 13
- 239000011449 brick Substances 0.000 description 11
- 238000012549 training Methods 0.000 description 11
- 238000003491 array Methods 0.000 description 10
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 9
- 238000000844 transformation Methods 0.000 description 9
- 230000002146 bilateral effect Effects 0.000 description 8
- 239000002023 wood Substances 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 7
- 238000013497 data interchange Methods 0.000 description 7
- 239000000203 mixture Substances 0.000 description 7
- 239000013598 vector Substances 0.000 description 7
- 230000008901 benefit Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 6
- 238000003708 edge detection Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000002045 lasting effect Effects 0.000 description 6
- 239000004575 stone Substances 0.000 description 6
- 208000006440 Open Bite Diseases 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000003709 image segmentation Methods 0.000 description 5
- 238000002156 mixing Methods 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 239000003086 colorant Substances 0.000 description 4
- 239000002245 particle Substances 0.000 description 4
- 230000001131 transforming effect Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 3
- 238000000151 deposition Methods 0.000 description 3
- 239000004744 fabric Substances 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 239000011521 glass Substances 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 239000010985 leather Substances 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 239000002184 metal Substances 0.000 description 3
- 229910052751 metal Inorganic materials 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 241000405217 Viola <butterfly> Species 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 230000003416 augmentation Effects 0.000 description 2
- 230000000295 complement effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000005305 interferometry Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 238000013518 transcription Methods 0.000 description 2
- 230000035897 transcription Effects 0.000 description 2
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 1
- FKOQWAUFKGFWLH-UHFFFAOYSA-M 3,6-bis[2-(1-methylpyridin-1-ium-4-yl)ethenyl]-9h-carbazole;diiodide Chemical compound [I-].[I-].C1=C[N+](C)=CC=C1C=CC1=CC=C(NC=2C3=CC(C=CC=4C=C[N+](C)=CC=4)=CC=2)C3=C1 FKOQWAUFKGFWLH-UHFFFAOYSA-M 0.000 description 1
- MDEHFNXAMDVESV-UHFFFAOYSA-N 3-methyl-5-(4-phenylphenyl)pentanoic acid Chemical compound C1=CC(CCC(C)CC(O)=O)=CC=C1C1=CC=CC=C1 MDEHFNXAMDVESV-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- -1 PhD thesis Chemical compound 0.000 description 1
- 241000270295 Serpentes Species 0.000 description 1
- 241000545067 Venus Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000003466 anti-cipated effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 210000005252 bulbus oculi Anatomy 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 229910052802 copper Inorganic materials 0.000 description 1
- 239000010949 copper Substances 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000012432 intermediate storage Methods 0.000 description 1
- 238000000050 ionisation spectroscopy Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000002187 spin decoupling employing ultra-broadband-inversion sequences generated via simulated annealing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0276—Advertisement creation
-
- G06K9/00765—
-
- G06K9/4642—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/60—Editing figures and text; Combining figures or text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/194—Segmentation; Edge detection involving foreground-background segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/262—Analysis of motion using transform domain methods, e.g. Fourier domain methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
- G06T7/41—Analysis of texture based on statistical description of texture
- G06T7/44—Analysis of texture based on statistical description of texture using image operators, e.g. filters, edge density metrics or local histograms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/536—Depth or shape recovery from perspective effects, e.g. by using vanishing points
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30204—Marker
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30244—Camera pose
Definitions
- Disclosed apparatus, systems, and computerized methods relate generally to integrating source digital content with target digital content.
- the apparatus can include a processor configured to run a computer program stored in memory.
- the computer program is operable to cause the processor to receive source digital content, receive target digital content and host region defining data associated with the target digital content, wherein the host region defining data specifies a location of a host region within the target digital content for integrating source digital content into the target digital content, and integrate the source digital content into the host region within the target digital content identified by the host region defining data.
- Some embodiments of the disclosed subject matter include a computerized method performed by a processor in a computing system.
- the computerized method includes receiving source digital content, receiving target digital content and host region defining data associated with the target digital content, wherein the host region defining data specifies a location of a host region within the target digital content for integrating source digital content into the target digital content, and integrating the source digital content into the host region within the target digital content identified by the host region defining data.
- Some embodiments of the disclosed subject matter include a non-transitory computer readable medium having executable instructions.
- the executable instructions are operable to cause a processor to receive source digital content, receive target digital content and host region defining data associated with the target digital content, wherein the host region defining data specifies a location of a host region within the target digital content for integrating source digital content into the target digital content, and integrate the source digital content into the host region within the target digital content identified by the host region defining data.
- the host region defining data further comprises a transformation object that specifies a transformation for the source digital content.
- the transformation comprises one or more transformations for replicating a motion, pose, luminance, texture, and/or a level of blur of the host region.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to implement the transformation on the source digital content prior to integrating the source digital content into the host region.
- the method further includes implementing the transformation on the source digital content prior to integrating the source digital content into the host region.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region from the target digital content. In some embodiments, the method further includes detecting the host region from the target digital content.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region from the target digital content based on a texture of the target digital content. In some embodiments, the method further includes detecting the host region from the target digital content based on a texture of the target digital content.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to use a neural network to determine the texture of the target digital content.
- the method further includes using a neural network to determine the texture of the target digital content.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region from the target digital content based on one or more of: (1) a level of variance in pixel values, (2) a background segmentation indicating that a pixel of the target digital content corresponds to a background of a scene, and/or (3) an object detected in the target digital content.
- the method further includes detecting the host region from the target digital content based on one or more of: (1) a level of variance in pixel values, (2) a background segmentation indicating that a pixel of the target digital content corresponds to a background of a scene, (3) an object detected in the target digital content, and/or (4) a neural network machine learning model trained on sample host regions.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to parse the target digital content comprising a plurality of frames into a plurality of scenes, wherein each scene comprises one or more interrelated frames in the target digital content, identify a first host region within a first frame corresponding to the first one of the plurality of scenes, and based on a location of the first host region in the first frame, identify a second host region within a second frame corresponding to the first one of the plurality of scenes.
- the method further includes parsing the target digital content comprising a plurality of frames into a plurality of scenes, wherein each scene comprises one or more interrelated frames in the target digital content, identifying a first host region within a first frame corresponding to the first one of the plurality of scenes, and based on a location of the first host region in the first frame, identifying a second host region within a second frame corresponding to the first one of the plurality of scenes.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to group the plurality of scenes into a first set of scenes and a second set of scenes, wherein the first set of scenes comprises scenes whose camera motion includes a translation less than a fixed percentage of a height or a width of a frame, and wherein the second set of scenes comprises scenes whose camera motion includes a translation greater than the fixed percentage of the height or the width of the frame.
- the method further includes grouping the plurality of scenes into a first set of scenes and a second set of scenes, wherein the first set of scenes comprises scenes whose camera motion includes a translation less than a fixed percentage of a height or a width of a frame, and wherein the second set of scenes comprises scenes whose camera motion includes a translation greater than the fixed percentage of the height or the width of the frame.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to group the plurality of scenes into a first set of scenes and a second set of scenes, wherein the first set of scenes comprises scenes whose camera motion includes a rotation less than a fixed value of degrees, and wherein the second set of scenes comprises scenes whose camera motion includes a rotation greater than the fixed value of degrees.
- the method further includes grouping the plurality of scenes into a first set of scenes and a second set of scenes, wherein the first set of scenes comprises scenes whose camera motion includes a rotation less than a fixed value of degrees, and wherein the second set of scenes comprises scenes whose camera motion includes a rotation greater than the fixed value of degrees.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region from the target digital content based on a marker identifying a preselected host region within the target digital content.
- the method further includes detecting the host region from the target digital content based on a marker identifying a preselected host region within the target digital content.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region by creating a gradient image of a frame of the target digital content and identifying a window within the gradient image in which a summation of the hypotenuse of the gradient pixel values within the window is less than a predetermined threshold.
- the method further includes detecting the host region by creating a gradient image of a frame of the target digital content and identifying a window within the gradient image in which a summation of the hypotenuse of the gradient pixel values within the window is less than a predetermined threshold.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect the host region by recognizing an object within the target digital content that is predetermined to be a host region.
- the method further includes detecting the host region by recognizing an object within the target digital content that is predetermined to be a host region.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to detect occlusion in one or more frames in the target digital content by merging a foreground mask and a luminance mask.
- the method further includes detecting occlusion in one or more frames in the target digital content by merging a foreground mask and a luminance mask.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to track the host region detected in a first frame of the target digital content across a plurality of frames in the target digital content using optical flow.
- the method further includes tracking the host region detected in a first frame of the target digital content across a plurality of frames in the target digital content using optical flow.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to track the host region detected in a first frame of the target digital content across a plurality of frames in the target digital content by tracking features associated with the host region using optical flow.
- the method further includes tracking the host region detected in a first frame of the target digital content across a plurality of frames in the target digital content by tracking features associated with the host region using optical flow.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to maintain a count of occluded pixels in the host region in each frame of the target digital content.
- the method further includes maintaining a count of occluded pixels in the host region in each frame of the target digital content.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to integrate the source digital content into the host region in real-time using a web-browser.
- the method further includes integrating the source digital content into the host region in real-time using a web-browser.
- Some embodiments of the disclosed subject matter include an apparatus.
- the apparatus includes a processor configured to run a computer program stored in memory.
- the computer program is operable to cause the processor to receive target digital content comprising a plurality of frames including a first frame and a second frame, wherein the plurality of frames is captured using an imaging device and the plurality of frames captures a surface, identify a relative motion between the imaging device and the surface over a duration of the plurality of frames based on an optical flow between the plurality of frames, determine a transformation to capture the relative motion between the first frame and the second frame, detect a first host region on the surface captured in the first frame based in part on a texture of the surface, identify a second host region in the second frame based in part on a location of the first host region in the first frame and the transformation, and create a host region defining data associated with the target digital content, wherein the host region defining data includes a first location of the first host region in the first frame and a second location of the second host region in the second frame
- Some embodiments of the disclosed subject matter include a computerized method performed by a processor in a computing system.
- the computerized method includes receiving target digital content comprising a plurality of frames including a first frame and a second frame, wherein the plurality of frames is captured using an imaging device and the plurality of frames captures a surface, identifying a relative motion between the imaging device and the surface over a duration of the plurality of frames based on an optical flow between the plurality of frames, determining a transformation to capture the relative motion between the first frame and the second frame, detecting a first host region on the surface captured in the first frame based in part on a texture of the surface, identifying a second host region in the second frame based in part on a location of the first host region in the first frame and the transformation, and creating a host region defining data associated with the target digital content, wherein the host region defining data includes a first location of the first host region in the first frame and a second location of the second host region in the second frame.
- Some embodiments of the disclosed subject matter include a non-transitory computer readable medium having executable instructions.
- the executable instructions are operable to cause a processor to receive target digital content comprising a plurality of frames including a first frame and a second frame, wherein the plurality of frames is captured using an imaging device and the plurality of frames captures a surface, identify a relative motion between the imaging device and the surface over a duration of the plurality of frames based on an optical flow between the plurality of frames, determine a transformation to capture the relative motion between the first frame and the second frame, detect a first host region on the surface captured in the first frame based in part on a texture of the surface, identify a second host region in the second frame based in part on a location of the first host region in the first frame and the transformation, and create a host region defining data associated with the target digital content, wherein the host region defining data includes a first location of the first host region in the first frame and a second location of the second host region in the second frame.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to determine an occlusion mask corresponding to the second frame, wherein a value of a pixel in the occlusion mask indicates that a corresponding pixel in the second frame is occluded.
- the method further includes determining an occlusion mask corresponding to the second frame, wherein a value of a pixel in the occlusion mask indicates that a corresponding pixel in the second frame is occluded.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to retrieve the target digital content and the host region defining data associated with the target digital content, receive source digital content, and integrate the source digital content into the first host region and the second host region within the target digital content identified by the host region defining data.
- the method further includes retrieving the target digital content and the host region defining data associated with the target digital content, receiving source digital content, and integrating the source digital content into the first host region and the second host region within the target digital content identified by the host region defining data.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to determine a depth and a surface normal of a surface associated with the host region to determine a pose transformation object for the host region.
- the method further includes determining a depth and a surface normal of a surface associated with the host region to determine a pose transformation object for the host region.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to use a neural network to predict the depth and the surface normal of the surface associated with the host region to identify a background region for host region identification.
- the method further includes using a neural network to predict the depth and the surface normal of the surface associated with the host region to identify a background region for host region identification.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to use a neural network to predict the depth and the surface normal of the surface associated with the host region to group scenes based on a camera positioning.
- the method further includes using a neural network to predict the depth and the surface normal of the surface associated with the host region to group scenes based on a camera positioning.
- the computer program in the apparatus and/or the executable instructions in the non-transitory computer readable medium are operable to cause a processor to determine a motion category for the target digital content and integrate the source digital content into the target digital content based on a process exclusively tailored to the motion category.
- FIG. 1 illustrates a content integration system in accordance with some embodiments.
- FIG. 2 illustrates an operation of a content integration system in accordance with some embodiments.
- FIGS. 3A-3U illustrate exemplary source digital content, target digital content, and integrated digital content integrated using the content integration system in accordance with some embodiments.
- FIG. 4 illustrates duplicate space and camera position recognition as performed by a host region identification module in accordance with some embodiments.
- FIG. 5 illustrates estimation of depth using a neural network in a scene recognition module in accordance with some embodiments.
- FIG. 6 illustrates use of a neural network model to predict the depth of pixels of one or more frames in accordance with some embodiments.
- FIG. 7 illustrates identification of texture using a neural network in accordance with some embodiments.
- FIG. 8 illustrates use of a neural network model to predict the texture of pixels of one more frames in accordance with some embodiments.
- FIG. 9 illustrates reduction of time complexity of duplicate space and camera position recognition through use of a hash function in accordance with some embodiments.
- FIG. 10 illustrates duplicate space recognition as performed by a scene recognition module in accordance with some embodiments.
- FIG. 11 illustrates a system in which a camera motion classification module is co-located with a scene recognition model in accordance with some embodiments.
- FIG. 12 illustrates use of a machine learning classifier to achieve camera motion classification in accordance with some embodiments.
- FIGS. 13A-13B illustrate an embodiment in which the marker is a graphic in accordance with some embodiments.
- FIG. 14 illustrates a procedure for finding maximally sized rectangles in each frame of target digital content in accordance with some embodiments.
- FIG. 15 illustrates identification of host regions based on their absence of edges or texture as performed by a host region identification module in accordance with some embodiments.
- FIG. 16 illustrates identification of host regions by inputting frames of the target digital content through a neural network model in accordance with some embodiments.
- FIGS. 17A-17D illustrate an embodiment in which a host region is identified through selection in a graphical user interface in accordance with some embodiments.
- FIG. 18 illustrates a system of neural networks of varying coarseness that is designed to transform a source digital content using the depth map and normals in accordance with some embodiments.
- FIG. 19 illustrates determination of a foreground mask transformation object using background subtraction as performed by a host region identification module in accordance with some embodiments.
- FIG. 20 illustrates determination of a foreground mask transformation object using background subtraction, and, in parallel, creation of a luminance mask as performed by a host region identification module in accordance with some embodiments.
- FIG. 21 illustrates determination of a foreground mask transformation object using depth information as performed by a host region identification module in accordance with some embodiments.
- FIG. 22 illustrates improvement of a foreground mask transformation object by removing noise and outliers as performed by a host region identification module in accordance with some embodiments.
- FIG. 23 illustrates determination of a luminance transformation object by a host region identification module in accordance with some embodiments.
- FIG. 24 illustrates a system in which the source digital content is integrated into the target digital content using an overlay method in accordance with some embodiments.
- FIG. 25 illustrates a system in which the source digital content is integrated into the target digital content using an overlay method in accordance with some embodiments.
- FIG. 26 illustrates a system in which the source digital content is integrated into the target digital content using a versioning method in accordance with some embodiments.
- FIG. 27 illustrates outline an embodiment where content integration is implemented using a versioning method in a content integration module in accordance with some embodiments.
- the content integration system is configured to retrieve a source digital content, retrieve a target digital content, identify a region within the target digital content for integrating the source digital content, and integrate the source digital content onto the identified region of the target digital content.
- the content integration system can retrieve an advertisement, retrieve one or more frames from a video, identify a region within one or more of those frames for integrating the advertisement, and integrate the advertisement onto the identified region.
- the content integration system can be configured to place the source digital content into the target digital content in an aesthetically-pleasing, unobtrusive, engaging, and/or otherwise favorable manner.
- the content integration system can be particularly useful for advertising or enhanced expression, entertainment, information, or communication.
- the source digital content and/or the target digital content include digital content designed for visual display.
- the source digital content and/or the target digital content include digital photographs, illustrations, one or more frames in a video (whether streaming or file and whether two dimensional, 360 degrees, or spherical), animations, video games, graphics displays, augmented reality, mixed reality, and/or virtual reality experiences.
- the content integration system can be configured to identify one or more regions—e.g., sets of one or more contiguous pixels—in or around the target digital content where source digital content can be placed (“host regions”).
- the host region includes one or more contiguous pixels that satisfy predetermined criteria.
- the predetermined criteria can be determined such that placing source digital content upon the corresponding contiguous pixels enhances, rather than detracts from, the viewer experience.
- the predetermined criteria can include, for example, a lack of variation in pixel values, an absence of edges or texture, an indication that these pixels are for the background rather than the foreground of the scene inside the target digital content, an indication that these pixels occupy an area of low visual saliency, or an indication, by a machine learning model trained on past examples of preferable regions, that the pixels represent a preferable host region.
- the content integration system can include a host region identification module that is configured to identify a host region based on one or more predetermined criteria.
- the host region identification module can be configured to identify, as host regions, (1) regions having a predetermined level of uniformity, (2) regions that represent the background (as opposed to the foreground), (3) regions that represent particular objects, textures, materials, shapes, places, spaces, or areas, or (4) regions that a machine learning model, trained on example host regions, classifies as host regions.
- the host region identification module is configured to identify a host region in target digital content by detecting a predetermined marker.
- the predetermined marker can be a graphical representation indicative of a preselected host region.
- the predetermined marker can be inserted into the target digital content by, for example, a user.
- the host region identification module can be configured to enable a user to select a host region within a target digital content.
- the host region identification module can be configured to receive a selection of a sub-section of a target digital content identified by a graphics tool.
- the host region identification module can be configured to assist the selection of a host region within a target digital content.
- the content integration system can provide host region candidates from which a host region can be selected.
- the content integration system is configured to (1) parse a digital content, such as a video, into scenes, (2) classify each scene based on the type and/or level of a camera motion corresponding to the scenes, and (3) find the host regions using different approaches based on the type of camera motion in each scene.
- a digital content such as a video
- the content integration system includes a scene recognition module that is configured to automatically parse a target digital content into scenes so that host region identification can be performed for each of these scenes.
- a scene can include, for example, a series of interrelated and/or consecutive frames, a continuous action in time, and/or a contiguous physical space.
- the scene recognition module can be configured to automatically classify one or more scenes that compose the target digital content according to their type or level of camera motion. For example, the scene recognition module can classify scenes lacking camera motion, scenes whose camera motion involves translation of no more than 20% of the height or width of the frame and rotation of no more than 5° (“minimal camera motion”), or scenes with camera motion involving either translation of more than 20% of the height or width of the frame or rotation of more than 5° (“maximal camera motion”). Subsequently, the scene recognition module can provide the classification information to the host region identification module so that the host region identification module can detect one or more host regions based on the classification information.
- the host region identification module can include sub-modules that are specialized for different types or levels of camera motion.
- the scene recognition module can provide scene classification information corresponding to a particular scene to a particular sub-module associated with the particular scene.
- the scene recognition module provides scene classification information corresponding to scenes without camera motion to a host region identification sub-module whose process is tailored for that type of camera motion.
- the scene recognition module provides scene classification information corresponding to scenes with maximal camera motion to a sub-module whose process is tailored for scenes with that type of motion.
- the scene recognition module is configured to use a machine learning model, trained on samples of digital content labelled according to their type and/or level of camera motion scenes, in order to perform the classification of the type or level of camera motion in a given scene of the target digital content.
- a host region identification module in the content integration system can be configured to search for one or more host regions in part based on texture of a region.
- the texture of a region can be measured, in part, based on a uniformity of pixel values (e.g., the absence of edges) in that region for one or more frames of the target digital content.
- the host region identification module is configured to identify host regions through the use of a machine learning system.
- the host region identification module can include, for example, one or more machine learning-based classifiers, such as a convolutional neural network, support vector machine, or random forest classifier, that are configured to determine whether a texture of a region in a target digital content is sufficiently bland and/or uniform so that the region could be classified as a host region.
- the machine learning system can be trained using a training set of samples of digital content reflecting textures which are deemed as suitable for hosting a source digital content.
- samples of digital content can include samples of digital content reflecting brick walls, painted walls, and/or sky textures.
- This dataset can be collected by manually collecting samples of digital content that feature these textures and then manually demarcating the location of the texture in the content (either by cropping the digital content to those regions or capturing the location—e.g., to yield the best result, with the coordinates of a polygon bounding the location—as a feature).
- the texture of a region can be modeled in part based on a gradient distribution within a region.
- the host region identification module can be configured to search for a region, within a target digital content, with a similar gradient distribution.
- the host region identification module can be configured to search for a maximal contiguous region within which the gradient distribution is uniform.
- the host region identification module can be configured to search for a maximal contiguous region within which none of the pixels has a gradient magnitude greater than a predetermined threshold.
- the host region identification module is configured to use a seed-growing or a region-growing image segmentation technique to identify a region with similar gradient characteristics.
- the gradient at a pixel can be computed by convolving a gradient filter with the pixel (and its neighboring pixels, based on the size of the filter).
- the host region identification module can be configured to search for one or more host regions in part by (1) creating a gradient image of one or more frames of the target digital content and (2) finding a window within that image whose diagonal pixel values (i.e., those along the hypotenuse), once summed, fall below a predetermined threshold. A sum of diagonal values that falls below a predetermined threshold suggests that the window lacks edges (and thus exhibits uniformity of pixel values).
- the host region identification module can be configured to identify host regions by searching digital content such as an image for one or more background regions (e.g., sets of one or more pixels representing the background, rather than the foreground, of the scene depicted by the content). This is based on an empirical observation that background regions are commonly of less interest to viewers than foreground regions and thus represent a preferred region for hosting source digital content.
- the host region identification module can maintain a machine learning system that is configured to determine whether a region corresponds to a background or not.
- the content integration system is configured to represent a host region using a predetermined data structure or an object.
- the content integration system is configured to represent a host region using a data structure including a predetermined dimension (e.g., a height dimension, a width dimension).
- a predetermined dimension e.g., a height dimension, a width dimension.
- the content integration system is configured to determine a surface orientation (e.g., a normal vector of a surface) of the host region in the target digital content. Subsequently, the content integration system can use the surface orientation information to transform (e.g., morph) the source digital content to have the same surface orientation. Then the content integration system can integrate the transformed source digital content into the target digital content to reduce visual artifacts.
- a surface orientation e.g., a normal vector of a surface
- transform e.g., morph
- the content integration system is configured to recognize one or more objects in target digital content and save the recognition result as a host region defining data.
- the host region defining data can indicate, for example, that a particular type of object has been recognized in the target digital content and, optionally, the location (e.g., coordinate) of the recognized object in the target digital content.
- the content integration system can also maintain an association between the host region defining data and source digital content that can be placed upon the object associated with the host region defining data.
- the host region defining data corresponding to a wall can be associated with source digital content corresponding to a company logo.
- the content integration system is configured to maintain the association using a table and/or a database.
- the content integration system includes a source digital content selection module that is configured to select the source digital content. In some embodiments, the content integration system also includes a content integration module that is configured to integrate the source digital content into the target digital content. In one example, the content integration module is configured to integrate the source digital content into the target digital content by placing the source digital content in a host region of the target digital content. In another example, the content integration module is configured to integrate the source digital content into the target digital content by overlaying the source digital content over a host region of the target digital content. In another example, the content integration module is configured to integrate the source digital content into the target digital content by blending the source digital content into a host region of the target digital content.
- the source digital content selection module and the content integration module can operate in real time.
- the source digital content selection module is configured to select the source digital content and provide it to a content integration module including a web browser so that the browser can integrate the source digital content into the target digital content in real time, while it is being viewed.
- the content integration module is configured to detect occlusion in the target digital content.
- the content integration module is configured to create a foreground mask that is enhanced by a luminance mask.
- the content integration module is configured to (1) choose, as a background and luminance model, a cropped instance of the host region from one frame (e.g. the first frame) of the target digital content, (2) apply a bilateral filter to the pixel values of this model to remove noise from that region, and (3) convert the model into both RGB and Lab color spaces.
- the content integration module is configured to create background and luminance masks for cropped instances of the host region.
- the content integration module is configured to (1) apply a bilateral filter to the pixel values to remove noise, (2) convert the values into both RGB and Lab color spaces, (3) compute the absolute value of the difference of the resulting values and the corresponding values of the model and, (4) where this absolute value exceeds a predetermined threshold that has been set for each mask, add the value indicating either occlusion or a luminance change to the respective mask. Then, the content integration module is configured to combine these two masks (e.g., using an AND operator), creating a merged mask that captures occluding objects but not mere luminance changes.
- the content integration system is configured to use a modified version of the flood fill process to improve foreground mask or combined foreground and luminance mask transformation objects.
- the content integration system is configured to recreate the luminance (and thus the texture and luminance changes) of the target digital content in the source digital content or its placement by converting the pixel values of the host region in each frame of the target digital content to Lab, subtracting each of these values from 255, and then adding the resulting number to each corresponding frame of the source digital content.
- the disclosed content integration system can provide a scalable computational mechanism to automatically and artfully enhance creativity, expression, or utility in target digital content.
- the content integration system can also be useful in advertising related applications.
- the content integration system can provide a computational mechanism to place advertisements into a target digital content in an unobtrusive, seamless manner. This is a way to advertise on target digital content that is impervious to avoidance via time-shifting and/or, depending on the implementation of the placement, avoidance via ad-blocking technology.
- this method creates entirely new advertising space inside new or vintage digital content at a time when such advertising space is in high demand and, further, makes that space available, when so desired, for the targeted and programmatic serving of ads at high scale, even in real time.
- the benefit of the disclosed content integration system is the seamless integration of digital contents, such as advertisements, text, or other augmentation, into unobtrusive regions inside a target digital content in an automated or semi-automated fashion and, potentially, in a standardized fashion.
- digital contents such as advertisements, text, or other augmentation
- This system satisfies that demand by providing a method for integrating source digital content into target digital content in an unobtrusive way. Further, by removing most or all human involvement from the process and by standardizing the resulting advertisements or augmentations, this method allows for their placement at high scale and programmatically.
- FIG. 1 illustrates a content integration system 100 in accordance with some embodiments.
- the content integration system 100 can include one or more processors 102 , a memory device 104 , a scene recognition module 106 , a camera motion classification module 108 , a host region identification module 110 , a host region approval module 112 , a distribution module 114 , a storage module 116 , a source digital content selection module 118 , a content integration module 120 , and an interface 122 .
- the one or more processors 102 can execute machine executable instructions.
- the one or more processors 102 can be implemented in hardware using an application specific integrated circuit (ASIC), programmable logic array (PLA), digital signal processor (DSP), field programmable gate array (FPGA), or any other integrated circuit.
- ASIC application specific integrated circuit
- PDA programmable logic array
- DSP digital signal processor
- FPGA field programmable gate array
- the processors 102 suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, digital signal processors, and any one or more processors of any kind of digital computer.
- the one or more processors 102 receive instructions and data from a read-only memory or a random access memory or both.
- the memory device 104 can store instructions and/or data.
- the one or more memory modules in the memory device 104 can be a non-transitory computer readable medium, such as a dynamic random access memory (DRAM), a static random access memory (SRAM), flash memory, a magnetic disk drive, an optical drive, a programmable read-only memory (PROM), a read-only memory (ROM), and/or any other memory or combination of memories.
- the memory device 104 can be used to temporarily store data.
- the memory device 104 can also be used for long-term data storage.
- the one or more processors 102 and the memory device 104 can be supplemented by and/or incorporated into special purpose logic circuitry.
- the scene recognition module 106 is configured to identify, in an automated fashion, the scenes that compose a digital content. For example, the scene recognition module 106 is configured to accept as input the target digital content. Then, using various methods, the scene recognition module 106 is configured to automatically identify one or more scenes inside that input content as well as those scenes that reflect the same physical space and/or camera positioning. The scene recognition module 106 is configured to output a list of frames composing the scenes that, in turn, compose the target digital content, possibly grouped by those that reflect the same physical space and/or camera positioning (e.g., the physical space depicted by the scene—though not necessarily the objects and people inside that physical space—as well as the positioning of the camera or point of view inside that space are identical across the scenes).
- the scene recognition module 106 is configured to identify, in an automated fashion, the scenes that compose a digital content. For example, the scene recognition module 106 is configured to accept as input the target digital content. Then, using various methods, the scene recognition module 106 is configured to automatically identify one or more scenes inside
- the host region identification module 110 is configured to detect a host region from input digital content, such as the target digital content.
- the host region identification module 110 is configured to accept as input the target digital content.
- a computerized search for regions that reflect some level of uniformity e.g., a smooth region
- a computerized search for regions that represent the background as opposed to the foreground
- a computerized search for regions that represent particular objects, textures, materials, shapes, places, spaces, or areas a computerized search for regions that represent particular objects, textures, materials, shapes, places, spaces, or areas, or the prediction of host regions through use of machine learning models trained on a training set
- the host region identification module 110 is configured to identify, in the target digital content, host regions for source digital content.
- the host region identification module 110 includes a plurality of host region identification sub-modules.
- Each sub-module can be dedicated towards identifying host regions in scenes with different types of camera motion (e.g., submodules for no, minimal, and maximal camera motion, or associated with a particular type of host region identification criteria.
- a host region identification sub-module can be configured to detect only host regions with a smooth surface.
- a host region identification sub-module can be configured to detect only host regions with highly-textured surface. In this way, the search for host regions can be distributed and parallelized among specialized sub-modules.
- the host region identification module 110 can be configured to provide host region defining data that defines the dimension and/or location of a host region in the target digital content, and/or transformation objects that define the transformations that can take place for the source digital content to seamlessly integrate with the host region in the target digital content.
- the transformations can include a geometric transformation that morphs the source digital content appropriately onto a surface in the target digital content. The transformation can take into account a relationship between the surface normal of the source digital content and the surface normal of the host region within the target digital content.
- the host region identification module 110 can track the host region across the duration of the source digital content (e.g., video frames across a video stream), create one or more transformation objects associated with the host region, including but not limited to one or more transformations that enables the eventual content integration to reflect the location, motion, pose, occlusion, lighting change, texture, and blur that affect the host region in the target digital content.
- the source digital content e.g., video frames across a video stream
- create one or more transformation objects associated with the host region including but not limited to one or more transformations that enables the eventual content integration to reflect the location, motion, pose, occlusion, lighting change, texture, and blur that affect the host region in the target digital content.
- the transformation objects can include, for example, masks, filters, kernels, homography or other matrices, images, arrays, lists of coordinates or other objects or data structures that enable a placement of the source digital content to emulate the location, motion, pose, luminance, texture, and/or level of blur of the surface, texture, material, plane, object, place, space, location, or area which is associated the host region and, thus, to appear more immersed in the target digital content, improving viewer experience.
- the camera motion classification module 108 is configured to classify, in an automated fashion, the scenes that compose the target digital content according to their level of camera motion.
- the camera motion classification module 108 is configured to accept as input the frames representing one or more of the scenes of the target digital content that have been identified by the scene recognition module 106 .
- the camera motion classification module 108 is configured to use a machine learning model, trained on samples of content with different types and levels of camera motion in digital content, to predict the type or level of camera motion in a given scene of the target digital content.
- the camera motion classification module 108 can use the classification information to distribute the host region identification operation to two or more host region identification sub-modules that are specifically designed to handle target digital content or scenes corresponding to a particular type or level of camera motion (e.g., scenes without camera motion are distributed to a host region identification sub-module tailored for that level of motion, while scenes with maximal camera motion are delivered to a sub-module tailored for that level of motion, and so on).
- two or more host region identification sub-modules that are specifically designed to handle target digital content or scenes corresponding to a particular type or level of camera motion (e.g., scenes without camera motion are distributed to a host region identification sub-module tailored for that level of motion, while scenes with maximal camera motion are delivered to a sub-module tailored for that level of motion, and so on).
- the distribution module 114 is configured to accept the source digital content as an input and, from thereon, to coordinate the communication of some or all of various modules in the content integration system.
- the distribution module 114 is configured to accept as input the target digital content and coordinate the communication between the scene recognition module 106 , the camera motion classification module 108 , the host region identification module 110 , the storage module 116 , and/or the content integration module 120 .
- the distribution module 114 is configured to relay the target digital content to the scene recognition module 106 .
- the distribution module 114 can coordinate the storage of the resulting host region defining data, host region objects (e.g., one or more data structures or object specific to the host region), the target digital content, and/or metadata attached to the target digital content (including, for example, the duration of the target digital content, pixel value histogram, mean or average pixel values, audio transcription and/or text, optical character recognition-derived text, creator/publisher (e.g., name, audience size, history of source digital content placements, past target digital content subject matter, and preferred advertisers), display channel, platform, or device (e.g., name, audience size, display size), current or predicted number of views (or other indications of popularity), subject matter, setting, and/or the objects, people, textures, materials, shapes, locations, and activities that it depicts) until
- Such storage may be local to itself and, in the case where the content integration module 120 is co-located with or belongs to a digital media or video hosting website or social network, may be in the source code for digital content and/or web pages that the distribution module 114 delivers to users who request the target digital content and/or web pages.
- the storage module 116 is configured to store the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content until such time that source digital content—to be placed upon the host region in the target digital content—is selected.
- the storage module 116 is configured to receive the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content from the distribution module 114 and/or the host region identification module 110 and store it until such time that a request for the target digital content to be viewed is made.
- the storage module 116 is configured to transmit, to a content integration module 120 , a message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content.
- the source digital content selection module 118 is configured to select the source digital content or receive the selection of the source digital content to be placed upon the host region in the target digital content.
- the source digital content selection module 118 is configured to select or enable the selection of source digital content to place upon the host region in the target digital content relying on methods including but not limited to receiving a selection message from a user, selection via buying, ordering, or bidding in a marketplace, and computerized or programmatic selection based on the host region defining data, host region object, the target digital content, and/or metadata about the target digital content.
- source digital content selection module 118 is configured to deliver a message containing the source digital content that it has selected, data about the source digital content, or some other indication of the source digital content selection that has been made to either the distribution module 114 or the content integration module 120 .
- the content integration module 120 is configured to integrate a source digital content into a host region in the target digital content.
- the content integration module 120 is configured to accept as input the target digital content, the source digital content, host region-defining data that defines the dimension and location of the host region in the target digital content, and/or transformation objects that define the transformations that can take place for the source digital content to seamlessly integrate with the host region in the target digital content. Subsequently, the content integration module 120 is configured to integrate the source digital content into the target digital content.
- the content integration module 120 is configured to integrate the source digital content into the target digital content using one or more assorted methods. For example, the content integration module 120 can be configured to create a new version of the target digital content.
- the content integration module 120 can be configured to overlay the source digital content over the target digital content during the display of the target digital content to a viewer, doing so by relying on the guidance provided by the host region-defining data that defines the dimension and location of host region in the target digital content, and/or transformation objects that define the transformations for the target digital content to seamlessly integrate with the host region in the target digital content.
- the scene recognition module 106 , the camera motion classification module 108 , the host region identification module 110 , the host region approval module 112 , the distribution module 114 , the storage module 116 , the source digital content selection module 118 , and/or the content integration module 120 can be implemented in software.
- the software can run on a processor 102 capable of executing computer instructions or computer code.
- the scene recognition module 106 , the camera motion classification module 108 , the host region identification module 110 , the host region approval module 112 , the distribution module 114 , the storage module 116 , the source digital content selection module 118 , and/or the content integration module 120 can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them.
- the implementation can be as a computer program product, e.g., a computer program tangibly embodied in a machine-readable storage device, for execution by, or to control the operation of, a data processing apparatus, e.g., a programmable processor, a computer, and/or multiple computers.
- a computer program can be written in any form of computer or programming language, including source code, compiled code, interpreted code and/or machine code, and the computer program can be deployed in any form, including as a stand-alone program or as a subroutine, element, or other unit suitable for use in a computing environment.
- a computer program can be deployed to be executed on one computer or on multiple computers at one or more sites.
- two or more modules 106 - 120 can be implemented on the same integrated circuit, such as ASIC, PLA, DSP, or FPGA, thereby forming a system on chip.
- Subroutines can refer to portions of the computer program and/or the processor/special circuitry that implement one or more functions.
- the interface 122 is configured to provide communication between the content integration system 100 and other computing devices in a communications network.
- the interface 122 can be implemented in hardware to send and receive signals in a variety of mediums, such as optical, copper, and/or wireless interfaces, and in a number of different protocols, some of which may be non-transient.
- the content integration system 100 can be operatively coupled to external equipment or to a communications network in order to receive instructions and/or data from the equipment or network and/or to transfer instructions and/or data to the equipment or network.
- Computer-readable storage devices suitable for embodying computer program instructions and data include all forms of volatile and non-volatile memory, including by way of example semiconductor memory devices, e.g., DRAM, SRAM, EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and optical disks, e.g., CD, DVD, HD-DVD, and Blu-ray disks.
- the content integration system 100 can include user equipment.
- the user equipment can communicate with one or more radio access networks and with wired communication networks.
- the user equipment can be a cellular phone.
- the user equipment can also be a smart phone providing services such as word processing, web browsing, gaming, e-book capabilities, an operating system, and a full keyboard.
- the user equipment can also be a tablet computer providing network access and most of the services provided by a smart phone.
- the user equipment operates using an operating system such as Symbian OS, iPhone OS, RIM's Blackberry, Windows Mobile, Linux, HP WebOS, and Android.
- the screen might be a touch screen that is used to input data to the mobile device, in which case the screen can be used instead of the full keyboard.
- the user equipment can also keep global positioning coordinates, profile information, or other location information.
- the content integration system 100 can include a server.
- the server can operate using operating system (OS) software.
- OS operating system
- the OS software is based on a Linux software kernel and runs specific applications in the server such as monitoring tasks and providing protocol stacks.
- the OS software allows server resources to be allocated separately for control and data paths. For example, certain packet accelerator cards and packet services cards are dedicated to performing routing or security control functions, while other packet accelerator cards/packet services cards are dedicated to processing user session traffic. As network requirements change, hardware resources can be dynamically deployed to meet the requirements in some embodiments.
- FIG. 2 illustrates an operation of a content integration system 100 in accordance with some embodiments.
- the content integration system 100 can receive the target digital content and optionally use the scene recognition module 106 to recognize one or more scenes in the target digital content.
- the content integration system 100 can optionally use a camera motion classification module 108 to classify the scene(s) detected In step 202 into one of a plurality of scene categories.
- the content integration system 100 can use a host region identification module 110 to identify a host region within the target digital content.
- the host region identification module 110 can use a variety of features to identify a host region, including, for example, the texture information.
- the host region identification module 110 can include a plurality of host region identification sub-modules, each of which is dedicated to detecting a host region within a particular type of scene.
- the host region identification module 110 can be configured to track the host region, detected in one of the frames, across multiple frames. For example, when the host region identification module 110 detects a first host region in a first frame, the host region identification module 110 can be configured to track the first host region across frames to detect a second host region in a second frame. Also, optionally, the host region identification module 110 can be configured to estimate a pose (e.g., a surface normal) of the detected host region. In step 210 , the host region identification module 110 can handle the occlusion, luminance, texture, and/or the blur within the target digital content.
- a pose e.g., a surface normal
- the content integration system 100 can receive a source digital content, and, optionally, select a portion of the received source digital content for integration into the target digital content.
- the content integration system 100 can use a content integration module 120 to integrate the source digital content (or a portion thereof) into the target digital content.
- the content integration module 120 can overlay or place the source digital content onto the detected host region of the target digital content.
- FIGS. 3A-3U illustrate exemplary source digital content, target digital content, and integrated digital content integrated using the content integration system in accordance with some embodiments.
- FIG. 3A illustrates a first frame of target digital content including a video.
- FIG. 3B illustrates a second frame of the target digital content, this one occurring sometime after the first frame illustrated in FIG. 3A in the sequence of frames comprising the video.
- FIG. 3C illustrates two host regions, demarcated by rectilinear bounding boxes, as identified in the first frame.
- FIG. 3D illustrates two host regions, defined and demarcated by rectilinear bounding boxes, as identified in the second frame.
- FIGS. 3E-3G illustrate single frames of source digital content including Portable Network Graphics (PNG) raster graphics files, with 3 E and 3 F depicting advertisement images and 3 G depicting a non-advertising image.
- FIG. 3H-3J illustrate the placement of each instance of the source digital content upon one of the two host regions in the first frame, after the source digital content has been transformed so as to a reflect the motion, positioning, pose, occlusion, luminance, texture, and blur of the host regions as they existed in the target digital content.
- 3K-3M illustrate the placement of each instance of the source digital content upon one of the two host regions in the second frame, after the source digital content has been transformed so as to reflect the occlusion, luminance, texture, and blur of the host regions as they existed in the target digital content.
- FIG. 3N illustrates a single frame of target digital content including a video in which a graphical marker has been placed.
- FIG. 3O illustrates a host region, defined and demarcated by a rectilinear bounding box, that have been identified in the first frame by detecting the marker.
- FIG. 3P illustrates source digital content including Portable Network Graphics (PNG) raster graphics files depicting an advertisement image.
- FIG. 3Q illustrates the placement of the source digital content upon one of the two host regions in the first frame, after the source digital content has been transformed so as to reflect the occlusion, luminance, texture, and blur of the host regions as they existed in the target digital content.
- PNG Portable Network Graphics
- FIG. 3R illustrates a first frame of target digital content including three-dimensional virtual reality content.
- FIG. 3S illustrates two host regions, defined and demarcated by 3D bounding boxes, at have been identified in that first frame using non-marker-based methods.
- FIG. 3T illustrates a single frame of source digital content including a 3D illustration of a product.
- FIG. 3U illustrates the placement of the source digital content upon one of the two host regions in the target digital content, after the source digital content has been transformed so as to reflect the occlusion, luminance, texture, and blur of the host regions as they existed in the target digital content.
- the present disclosed subject matter is exemplified in or may be practiced by any digital content intended for visual display, including but not limited to digital photographs, illustrations, videos (whether streaming or file and whether two dimensional, 360 degrees, or spherical), animations, video games, graphics displays, augmented reality, mixed reality, and virtual reality experiences (the “target digital content”).
- digital content intended for visual display, including but not limited to digital photographs, illustrations, videos (whether streaming or file and whether two dimensional, 360 degrees, or spherical), animations, video games, graphics displays, augmented reality, mixed reality, and virtual reality experiences (the “target digital content”).
- the original digital content will exist as a file, data, or some other discrete or continuous and streaming entity (“file”), itself composed of one or more frames (e.g., still images, to be displayed in succession) or other states to be displayed at given points in time (collectively, “frames”).
- file itself composed of one or more frames (e.g., still images, to be displayed in succession) or other states to be displayed at given points in time (collectively, “frames”).
- Each of its frames can further be composed of individual pixels, dots, image points, or other smallest addressable elements (collectively, “pixels”).
- the content integration system can be configured to extract certain features or attributes.
- These features or attributes can include, but are not limited to, the following examples: (A) pixel values including but not limited to color, brightness, luminance, hue, radiance, lightness, colorfulness, chroma, intensity, saturation, or depth, as well as localized histograms or other aggregations of the same (collectively, “pixel values”); (B) values derived from said pixel values, including but not limited to approximations of the magnitude of the gradients of the image intensity function (“gradient”) as extracted through the convolution of the image using a kernel, including but not limited to the Sobel operator, as described in Sobel & Feldman, Isotropic 3 ⁇ 3 Image Gradient Operator , SAIL (1968), herein incorporated by reference in the entirety, or the Prewit
- Patent application 2009238460 herein incorporated by reference in the entirety, GLOH, including but not limited to the method described in Mikolajczyk & Schmid, A Performance Evaluation of Local Descriptors , TPAMI (2005), herein incorporated by reference in the entirety, HOG, including but not limited to the method described in Dalal & Triggs, Histograms of Oriented Gradients for Human Detection , CVPR (2005), herein incorporated by reference in the entirety, or ORB, including but not limited to the method described in Rublee et al., Orb: An Efficient Alternative to SIFT or SURF , ICCV (2011), herein incorporated by reference in the entirety; (F) edge features derived through: (i) Canny Edge Detection as described in Canny, A Computational Approach To Edge Detection , TPAMI (1986), herein incorporated by reference in the entirety; (ii) Deriche edge detection as described in Deriche, Using Canny's Criteria
- contour feature data extracted by methods including but not limited to the method described in Deguchi, Multi - scale Curvatures for Contour Feature Extraction , ICPR (1988);
- line feature data extracted by methods including but not limited to the method described in Heij den, Edge and Line Feature Extraction Based on Covariance Models , IEEE Trans. Pattern Anal. Mach. Intell. (1995); (0) any combinations of these or other available features.
- the target digital content can be created at a target digital content source, and can be found in the procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, or network on which it was created, recorded, edited, handled, or stored.
- the target digital content or its components are transferred, over the internet or any other network, from the target digital content source to the content integration system.
- the content integration system can maintain the target digital content in a memory device 104 .
- the content integration system can receive the target digital content over the interface from the target digital content source, and store the target digital content in the memory device 104 .
- the content integration system 100 can maintain the target digital content in a distribution module 114 dedicated to hosting or serving digital content and including but not limited to digital content distribution websites or applications or social media networks.
- the content integration system 100 is configured to receive or retrieve the target digital content and to use the scene recognition module 106 to parse the target digital content into scenes, shots, or cuts (“scenes”), where each scene represents a series of interrelated and/or consecutive frames, a continuous action in time, or a contiguous physical space.
- the scene recognition module 106 can include open source software such as PySceneDetect.
- the scene recognition module 106 is co-located with the target digital content source. This allows for scene recognition to occur up-front, such that later steps in the content integration process can be distributed or parallelized according to scene.
- the scene recognition module 106 is not co-located with the source of the target digital content, but is the first point of contact for this source, meaning that the target digital content is transmitted directly from the target digital content source to the scene recognition module 106 , without intermediary modules. This allows for scene recognition to occur relatively up-front, such that later steps in the content integration process can be distributed or parallelized according to the scene, yet allows for scene recognition to occur on specialized resources that are impractical to contain on target digital content sources.
- the distribution module 114 (which is configured to accept the source digital content as an input and, from thereon, coordinate the communication of some or all of the various modules in the content integration system) controls the messages and transmission of data between the two.
- the host region identification module 110 can be configured to identify and group those scenes that, while not sequential in the target digital content, nonetheless represent the same physical space and same camera positioning (“duplicate space and camera position recognition”). This allows these scenes to be treated the same during the host region identifications process—e.g., host regions identified in one scene can be assumed to be present in the others, barring their occlusion by objects which, inside the scene depicted by the target digital content, pass between the camera or viewer perspective and the surface, texture, material, plane, object, place, space, location, or area which is associated the host region to occlude the host region.
- the host region identification module 110 to efficiently identify host regions across multiple frames by assuming that host regions which appear in one scene appear in other scenes with duplicate space and camera position recognition.
- the host region defining data for a host region identified in one scene should carry over to the other scenes with duplicate space and camera position recognition.
- the host region identification module 110 is configured to perform the duplicate space and camera position recognition by finding the relative distance between pixels in frames corresponding to the scenes.
- FIG. 4 illustrates the duplicate space and camera position recognition as performed by the host region identification module 110 in accordance with some embodiments.
- the host region identification module 110 is configured to load the pixel values representing the first frame of a first scene onto a first frame buffer or a first memory region.
- the host region identification module 110 is configured to load the pixel values representing the second frame of a second scene onto a second frame buffer or a second memory region.
- the host region identification module 110 is configured to perform memory operations to subtract, find the Euclidian distance, or otherwise find the distance between the values of the first and second frames' pixels.
- step 408 when multiple frames are grouped together for the first scene and the second scene, the host region identification module 110 is configured to repeat steps 402 - 406 to compute the distance between corresponding frames from the first scene and the second scene, and maintain an average distance between the corresponding frames from the first scene and the second scene.
- step 410 when the (average) distance between the corresponding frames from the first scene and the second scene is less than a predetermined threshold, the host region identification module 110 is configured to determine that the first scene and the second scene correspond to the same space and camera position.
- the host region identification module 110 is configured to perform the duplicate space and camera position recognition by performing background modeling (“background modeling”) to determine which pixels include the background of each frame of the target digital content. Once the background pixels are identified in one or more frames of identified scenes of the target digital content, the values of those pixels may be compared in order to predict whether or not the scenes reflect the same physical space and the same camera positioning.
- background modeling background modeling
- Such background modeling can be achieved by: (A) basic background modeling, where a model of the background is derived by taking: (i) the average, across the frames of the digital content, of all pixel values at each location, as described in Lee & Hedley, Background Estimation for Video Surveillance , IVCNZ (2002), herein incorporated by reference in the entirety; (ii) the median, across the frames of the digital content, of all pixel values at each location, as described in McFarlane & Schofield, Segmentation and Tracking of Piglets in Images , BMVA (1995), herein incorporated by reference in the entirety; or the (iii) finding the mode of the histogram of the pixel value series over time, including but not limited to the method described in Zheng et al., Extracting Roadway Background Image: A Mode Based Approach , Journal of Transportation Research Report, 1944 (2006), herein incorporated by reference in the entirety; (B) statistical background modeling, where pixels are classified as foreground or background based on statistical variables including: (i
- FIG. 5 illustrates the estimation of a depth and a normal of a surface by the scene recognition module 106 using a neural network in accordance with some embodiments.
- a neural network in the scene recognition module 106 is trained on a depth map dataset.
- the depth map dataset includes a set of RGB images and their corresponding depth maps.
- the neural network is trained using a loss function that compares the predicted log depth map D to the ground-truth log depth map D*.
- the loss function can be defined as follows:
- L depth ⁇ ( D , D * ) 1 n ⁇ ⁇ i ⁇ d i 2 - 1 2 ⁇ n 2 ⁇ ( ⁇ i ⁇ d i ) 2 + 1 n ⁇ ⁇ i ⁇ [ ( ⁇ x ⁇ d i ) 2 + ( ⁇ y ⁇ d i ) 2 ]
- a training image is processed through the neural network by:
- the trained neural network model in the scene recognition module 106 is configured to predict the depth of target digital content by resizing, if necessary, an input frame from one scene of the target digital content, and inputting it into the trained neural network model to obtain the depth map and normals.
- FIG. 6 illustrates the use, in accordance with FIG. 5 , of a neural network model to predict the depth of pixels of one or more frames in accordance with some embodiments.
- the neural network model in the scene recognition module 106 is used to predict the depth of pixels of one or more frames from different scenes of target digital content in order to model and compare the background of those scenes for the purpose of determining if they represent duplicate space and camera position recognition. For example:
- FIG. 7 illustrates the identification of texture using a neural network in accordance with some embodiments.
- Step 702 illustrates the training of a neural network on a dataset.
- the texture dataset includes is a set of images labelled by the objects they depict.
- the neural network is trained by using, as an overall training architecture, stochastic gradient descent with Softmax as a loss function, a batchsize of 128, a dropout rate of 0.5, a momentum of 0.9, and a base learning rate of 10 ⁇ 3 . For example, for each test image in the training set, the training of the neural network proceeds as follows:
- each image into the first input layer of the neural net, such as a 11 ⁇ 11 convolutional layer with a ReLU activation function, a learning rate of 0.001, a stride of 4, and a 2 ⁇ 2 pooling filter with max pooling, where the number of channels in the output is 48 channels.
- the first input layer of the neural net such as a 11 ⁇ 11 convolutional layer with a ReLU activation function, a learning rate of 0.001, a stride of 4, and a 2 ⁇ 2 pooling filter with max pooling, where the number of channels in the output is 48 channels.
- the scene recognition module 106 is configured to input the output of the previous layer into the second (hidden) layer of the neural net, such as a 5 ⁇ 5 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2 ⁇ 2 pooling filter layer with max pooling, where the number of channels in the output is 128.
- the third (hidden) layer of the neural net such as a 3 ⁇ 3 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output can be 192.
- the fourth (hidden) of the neural net such as a 3 ⁇ 3 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output can be 192.
- the fifth (hidden) layer of the neural net such as a 3-pixel ⁇ 3-pixel convolutional layer with a ReLU activation function, a learning rate of 0.001, a 2 ⁇ 2 pooling filter with max pooling.
- the number of channels in the output can be 128 channels.
- the number of channels in the output can be 2048 channels.
- a (second) fully connected layer such as a fully connected layer with a ReLU activation function and a learning rate of 0.001.
- the number of channels in the output can be 2048 channels.
- the number of channels in the output can be 1000 channels.
- Step 702 illustrates the re-training of the resulting model on images labelled by the materials (e.g., textures) they depict.
- the re-training includes using, as an overall training architecture, stochastic gradient descent with Softmax as a loss function, a batchsize of 128, a dropout rate of 0.5, a momentum of 0.9, and a base learning rate of 10 ⁇ 3.
- Each of the input images from the dataset (which are likely to be untraditional sizes given that they capture textures) are resized to 3 different scales: 1/ ⁇ square root over (2) ⁇ , 1, and ⁇ square root over (2) ⁇ .
- the model is retrained by, for each of the three versions:
- Step 704 illustrates prediction of the texture of pixels in target digital content by:
- f i [ p i x ⁇ p ⁇ d , p i y ⁇ p ⁇ d , I i L ⁇ L , I i a ⁇ ab , I i b ⁇ ab ] ,
- FIG. 8 illustrates the use, in accordance with FIG. 7 , of a neural network model to predict the texture of pixels of one or more frames from different scenes of target digital content in accordance with some embodiments.
- the neural network model is configured to model and compare the background of those scenes for the purpose of determining if they represent duplicate space and camera position recognition.
- Step 802 illustrates the inputting, after identifying the separate scenes in the target digital content, of one or more frames from an identified scene into the texture prediction neural network and CRF described in FIG. 7 .
- Step 804 illustrates the inputting of the output of Step 802 through a linear layer that is responsible for transforming the multiple labels to binary labels that reflect a confidence score for each pixel in the frame, that score being based on whether or not the region is likely a quality host region.
- Step 806 illustrates the resizing, if necessary, of one or more frames from another identified scene and then input them into the texture prediction neural network.
- Step 808 illustrates the loading of the positively labelled pixels from each frame onto a frame buffer or memory area.
- Step 810 illustrates the performing of memory operations to subtract, find the Euclidian distance, or otherwise find the distance between the values of the frames' positively labelled pixels.
- Step 812 illustrates the deeming of the two scenes, where the difference is sufficiently close to zero, to represent duplicate space and camera position recognition. In some embodiments, duplicate space and camera position recognition is repeated for all pairs of scenes.
- FIG. 9 illustrates the reduction of the time complexity of duplicate space and camera position recognition through the use of a hash function in accordance with some embodiments.
- Step 902 illustrates the creation of a hash function that converts the backgrounds pixels from each scene into strings (“background hashes”).
- Step 904 illustrates the comparison of the background hashes of scenes and, where they are sufficiently similar, deeming the two scenes to represent duplicate space and camera position recognition.
- the data generated during background modeling for the purpose of duplicate space and camera position recognition may be stored and accessed again during host region identification where host region identification relies on searching the target digital content or its components for those parts of the target digital content or its components which represent the background of a scene and, thus, may constitute a host region.
- a preliminary step in host region identification includes grouping scenes within the target digital content that, while neither sequential and nor representing the same camera positioning, nonetheless depict the same contiguous physical space (“duplicate space recognition”). This grouping can assist with the parallelization of the host region identification process. This can also assist 3D reconstruction-based methods of host region identification by expanding the amount of data about a given physical space.
- FIG. 10 illustrates duplicate space recognition as performed by the scene recognition module 106 in accordance with some embodiments.
- the scene recognition module 106 is configured to find the average or normalized average values at each pixel in a scene, whether in each frame or at specific pixel locations across all frames, or creating a histogram of the pixel values or the normalized pixel values across the scenes.
- the scene recognition module 106 is configured to create a histogram of pixel values for each scene a clustering those histograms using a clustering algorithm such as k-means.
- the content integration system can use a camera motion classification module 108 to classify the target digital content's scenes or other components according to the presence and/or degree of camera motion (“camera motion classification”). This can improve the distribution or parallelization of host region identification and/or other processes.
- camera motion classification is performed using a dedicated and specialized procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, or network sitting in the communication network (“camera motion classification module”).
- the camera motion classification module 108 is co-located with both the scene recognition module 106 and the target digital content source. This allows for both scene recognition and camera motion classification to occur up-front, such that later steps in the content integration process can be distributed or parallelized according to the type of camera motion in each scene.
- the camera motion classification module 108 is co-located with the scene recognition module 106 , but not the target digital content source, and the co-located camera motion classification module 108 and scene recognition module 106 are the first point of contact for the target digital content source, meaning that the target digital content is transmitted directly from the target digital content source to them, without intermediary modules.
- This allows for scene recognition and camera motion classification to occur relatively up-front, such that later steps in the content integration process can be distributed or parallelized according, yet allows for scene recognition and camera motion classification to occur on specialized processes or equipment (e.g., GPUs) that are impractical to contain on the target digital content source.
- FIG. 11 illustrates a system in which camera motion classification module 108 is co-located with the scene recognition model 106 in accordance with some embodiments.
- the camera motion classification module 108 is not co-located with either the target digital content source or the scene recognition module, but is the first point of contact in the network for the scene recognition module. This allows for camera motion classification to occur relatively up-front, such that later steps in the content integration process can be distributed or parallelized accordingly, yet on specialized processes or equipment that are impractical to contain on either the target digital content source or the scene recognition module.
- the distribution module 114 controls the messages and transmission of data between the modules.
- camera motion classification is achieved using a machine learning classifier that has been trained on examples of target digital content or its components that have potentially been labelled according to their degree of camera motion.
- FIG. 12 illustrates the use of a machine learning classifier to achieve camera motion classification in accordance with some embodiments.
- Step 1202 illustrates taking a sampling of pairs of successive frames from the target digital content.
- Step 1204 illustrates the division of each of these frames into multiple, equally-sized sections.
- Step 1206 illustrates the calculation, for corresponding sections within these pairs, of the intensity of the optical flow of any features that appear in both frames and sections.
- Step 1208 illustrates the storage of these intensities in equal-sized histograms, one per section, where each bar represents a certain intensity level or range and the height represents the number of features falling into that level or range.
- Step 1210 illustrates inputting these histograms as features into a binary or multiclass SVM classifier that has been trained sets of histograms from sample pairs of frames that have marked as exhibiting one of two categories or types of camera motion (no motion, minor motion, major motion, etc.).
- Step 1212 illustrates the determination, once all pairs of frames have been classified, based on a threshold, of which type of camera motion the scene falls into.
- the target digital content or its components are delivered to one or more procedures, functions, processes, threads, applications, memories, caches, disks or other storage, databases, computers, devices, or networks sitting in the network (e.g., host region identification module 110 ) dedicated to host region identification.
- network e.g., host region identification module 110
- the target digital content or its components, after camera motion classification are delivered to one or more host region identification sub-modules, each of which is specialized toward a type of camera motion. Since type of camera motion can increase the length of the host region identification process, this architecture allows for parallelizing and load-balancing by dedicating comparatively more host region identification modules or more computation resources to scenes whose type of camera motion require more processing.
- the host region identification module 110 is co-located with the target digital content source, the scene recognition module 106 , and the camera classification module 108 . This allows for host region identification, scene recognition, and camera classification module to occur up-front, such that later steps in the content integration process can be distributed or parallelized according to host region.
- the host region identification module(s) 110 is not co-located with the target digital content source, but is co-located with the scene recognition 106 and camera motion classification module 108 . This allows for scene recognition, camera motion classification, and host region identification, to occur relatively up-front, such that later steps in the content integration process can be distributed or parallelized accordingly, yet on specialized processes or equipment that are impractical to contain on target digital content sources (e.g., GPUs) but would be advantageous for host region identification as well as scene recognition and/or camera motion classification.
- target digital content sources e.g., GPUs
- the host region identification module(s) 110 are not co-located with the scene recognition modules, camera motion classification module, or target digital content source, but are the first point of contact for the camera motion classification module. This allows for scene recognition to occur relatively up-front, such that later steps in the content integration process can be distributed or parallelized accordingly, yet on specialized resources or in specialized modules that are impractical to contain on target digital content sources, scene recognition modules, or camera motion classification modules and, further, may be specialized according to type of camera motion.
- the distribution module 114 may control the messages and transmission of data between them.
- host region identification relies on computerized identification using so-called “markers”, e.g., graphical identifiers placed inside the real or illustrated scene portrayed by the target digital content (“marker-based computerized host region identification”), computerized identification using non-marker-based techniques (“non-marker-based computerized host region identification”), manual identification by users (“manual host region identification”), or some combination of these methods.
- markers e.g., graphical identifiers placed inside the real or illustrated scene portrayed by the target digital content
- non-marker-based computerized host region identification computerized identification using non-marker-based techniques
- manual host region identification manual identification by users
- an identified host region may be excluded from further consideration because it fails to satisfy some size or duration threshold or is otherwise deemed unable to favorably host source digital content.
- the host region identification module 110 is configured to search the target digital content or its components for representations of pre-selected markers. This is achieved by comparing the available features present in the target digital content or its components to a template or model representing corresponding features in the marker, as described in, for example, Köhler et al., Detection and Identification Techniques for Markers Used in Computer Vision , VLUDS (2010), herein incorporated by reference in the entirety.
- the target digital content's creator has placed the representation of the marker inside the target digital content in order to identify the surface, texture, material, plane, object, place, space, location, or area which is inside the scene portrayed by the target digital content and which is associated with the marker as a host region.
- the resulting host region is sold as an advertising space, with the price of that space based on the physical dimensions of the marker that was placed in the scene depicted by the target digital content (e.g. the dimensions of a billboard-sized marker placed in a sports stadium, regardless of how it appears in digital recordings of games from the stadium).
- the target digital content e.g. the dimensions of a billboard-sized marker placed in a sports stadium, regardless of how it appears in digital recordings of games from the stadium.
- the resulting host region is sold as an advertising space, with the price of that space based on the size of the resulting placement, in the target digital content, of source digital content upon the host region (e.g. the dimensions of the placement, inside digital recordings of games, of source digital content on the billboard-sized marker placed in a sports stadium).
- the marker is a special graphic that offers ample potential correspondences between the template or model and a representation of a marker placed in the scene depicted by the target digital content, including but not limited to an ArUco marker, described in S. Garrido-Jurado et al., Automatic generation and detection of highly reliable fiducial markers under occlusion , Pattern Recognition, 47 (6), 2280-92 (June 2014), which is herein incorporated by reference in the entirety.
- the marker is a graphic or text with aesthetic, informational, or advertising value even when it is not serving as a marker and when source digital content is not being placed upon it.
- the marker can be one version of a company's logo.
- it has advertising value even when not used as a marker and when source digital content is not placed upon it.
- it also has value as a marker, enabling the placement of source digital content (such other versions of the company's logo) upon it.
- the marker is a graphic, part of which possesses aesthetic, informational, or advertising value, even when not used as a marker and when source digital content is not placed upon it, and part of which includes, in its design, a special graphic that offers ample potential correspondences between the template or model and a representation of a marker placed in the scene depicted by the target digital content, including but not limited to an ArUco marker.
- FIGS. 13A-13B illustrate an embodiment in which the marker is a graphic in accordance with some embodiments.
- FIG. 13A illustrates a marker which is one version of a company's logo with an ArUco marker placed inside it. Thus, it has advertising value even when not used as a marker and when source digital content is not placed upon it. Yet it also has value as a marker, enabling the (more reliable) placement of source digital content (such other versions of the company's logo) upon it, as depicted in FIG. 13B .
- a competitor's logo or graphic is used as a marker. In this way, a company or brand is able to identify and replace competitor logos and graphics with their own inside the target digital content.
- the host region identification module 110 can be configured to identify the host regions by searching the available attributes of the target digital content or its components for patterns or qualities that suggest the existence of a host region.
- host regions are identified by searching the available attributes of the target digital content or its components for patterns or qualities that suggest the existence of a host region, a single frame in a scene (for example, the first frame, a randomly chosen frame, or a frame with the median values, compared to all the frames, of some available attribute) is selected as the starting point or, in the case of a scene without camera motion, the sole focal point for the search.
- a single frame in a scene for example, the first frame, a randomly chosen frame, or a frame with the median values, compared to all the frames, of some available attribute
- host regions are identified by searching the available attributes of the target digital content or its components for patterns or qualities that suggest the existence of a host region, the present or most recent frame in a scene is selected as the starting point or, in the case of a scene without camera motion, sole focal point for the search.
- each individual pixel in one or more frames of the target digital content may be assigned a score or probability based on the likelihood that it represents a host region.
- FIG. 14 illustrates a procedure for finding the maximally sized rectangles in each frame of the target digital content in accordance with some embodiments.
- Individual pixels in one or more frames of the target digital content are assigned a score or probability based on the likelihood that they represent a host region, and the rectangles include pixels having a score greater than a predetermined threshold. This helps ensure, when using this approach to host region identification, that the size of the host regions being identified are as large as possible.
- Step 1402 illustrates starting at the first row of the frame, going in the vertical direction and for each element (e.g., pixel), counting the number of elements in the horizontal direction that satisfy the threshold and inserting this number into a histogram for the row.
- Step 1404 illustrates finding, once the histogram for the row is complete, the minimum value in the row, storing it for future use and then subdividing the row into any strings of non-zero histogram bars (sub-histograms), putting each histogram into an array representing the heights of the bars.
- the x coordinate can be starting index (split point) of the sub histogram+current index (i) of the max_rectangle;
- the y coordinate can be the current row of the sub-histogram we are using;
- the w coordinate can be the width of the max_rectangle;
- the h coordinate can be the height of the max_rectangle.
- Step 1410 illustrates advancing ahead a number of rows that is equal to the minimum height of the row (which was stored during an earlier step).
- Step 1412 illustrates sorting the list of rectangles by size.
- Step 1414 illustrates removing any rectangles that are below any minimum dimension thresholds for host regions.
- Step 1416 illustrates eliminating overlapping rectangles by checking, for each rectangle in the list, if any of the corner coordinates lies between any of the corner coordinates of another rectangle and, if so, removing the smaller of the two rectangles from consideration.
- host regions are identified by using background modeling to determine which regions of the target digital content or its components represent the background, rather than the foreground, of the scene depicted by the target digital content and, thus, may host source digital content (since a scene's background is usually of less interest to viewers).
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.).
- the determination that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on using texture recognition to analyze whether the target digital content or its components possess a texture or material commonly associated with the background of images (including but not limited to brick, sky, or forest textures).
- a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on using texture recognition to analyze whether the target digital content or its components possess a texture or material commonly associated with the background of images (including but not limited to brick, sky, or forest textures), that determination may be stored and re-used for other steps in the algorithm (such as the removal of host regions whose texture or material represents poor host regions).
- the results of texture recognition for the purpose of determining whether or not a region represents the background of an image and, thus, a host region may also be used to make a determination about whether or not a region represents a host region because it possesses a texture that is generally favorable or unfavorable for host regions (regardless of whether that region is background or foreground).
- the determination that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on the comparison of the available attributes of the target digital content or its components and a template capturing the available attributes of materials that are commonly associated with the background of images.
- the determination that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on the classification of the target digital content or its components as possessing a texture commonly associated with the background of images (including but not limited to brick, wood paneling, trees, sea, or sky textures), as determined by inputting the original digital or its components into a neural network that has been trained on examples of original digital or its components that have been labelled according to the texture they depict.
- a texture commonly associated with the background of images including but not limited to brick, wood paneling, trees, sea, or sky textures
- the determination that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on the fact that object detection (“object detection”) has been used to search the target digital content or its components for representations of objects, textures, materials, shapes, places, spaces, or areas that commonly represent background objects (e.g., clouds, trees).
- object detection object detection
- the neural-network classification of the target digital content or its components according to texture for the purpose of determining that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region may run concurrently with the classification of target digital content or its components as representing materials or textures that are favorable or unfavorable as host regions (regardless of whether they are background or foreground).
- the classification of the target digital content may be stored and re-used for other steps in the algorithm (such as the removal of host regions whose texture or material represents poor host regions).
- the determination that a region represents the background, rather than the foreground of the real or illustrated scene depicted by the target digital content, and is therefore a host region is based on background modeling using depth information contained in the target digital content (e.g., a formats such as RGB-D where each pixel in a frame is associated with a depth value, possibly via a two-dimensional “depth map” matrix associated with each frame).
- depth information contained in the target digital content e.g., a formats such as RGB-D where each pixel in a frame is associated with a depth value, possibly via a two-dimensional “depth map” matrix associated with each frame).
- the determination that a region represents a host region is based on using texture recognition to identify the region as possessing a texture this commonly favorable for a host region, including, for example, wall, wood paneling, or sky textures.
- the determination that a region represents a host region is based on using texture recognition to identify the region as possessing a texture this commonly favorable for a host region, including, for example, wall, wood paneling, or sky textures relies on using a neural network to classify the textures of various regions.
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.
- the determination that a region can be used as a host region can be performed using a neural network model.
- the neural network model can recognize texture to identify a region as possessing a texture that can accommodate a source digital content.
- the neural network model can assign to each pixel in a frame one or more pairs of probabilities and texture labels, with each reflecting the predicted probability of that pixel possessing the particular texture and the label, and then pass these probabilities and texture label pairs through a linear layer that is responsible for transforming them into binary labels that reflect whether the pixel is part of a host region or not.
- this linear layer can assign a positive binary label to any pixel whose highest probability belongs to a texture label that is listed as positive because it represents favorable host regions, where positive texture labels might include: ‘brick’, ‘carpet’, ‘metal’, ‘paper’, ‘plastic’, ‘polished stone’, ‘stone’, ‘tile’, ‘wallpaper’, ‘wood’, ‘painted’, and ‘sky.’
- positive texture labels might include: ‘brick’, ‘carpet’, ‘metal’, ‘paper’, ‘plastic’, ‘polished stone’, ‘stone’, ‘tile’, ‘wallpaper’, ‘wood’, ‘painted’, and ‘sky.’
- negative binary label to any pixel whose highest probability belongs to a texture label that is listed as negative because it represents favorable host regions, where negative labels might include ‘hair’, ‘skin’, ‘food’, ‘foliage’, ‘fabric’, ‘leather’, ‘glass’, ‘water’, and ‘mirror’.
- host regions are identified by searching the target digital content or its components for contiguous regions whose available attributes possess a level of uniformity across the region (e.g., white walls, patches of blue sky), a quality that often indicates the region can host source digital content in an aesthetically appealing and/or unobtrusive manner.
- a level of uniformity across the region e.g., white walls, patches of blue sky
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.
- host regions are identified by searching the target digital content or its components for regions whose available attributes possess a level of uniformity across the region because there is an absence of edges or texture.
- FIG. 15 illustrates the identification of host regions based on their absence of edges or texture as performed by the host region identification module 110 in accordance with some embodiments.
- the host region identification module 110 is configured to load the pixel values representing a frame of a scene onto a frame buffer or memory area.
- the host region identification module 110 is configured to convert the pixel values into grayscale and depositing the resulting frame onto a frame buffer or memory area.
- the host region identification module 110 is configured to convolve the frame from the scene with a Prewitt, Sobel, combined Prewitt and Sobel, or other kernel in the horizontal (G x ) and vertical directions (G y ) and depositing the resulting frames onto frame buffers or memory areas.
- the host region identification module 110 is configured to select for each corresponding element in the frames representing the results of convolutions in the (G x ) and vertical directions (G y ) the maximum value between the two and depositing it in a frame (map of the maximal gradients) that has been loaded onto a frame buffer or memory area.
- the host region identification module 110 is configured to normalize this frame from 0-1.
- the host region identification module 110 is configured to optionally map, onto this normalized map of maximal gradients, 0's in place of any pixels, bounding boxes, or other areas that have been determined to represent objects, people, textures, materials, shapes, locations, and activities (such as skin, hair, fur, water, etc.) which are deemed to constitute unwanted host regions, that determination being made by:
- step 1514 in the resulting map, finding the rectangles that represent likely host regions by setting a threshold (for a satisfactory host region score) somewhere between 0 and 1, and then relying on the algorithm in FIG. 14 .
- host regions are identified by searching the target digital content or its components for contiguous regions whose available attributes suggest a common texture across those region (e.g. brick walls, patches of cloudy sky), a quality that indicates that the region might host source digital content in an aesthetically appealing and/or unobtrusive manner.
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.
- host regions are identified by searching the target digital content or its components for contiguous regions whose available attributes suggest a common texture across those region (e.g. brick walls, patches of cloudy sky), by using a filter, including but not limited to Gabor filter, as described in Fogel & Sagi, Gabor filters as Texture Discriminator , Biological Cybernetics 61 (1989), herein incorporated by reference in the entirety, to compare textures of different parts of the frame.
- a filter including but not limited to Gabor filter, as described in Fogel & Sagi, Gabor filters as Texture Discriminator , Biological Cybernetics 61 (1989), herein incorporated by reference in the entirety, to compare textures of different parts of the frame.
- host regions are identified by using object recognition to search the target digital content or its components for representations of objects, textures, materials, shapes, places, spaces, or areas that can host source digital content, such as billboards, guitar bodies, stadium jumbotrons, brick texture, sky texture, quadrilaterals, etc.
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.
- host region defining data the data about the particular object, textures, or shapes identified (or not identified) is captured in a host region object as metadata about the host region (“host region defining data”).
- host regions are identified by object recognition
- host regions are identified by comparing the available attributes of the target digital content or its components to the available attributes of pre-constructed object-, texture-, material-, shape-, place-, space-, or area-specific templates, considering changes in template position, and, where there is sufficient similarity, making a determination that the objects, textures, materials, shapes, places, spaces, or areas are or are not present in the target digital content or its components.
- host regions are identified by comparing the available attributes of the target digital content or its components to the available attributes of object-, texture-, material-, shape-, place-, space-, or area-specific templates or using machine learning models in order to predict labels thus to determine whether or not the objects, textures, materials, shapes, places, spaces, or areas are represented in the target digital content or its components has been made, metadata about the template (e.g., the template name or the name of the object, texture, material, shape, place, space, or area that it represents), the labels and/or probabilities predicted by the machine learning models, and/or data about the determination that those objects, textures, materials, shapes, places, spaces, or areas are or are not represented in the target digital content or its components is captured in a host region object as host region defining data.
- metadata about the template e.g., the template name or the name of the object, texture, material, shape, place, space, or area that it represents
- the labels and/or probabilities predicted by the machine learning models and/or data about the determination that those objects
- host regions are identified by object recognition
- host regions are identified by inputting available attributes of the target digital content or its components into a classifier that has been trained on examples of target digital content or its components labelled by objects, textures, materials, shapes, places, spaces, or areas, and will classify the input as one or more objects, textures, materials, shapes, places, spaces, or areas.
- FIG. 16 illustrates the identification of host regions by inputting frames of the target digital content through a neural network model as described in FIG. 7 in accordance with some embodiments.
- Step 1602 illustrates the inputting of one or more frames from an identified scene into the texture prediction neural network and CRF as described in FIG. 7 .
- Step 1604 illustrates the passing, by host region identification module 110 , of the resulting this output through a linear layer that is responsible for transforming the multiple labels to binary labels that reflect a confidence score for each pixel in the frame, that score being based on whether or not the region is likely a quality host region. This is done by creating a list of positive and negative texture labels.
- Step 1606 illustrates the use of the process described in FIG. 14 to locate the maximal rectangular host regions inside the resulting map.
- host region identification relies on object recognition
- host region identification relies on classifiers that have been trained on examples of target digital content or its components labelled by objects, textures, materials, shapes, places, spaces, or areas they represent, and will classify the input as one or more objects, textures, materials, shapes, places, spaces, or areas, the resulting classification is captured in as host region defining data.
- host regions are identified by accepting a message from a user wherein that message contains an indication that a particular object, texture, material, shapes, place, space, or area is or is not represented in the target digital content or its components.
- host region locations in the target digital content are predicted using a machine learning classifier, trained on the available attributes or other data from examples of host regions or the labelled as positive or negative examples, the target digital content or its components, and/or metadata about the target digital content.
- This machine learning classifier may take the form of, among other things, a: (A) linear classifier; (B) Fisher's linear discriminant; (C) logistic regression; (D) naive Bayes classifiers; (E) perceptron; (F) support vector machines; (G) least squares support vector machines; (H) quadratic classifiers; (I) kernel estimation; (J) k-nearest neighbors; (K) decision trees; (L) random forests; (M) conditional random fields; (N) neural networks including but not limited to: (i) convolutional neural networks, as described in Krizhevsky et al., ImageNet Classification with Deep Convolutional Neural Networks , NIPS (2012), herein incorporated by reference in the entirety, including the unique implementation where the neurons in the network are grouped in different layers, where each layer analyses windows of a frame, determining an output score for each pixel, where the highest score pixels are the ones in windows that match a region of that frame that is able to host, in an
- predictive models may be continually refined by retraining the models on the new examples of positive or negative host regions that are produced as users make manual selections, approvals, or customizations of predicted host regions.
- the quality of the resulting host regions may additionally be scored based on factors such as position in the frame, luminance value, color value, level of visual salience, etc.
- a convolutional neural network model is trained on frames from examples of target digital content whose labels are positive and/or negative examples of host region defining data (e.g., the coordinates the corners of the bounding box of a host region in that particular frame, or a list of the pixels it includes). When new frames are inputted into the model, it will predict the host region defining data that defines the host region(s) in the inputted frames.
- the pixel scores are additionally weighted based on other available attributes that speak to the favorability of a host region (such as location in frame(s), luminance, color value, etc.).
- host regions are identified by identifying lines and then parallelograms in the frame using methods including but not limited to the method described in Tam, Shen, Liu & Tang, Quadrilateral Signboard Detection and Text Extraction , CIS ST (2003), herein incorporated by reference in the entirety.
- the frames of the target digital content are parsed into a grid. For each section of the grid, the values of the pixels are subtracted from subsequent, similar sections across the frames of the source digital content. Where the difference is zero or sufficiently close to it based on some predetermined threshold, that part of the frame is determined to be a non-active or non-dynamic one across the source digital content and, therefore, a host region.
- the frames of the target digital content are searched for segments that lack gradient change or texture; this suggests that it is likely to be a flat or uniform surface and, thus, a host region.
- this can be achieved by calculating the gradients in each direction at each pixel and then passing sliding windows of various sizes on the derivatives that have been calculated (in x and y), calculating the covariance matrix of the gradient direction vectors within each window, calculating the sum of the values along each diagonal of that matrix (or the eigenvalues along each diagonal of that matrix) and, where both sums (or both eigenvalues) of a region are sufficiently close to 0, assuming that the region lacks edges and, thus, is a host region.
- the host region identification module 110 receives from a user a selection message over the communication network, where the selection message represents the host region defining data or host region objects for one or more host regions selected by a user.
- FIGS. 17A-17D illustrate an embodiment in which a host region is identified through selection in a graphical user interface in accordance with some embodiments.
- FIG. 17A illustrates a frame of target digital content including a video.
- FIG. 17B illustrates a host region, demarcated by rectilinear bounding boxes, as identified in the frame by a user inside a graphical user interface that communicates the selection to host region identification module 110 .
- FIG. 17C illustrates a single frame of source digital content including a Portable Network Graphics (PNG) raster graphics file, depicting an advertisement image.
- FIG. 17D illustrates the integration of the source digital content in FIG. 17C into the host region in the frame of the target digital content, after the source digital content has been transformed to accommodate the occlusion, luminance, texture, and blur of the host region.
- PNG Portable Network Graphics
- one or more servers, procedures, functions, processes, applications, computers or devices sitting in the communication network receive from a user a selection message over the communication network, where the selection message represents the host region defining data or host region objects for one or more host regions as selected by host region identification module 110 .
- the user selection message takes the form of a series of corners (in the case where the host region is a quadrilateral, polygon, or other shape that can be defined by its corner positions), parameters, an outline, a bounding box, list of pixels, or any other information that can be used to separate the host region from the rest of the content in one or more frames of the content.
- the host region approval module 112 is configured to receive a message over the communication network where the message includes one or more instances of host region defining data, host region objects, and/or the target digital content and its metadata.
- the host region approval module 112 receives, from the distribution module 114 , a message including one or more of host region defining data, host region objects, and/or the target digital content and its metadata.
- the host region approval module 112 receives, from the host region identification module 110 , a message including one or more of host region defining data, host region objects, and/or the target digital content and its metadata.
- a procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, or network sitting in the communication network receives, from the host region approval module 112 , a message over the communication network where the message includes an indication that a host region embodied in the host region defining data or host region objects it has received is approved or selected or where the message includes one or more new, customized instances of host region defining data or host region objects (with, potentially, customized transformation objects), possibly with new metadata added.
- the host region approval module 112 informs the user's approval, selection, or customization of host region by providing data, guidelines, or feedback about: (1) which, if any, standard host region dimensions the approved, selected, or customized host regions can be associated with; (2) in the advertising use case, prior selling prices of the approved, selected, or customized host regions; (3) summary statistics on the approved, selected, or customized host regions.
- the host region approval module 112 is configured to transmit, to the distribution module 114 , a message indicating that a host region is approved or selected, or a message that includes one or more one or more customized instances of host region defining data, or host region objects (with, potentially, customized transformation objects), possibly with new metadata added.
- the host region approval module 112 is configured to transmit, to the storage module 116 , a message indicating that a host region is approved or selected, or a message that includes one or more customized instances of host region defining data, or host region objects (with, potentially, customized transformation objects), possibly with new metadata added.
- the host region approval module 112 is configured to transmit, to the host region identification module 110 , a message indicating that a host region is approved or selected, or a message that includes one or more customized instances of host region defining data, host region objects (with, potentially, customized transformation objects), or lightweight host region objects, possibly with new metadata added.
- the host region approval module 112 is part of a device or user account operated by the same user as that of the device or user account originating the target digital content.
- any time after or during host region identification the available attributes or other qualities of the host region or target digital content are used to create one or more masks, filters, kernels, homography or other matrices, images, arrays, lists of coordinates or other objects or data structures (“transformation objects”), that enable a placement of the source digital content to emulate the location, motion, pose, luminance, texture, and/or level of blur in the surface, texture, material, plane, object, place, space, location, or area which is associated the host region and, thus, to appear more immersed in the target digital content, improving viewer experience.
- transformation objects coordinates or other objects or data structures
- the creation of transformation objects is handled by the host region identification module(s) 110 since host region identification and transformation object creation processes may be interwoven and may share compatible resources (e.g. GPUs).
- a procedure, function, process, application, computer, or device that is sitting in the network and is dedicated to the selection of source digital content to be placed upon the host region (“source digital content selection module”) receives the host region defining data, host region object, the target digital content, and/or metadata about the target digital content and the transformation object creation process does not commence until it receives a return message from that the content integration module 120 containing the source digital content or an indication about whether there is appropriate source digital content to place upon the identified host region or interest in placing source digital content upon the identified host region. This saves the expense of creating transformation objects in case there is no appropriate source digital content or interest.
- host region defining data or host region objects are transmitted to one or more content integration modules 120 and the transformation object creation process waits for an indication from those content integration modules 120 that there is interest from interested parties (e.g. advertisers) in placing source digital content upon the host region before proceeding, those interested parties may express interest or bid on the host region without knowledge of its eventual level of occlusion, luminance, or blur, and may, retrospectively, after the determination of the transformation object, receive a settlement or reimbursement for any surplus in occlusion or blur or lack of luminance in the eventual placement.
- interested parties e.g. advertisers
- host region defining data or host region objects are transmitted to one or more source digital content selection modules 118 at the same time that the transformation object creation process begins. This aids parallelization by allowing the process of creating the transformation objects to start while source digital content is still being selected.
- the location or the pose of that surface, texture, material, plane, object, place, space, location, or area is tracked across those frames (collectively, “host region tracking”), so that the integration of the source digital content upon the host region can emulate that location and pose and, thus, achieve a more immersed and realistic feel, improving viewer experience.
- the host region when the surface, texture, material, plane, object, place, space, location, or area that is associated with the host region lacks sufficient features to enable host region tracking (“trackable features”)—e.g., it is a blank wall—the host region is temporarily augmented with a “buffer” of additional pixels in order to capture, within the buffer, more features that can be used to track the surface, texture, material, plane, object, place, space, location, or area. After host region tracking is complete, the buffer can be subtracted from the host-region defining data.
- trackable features e.g., it is a blank wall
- host region tracking is achieved using methods including but not limited to: (A) feature tracking (e.g., video tracking), including but not limited to methods based on (i) optical flow, including but not limited to the method described in Lucas & Kanade, An Iterative Image Registration Technique with an Application to Stereo Vision , IJCAI (1981), herein incorporated by reference in the entirety; (ii) kernel-based optical flow, including but not limited to the methods described in Weinzaepfel et al., DeepFlow: Large Displacement Optical Flow with Deep Matching , ICCV (2013), herein incorporated by reference in the entirety, Farneback, Two frame Motion Estimation based on Polynomial Expansion , SCIA (2003), herein incorporated by reference in the entirety, or Comaniciu et al., Real - time Tracking of Non - rigid Objects Using Mean Shift , CVPR (2000), herein incorporated by reference in the entirety; (B) Kalman filters, including but not limited to the method
- host region tracking is achieved using optical flow, along with RANSAC to reduce the effect of outliers, and producing, for each frame in the content, a host region positioning and a homography matrix that describes how the host region in the starting frame may be transformed to approximate the positioning of the host region in each of the other frames in the content.
- an identified host region may be excluded from further consideration because it fails to satisfy some size or duration threshold or is otherwise deemed unable to favorably host source digital content.
- the egomotion data may be saved as metadata, and, later, used to aid the process of determining the visual salience heat map of the particular target digital content or its components.
- FIG. 18 illustrates a system of neural networks of varying coarseness that is designed to transform a source digital content using the depth map and normals in accordance with some embodiments.
- the outputs of the process in FIG. 5 can be used in conjunction with the original image to fit planes to region and thus, to create transformation objects that emulate the pose of the host region.
- step 1802 the depth map and normals that have been predicted by the neural network are resized to the size of the input image.
- step 1804 the normal predictions (dimension HxWx 1 ) from FIG. 5 are transformed into a heat map image, in which the normals with similar direction share similar colors (dimension HxWx 3 ).
- the super pixels inside the heat map image are identified using a super pixel segmentation algorithm such as optical flow-based, edge detection, meanshift, graph based, blob-based, SLIC, watershed, quick shift, or neural network based superpixel segmentation.
- a super pixel segmentation algorithm such as optical flow-based, edge detection, meanshift, graph based, blob-based, SLIC, watershed, quick shift, or neural network based superpixel segmentation.
- a graph cut algorithm is used to combine super pixels that share similar colors (normal). This is done by calculating a mean across the region of each super pixel and merging adjacent regions where the mean variation is below a certain threshold.
- step 1810 RANSAC is used on each of the 4 outputted regions (which represent the segmented regions inside the image where all points share a similar normal) to remove any outliers from the predicted normal, outputting the normal that fits the largest majority of the points (and using a threshold of 80%; if 80% of the points fit, the algorithm converges).
- a homography matrix is calculated using the surface normal/depth map to transform the points in the contour to a fronto-parallel pose (e.g., parallel to the camera). This allows the system to approximate the surface as if it was not seen from a perspective (e.g., an object being seen from the top).
- a fronto-parallel pose e.g., parallel to the camera.
- step 1814 the inverse of the homography matrix from the prior step (a transformation that will transform something from the frontal parallel view to the same orientation as the region selected) are calculated for each region.
- step 1816 the prior output is used to transform or warp the source digital content into the orientation that has been calculated for the region.
- any time after or during host region identification the available attributes or other qualities of the host region or target digital content are used to create a mask, filter, kernel, matrix, image, array, or other object or data structure (“occlusion transformation object”) that enables a placement of the source digital content to allow or appear to allow any representations of objects which, inside the scene depicted by the target digital content, pass between the camera or viewer perspective and the surface, texture, material, plane, object, place, space, location, or area which is associated the host region to occlude the host region (e.g., block it from view) just as they would in the physical world.
- occlusion transformation object enables a placement of the source digital content to allow or appear to allow any representations of objects which, inside the scene depicted by the target digital content, pass between the camera or viewer perspective and the surface, texture, material, plane, object, place, space, location, or area which is associated the host region to occlude the host region (e.g., block it from view) just as they would in the physical world.
- the host region identification module 110 is configured to keep a count, during the determination of the occlusion transformation object, of the number of pixels in the host region marked occluded in each frame in which the host region appears. This facilitates the pairing of the host region with source digital content (e.g., the selling of the host region for advertising purposes) by detailing how much of the eventual placement will actually be seen by a viewer.
- source digital content e.g., the selling of the host region for advertising purposes
- the occlusion transformation object is a set of images, lists, multidimensional arrays, or matrices, one for each frame of the content that the host region occupies in the target digital content or one for each frame of the content that is meant to host a placement of source digital content, with each element or pixel in each image, list, array, or a matrix being associated with a particular pixel in the host region and containing a binary indicator of whether or not the pixel is occluded.
- the occlusion transformation object is a lightweight data-interchange format, including but not limited to JSON or XML, or lightweight image format that includes a list of images, lists, multidimensional arrays, or matrices, one for each frame of the content that the host region occupies in the target digital content or one for each frame of the content that is meant to host a placement of source digital content, with each element or pixel in each image, list, array, or a matrix being associated with a particular pixel in the host region and containing a binary indicator of whether or not the pixel is occluded.
- JSON JSON
- XML lightweight image format that includes a list of images, lists, multidimensional arrays, or matrices, one for each frame of the content that the host region occupies in the target digital content or one for each frame of the content that is meant to host a placement of source digital content, with each element or pixel in each image, list, array, or a matrix being associated with a particular pixel in the host region and containing a binary indicator
- the occlusion transformation object is a so-called “foreground mask” for either the host region or the frames it occupies—e.g., a binary image marking, for each frame in which the host region appears, background pixels (here, those belonging to the surface, texture, material, plane, object, place, space, location, or area that is associated with the host region) with one value and foreground pixels (here, those belonging to representations of objects which, inside scene depicted by the target digital content, pass between the camera or viewer perspective and the surface, texture, material, plane, object, place, space, location, or area associated with the host region) with another value.
- background pixels here, those belonging to the surface, texture, material, plane, object, place, space, location, or area that is associated with the host region
- foreground pixels here, those belonging to representations of objects which, inside scene depicted by the target digital content, pass between the camera or viewer perspective and the surface, texture, material, plane, object, place, space, location, or area associated with the host region
- this foreground mask acts as a guide, with its values dictating whether to expose the source digital content's pixels to the viewer (as is the case when a particular pixel in the mask holds the value for non-occlusion) or to expose the target digital content pixels to the viewer (as is the case when a particular pixel in the mask holds the value for occlusion).
- the occlusion transformation object is a so-called “foreground mask” with the same dimensions as the host region. This reduces computation time.
- FIG. 19 illustrates the determination of a foreground mask transformation object using background subtraction as performed by the host region identification module 110 in accordance with some embodiments.
- the host region identification module 110 is configured to select an unoccluded instance of the host region from one frame of the target digital content to use as the “background model”, to crop the frame to that host region, and to fill a frame solely by the binary value representing non-occlusion (e.g., “true”), representing the foreground mask for this model frame.
- the host region identification module 110 is configured to create foreground masks for all other frames.
- the host region identification module 110 is configured to load the pixel values of the background model onto a frame buffer or memory area and to apply a bilateral filter, as described in Carlo Tomasi and Roberto Manduchi, “Bilateral Filtering for Gray and Color Images,” IEEE (1998), herein incorporated by reference in the entirety (“bilateral filter”) to those pixel values in order to smooth out noise while preserving edges.
- the host region identification module 110 is configured to load the pixel values of another instance of the other frame, cropped to the host region area inside that frame, onto another frame buffer or memory area and to apply a bilateral filter to those pixel values.
- the host region identification module 110 is configured to perform memory operations to find the absolute value of the difference between the corresponding pixels values.
- the host region identification module 110 is configured, where the absolute value of the differences is sufficiently close to 0 (e.g., within a preset threshold, possibly varying according to video quality), to add the value for non-occlusion (e.g., “true”) to the foreground mask at the corresponding pixel.
- the host region identification module 110 is configured, where the absolute value of the differences is sufficiently greater than 0 (e.g., outside of a preset threshold, possibly varying according to video quality), to add the other binary value (e.g., “false”) to the foreground mask.
- FIG. 20 illustrates the determination of a foreground mask transformation object using background subtraction and, in parallel, the determination of a luminance mask that enhances the foreground mask as performed by the host region identification module 110 in accordance with some embodiments.
- a foreground mask is created using so-called “background subtraction” supplemented by the determination of “luminance mask” that enable the removal of shadows and other changes in luminance (which, in the physical world, do not occlude in same the way that objects do) from the foreground mask.
- luminance mask an instance of the host region from one frame of the target digital content that is both unoccluded and unaffected by changes in luminance (e.g., represents the luminance status quo) is used as the “background model” for both the foreground mask and the luminance mask.
- the frame is converted to Lab, HSV, or any other color space with a luminance-related channel.
- both the foreground mask and the luminance mask are populated solely by the binary value representing non-occlusion and the absence of luminance changes (e.g., “true”).
- Foreground masks and luminance masks for all other frames are created by first applying bilateral filters to each of these versions of the frame representing the background model and luminance model and then:
- the host region identification module 110 is configured to convert, if necessary, each frame of the target digital content in which the host region appears to the RGB color space.
- the host region identification module 110 is configured to crop each frame to just those pixels composing the host region and then to apply a bilateral filter is applied to smooth each cropped host region.
- the host region identification module 110 is configured to select the cropped host region from one frame (e.g., the first frame) as the background model, to load the original pixel values of the cropped host region representing the background model onto a frame buffer or memory area, to load, for every other cropped host region, the pixel values onto another frame buffer or memory area, to perform memory operations to find the absolute value of the difference between the corresponding pixels values of the background model and the other cropped host region, and, where the absolute value of the differences are sufficiently close to 0 (e.g., within a preset threshold, possibly varying according to video quality), to add the value for background (e.g., “true”) to the foreground mask at the corresponding pixel or, where the absolute value of the differences are sufficiently greater than 0 (e.g., outside of a preset threshold, possibly varying according to video quality), to add the other binary value (e.g., “false”) to the foreground mask.
- 0 e.g., within a preset threshold,
- the host region identification module 110 is configured to load the pixel values the cropped host region representing the luminance model (often, the same one representing the background model) onto a frame buffer or memory area, to convert the values to the HSV color space, to perform memory operations to find the absolute value of the difference between the corresponding H and S pixel values and to calculate the ratio of corresponding V pixel values and if the V ratio is sufficiently far from 0 (e.g., beyond a preset threshold, possibly varying according to video quality) and the absolute value of the differences in the H and S values is sufficiently close to 0 (e.g., within a preset threshold, possibly varying according to video quality), to add the value indicating a change in luminance (e.g., “true”) to the luminance mask at the corresponding pixel or where the thresholds are not met, to add the other binary value (e.g., “false”) to the foreground mask.
- the value indicating a change in luminance e.g., “true”
- the host region identification module 110 is configured to invert the values of the luminance mask for each cropped host region to apply an AND operation to merge the inverted luminance mask and the foreground mask for each cropped host region, with the output being copied to a frame or memory buffer.
- the luminance mask is created by using a Gabor filter to capture, as described in Fogel & Sagi, Gabor filters as Texture Discriminator , Biological Cybernetics 61 (1989), herein incorporated by reference in the entirety, the texture of the host region from one frame of the target digital content that is both unoccluded and unaffected by changes in luminance (e.g., represents the luminance status quo) and is used as the “background model” for both the foreground mask and the luminance mask.
- a “luminance mask” can then be created by comparing the texture information from this background model to other frames, marking the pixels which similar texture as shadows or areas of luminance change (rather than as occluding objects, which would not have the same texture). After this is done for a particular frame, its values can be inverted and then an AND operation can be applied with the luminance mask and the foreground mask for each frame, with the output being copied to a frame or memory buffer.
- FIG. 21 illustrates the determination of a foreground mask transformation object using depth information as performed by the host region identification module 110 in accordance with some embodiments.
- the depth information can be either intrinsic to the target digital content (e.g., an RGB-D video) or calculated based on the available attributes of the target digital content or its components.
- the host region identification module 110 is configured to select an un-occluded instance of the host region from a frame of the target digital content is used as the background model, to crop that frame to the host region, and to load the depth pixel values of the background model onto a frame buffer or memory area.
- the host region identification module 110 is configured to load the depth pixel values of another cropped instance of the host region or another frame onto another frame buffer or memory area.
- the host region identification module 110 is configured to perform memory operations to subtract corresponding depth pixel values; Where the difference is equal to or less than 0, suggesting the pixel represents an object that is at a depth equal to or greater than the background model, adding the value for non-occlusion (e.g., “true”) the foreground mask at the corresponding pixel.
- the value for non-occlusion e.g., “true”
- step 2108 the host region identification module 110 is configured, where the difference is greater than 0, to add the other binary value (e.g., “false”) to the foreground mask.
- the other binary value e.g., “false”
- FIG. 22 illustrates the improvement of a foreground mask or combined foreground and luminance mask transformation object by removing noise and outliers as performed by the host region identification module 110 in accordance with some embodiments.
- the host region identification module 110 is configured to set a threshold difference level.
- the host region identification module 110 is configured, or each point in the mask, to check if the difference between the background model and the new input image at that point is greater than the threshold, and, if so, to move onto the next point and, if not, to run the so-called “flood fill” algorithm starting at that point.
- the selection of the background model for the creation of a foreground mask or combined foreground and luminance mask involves selecting the version of the identified host region in the first frame of a given scene of the target digital content.
- the selection of the background model for the creation of a foreground mask or combined foreground and luminance mask involves selecting the version of the identified host region in the current or most recently generated frame.
- any time after or during host region identification the available attributes or other qualities of the host region or target digital content are used to create a mask, filter, kernel, matrix, image, array, or other object or data structure (“luminance transformation object”) that enables the placement of source digital content on a host region to reflect the luminance changes (e.g., from shadows or specular light) that affect the surface, texture, material, plane, object, place, space, location, or area which is depicted in the target digital content and which is associated the host region.
- luminance transformation object e.g., from shadows or specular light
- the luminance transformation object is a list of lists, multi-dimensional arrays, or matrices, each capturing values that (through multiplication, addition, or any other operator) are able to transform the pixels of the source digital content or its placement such that they reflect the luminance values possessed by the host region as it exists in the target digital content.
- the luminance transformation object is created by using a version of the host region (one that is devoid of luminance changes) from a particular frame as a “luminance model” and comparing that luminance models' host region luminance-related pixel values (e.g., the L channel in Lab) to those of all the other frames the host region occupies. Differences in luminance-related pixel values are captured as a luminance transformation object, each of whose elements represent a transformation value that, when applied to the source digital content, dictates how and how much to adjust (by addition, multiplication, or any other operation) its pixel values in order to match the luminance changes in the host region as it exists in the target digital content.
- a luminance transformation object each of whose elements represent a transformation value that, when applied to the source digital content, dictates how and how much to adjust (by addition, multiplication, or any other operation) its pixel values in order to match the luminance changes in the host region as it exists in the target digital content.
- FIG. 23 illustrates the determination of a luminance transformation object by the host region identification module 110 in accordance with some embodiments.
- the host region identification module 110 is configured, for all frames the host region occupies in the target digital content, to load the pixel values for the host region onto a frame buffer or memory area.
- the host region identification module 110 is configured, for all the frames the host region occupies in the target digital content, to convert those values to Lab.
- the host region identification module 110 is configured, for all frames the host region occupies in the target digital content, to isolate the L channel.
- the host region identification module 110 is configured, for all frames the host region occupies in the target digital content, to perform memory operations to subtract 255 from each pixel's L value.
- the source digital content possesses a different shape or geometry than the host region (as is the case, for example, when placing source digital content representing a 3D object such as a barber pole on a host region that represents a 2D surface such as a wall), a 3D reconstruction of the geometry of (i) the scene depicted by the original content; (ii) the light sources in that space; and (iii) the 3D object represented by the source digital content is used to predict how luminance changes will affect the placement of the source digital content. This gives the eventual placement of the source digital content a more immersed and realistic feel, improving viewer experience.
- the source digital content depicts a texture or material that is different than that of the host region (as is the case, e.g., when placing source digital content representing metallic lettering on a host region associated, in the scene depicted by the target digital content, with a wood wall), 3D reconstruction of the geometry of the scene depicted by the original content, data about the light sources (e.g., lamps or the sun) in that 3D reconstruction, and models that predict how certain textures or materials respond to luminance changes are used to model and/or modify the behavior of source digital content or its placement as it is subject to luminance changes that affected the host region. This gives the placement of the source digital content an immersive, realistic feel and improves the experience of the viewer.
- the luminance transformation object is a lightweight data-interchange format, including but not limited to JSON or XML.
- any time after or during host region identification the available attributes or other qualities of the host region or target digital content are used to create a mask, filter, kernel, matrix, image, array, or other object or data structure (“texture transformation object”) that enables a placement of source digital content on the host region to reflect the original texture(s) of the surface, texture, material, plane, object, place, space, location, or area in the target digital content which is associated the host region.
- texture transformation object object that enables a placement of source digital content on the host region to reflect the original texture(s) of the surface, texture, material, plane, object, place, space, location, or area in the target digital content which is associated the host region.
- the texture transformation object is a list of lists, multi-dimensional arrays, or matrices, each capturing values that (through multiplication, addition, or any other operator) are able to transform the pixels of the source digital content or its placement such that they reflect the textures possessed by the host region as it exists in the target digital content.
- the texture transformation object is a lightweight data-interchange format, including but not limited to JSON or XML.
- any time after or during host region identification the available attributes or other qualities of the host region or the target digital content is used to create a mask, filter, kernel, matrix, image, array, or other object or data structure (“blur transformation object”) that enables any placement of source digital content on the host region to reflect the original level of blur of the surface, texture, material, plane, object, place, space, location, or area in the target digital content which is associated the host region.
- Blur transformation object enables any placement of source digital content on the host region to reflect the original level of blur of the surface, texture, material, plane, object, place, space, location, or area in the target digital content which is associated the host region.
- the blur transformation object is a list of lists, multi-dimensional arrays, or matrices, each capturing values that (through multiplication, addition, or any other operator) are able to transform the pixels of the source digital content or its placement such that they reflect the level of blur possessed by the host region as it exists in the target digital content.
- the blur transformation object is created by using depth information gained during, for example, background modeling to predict the level of blur transformation that is necessary for the placement of source digital content to reflect the original level of blur of the surface, texture, material, plane, object, place, space, location, or area in the scene that the host region is associated with the host region in the target digital content.
- the blur transformation object is created by using blur detection algorithms such as OpenCV's blur detection tool, to detect and then replicate the original level of blur of the surface, texture, material, plane, object, place, space, location, or area in the scene that the host region is associated with the host region in the target digital content.
- blur detection algorithms such as OpenCV's blur detection tool
- the tracking of the host region leads to an understanding of the rate of motion of the surface, texture, material, plane, object, place, space, location, or area that is associated with the host region across the frames such that the eventual placement can be blurred accordingly in order to recreate the blur caused by the motion of pixels across time in the original content.
- the blur transformation object is a lightweight data-interchange format, including but not limited to JSON or XML.
- the transformation objects dedicated to different transformations are merged into one or more masks, filters, kernels, matrices, images, arrays, or other objects or data structures (“merged transformation object”).
- the merged transformation object is a set of matrices, images, or arrays, one for each frame of the content, whose values that transform (e.g. through multiplication, subtraction, or addition) the source digital content so that it reflects the occlusion, texture, luminance, and blur of the host region.
- the merged transformation object is a set of matrices, images, or arrays, one for each frame of the content, with each element including an 8-bit string, the first of which indicates whether or not the associated pixel is occluded or not (e.g., it acts as a foreground mask), with the remaining bits being used to contain one or more integers that indicate the transformation value that non-occluded pixels must be multiplied by, added to or subtracted from in order to reflect the occlusion, texture, luminance, and blur of the host region.
- the merged transformation object is a lightweight data-interchange format, including but not limited to JSON or XML.
- the output of host region identification is data defining the location and/or duration of the host region in one or more frames of the target digital content, as well as each frame's necessary transformation objects (“host region defining data”).
- the host region defining data includes a list of triples, quadruples, quintuples, or septuples including a frame number, a list of the coordinates—e.g., a series of (x, y) coordinate pairs or (x, y, z) coordinate triples—that bound the host region in that frame, and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a list of triples, quadruples, quintuples, or septuples including a frame number, a list of the coordinates of the pixels that compose the host region in that frame, and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a list of triple, quadruple, quintuple, or septuple including a frame number, an array, image, or other data structure capturing the shape and location of the host region in that frame, and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a starting frame and, for each frame of the host region's duration, a tuple, triple, quadruple, or quintuple including list of the coordinates—e.g., a series of (x, y) coordinate pairs or (x, y, z) coordinate triples—that bound the host region in that frame, and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a starting frame and, for each frame of the host region's duration, a tuple, triple, quadruple, or quintuple including a list of the coordinates of the pixels that compose the host region in that frame and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a starting frame and, for each frame of the host region's duration, a tuple, triple, quadruple, or quintuple including an array, image, or other data structure capturing the shape and location of the host region in that frame and the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data includes a starting frame, a starting set of coordinates, and, for each frame of the host region's duration other than the first, a homography matrix that describes the necessary transformation between the starting set of coordinates and the coordinates that bound the host region in that frame.
- the host region defining data is a starting time or frame, an ending time or frame, a single list of coordinates (e.g., sequence of (x, y) coordinate pairs or (x, y, z) coordinate triples) bounding the host region, and a list, one for each frame, of all the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- coordinates e.g., sequence of (x, y) coordinate pairs or (x, y, z) coordinate triples
- the host region defining data is a starting time or frame, an ending time or frame, single list of the positions of the pixels comprising the host region and a list, one for each frame, of all the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- the host region defining data is a starting time or frame, an ending time or frame, a single array, image, or other data structure capturing the shape and location of the host region in that frame, and a list, one for each frame, of all the homography matrices or separate or merged transformation objects that can be applied to transform the source digital content in that frame.
- host region defining data along with, potentially, metadata about the host region or the target digital content, is collected in one or more data structures or object specific to the host region (“host region object”).
- the target digital content may include more than one host region objects.
- One or more host regions in the target digital content can be a host region defining data, providing a lightweight summary of the host region defining data and/or its metadata.
- one lightweight version of the host region object merely includes the duration of the host region (in frames), its dimensions, and/or its total number of pixels.
- the host region object is a lightweight data-interchange format, including but not limited to JavaScript Object Notation (“JSON”) or Extensible Markup Language (“XML”).
- JSON JavaScript Object Notation
- XML Extensible Markup Language
- those host region's host region defining data or host region objects may be merged into “frame objects”, with each frame object containing the host region defining data or host region objects for all of the host regions that occupy that frame.
- those host region's host region defining data or host region objects may be merged into “scene objects”, with each scene object containing the host region defining data or host region objects for all of the host regions that occupy that scene.
- frame objects may be regrouped into scene objects, where each scene object contains the frame objects for one or more of the frames comprising that scene.
- all of a video's scene objects are grouped together inside one “content object” that covers all of the scenes for the piece of target digital content.
- all of a video's frame objects are grouped together inside one “content object” that covers all of the frames for the piece of target digital content.
- the host region objects, frame objects, scene objects, or content objects are lightweight data-interchange formats, including but not limited to JSON or XML which contain, in a list, their respective subcomponent objects.
- statistics regarding the host region are calculated and added to its metadata or the host region object in order to facilitate the pairing of the host region with source digital content for placement.
- the level of occlusion of the host region across the frames it occupies is calculated and added to its metadata or the host region object.
- the level of occlusion of the host region across the frames it occupies is calculated by counting the number of instances of the binary value representing occlusion in the foreground masks that constitute the occlusion objects, with the occlusion score reflecting those instances as a percentage of the total elements of the masks across all the frames.
- the level of occlusion of the host region across the frames it occupies is calculated and used to generate an “occlusion score” that is added to as metadata to the host region object.
- the occlusion score is weighted to favor levels of occlusion that lie between 0% occlusion and 100% occlusion, with the lowest scores lying at each end of the spectrum and the highest score lying somewhere between those points (e.g., the scale has a normal or skewed parabolic shape).
- the advantage here is assigning the highest score to host regions with enough occlusion that the placement draws attention (since occlusion often indicates that the action, and thus the interesting part of the content, occurs in or around the host region), but not so much occlusion so as to keep the placement from being noticed.
- the level of visual salience of the host region across the frames it occupies is calculated and added as metadata to the host region object.
- the level of visual salience of the host region across the frames it occupies is calculated and used to generate a “visual salience score” that is added as metadata to the host region object.
- the visual salience score of the host region is calculated by comparing the pixels contained in the host region to some baseline data or heat map that captures the level of salience of pixels based on their location in a frame and has been derived from pre-existing data about the portions of a frame with the highest visual salience.
- the target digital content's metadata related to subject matter, objects, people, or locations is used to select the most appropriate baseline heat map to use for the calculation of the visual salience score.
- target digital content that has metadata suggesting its subject matter is football may be paired with a baseline heat map that is specific to football content.
- the target digital content's metadata related to egomotion is used to select the most appropriate baseline heat map to use for the calculation of the visual salience score. For example, target digital content whose egomotion metadata suggests that its subject matter is tennis may be paired with a baseline heat map that is specific to tennis content.
- the level of visual salience is based on some baseline understanding of salience that is derived from real-time or near real-time data tracking they eyeball, head, or body motion of a particular user.
- the highest visual salience score lies between high salience and low salience, with the lowest scores lying at either end of the spectrum (e.g., the scale has a parabolic shape). This is because low visual salience means the host region will not be seen and high visual salience means the host region is likely too close to the focal point of the target digital content and will distract or annoy viewers.
- an identified host region may be excluded from further consideration because it fails to satisfy some size or duration threshold or is otherwise deemed unable to favorably host source digital content.
- an identified host region's host region defining data is compared to a set of standard host region sizes and durations (“standard host region dimensions”) (e.g., a 200 pixel by 100-pixel rectangle lasting for 100 frames of target digital content) and the host region is associated with the one whose dimensions are nearest to its host region defining data.
- standard host region dimensions e.g., a 200 pixel by 100-pixel rectangle lasting for 100 frames of target digital content
- Associating identified host regions with standard host regions in this way makes it more efficient for collaborators or advertisers (collectively, “interested parties”) to request, buy, bid on, and prepare source digital content to for placement on host regions reliably and at scale, and for marketplaces, auctions, or exchanges (collectively, “marketplaces”) to efficiently offer identified host regions for order, sale, or bidding.
- the host region defining data is compared to any number of preset, standard host region dimensions and associated with the one whose dimensions are nearest to, but never larger than, its own. This ensures that interested parties are never delivered a smaller host region than they have requested, ordered, bought, or paid for.
- the host region defining data is compared to any number of preset, standard host region dimensions and it is determined that its dimensions are below those of the smallest standard, then that host region is exempted from future steps in the process and is not paired with source digital content or offered to interested parties.
- the standard host region's dimensions, identification code, or other identifying information is added to the host metadata, and/or host region object.
- the standard host region dimensions, identification code, or other identifying information is added to the host metadata, and/or the host region object.
- the fact that a host region has been associated with a standard host region is captured as message, data structure or object that can be transmitted in a lightweight fashion to the source digital content selection module 118 .
- a standard host region (or an indication of one) is delivered to the source digital content selection module 118 , once the host region is paired with source digital content, the placement conforms to the original host region's host region defining data (e.g., dimensions).
- a standard host region (or an indication of one) is delivered to the source digital content selection module 118 , once the host region is paired with source digital content, the placement conforms to the standard host region's host region defining data (e.g., dimensions).
- a placement of 100 pixels across 10 frames would cost 1000 ⁇ the per-pixel per-frame price.
- the order is filled by placing the buyer's source digital content (e.g., an advertiser's ad or “creative”) into newly identified host regions have been associated with that standard, with the placement reflecting the dimensions of the standard host region. For example, if a newly identified 500-pixel ⁇ 1000-pixel rectangular host region lasting 20 seconds is associated with a standard 300-pixel ⁇ 600-pixel rectangular host region lasting 15 seconds and sold to a buyer who has placed an order for an instance of that standard host region, the resulting placement of the source digital content on the target digital content will represent a 300-pixel ⁇ 600-pixel rectangle that lasts 15 seconds.
- the buyer's source digital content e.g., an advertiser's ad or “creative”
- the order is filled by placing the buyer's source digital content (e.g., an advertiser's ad or “creative”) into newly identified host regions that have been associated with that standard, with the placement reflecting the dimensions of the identified host region. For example, if a newly identified 500-pixel ⁇ 1000-pixel rectangular host region lasting 20 seconds is associated with a standard 300-pixel ⁇ 600-pixel rectangular host region lasting 15 seconds and sold to a buyer who has placed an order for an instance of that standard host region, the resulting placement of the source digital content on the target digital content will represent a 500-pixel ⁇ 1000-pixel rectangle lasting 20 seconds.
- the buyer's source digital content e.g., an advertiser's ad or “creative”
- the host region is parsed and associated with two or more standard host regions.
- a storage module 116 is configured to store the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content.
- the storage module 116 can be part of the distribution module 114 . In other embodiments, the storage module 116 can be part of the host region identification module 110 or can be co-located with the host region identification module 110 .
- the storage module 116 can receive an indication that there has been a request for the target digital content to be viewed and, in response to the request, transmit, to a content integration module 120 , a message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content.
- the storage module 116 is configured to receive an indication that there has been a request for the target digital content to be viewed and, in response to the indication, transmit to the content integration module 120 a message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content.
- the content integration module 120 upon receiving the message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content from the storage module 116 , the content integration module 120 is configured to display the received information.
- the content integration module 120 can include a client-side web browser or other client-side digital content viewing application.
- the content integration module 120 receiving the message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content from the storage module 116 is a client-side web browser or other client-side digital content viewing application
- the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content is delivered as part of the web page source code that is delivered to the client-side web browser or other client-side digital content viewing application in response to its requests to view the target digital content (or the web page where it resides).
- the content integration module 120 receiving the message that includes the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content from the storage module 116 is a client-side web browser or other client-side digital content viewing application
- the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content is delivered as part of the web page source code that is delivered to the client-side web browser or other client-side digital content viewing application in response to it when it initially requests to view the target digital content (or the web page where it resides)
- the web page source code additionally includes a program, instructions, or code (e.g., a “plug-in”) that directs the client-side web browser or other client-side digital content viewing application to send, to the source digital content selection module 118 , a request for selection of source digital content that includes the host region defining data, transformation objects, host region object, and/or data about the particular impression or about the particular viewer of the target digital content
- the content integration module 120 is co-located with or part of the distribution module 114 .
- a procedure, function, process, application, computer, or device that is sitting in the network and is dedicated to the selection of source digital content to be placed upon the host region receives the host region defining data, transformation objects, host region object, the target digital content, metadata about the target digital content, and/or impression data regarding one or more requested or anticipated views of the target digital content.
- a procedure, function, process, application, computer, or device that is sitting in the network receives from the source digital content selection module 118 a selection message containing the source digital content that it has selected to integrate with the target digital content for one or more impressions and/or data about that source digital content or its selection.
- the source digital content selection module 118 is a database that stores the host region defining data, transformation objects, host region object, the target digital content, and/or metadata about the target digital content and makes them searchable and selectable to users, who may select one or more host regions and the source digital content to integrate into them for one or more impressions, returning the selections in a selection message.
- the source digital content selection module 118 is a marketplace or exchange where the source digital content is selected based on a ordering, bidding, or purchasing process, with the returned selection message including other source digital content (e.g., where the marketplace is an advertising marketplace, the bidder or purchaser's advertisement) that the winner or purchases wishes to integrate into the target digital content for one or more impressions
- the source digital content selection module 118 is a marketplace or exchange where the source digital content is selected based on an ordering, bidding, or purchasing process
- the ordering, bidding, or purchase and thus the selection, is automated and made based on inputs such as one or more parties' bid or offer price for one or more impressions of their provided source digital content integrated into host regions of certain dimensions, host regions satisfying a certain standard host region dimension, or host regions whose metadata or whose target digital content metadata satisfies particular preset criteria and/or is compatible with provided source digital content data.
- the source digital content selection module 118 is a marketplace or exchange where the source digital content is selected based on an ordering, bidding, or purchasing process, with that process being automated, the orderer, bidder, or purchaser has an opportunity to consent to a suggested selection that has been automatically made for them before it proceeds to other steps in the content integration process.
- the source digital content selection module 118 is a marketplace where interested parties may purchase or bid on host regions that have been pre-standardized around a finite set of standards and/or segregated according to dimensions, duration, shape, or level of occlusion or visual salience, pre-standardized based on a finite set of standards, possibly without seeing the target digital content and solely based on indications that inventory that conforms with standard host regions exists, or possibly based on additional metadata about the content such as subject matter or publisher.
- the source digital content selection module 118 uses the target digital content's metadata to pair a host region of a target digital content with source digital content to be placed upon it.
- the source digital content selection module 118 uses the source digital content's metadata to pair a host region with source digital content to be placed upon it.
- the source digital content's metadata (“source digital content metadata”) can indicate, for example, the source digital content's duration, pixel value histogram, mean or average pixel values, audio transcription and/or text, optical character recognition-derived text, creator/publisher (e.g., name, audience size, history of source digital content placements, past target digital content subject matter, and preferred advertisers), subject matter, setting, or the objects, people, textures, materials, locations, or activities that it depicts, or the brand, advertiser, or product represented.
- the target digital content's metadata is generated or supplemented by a process that is run in parallel, at the host region identification module 110 or any other servers, procedures, functions, processes, applications, computers or devices sitting in the communication network, to host region identification and which searches the target digital content or its components for objects, textures, materials, shapes, people, places, spaces, areas, locations, settings, or activities.
- host region identification relies on comparing the available attributes of the target digital content or its components to the available attributes of pre-constructed object-, texture-, material-, shape-, place-, space-, or area-specific templates and, where there is sufficient similarity, making a determination that the objects, textures, materials, shapes, places, spaces, or areas are or are not represented in the target digital content or its components, a determination that those objects, textures, materials, shapes, places, spaces, or areas are represented in the target digital content or its components is logged as host region defining data.
- the source digital content selection module 118 compares the target digital content's metadata with the source digital content's metadata, in order to make a determination about whether or not to pair the host region of the target digital content and the source digital content.
- the source digital content selection module 118 compares the target digital content's metadata with the source digital content's metadata with the existence of identical, similar, or compatible metadata used to accrue a score that is used to select or rank the source digital content that represents the best pairing with the host region.
- the source digital content selection module 118 transmits, to a content integration module 120 and/or one or more servers, procedures, functions, processes, applications, computers or devices sitting in the communication network, one or more pairings of the host region and source digital content to place upon it, along with, potentially, the target digital content, the host region objects, and/or the source digital content.
- the source digital content selection module 118 receives, from one or more servers, procedures, functions, processes, applications, computers or devices sitting in the communication network, a selection message over the communication network, where the selection message represents the user's approval or ranking of one or more pairings of the host region and source digital content to place upon it.
- the creators, owners, or publishers of the source digital content e.g., advertisers
- the results of the comparison source digital content metadata that creators, owners, or publishers have appended to source digital content's metadata is used to accrue or decrement a score that is used to select or rank the source digital content that represents the best pairing with the host region.
- Metadata can be parsed according to the arrival of new objects, people, or subject matter in the frame at different times.
- a new object or person e.g. a celebrity
- a different type of digital content e.g., an ad related to that celebrity
- digital content creators associate target digital content and any potential host regions inside it with metadata that indicates what advertisers or collaborators are acceptable or favorable to them and, therefore, which source digital content may be placed up on their target digital content.
- a host region is identified and this metadata matches or is compatible with the metadata of a particular piece of source digital content (e.g., it is a brand that exists on in the metadata's list of favorable brands) this is deemed a favorable placement.
- that source digital content may be downgraded or bypassed. In this way, creators of content can select the brands they would like to work with.
- the creator, owner, or publisher of the target digital content selects those advertisers or collaborators whose source digital content they will or will not allow to be placed on their target digital content, with those selections appended to one or more pieces of existing or future target digital content as metadata.
- one or more servers, procedures, functions, processes, applications, computers or devices sitting in the communication network (“content integration module”) may receive the file representing the source digital content or the individual frames which include it and, either immediately or upon request, implement or display the favorable placement of the source digital content within the original content.
- certain steps in addition to the application of transformation objects, may be taken to transform the source digital content in order to improve its integration into the target digital content.
- the transformation can be performed in any one or more of the modules described in FIG. 1 .
- the source digital content is automatically resized.
- the source digital content is automatically centered inside the host region in order to occupy it in an aesthetically pleasing manner.
- the surface, texture, material, plane, object, place, space, location, or area associated with the host region has been identified as representing or possessing a particular shape, plane, contour, geometry, or object either (i) automatically adjusting the source digital content to match, cover, attach to, sit atop, surround, interact with, or otherwise engage the shape or object, by selecting, from a library generated alongside the source digital content, an iteration of the source digital content whose particular pose or shape matches that particular shape, plane, contour, geometry, or object.
- the source digital content asset or model is automatically manipulated to match, cover, attach to, sit atop, or otherwise engage that shape, plane, contour, geometry, or object.
- all the pixels that lie between the periphery or border of the source digital content and its edges or textures nearest to that periphery or border are automatically converted into white, translucent, empty, or other colors. This can make the placement feel more immersed without reducing the integrity of the source digital content.
- the source digital content is automatically altered such that it fades in and fades out periodically, changes color periodically, raises or lowers luminance levels periodically, appears to sparkle or glimmer, or otherwise alters periodically in order to garner more attention after its placement.
- the average or median pixel values of the source digital content are compared to the average or median pixel values of the host region and, if they are sufficiently similar, they can be automatically decreased, increased, or otherwise changed from their original values in order to ensure they stand out against the region and/or are not camouflaged when placed.
- any pixel values of the source digital content when any pixel values of the source digital content are sufficiently close to the colors of the host region, they can be inverted (e.g., turned from black to white or low to the opposite); or shifted a number of luminance or color space values away from those of the host region to achieve this.
- different versions of the source digital content are supplied and the one with farthest difference from the pixel values of the surface, texture, material, plane, object, place, space, location, or area associated with the host region is selected for placement. In this way, it can be ensured that the source digital content stands out as much as possible when placed into the target digital content, yet still represents a pre-approved embodiment.
- the method of texture handling involves making the source digital content or its placement more transparent by converting it to RGBA or another color space with transparency as a channel and then raising the transparency level of that channel. This allows the texture of the host region to be visible underneath or through the source digital content in the placement.
- the content integration module 120 is configured to integrate a source digital content into a host region in the target digital content.
- the content integration module 120 is configured to accept, as input, the target digital content, the source digital content, data or output (e.g., metadata) that defines the dimension and/or the location of the host region in the target digital content (as well as, if appropriate, metadata surrounding the host region), and/or transformation objects that define the transformations that can be operated on the source digital content prior to the integration into the host region in the target digital content.
- the content integration module 120 is configured to implement the integration by altering and/or creating, re-encoding, or saving a new version of the target digital content with the pixel values inside the host region portion permanently replaced with the pixel values of the source digital content, possibly after their transformation by the various transformation objects. This operation is sometimes referred to as versioning.
- the content integration module 120 is configured to overlay the source digital content over the target digital content during the display of the target digital content to a viewer and, in doing so, rely on the guidance provided by the data or output that defines the dimension and location of host regions in the target digital content (as well as, if appropriate, metadata surrounding the host region) and/or transformation objects that define the transformations for the target digital content to seamlessly integrate with the host region in the target digital content. This operation is sometimes referred to as overlaying.
- the integration of the source digital content into the target digital content is smoothed or blended using smoothing algorithms such as Gaussian blur, Poisson blending, or the algorithm described in Perez et al., Poisson Image Editing , SIGGRAPH (2003), herein incorporated by reference in the entirety, which allows the luminance and color space values of the source digital content and the target digital content to equalize while maximally preserving the edges or gradients of the source digital content.
- smoothing algorithms such as Gaussian blur, Poisson blending, or the algorithm described in Perez et al., Poisson Image Editing , SIGGRAPH (2003), herein incorporated by reference in the entirety, which allows the luminance and color space values of the source digital content and the target digital content to equalize while maximally preserving the edges or gradients of the source digital content.
- the new version of target digital content may be transmitted to a procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, or network for the purpose of immediate or future display, publishing, storage, or sharing, or any other permanent or temporary use.
- the frames are streamed on a frame-by-frame basis rather than as one file.
- the content integration module 120 includes a procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, which allows for the sharing or publishing, whether automated or manual, of the new version of the target digital content to various media websites and social networks.
- the new version of target digital content is streamed directly from the content integration module 120 to a browser or a viewing application.
- new versions of the frames are streamed directly from the content integration module 120 to a browser or a viewing application on a frame-by-frame basis rather than as one file.
- the content integration module 120 from which the new version of the target digital content is also streamed is co-located with the host region identification module 110 .
- the new version of the target digital content may be streamed or transmitted to the same browser, application, or client from which where the target digital content was initially transmitted in order to begin the process of placing source digital content upon it (e.g., a self-service application aimed at letting users upload target digital content and insert new digital content into it).
- a self-service application aimed at letting users upload target digital content and insert new digital content into it.
- the content integration module 120 is a server belonging to the digital content website or social network to which the target digital content was uploaded to begin with and, after integration, the new version of the target digital content is streamed to the browser of a viewer. This has the advantage of emulating their streaming process for other digital content.
- One appealing method of placement involves creating an overlay, of the host region or of the entire frame, that is transparent in any areas meant to be unaffected by the integration of the source digital content and which, when played between the viewer and the target digital content, is in synchronization with the target digital content, gives the appearance that the source digital content (possibly after transformation by various transformation objects) has been embedded in or is part of the target digital content.
- the content integration module 120 that executes this overlay is inside the digital content viewing application (e.g., the browser, monitor, television, or Virtual Reality (VR) goggles), on the client (e.g., viewer) side, and takes as its inputs the host region defining data, host region object, and/or transformation objects, as well as the source digital content.
- the digital content viewing application e.g., the browser, monitor, television, or Virtual Reality (VR) goggles
- VR Virtual Reality
- the content integration inputs e.g., the host region data, host region objects, transformation objects, and the source digital content
- requests e.g., by Asynchronous JavaScript And XML or “AJAX” calls
- external web servers such as the source digital content selection module 118 , and, potentially, the host region identification module 110 , storage module 116 , and/or distribution module 114 , and the protocol or code for integrating the content by (1) making such requests; (2) using the inputs, once received, to transform the source digital content and create the overlay; and (3) synchronizing the overlay with the target digital content all being delivered to the browser or other digital content viewing application inside the source code of the target digital content, of the web page that envelopes it, or of the digital media site or social network hosting it, or inside a plug-in (e.g., Javascript plug-in) associated with.
- a plug-in e.g., Javascript plug-in
- a buffering mechanism is used to collect transformation objects from an external server in batches, using each batch to transform the source digital content and play it in synch with the target digital content, while also continuing to request and load new batches of transformation objects.
- the content integration module 120 that executes this overlay is inside the browser or other digital content viewing application
- an external request to stream that target digital content is made to a location other than the viewing application or browser (e.g., the same server or network that hosts the host region identification), whereupon that procedure, function, thread, process, application, memory, cache, disk or other storage, database, computer, device, or network gathers the host region data and/or transformation objects as well as the source digital content, possibly by making a request to external web servers, and then transmits or streams both the raw digital content along with the overlay, as separate objects, back to the viewing application or browser where it has been requested, possibly on a frame-by-frame basis, where they are displayed to the viewer as an overlay, that overlay being delivered by a protocol contained in the source code of the target digital content, the web page that envelopes it, or the digital media site or social network hosting it, and/or as a plug-in (e.g.
- placements of source digital content on host regions may be merged according to frames, scenes, or file.
- different placements may be merged in the same overlay prior to display and different placements may result in different version of the target digital content being created prior to display or storage.
- the content integration module 120 is configured to execute the integration of the source digital content and the target digital content.
- the content integration module 120 is configured to execute the integration of the blend the source digital content and into the target digital content by overlaying the target digital content with a mostly-transparent source digital content overlay (e.g., video overlay such as an HTML Inline Frame Element or “iframe”), played at the same speed and in sync with the target digital content.
- a mostly-transparent source digital content overlay e.g., video overlay such as an HTML Inline Frame Element or “iframe”
- the content integration module 120 exists in the web browser or other client-side digital content viewing application and, driven by the instructions or code in the source code of the target digital content or its web page or the plug-in, executes the integration of the blend the source digital content and into the target digital content by creating an overlay using the host region data or transformation objects, which are contained in the code of the target source digital content or its web page or requested from an intermediate storage module 116 point, in conjunction with the source digital content, which it can obtain via a request to a source digital content selection module 118 , to create the overlay.
- This is appealing because it makes the placement undetectable to the viewer. The appeal of this system is that it involves relatively lightweight transfers of data and is thus scalable.
- the source digital content is associated with a web page or other information and either the entire overlay or any part of the overlay that is not transparent (e.g., the host region or the source digital content) is made clickable such that, if clicked, the browser is immediately directed to that web page, information, or destination.
- the source digital content is associated with a web page or other information and either the entire overlay or any part of the overlay that is not transparent (e.g., the host region or the source digital content) is made clickable such that, if clicked, it adds the web page, information, or destination to a list or “shopping cart” that is presented to the viewer after the end of the source digital content.
- the source digital content is associated with a web page or other information and either the entire overlay or any part of the overlay that is not transparent (e.g., the host region or the source digital content) is made hoverable such that, if hovered over by, for example, a mouse or other pointer, a web page, information, or a destination is revealed (e.g., in a “pop-up” window or bubble) to the viewer.
- host region objects including the occlusion, luminance, texture, blur or merged transformation objects
- host region objects which have been stored in the browser, are applied to transform the source digital content that is selected and used to create the overlay, all inside the browser, which is then played atop and in sync with the target digital content, creating the illusion that the source digital content is placed inside the target digital content.
- the source digital content is transferred to the content integration module 120 for placing the source digital content into the target digital content by replacing the pixel values in the part of the target digital content comprising the host region with the non-transparent pixel values in the source digital content, and then streaming them to the viewer's browser.
- This configuration is appealing in the advertising use case because the resulting digital content it is not an advertisement that would be blockable by commercial ad block software such as AdBlock, which operate in the client browser.
- the content integration module 120 can be configured to remove a visible boundary between the source digital content and the target digital content by blending the source digital content and the target digital content using a Poisson image blending technique and/or a similar gradient-based blending technique.
- the content integration module 120 can be configured to modify the shading and/or the lighting of the source digital content to match the shading and/or the lighting of the target digital content.
- the content integration module 120 is configured to estimate a shading and/or a lighting of the target digital content, re-render the source digital content to match the estimated shading and/or the estimated lighting of the target digital content, and blend the re-rendered source digital content into the target digital content.
- the source digital content selection module 118 is also the streaming location point that both selects the source digital content and places it into the target digital content, possibly by replacing the pixel values in the part of the target digital content comprising the host region with the non-transparent pixel values in the source digital content and then streaming them to the viewer's browser (“combined source digital content selection and streaming point”). This is appealing because it is not blockable by commercial ad block software such as AdBlock.
- the source digital content selection module 118 is a marketplace where interested parties (e.g. advertisers) can, based on the impression data and/or host region data, buy or bid to have their source digital content (e.g., ad) placed on the host region for one or more impressions of the target digital content.
- interested parties e.g. advertisers
- their source digital content e.g., ad
- the source digital content selection module 118 is a marketplace where interested parties purchase or bid on host regions whose host region data, along with their metadata, is stored in a database, with selections being made based on collaborative filtering and other machine learning techniques, when the input is the interested parties' metadata or campaign goals.
- the source digital content selection module 118 can be configured to receive an approval of a suggested selection.
- the approval can be provided by, for example, a user.
- the source digital content selection module 118 is a marketplace where interested parties can purchase or bid on host regions without seeing the target digital content and solely based on indications that inventory that conforms with standard host regions exists.
- the source digital content selection module 118 is a marketplace where interested parties can purchase or bid on host regions by viewing the original digital in full with the host region identified or by viewing the part of the target digital content that contains the host region.
- the source digital content selection module 118 is a marketplace where interested parties may purchase or bid on host regions that have been pre-standardized around a finite set of standards and/or segregated according to dimensions, duration, shape, or level of occlusion or visual salience, pre-standardized based on a finite set of standards, possibly without seeing the target digital content and solely based on indications that inventory that conforms with standard host regions exists, or possibly based additional on metadata about the content such as subject matter or publisher.
- FIG. 24 illustrates a system in which the source digital content is integrated into the target digital content using an overlay method in accordance with some embodiments.
- FIG. 24 illustrates an embodiment that uses non-marker-based computerized methods to identify host regions in a two-dimensional target digital content (e.g., a video).
- a two-dimensional target digital content e.g., a video
- Step 2401 represents the transmission, after or while the target digital content is created, of the target digital content or its components from the its captured frames from their target digital content source, which also contain the host region approval module 112 , to a distribution module 114 (e.g., a media website or a social network website dedicated to serving digital content), which may operate as or is co-located with storage module 116 .
- a distribution module 114 e.g., a media website or a social network website dedicated to serving digital content
- the distribution module 114 transmits the target digital content to a host region identification module 110 , which may operate as or is co-located with the scene recognition module 106 and the camera motion classification module 108 .
- step 2403 the host region identification module 110 returns one or more instances of host region data and transformation objects to the distribution module 114 .
- a notification of the identification of host regions or one or more instances of host region data are sent by the distribution module 114 to the target digital content source, which also contain the host region approval module 112 .
- step 2405 a notification of approval or customizations of instances of host region data are sent from the target digital content source, which also contain the host region approval module 112 , to the distribution module 114 .
- step 2406 the distribution module 114 transmits the notification of approval or customizations of instances of host region data to the host region identification module 110 for the preparation of the transformation object.
- step 2407 the host region identification module 110 sends the transformation object to the distribution module 114 , where it is integrated into the source code for the web page or application in which the target digital content is viewed and transmitted to storage module 116 , which stores it.
- a viewing application such as a browser, which also acts as the content integration module 120 , issues, to the distribution module 114 , a request for the source code of the web page or application in which the target digital content is viewed.
- step 2409 the source code of the web page or application in which the target digital content is viewed, including the host region data and transformation objects, as well as instructions to the source digital content selection module 118 , are delivered from the distribution module 114 to the viewing application such as a browser.
- the viewing application such as a browser transmits the host region and impression data to the source digital content selection module 118 .
- the source digital content selection module 118 transmits selected source digital content to the viewing application such as a browser, which also acts as the content integration module 120 .
- a viewing application such as a browser issues a request, which also acts as the content integration module 120 , to the distribution module 114 for the target digital content.
- the target digital content is delivered from the distribution module 114 to the content integration module 120 , which is also a viewing application such as a browser.
- the content integration module 120 which is also a viewing application such as a browser, integrates the source digital content into the target digital content by applying the transformation objects to the source digital content and then displaying it as an overlay on top of the target digital content.
- FIG. 25 illustrates a system in which the source digital content is integrated into the target digital content using an overlay method in accordance with some embodiments.
- FIG. 24 illustrates an embodiment that uses non-marker-based computerized methods to identify host regions in a two-dimensional target digital content (e.g., a video).
- a two-dimensional target digital content e.g., a video
- Step 2501 represents the transmission, after or while the target digital content is created, of the target digital content or its components from their target digital content source, which also contain the host region approval module 112 , to a distribution module 114 (e.g., a media website or social network dedicated to serving digital content).
- a distribution module 114 e.g., a media website or social network dedicated to serving digital content.
- the distribution module 114 transmits the target digital content to a host region identification module 110 , which may operate as or is co-located with the scene recognition module 106 , the camera motion classification module 108 , and the storage module 116 .
- step 2503 the host region identification module 110 returns one or more instances of host region data to the distribution module 114 .
- step 2504 a notification of the identification of host regions or one or more instances of host region data are sent by the distribution module 114 to the target digital content source, which also contain the host region approval module 112 .
- step 2505 a notification of approval or customizations of instances of host region data are sent from the target digital content source, which also contain the host region approval module 112 , to distribution module 114 .
- the distribution module 114 transmits the notification of approval or customizations of instances of host region data to the host region identification module 110 , which prepares the transformation objects and transmits them and the host region data to the storage module 116 , which stores them.
- a viewing application such as a browser, which also acts as the content integration module 120 , issues a request to the distribution module 114 , for the source code of the web page or application in which the target digital content is viewed.
- step 2508 the web page source code, along with the host region data and the instructions or data for calling the host region identification module 110 and the source digital content selection module 118 , is delivered from the distribution module 114 to the content integration module 120 , which is also a viewing application such as a browser.
- a viewing application such as a browser, which also acts as the content integration module 120 , issues a request for the host region data to the host region identification module 110 .
- the host region identification module 110 returns the host region data, which it has obtained from the storage module 116 , to the content integration module 120 , which is also a viewing application such as a browser.
- a viewing application such as a browser issues a request for the source digital content, along with data about the host region and the impression, to the source digital content selection module 118 .
- step 2512 the source digital content selection module 118 transmits the selected source digital content to the viewing application such as a browser.
- a viewing application such as a browser, which also acts as the content integration module 120 , issues a request to the distribution module 114 for the target digital content.
- the target digital content is delivered from the distribution module 114 to the content integration module 120 , which is also the viewing application such as a browser.
- the content integration module 120 which is also the viewing application such as a browser, integrates the source digital content into the target digital content by applying the transformation objects to the source digital content and then displaying it as an overlay on top of the target digital content.
- FIG. 26 illustrates a system in which the source digital content is integrated into the target digital content using a versioning method in accordance with some embodiments.
- Step 2601 represents the transmission, after or while the target digital content is created, of the target digital content or its components from their target digital content source which also contain the host region approval module 112 , to a distribution module 114 (e.g., a media website or social network dedicated to serving digital content).
- a distribution module 114 e.g., a media website or social network dedicated to serving digital content.
- the distribution module 114 transmits the target digital content to a host region identification module 110 , which may operate as or is co-located with scene recognition module 106 , the camera motion classification module 108 , the storage module 116 , and the content integration module 120 .
- step 2603 the host region identification module 110 returns one or more instances of host region data to the distribution module 114 .
- step 2604 a notification of the identification of host regions or one or more instances of host region data are sent by the distribution module 114 to the target digital content source, which also contain the host region approval module 112 .
- step 2605 a notification of approval or customizations of instances of host region data are sent from the target digital content source, which also contain the host region approval module 112 , to the distribution module 114 .
- step 2606 a notification of approval or customizations of instances of host region data are relayed from the distribution module 114 to the host region identification module 110 , which transmits them to the storage module 116 , which stores them.
- a viewing application such as a browser issues a request to the distribution module 114 for the source code of the web page or application in which the target digital content is viewed.
- step 2608 the web page source code, along with (1) the host region data, (2) instructions to request to send impression data to the host region identification module 110 and/or (3) instructions to route requests for the target digital content to the host region identification module 110 , is delivered from the distribution module 114 to the viewing application such as a browser.
- step 2609 the viewing application such as a browser transmits the impression data to the host region identification module 110 .
- step 2610 the host region identification module 110 transmits the host region and impression data to the source digital content selection module 118 .
- the source digital content selection module 118 transmits selected source digital content to the host region identification module 110 , which also acts as the content integration module 120 .
- step 2612 the host region identification module 110 , which also acts as the content integration module 120 , integrates the source digital content into the target digital content by creating a new version of it.
- a viewing application such as a browser issues a request to the host region identification module 110 , for the target digital content.
- the (new version of the) target digital content is delivered from the distribution module 114 to the viewing application such as a browser.
- step 2614 the viewing application such as a browser displays the new version of the target digital content.
- FIG. 27 illustrates outline an embodiment where content integration is implemented using a versioning method in the content integration module 120 in accordance with some embodiments.
- the content integration module 120 is co-located with the distribution module 114 .
- Step 2701 represents the transmission, after or while the target digital content is created, of the target digital content or its components from their target digital content source, which also contain the host region approval module 112 , to a distribution module 114 (e.g., a media website or social network dedicated to serving digital content), which also acts as the storage module 116 and content integration module 120 .
- a distribution module 114 e.g., a media website or social network dedicated to serving digital content
- the distribution module 114 transmits the target digital content to the host region identification module 110 , which may operate as or is co-located with the scene recognition module 106 , and the camera motion classification module 108 .
- step 2703 the host region identification module 110 returns one or more instances of host region data to the distribution module 114 .
- a notification of the identification of host regions or one or more instances of host region data are sent by the distribution module 114 to the target digital content source, which also contain the host region approval module 112 .
- step 2705 a notification of approval or customizations of instances of host region data are sent from the target digital content source, which also contain the host region approval module 112 , to distribution module 114 .
- step 2706 a notification of approval or customizations of instances of host region data are relayed from the distribution module 114 to the host region identification module 110 , which creates the transformation objects.
- the host region identification module 110 transmits the transformation objects the distribution module 114 , which transmits them, along with the host region data, to the storage module 116 , which stores them.
- a viewing application such as a browser issues a request to the distribution module 114 for the source code of the web page or application in which the target digital content is viewed.
- step 2709 the web page source code is delivered from the distribution module 114 to the viewing application such as a browser along with instructions to return the impression data to distribution model 114 .
- step 2710 the viewing application such as a browser forwards the impression data to the distribution model 114 .
- step 2711 the distribution module 114 transmits the impression data and the host region data, which it retrieves from the storage module 116 , to a source digital content selection module 118 .
- the source digital content selection module 118 transmits selected source digital content to the distribution module 114 , which also acts as the content integration module 120 .
- step 2713 the distribution module 114 , which also acts as the content integration module 120 , integrates the source digital content into the target digital content by creating a new version of it.
- a viewing application such as a browser issues a request to the distribution module 114 for the target digital content.
- step 2715 the target digital content is delivered from the distribution module 114 to the viewing application such as a browser.
- step 2716 the viewing application such as a browser displays the new version of the digital content.
- the software needed for implementing a process or a database includes a high level procedural or an object-orientated language such as C, C++, C#, Java, or Perl.
- the software may also be implemented in assembly language if desired.
- the language can be a compiled or an interpreted language.
- Packet processing implemented in a host server includes any processing determined by the context. For example, packet processing may involve high-level data link control (HDLC) framing, header compression, and/or encryption.
- HDLC high-level data link control
- the software is stored on a storage medium or device such as read-only memory (ROM), programmable-read-only memory (PROM), electrically erasable programmable—read-only memory (EEPROM), flash memory, or a magnetic disk that is readable by a general or special purpose-processing unit to perform the processes described in this document.
- the processors can include any microprocessor (single or multiple core), system on chip (SoC), microcontroller, digital signal processor (DSP), graphics processing unit (GPU), or any other integrated circuit capable of processing instructions such as an x86 microprocessor.
- the server groups in the host server can each be a logical module running on a single server.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Business, Economics & Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Strategic Management (AREA)
- Multimedia (AREA)
- Game Theory and Decision Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Computing Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
-
- 1) Resizing the image to a uniform size and, if necessary, converting it to the RGB color space.
- 2) Processing the resized image using the global coarse scale neural network by:
- a) Inputting the resized image into the first input layer of the neural net, a 11×11 convolutional layer with a rectified linear unit (“ReLU”) activation function, a learning rate of 0.001, and a 2×2 pooling filter with a stride of 4 and max pooling, where the number of channels in the output is 96.
- b) Inputting the output of the previous layer through the second (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with a stride of 2 and max pooling, where the number of channels in the output is 256.
- c) Inputting the output of the previous layer into the third (hidden) layer of the neural net, a 3×3 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with max pooling, where the number of channels in the output is 384.
- d) Inputting the output of the previous layer into the fourth (hidden) layer of the neural net, a 3×3 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with max pooling, where the number of channels in the output is 384.
- e) Inputting the output of the previous layer into the fifth hidden layer of the neural net, a 3×3 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with max pooling, where the number of channels in the output is 256.
- f) Inputting the output of the previous layer into a fully connected layer a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 4096.
- g) Inputting the output of the previous layer into a fully connected layer with a linear activation function, where the number of channels in the output is 64.
- h) Upsampling the output of the last layer by 4.
- 3) Inputting the output of the coarse scale network into a finer grained network that produces predictions at a mid-level resolution by:
- a) Resizing, if necessary, a frame of the target digital content to the same size image that the neural network has been trained on;
- b) Inputting this resized image into the first input layer of the neural net, a 9×9 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with a stride of 2 and max pooling, where the number of channels in the output is 96.
- c) Combining the output of the previous layer with the output of the coarse grained network by combining the channels of both outputs, resulting in a feature vector of dimensions with 160 channels.
- d) Inputting the combined feature vector into the second (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64.
- e) Inputting the previous input into the third (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64.
- f) Inputting the previous input into the fourth (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64.
- g) Inputting the previous output through the final (hidden) layer, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 2 (depth map+normal prediction);
- h) Upsampling the previous output is then upsampled to the ratio of ½ the size of the original input image.
- 4) Inputting the previous output into an even more fine-grained neural network that refines the predictions to higher resolution, by:
- a) Resizing, if necessary, the previous output to the same size image that the neural network has been trained on;
- b) Inputting this resized image into the first input layer of the neural net, a 9-pixel×9-pixel convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter with a stride of 2 and max pooling, where the number of channels in the output is 96.
- c) Combining the output of the previous network with the output of the fine grained network by combining the channels of both outputs, resulting in a feature vector of dimensions 97 channels (where one channel is the depth map outputted from the coarse grained network)
- d) Inputting the combined feature vector into the second (hidden) layer, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01, where the number of channels in the output is 64.
- e) Inputting the previous output into the third (hidden) layer, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64.
- f) Inputting the previous output into the final hidden layer, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 2 (depth map+normal).
-
- 1) In
step 602, the neural network model in thescene recognition module 106 is configured to identify the separate scenes in the target digital content,scene recognition module 106 is configured to, if necessary, resize one or more frames from an identified scene and then input them into the depth prediction neural network. - 2) In
step 604, the neural network model in thescene recognition module 106 is configured to use a threshold background integer to determine which portions of the depth map of the inputted frames constitutes the background and to load them onto a frame buffer or memory area. - 3) In
step 606, if necessary, the neural network model in thescene recognition module 106 is configured to resize one or more frames from another identified scene and then input them into the depth prediction neural network. - 4) In
step 608, the neural network model in thescene recognition module 106 is configured to use a threshold background integer to determine which portions of the depth map of these other inputted frames constitutes the background and to load them onto a frame buffer or memory area. - 5) In
step 610, the neural network model in thescene recognition module 106 is configured to perform memory operations to find the distance (e.g., the Euclidian distance) between the values of the frames' pixels. - 6) In
step 612, when the difference is sufficiently close to zero, the neural network model in thescene recognition module 106 is configured to deem the two scenes to represent duplicate space and camera position recognition.
- 1) In
-
- 1) Inputting the image into the first input layer of the neural net, a 11×11 convolutional layer with a ReLU activation function, a learning rate of 0.001, a stride of 0.5, and a 2×2 pooling filter with max pooling, where the number of channels in the output is 48.
- 2) Inputting the output of the previous layer into the second (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function, a learning rate of 0.001, and a 2×2 pooling filter layer with max pooling, where the number of channels in the output is 128.
- 3) Inputting the output of the previous layer into the third (hidden) layer of the neural net, a 3×3 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 192.
- 4) Inputting the output of the previous layer into the fourth (hidden) of the neural net, a 3×3 convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 192.
- 5) Inputting the output of the previous layer into the fifth (hidden) layer of the neural net, a 3×3 convolutional layer with a ReLU activation function, a learning rate of 0.001, a 2×2 pooling filter with max pooling, where the number of channels in the output is 128.
- 6) Inputting the output of the previous layer into a fully convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 2048.
- 7) Inputting the output of the previous layer into a (second) fully convolutional layer with a ReLU activation function and a learning rate of 0.001, where the number of channels in the output is 2048.
- 8) Inputting the output of the previous layer into a 1000-way softmax function which produces a distribution over the 1000 class labels, where the number of channels in the output is 1000.
- 9) Returning the output to the original size of the input image by either upsampling or downsampling using bilinear interpolation;
-
- 1) Resizing the input frame to 3 different scales: 1/√{square root over (2)}, 1, and √{square root over (2)};
- 2) Creating a new image by averaging the values of the three output images;
- 3) In step BB38, the
scene recognition module 106 is configured to convert, if necessary, the merged image to Lab color space; and - 4) Predicting a label for each pixel in the merged image by passing the output image to a conditional random field (CRF), which uses the following energy:
where ψi is the unary energy—e.g., the negative log of the aggregated softmax probabilities, ψij is the single pairwise term—with a Potts label compatibility term δ weighted by wp and unit Gaussian kernel k—that connects every pair of pixels in the image, and where the Lab values:
-
- (Ii L, Ii a, Ii b)
are used along with the position (px, py) are used as the features for each pixel, with d being the smaller image dimension:
- (Ii L, Ii a, Ii b)
-
- i) Texture recognition;
- ii) Object recognition;
- iii) Using a human facial recognition and detection algorithms such as OpenCV's Haar Wavelet-based face detection tool;
- iv) Deformable parts human segmentation algorithms, including but not limited to OpenCV's deformable parts model.
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/466,135 US10839573B2 (en) | 2016-03-22 | 2017-03-22 | Apparatus, systems, and methods for integrating digital media content into other digital media content |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201662311472P | 2016-03-22 | 2016-03-22 | |
US201662354053P | 2016-06-23 | 2016-06-23 | |
US201662419709P | 2016-11-09 | 2016-11-09 | |
US15/466,135 US10839573B2 (en) | 2016-03-22 | 2017-03-22 | Apparatus, systems, and methods for integrating digital media content into other digital media content |
Publications (2)
Publication Number | Publication Date |
---|---|
US20170278289A1 US20170278289A1 (en) | 2017-09-28 |
US10839573B2 true US10839573B2 (en) | 2020-11-17 |
Family
ID=58547808
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/466,135 Active 2037-05-16 US10839573B2 (en) | 2016-03-22 | 2017-03-22 | Apparatus, systems, and methods for integrating digital media content into other digital media content |
Country Status (3)
Country | Link |
---|---|
US (1) | US10839573B2 (en) |
EP (1) | EP3433816A1 (en) |
WO (1) | WO2017165538A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11049232B2 (en) * | 2017-02-10 | 2021-06-29 | Hangzhou Hikvision Digital Technology Co., Ltd. | Image fusion apparatus and image fusion method |
US20220051425A1 (en) * | 2019-04-30 | 2022-02-17 | Huawei Technologies Co., Ltd. | Scale-aware monocular localization and mapping |
US11294047B2 (en) * | 2019-12-23 | 2022-04-05 | Sensetime International Pte. Ltd. | Method, apparatus, and system for recognizing target object |
US20220172826A1 (en) * | 2020-11-30 | 2022-06-02 | Coreline Soft Co., Ltd. | Medical image reading assistant apparatus and method for adjusting threshold of diagnostic assistant information based on follow-up examination |
US11361448B2 (en) * | 2018-09-19 | 2022-06-14 | Canon Kabushiki Kaisha | Image processing apparatus, method of controlling image processing apparatus, and storage medium |
US11371835B2 (en) * | 2018-03-16 | 2022-06-28 | Nec Corporation | Object detection device, object detection system, object detection method, and non-transitory computer-readable medium storing program |
US11391840B2 (en) * | 2018-06-25 | 2022-07-19 | Ricoh Company, Ltd. | Distance-measuring apparatus, mobile object, distance-measuring method, and distance measuring system |
US11430132B1 (en) * | 2021-08-19 | 2022-08-30 | Unity Technologies Sf | Replacing moving objects with background information in a video scene |
WO2023283612A1 (en) * | 2021-07-08 | 2023-01-12 | Drake Alexander Technologies, Inc. | System and method for image-based parking determination using machine learning |
US11554324B2 (en) * | 2020-06-25 | 2023-01-17 | Sony Interactive Entertainment LLC | Selection of video template based on computer simulation metadata |
US11594258B2 (en) | 2021-07-19 | 2023-02-28 | Pes University | System for the automated, context sensitive, and non-intrusive insertion of consumer-adaptive content in video |
US11907838B2 (en) | 2020-05-22 | 2024-02-20 | Alibaba Group Holding Limited | Recognition method, apparatus, and device, and storage medium |
Families Citing this family (162)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9269154B2 (en) * | 2009-01-13 | 2016-02-23 | Futurewei Technologies, Inc. | Method and system for image processing to classify an object in an image |
US9324190B2 (en) | 2012-02-24 | 2016-04-26 | Matterport, Inc. | Capturing and aligning three-dimensional scenes |
US10848731B2 (en) | 2012-02-24 | 2020-11-24 | Matterport, Inc. | Capturing and aligning panoramic image and depth data |
US11094137B2 (en) | 2012-02-24 | 2021-08-17 | Matterport, Inc. | Employing three-dimensional (3D) data predicted from two-dimensional (2D) images using neural networks for 3D modeling applications and other applications |
US10594763B2 (en) | 2013-03-15 | 2020-03-17 | adRise, Inc. | Platform-independent content generation for thin client applications |
US10356461B2 (en) | 2013-03-15 | 2019-07-16 | adRise, Inc. | Adaptive multi-device content generation based on associated internet protocol addressing |
US10887421B2 (en) | 2013-03-15 | 2021-01-05 | Tubi, Inc. | Relevant secondary-device content generation based on associated internet protocol addressing |
JP2018514844A (en) * | 2015-03-11 | 2018-06-07 | シーメンス アクチエンゲゼルシヤフトSiemens Aktiengesellschaft | System and method for deep convolution network-based classification of cellular images and video |
TWI626610B (en) * | 2015-12-21 | 2018-06-11 | 財團法人工業技術研究院 | Message pushing method and message pushing device |
US10181195B2 (en) * | 2015-12-28 | 2019-01-15 | Facebook, Inc. | Systems and methods for determining optical flow |
US10210613B2 (en) * | 2016-05-12 | 2019-02-19 | Siemens Healthcare Gmbh | Multiple landmark detection in medical images based on hierarchical feature learning and end-to-end training |
US10121515B2 (en) * | 2016-06-06 | 2018-11-06 | Avigilon Corporation | Method, system and computer program product for interactively identifying same individuals or objects present in video recordings |
US10475186B2 (en) * | 2016-06-23 | 2019-11-12 | Intel Corportation | Segmentation of objects in videos using color and depth information |
JP6671265B2 (en) * | 2016-08-17 | 2020-03-25 | キヤノン株式会社 | Image processing apparatus, control method therefor, and program |
WO2018033137A1 (en) * | 2016-08-19 | 2018-02-22 | 北京市商汤科技开发有限公司 | Method, apparatus, and electronic device for displaying service object in video image |
US10898804B2 (en) * | 2016-08-19 | 2021-01-26 | Sony Corporation | Image processing device and image processing method |
US10902243B2 (en) * | 2016-10-25 | 2021-01-26 | Deep North, Inc. | Vision based target tracking that distinguishes facial feature targets |
US10203855B2 (en) | 2016-12-09 | 2019-02-12 | Snap Inc. | Customized user-controlled media overlays |
WO2018126275A1 (en) * | 2016-12-30 | 2018-07-05 | Dirk Schneemann, LLC | Modeling and learning character traits and medical condition based on 3d facial features |
GB201702008D0 (en) * | 2017-02-07 | 2017-03-22 | Anthropics Tech Ltd | A method of matching colours |
US10311554B2 (en) | 2017-03-01 | 2019-06-04 | Fotonation Limited | Method of providing a sharpness measure for an image |
US10217195B1 (en) * | 2017-04-17 | 2019-02-26 | Amazon Technologies, Inc. | Generation of semantic depth of field effect |
BR112019022447A2 (en) * | 2017-04-27 | 2020-06-09 | Bober Miroslaw | system and method for automated funduscopic image analysis |
US10769448B2 (en) * | 2017-05-31 | 2020-09-08 | Panasonic I-Pro Sensing Solutions Co., Ltd. | Surveillance system and surveillance method |
US10796477B2 (en) * | 2017-06-20 | 2020-10-06 | Edx Technologies, Inc. | Methods, devices, and systems for determining field of view and producing augmented reality |
US10699453B2 (en) * | 2017-08-17 | 2020-06-30 | Adobe Inc. | Digital media environment for style-aware patching in a digital image |
US10276229B2 (en) * | 2017-08-23 | 2019-04-30 | Teradyne, Inc. | Adjusting signal timing |
US12041713B2 (en) | 2017-08-23 | 2024-07-16 | Teradyne, Inc. | Reducing timing skew in a circuit path |
JP6464278B1 (en) * | 2017-08-24 | 2019-02-06 | ガンホー・オンライン・エンターテイメント株式会社 | Terminal device, program and method |
US10628676B2 (en) * | 2017-08-25 | 2020-04-21 | Tiny Pixels Technologies Inc. | Content delivery system and method for automated video overlay insertion |
US10575033B2 (en) * | 2017-09-05 | 2020-02-25 | Adobe Inc. | Injecting targeted ads into videos |
US10509413B2 (en) * | 2017-09-07 | 2019-12-17 | GM Global Technology Operations LLC | Ground reference determination for autonomous vehicle operations |
US10452954B2 (en) * | 2017-09-14 | 2019-10-22 | Google Llc | Object detection and representation in images |
US10997623B2 (en) * | 2017-09-21 | 2021-05-04 | Accenture Global Solutions Limited | Advertisement rendering |
US10922878B2 (en) * | 2017-10-04 | 2021-02-16 | Google Llc | Lighting for inserted content |
US10089743B1 (en) * | 2017-10-05 | 2018-10-02 | StradVision, Inc. | Method for segmenting an image and device using the same |
US10614557B2 (en) | 2017-10-16 | 2020-04-07 | Adobe Inc. | Digital image completion using deep learning |
US10672164B2 (en) | 2017-10-16 | 2020-06-02 | Adobe Inc. | Predicting patch displacement maps using a neural network |
US10902479B2 (en) * | 2017-10-17 | 2021-01-26 | Criteo Sa | Programmatic generation and optimization of images for a computerized graphical advertisement display |
CN109685058B (en) | 2017-10-18 | 2021-07-09 | 杭州海康威视数字技术股份有限公司 | Image target identification method and device and computer equipment |
EP3480714A1 (en) * | 2017-11-03 | 2019-05-08 | Tata Consultancy Services Limited | Signal analysis systems and methods for features extraction and interpretation thereof |
US10803546B2 (en) * | 2017-11-03 | 2020-10-13 | Baidu Usa Llc | Systems and methods for unsupervised learning of geometry from images using depth-normal consistency |
KR102609767B1 (en) * | 2017-11-08 | 2023-12-05 | 엘지전자 주식회사 | Apparatus, method and storage medium for controlling digital signage |
JP2021503134A (en) | 2017-11-15 | 2021-02-04 | グーグル エルエルシーGoogle LLC | Unsupervised learning of image depth and egomotion prediction neural networks |
US10456673B1 (en) * | 2017-11-17 | 2019-10-29 | Amazon Technologies, Inc. | Resource selection for hosted game sessions |
US10565743B1 (en) * | 2017-11-21 | 2020-02-18 | Snap Inc. | Synthesizing cloud stickers |
US10498963B1 (en) * | 2017-12-04 | 2019-12-03 | Amazon Technologies, Inc. | Motion extracted high dynamic range images |
US10558849B2 (en) * | 2017-12-11 | 2020-02-11 | Adobe Inc. | Depicted skin selection |
US10671888B1 (en) | 2017-12-14 | 2020-06-02 | Perceive Corporation | Using batches of training items for training a network |
WO2019125474A1 (en) | 2017-12-21 | 2019-06-27 | Rovi Guides, Inc. | Systems and method for dynamic insertion of advertisements |
CN109978924A (en) * | 2017-12-27 | 2019-07-05 | 长沙学院 | A kind of visual odometry method and system based on monocular |
US11043006B1 (en) | 2017-12-29 | 2021-06-22 | Perceive Corporation | Use of machine-trained network for misalignment identification |
JP2021511729A (en) * | 2018-01-18 | 2021-05-06 | ガムガム インコーポレイテッドGumgum, Inc. | Extension of the detected area in the image or video data |
US11113465B2 (en) * | 2018-01-26 | 2021-09-07 | International Business Machines Corporation | Personalized auto-triage of communications |
US10679070B1 (en) * | 2018-02-23 | 2020-06-09 | Facebook, Inc. | Systems and methods for a video understanding platform |
US10430950B2 (en) * | 2018-03-01 | 2019-10-01 | Honda Motor Co., Ltd. | Systems and methods for performing instance segmentation |
US11507800B2 (en) * | 2018-03-06 | 2022-11-22 | Adobe Inc. | Semantic class localization digital environment |
US11995537B1 (en) | 2018-03-14 | 2024-05-28 | Perceive Corporation | Training network with batches of input instances |
US11586902B1 (en) | 2018-03-14 | 2023-02-21 | Perceive Corporation | Training network to minimize worst case surprise |
CN108491807B (en) * | 2018-03-28 | 2020-08-28 | 北京农业信息技术研究中心 | Real-time monitoring method and system for oestrus of dairy cows |
CN108734087B (en) * | 2018-03-29 | 2022-04-29 | 京东方科技集团股份有限公司 | Object automatic identification method and system, shopping equipment and storage medium |
US11006154B2 (en) * | 2018-04-04 | 2021-05-11 | DISH Technologies L.L.C. | Selected replacement of digital imagery portions using augmented reality |
FI3557425T3 (en) | 2018-04-19 | 2024-07-10 | Aimotive Kft | Accelerator and system for accelerating operations |
EP3557485B1 (en) * | 2018-04-19 | 2021-05-26 | Aimotive Kft. | Method for accelerating operations and accelerator apparatus |
US10755391B2 (en) * | 2018-05-15 | 2020-08-25 | Adobe Inc. | Digital image completion by learning generation and patch matching jointly |
US10762702B1 (en) * | 2018-06-22 | 2020-09-01 | A9.Com, Inc. | Rendering three-dimensional models on mobile devices |
KR102591582B1 (en) * | 2018-06-22 | 2023-10-19 | 삼성전자주식회사 | Method and electronic device for displaying contents |
US10885400B2 (en) | 2018-07-03 | 2021-01-05 | General Electric Company | Classification based on annotation information |
CN108960139A (en) * | 2018-07-03 | 2018-12-07 | 百度在线网络技术(北京)有限公司 | Personage's Activity recognition method, apparatus and storage medium |
US20200012884A1 (en) * | 2018-07-03 | 2020-01-09 | General Electric Company | Classification based on annotation information |
US10755147B2 (en) * | 2018-07-03 | 2020-08-25 | General Electric Company | Classification and localization based on annotation information |
US11004242B2 (en) * | 2018-07-09 | 2021-05-11 | Myntra Designs Private Limited | Discrete wavelet transform based generative system and method |
CN109005416B (en) * | 2018-08-01 | 2020-12-15 | 武汉斗鱼网络科技有限公司 | AR scanning interaction method, storage medium, equipment and system in live broadcast |
JP7062068B2 (en) * | 2018-08-08 | 2022-05-02 | 富士フイルム株式会社 | Image processing method and image processing device |
CN109145902B (en) * | 2018-08-21 | 2021-09-03 | 武汉大学 | Method for recognizing and positioning geometric identification by using generalized characteristics |
US10475250B1 (en) * | 2018-08-30 | 2019-11-12 | Houzz, Inc. | Virtual item simulation using detected surfaces |
US11025907B2 (en) * | 2019-02-28 | 2021-06-01 | Google Llc | Receptive-field-conforming convolution models for video coding |
US10674152B2 (en) | 2018-09-18 | 2020-06-02 | Google Llc | Efficient use of quantization parameters in machine-learning models for video coding |
US10869036B2 (en) | 2018-09-18 | 2020-12-15 | Google Llc | Receptive-field-conforming convolutional models for video coding |
US11589031B2 (en) * | 2018-09-26 | 2023-02-21 | Google Llc | Active stereo depth prediction based on coarse matching |
JP7064144B2 (en) * | 2018-09-28 | 2022-05-10 | 日本電信電話株式会社 | Information integration method, information integration device, and information integration program |
US11810202B1 (en) | 2018-10-17 | 2023-11-07 | State Farm Mutual Automobile Insurance Company | Method and system for identifying conditions of features represented in a virtual model |
CN109191382B (en) * | 2018-10-18 | 2023-12-05 | 京东方科技集团股份有限公司 | Image processing method, device, electronic equipment and computer readable storage medium |
CN109603155B (en) * | 2018-11-29 | 2019-12-27 | 网易(杭州)网络有限公司 | Method and device for acquiring merged map, storage medium, processor and terminal |
US11699162B2 (en) * | 2018-12-24 | 2023-07-11 | Infilect Technologies Private Limited | System and method for generating a modified design creative |
CN109753961A (en) * | 2018-12-26 | 2019-05-14 | 国网新疆电力有限公司乌鲁木齐供电公司 | A kind of substation's spacer units unlocking method and system based on image recognition |
US10911837B1 (en) * | 2018-12-28 | 2021-02-02 | Verizon Media Inc. | Systems and methods for dynamically augmenting videos via in-video insertion on mobile devices |
EP3674973A1 (en) * | 2018-12-28 | 2020-07-01 | Samsung Electronics Co., Ltd. | Method and apparatus with liveness detection and object recognition |
US10873724B1 (en) | 2019-01-08 | 2020-12-22 | State Farm Mutual Automobile Insurance Company | Virtual environment generation for collaborative building assessment |
US10860889B2 (en) * | 2019-01-11 | 2020-12-08 | Google Llc | Depth prediction from dual pixel images |
US10810782B1 (en) * | 2019-04-01 | 2020-10-20 | Snap Inc. | Semantic texture mapping system |
US10963140B2 (en) * | 2019-04-12 | 2021-03-30 | John William Marr | Augmented reality experience creation via tapping virtual surfaces in augmented reality |
US11049072B1 (en) | 2019-04-26 | 2021-06-29 | State Farm Mutual Automobile Insurance Company | Asynchronous virtual collaboration environments |
US11032328B1 (en) | 2019-04-29 | 2021-06-08 | State Farm Mutual Automobile Insurance Company | Asymmetric collaborative virtual environments |
CN111866301B (en) * | 2019-04-30 | 2022-07-05 | 阿里巴巴集团控股有限公司 | Data processing method, device and equipment |
GB2583747B (en) * | 2019-05-08 | 2023-12-06 | Vivacity Labs Ltd | Traffic control system |
CN110121034B (en) * | 2019-05-09 | 2021-09-07 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for implanting information into video |
US11080861B2 (en) * | 2019-05-14 | 2021-08-03 | Matterport, Inc. | Scene segmentation using model subtraction |
WO2020230921A1 (en) * | 2019-05-14 | 2020-11-19 | 엘지전자 주식회사 | Method for extracting features from image using laser pattern, and identification device and robot using same |
US11080884B2 (en) * | 2019-05-15 | 2021-08-03 | Matterport, Inc. | Point tracking using a trained network |
CN110290426B (en) * | 2019-06-24 | 2022-04-19 | 腾讯科技(深圳)有限公司 | Method, device and equipment for displaying resources and storage medium |
CN110213629B (en) * | 2019-06-27 | 2022-02-11 | 腾讯科技(深圳)有限公司 | Information implantation method, device, server and storage medium |
CN112153483B (en) * | 2019-06-28 | 2022-05-13 | 腾讯科技(深圳)有限公司 | Information implantation area detection method and device and electronic equipment |
KR102350192B1 (en) * | 2019-07-19 | 2022-01-17 | 한국과학기술연구원 | Method for selecting image of interest to construct retrieval database and image control system performing the same |
US11468585B2 (en) * | 2019-08-27 | 2022-10-11 | Nec Corporation | Pseudo RGB-D for self-improving monocular slam and depth prediction |
US11533460B2 (en) * | 2019-09-11 | 2022-12-20 | Nbc Universal Media, Llc | High dynamic range and wide colour gamut comfort measurement and feedback |
CN110706171B (en) * | 2019-09-26 | 2024-04-26 | 中国电子科技集团公司第十一研究所 | Image noise reduction method and device |
EP3798679B1 (en) * | 2019-09-30 | 2023-06-21 | STMicroelectronics (Research & Development) Limited | Laser safety verification |
JP7410619B2 (en) * | 2019-10-31 | 2024-01-10 | キヤノン株式会社 | Image processing device, image processing method and program |
CN112783993B (en) * | 2019-11-08 | 2024-03-15 | 华为技术有限公司 | Content synchronization method for multiple authorized spaces based on digital map |
US20210144449A1 (en) * | 2019-11-11 | 2021-05-13 | José Antonio CRUZ MOYA | Video processing and modification |
US20210150751A1 (en) * | 2019-11-14 | 2021-05-20 | Nec Laboratories America, Inc. | Occlusion-aware indoor scene analysis |
CN110807742B (en) * | 2019-11-21 | 2023-02-24 | 西安工业大学 | Low-light-level image enhancement method based on integrated network |
KR20210065447A (en) * | 2019-11-27 | 2021-06-04 | 삼성전자주식회사 | Electronic device and method for controlling the same, and storage medium |
JP7482620B2 (en) * | 2019-11-28 | 2024-05-14 | 株式会社Preferred Networks | DATA GENERATION METHOD, DATA DISPLAY METHOD, DATA GENERATION DEVICE, AND DATA DISPLAY SYSTEM |
US10902607B1 (en) * | 2019-12-06 | 2021-01-26 | Black Sesame International Holding Limited | Fast instance segmentation |
US11315266B2 (en) * | 2019-12-16 | 2022-04-26 | Robert Bosch Gmbh | Self-supervised depth estimation method and system |
CN111031378B (en) * | 2019-12-20 | 2022-04-05 | 山东浪潮超高清视频产业有限公司 | Method for dotting video and recording EPG (electronic program guide) information |
GB201919027D0 (en) * | 2019-12-20 | 2020-02-05 | Move Ai Ltd | Method of inserting an object into a sequence of images |
CN111127483B (en) * | 2019-12-24 | 2023-09-15 | 新方正控股发展有限责任公司 | Color picture processing method, device, equipment, storage medium and system |
US11506508B2 (en) * | 2019-12-29 | 2022-11-22 | Dell Products L.P. | System and method using deep learning machine vision to analyze localities |
US11687778B2 (en) | 2020-01-06 | 2023-06-27 | The Research Foundation For The State University Of New York | Fakecatcher: detection of synthetic portrait videos using biological signals |
US11842299B2 (en) * | 2020-01-14 | 2023-12-12 | Dell Products L.P. | System and method using deep learning machine vision to conduct product positioning analyses |
US11430002B2 (en) * | 2020-01-14 | 2022-08-30 | Dell Products L.P. | System and method using deep learning machine vision to conduct comparative campaign analyses |
CN111372122B (en) * | 2020-02-27 | 2022-03-15 | 腾讯科技(深圳)有限公司 | Media content implantation method, model training method and related device |
CN111310858B (en) * | 2020-03-26 | 2023-06-30 | 北京百度网讯科技有限公司 | Method and device for generating information |
WO2021237743A1 (en) * | 2020-05-29 | 2021-12-02 | 京东方科技集团股份有限公司 | Video frame interpolation method and apparatus, and computer-readable storage medium |
US11417001B1 (en) * | 2020-07-27 | 2022-08-16 | Scandit Ag | Detecting discrete optical patterns using depth estimation |
US11094134B1 (en) * | 2020-08-13 | 2021-08-17 | Booz Allen Hamilton Inc. | System and method for generating synthetic data |
US11277658B1 (en) | 2020-08-21 | 2022-03-15 | Beam, Inc. | Integrating overlaid digital content into displayed data via graphics processing circuitry |
US11341543B2 (en) * | 2020-08-31 | 2022-05-24 | HYPE AR, Inc. | System and method for generating visual content associated with tailored advertisements in a mixed reality environment |
JP7386888B2 (en) * | 2020-10-08 | 2023-11-27 | グーグル エルエルシー | Two-shot composition of the speaker on the screen |
US11410361B2 (en) * | 2020-10-26 | 2022-08-09 | Adobe Inc. | Digital content editing using a procedural model |
CN114501097A (en) * | 2020-11-12 | 2022-05-13 | 微软技术许可有限责任公司 | Inserting digital content in video |
CN114501127B (en) * | 2020-11-12 | 2024-06-14 | 微软技术许可有限责任公司 | Inserting digital content in multi-picture video |
US20220156972A1 (en) * | 2020-11-16 | 2022-05-19 | Waymo Llc | Long range distance estimation using reference objects |
US11790254B2 (en) * | 2020-12-04 | 2023-10-17 | Shutterstock, Inc. | Method and system for detecting model file content |
US11798210B2 (en) * | 2020-12-09 | 2023-10-24 | Salesforce, Inc. | Neural network based detection of image space suitable for overlaying media content |
US11978214B2 (en) * | 2021-01-24 | 2024-05-07 | Inuitive Ltd. | Method and apparatus for detecting edges in active stereo images |
US11657511B2 (en) | 2021-01-29 | 2023-05-23 | Salesforce, Inc. | Heuristics-based detection of image space suitable for overlaying media content |
US11682101B2 (en) | 2021-04-30 | 2023-06-20 | Mobeus Industries, Inc. | Overlaying displayed digital content transmitted over a communication network via graphics processing circuitry using a frame buffer |
US11475610B1 (en) * | 2021-04-30 | 2022-10-18 | Mobeus Industries, Inc. | Controlling interactivity of digital content overlaid onto displayed data via graphics processing circuitry using a frame buffer |
US20220350650A1 (en) * | 2021-04-30 | 2022-11-03 | Mobeus Industries, Inc. | Integrating overlaid digital content into displayed data via processing circuitry using a computing memory and an operating system memory |
US11601276B2 (en) | 2021-04-30 | 2023-03-07 | Mobeus Industries, Inc. | Integrating and detecting visual data security token in displayed data via graphics processing circuitry using a frame buffer |
US11586835B2 (en) | 2021-04-30 | 2023-02-21 | Mobeus Industries, Inc. | Integrating overlaid textual digital content into displayed data via graphics processing circuitry using a frame buffer |
US11483156B1 (en) | 2021-04-30 | 2022-10-25 | Mobeus Industries, Inc. | Integrating digital content into displayed data on an application layer via processing circuitry of a server |
US11477020B1 (en) | 2021-04-30 | 2022-10-18 | Mobeus Industries, Inc. | Generating a secure random number by determining a change in parameters of digital content in subsequent frames via graphics processing circuitry |
CN113129298B (en) * | 2021-05-06 | 2024-01-12 | 北京思图场景数据科技服务有限公司 | Method for identifying definition of text image |
CN115460456A (en) * | 2021-06-08 | 2022-12-09 | 微软技术许可有限责任公司 | Target area extraction for digital content addition |
US12022138B2 (en) | 2021-06-21 | 2024-06-25 | Tubi, Inc. | Model serving for advanced frequency management |
JP7113327B1 (en) * | 2021-07-12 | 2022-08-05 | パナソニックIpマネジメント株式会社 | Imaging device |
US11562153B1 (en) | 2021-07-16 | 2023-01-24 | Mobeus Industries, Inc. | Systems and methods for recognizability of objects in a multi-layer display |
US11935199B2 (en) * | 2021-07-26 | 2024-03-19 | Google Llc | Augmented reality depth detection through object recognition |
US20230082420A1 (en) * | 2021-09-13 | 2023-03-16 | Qualcomm Incorporated | Display of digital media content on physical surface |
US12041378B2 (en) * | 2021-09-17 | 2024-07-16 | BCD International, Inc. | Combined security and video camera control system |
US11763496B2 (en) * | 2021-09-30 | 2023-09-19 | Lemon Inc. | Social networking based on asset items |
US11868788B2 (en) * | 2021-11-04 | 2024-01-09 | Here Global B.V. | Method and apparatus for application plug-in management |
CN113780258B (en) * | 2021-11-12 | 2022-02-22 | 清华大学 | Photoelectric calculation light field depth intelligent perception classification method and device |
CN116797956A (en) * | 2022-03-14 | 2023-09-22 | 晒趣科技有限公司 | Advertisement insertion position searching method and method for automatically inserting advertisement on video |
US11887289B2 (en) * | 2022-05-15 | 2024-01-30 | Oran Gilad | Occlusion key generation |
WO2024097701A1 (en) * | 2022-10-31 | 2024-05-10 | Drexel University | System to defend against puppeteering attacks in ai-based low bandwidth video |
CN116934754B (en) * | 2023-09-18 | 2023-12-01 | 四川大学华西第二医院 | Liver image identification method and device based on graph neural network |
Citations (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5430808A (en) * | 1990-06-15 | 1995-07-04 | At&T Corp. | Image segmenting apparatus and methods |
EP0750819A1 (en) | 1994-03-14 | 1997-01-02 | Scidel Technologies Ltd. | A system for implanting an image into a video stream |
GB2305051A (en) | 1995-09-08 | 1997-03-26 | Orad Hi Tec Systems Ltd | Automatic electronic replacement of billboards in a video image |
US5680475A (en) * | 1992-09-16 | 1997-10-21 | U.S. Philips Corporation | System for processing textured images, texture analyser and texture synthesizer |
US5892554A (en) * | 1995-11-28 | 1999-04-06 | Princeton Video Image, Inc. | System and method for inserting static and dynamic images into a live video broadcast |
US5903317A (en) * | 1993-02-14 | 1999-05-11 | Orad Hi-Tech Systems Ltd. | Apparatus and method for detecting, identifying and incorporating advertisements in a video |
US20020136449A1 (en) * | 2001-01-20 | 2002-09-26 | Samsung Electronics Co., Ltd. | Apparatus and method for extracting object based on feature matching between segmented regions in images |
US20020159636A1 (en) * | 2000-03-14 | 2002-10-31 | Lienhart Rainer W | Generalized text localization in images |
US20030012409A1 (en) * | 2001-07-10 | 2003-01-16 | Overton Kenneth J. | Method and system for measurement of the duration an area is included in an image stream |
US6711293B1 (en) | 1999-03-08 | 2004-03-23 | The University Of British Columbia | Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image |
US20040105583A1 (en) * | 2002-11-22 | 2004-06-03 | Jacobs Johannes W.M. | Segmenting a composite image via minimum areas |
US20040140992A1 (en) * | 2002-11-22 | 2004-07-22 | Marquering Henricus A. | Segmenting an image via a graph |
US6912298B1 (en) * | 2003-08-11 | 2005-06-28 | Adobe Systems Incorporation | Object detection using dynamic probability scans |
US20060062430A1 (en) * | 2004-03-16 | 2006-03-23 | Vallone Robert P | Feed-customized processing of multiple video streams in a pipeline architecture |
US20070204310A1 (en) * | 2006-02-27 | 2007-08-30 | Microsoft Corporation | Automatically Inserting Advertisements into Source Video Content Playback Streams |
US20080056563A1 (en) * | 2003-10-24 | 2008-03-06 | Adobe Systems Incorporated | Object Extraction Based on Color and Visual Texture |
US20080118107A1 (en) * | 2006-11-20 | 2008-05-22 | Rexee, Inc. | Method of Performing Motion-Based Object Extraction and Tracking in Video |
US20090037947A1 (en) * | 2007-07-30 | 2009-02-05 | Yahoo! Inc. | Textual and visual interactive advertisements in videos |
US20090076882A1 (en) * | 2007-09-14 | 2009-03-19 | Microsoft Corporation | Multi-modal relevancy matching |
US20090079871A1 (en) * | 2007-09-20 | 2009-03-26 | Microsoft Corporation | Advertisement insertion points detection for online video advertising |
US20090150210A1 (en) * | 2007-12-10 | 2009-06-11 | Athellina Athsani | Advertising associated with multimedia content |
US20090171787A1 (en) * | 2007-12-31 | 2009-07-02 | Microsoft Corporation | Impressionative Multimedia Advertising |
US20090238460A1 (en) | 2006-04-28 | 2009-09-24 | Ryuji Funayama | Robust interest point detector and descriptor |
US20090324065A1 (en) * | 2008-06-26 | 2009-12-31 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US7667732B1 (en) * | 2004-03-16 | 2010-02-23 | 3Vr Security, Inc. | Event generation and camera cluster analysis of multiple video streams in a pipeline architecture |
US20100054538A1 (en) * | 2007-01-23 | 2010-03-04 | Valeo Schalter Und Sensoren Gmbh | Method and system for universal lane boundary detection |
US20110030002A1 (en) * | 2009-07-29 | 2011-02-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Adm enabled oitf, supporting iptv infrastructure and associated methods |
US20110038452A1 (en) * | 2009-08-12 | 2011-02-17 | Kabushiki Kaisha Toshiba | Image domain based noise reduction for low dose computed tomography fluoroscopy |
US7893963B2 (en) * | 2000-03-27 | 2011-02-22 | Eastman Kodak Company | Digital camera which estimates and corrects small camera rotations |
US20110075992A1 (en) * | 2009-09-30 | 2011-03-31 | Microsoft Corporation | Intelligent overlay for video advertising |
US8207989B2 (en) * | 2008-12-12 | 2012-06-26 | Microsoft Corporation | Multi-video synthesis |
US20120192226A1 (en) * | 2011-01-21 | 2012-07-26 | Impossible Software GmbH | Methods and Systems for Customized Video Modification |
US20130039534A1 (en) * | 2011-08-10 | 2013-02-14 | National Taipei University Of Technology | Motion detection method for complex scenes |
US8730397B1 (en) * | 2009-08-31 | 2014-05-20 | Hewlett-Packard Development Company, L.P. | Providing a photobook of video frame images |
US20150003707A1 (en) * | 2012-01-19 | 2015-01-01 | Peter Amon | Pixel-Prediction for Compression of Visual Data |
US20150078648A1 (en) * | 2013-09-13 | 2015-03-19 | National Cheng Kung University | Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same |
US20150131851A1 (en) * | 2013-11-13 | 2015-05-14 | Xerox Corporation | System and method for using apparent size and orientation of an object to improve video-based tracking in regularized environments |
US20150161773A1 (en) * | 2012-05-09 | 2015-06-11 | Hitachi Kokusai Electric Inc. | Image processing device and image processing method |
US20160042251A1 (en) * | 2014-07-03 | 2016-02-11 | Oim Squared Inc. | Interactive content generation |
US20160232425A1 (en) * | 2013-11-06 | 2016-08-11 | Lehigh University | Diagnostic system and method for biological tissue analysis |
US20160286080A1 (en) * | 2015-03-20 | 2016-09-29 | Pfu Limited | Image processing apparatus, region detection method and computer-readable, non-transitory medium |
US9704261B2 (en) * | 2014-11-14 | 2017-07-11 | Huawei Technologies Co., Ltd. | Image segmentation processing method and apparatus |
US20190116200A1 (en) | 2017-01-27 | 2019-04-18 | Oracle International Corporation | Method and system for placing a workload on one of a plurality of hosts |
US10522186B2 (en) | 2017-07-28 | 2019-12-31 | Adobe Inc. | Apparatus, systems, and methods for integrating digital media content |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US913454A (en) | 1908-03-16 | 1909-02-23 | George E Belcher | Tape for measuring lasts. |
-
2017
- 2017-03-22 EP EP17717555.1A patent/EP3433816A1/en not_active Withdrawn
- 2017-03-22 WO PCT/US2017/023616 patent/WO2017165538A1/en active Application Filing
- 2017-03-22 US US15/466,135 patent/US10839573B2/en active Active
Patent Citations (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5430808A (en) * | 1990-06-15 | 1995-07-04 | At&T Corp. | Image segmenting apparatus and methods |
US5680475A (en) * | 1992-09-16 | 1997-10-21 | U.S. Philips Corporation | System for processing textured images, texture analyser and texture synthesizer |
US5903317A (en) * | 1993-02-14 | 1999-05-11 | Orad Hi-Tech Systems Ltd. | Apparatus and method for detecting, identifying and incorporating advertisements in a video |
EP0750819A1 (en) | 1994-03-14 | 1997-01-02 | Scidel Technologies Ltd. | A system for implanting an image into a video stream |
GB2305051A (en) | 1995-09-08 | 1997-03-26 | Orad Hi Tec Systems Ltd | Automatic electronic replacement of billboards in a video image |
US5892554A (en) * | 1995-11-28 | 1999-04-06 | Princeton Video Image, Inc. | System and method for inserting static and dynamic images into a live video broadcast |
US6711293B1 (en) | 1999-03-08 | 2004-03-23 | The University Of British Columbia | Method and apparatus for identifying scale invariant features in an image and use of same for locating an object in an image |
US20020159636A1 (en) * | 2000-03-14 | 2002-10-31 | Lienhart Rainer W | Generalized text localization in images |
US7893963B2 (en) * | 2000-03-27 | 2011-02-22 | Eastman Kodak Company | Digital camera which estimates and corrects small camera rotations |
US20020136449A1 (en) * | 2001-01-20 | 2002-09-26 | Samsung Electronics Co., Ltd. | Apparatus and method for extracting object based on feature matching between segmented regions in images |
US20030012409A1 (en) * | 2001-07-10 | 2003-01-16 | Overton Kenneth J. | Method and system for measurement of the duration an area is included in an image stream |
US20040105583A1 (en) * | 2002-11-22 | 2004-06-03 | Jacobs Johannes W.M. | Segmenting a composite image via minimum areas |
US20040140992A1 (en) * | 2002-11-22 | 2004-07-22 | Marquering Henricus A. | Segmenting an image via a graph |
US7570811B2 (en) * | 2002-11-22 | 2009-08-04 | Oce Technologies B.V. | Segmenting an image via a graph |
US6912298B1 (en) * | 2003-08-11 | 2005-06-28 | Adobe Systems Incorporation | Object detection using dynamic probability scans |
US20080056563A1 (en) * | 2003-10-24 | 2008-03-06 | Adobe Systems Incorporated | Object Extraction Based on Color and Visual Texture |
US20060062430A1 (en) * | 2004-03-16 | 2006-03-23 | Vallone Robert P | Feed-customized processing of multiple video streams in a pipeline architecture |
US7667732B1 (en) * | 2004-03-16 | 2010-02-23 | 3Vr Security, Inc. | Event generation and camera cluster analysis of multiple video streams in a pipeline architecture |
US20070204310A1 (en) * | 2006-02-27 | 2007-08-30 | Microsoft Corporation | Automatically Inserting Advertisements into Source Video Content Playback Streams |
US20090238460A1 (en) | 2006-04-28 | 2009-09-24 | Ryuji Funayama | Robust interest point detector and descriptor |
US20080118107A1 (en) * | 2006-11-20 | 2008-05-22 | Rexee, Inc. | Method of Performing Motion-Based Object Extraction and Tracking in Video |
US20100054538A1 (en) * | 2007-01-23 | 2010-03-04 | Valeo Schalter Und Sensoren Gmbh | Method and system for universal lane boundary detection |
US20090037947A1 (en) * | 2007-07-30 | 2009-02-05 | Yahoo! Inc. | Textual and visual interactive advertisements in videos |
WO2009017983A2 (en) | 2007-07-30 | 2009-02-05 | Yahoo! Inc. | Textual and visual interactive advertisements in videos |
US20090076882A1 (en) * | 2007-09-14 | 2009-03-19 | Microsoft Corporation | Multi-modal relevancy matching |
US20090079871A1 (en) * | 2007-09-20 | 2009-03-26 | Microsoft Corporation | Advertisement insertion points detection for online video advertising |
US20090150210A1 (en) * | 2007-12-10 | 2009-06-11 | Athellina Athsani | Advertising associated with multimedia content |
US20090171787A1 (en) * | 2007-12-31 | 2009-07-02 | Microsoft Corporation | Impressionative Multimedia Advertising |
US20090324065A1 (en) * | 2008-06-26 | 2009-12-31 | Canon Kabushiki Kaisha | Image processing apparatus and method |
US8207989B2 (en) * | 2008-12-12 | 2012-06-26 | Microsoft Corporation | Multi-video synthesis |
US20110030002A1 (en) * | 2009-07-29 | 2011-02-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Adm enabled oitf, supporting iptv infrastructure and associated methods |
US20110038452A1 (en) * | 2009-08-12 | 2011-02-17 | Kabushiki Kaisha Toshiba | Image domain based noise reduction for low dose computed tomography fluoroscopy |
US8730397B1 (en) * | 2009-08-31 | 2014-05-20 | Hewlett-Packard Development Company, L.P. | Providing a photobook of video frame images |
US8369686B2 (en) * | 2009-09-30 | 2013-02-05 | Microsoft Corporation | Intelligent overlay for video advertising |
US20110075992A1 (en) * | 2009-09-30 | 2011-03-31 | Microsoft Corporation | Intelligent overlay for video advertising |
US20120192226A1 (en) * | 2011-01-21 | 2012-07-26 | Impossible Software GmbH | Methods and Systems for Customized Video Modification |
US20130039534A1 (en) * | 2011-08-10 | 2013-02-14 | National Taipei University Of Technology | Motion detection method for complex scenes |
US20150003707A1 (en) * | 2012-01-19 | 2015-01-01 | Peter Amon | Pixel-Prediction for Compression of Visual Data |
US20150161773A1 (en) * | 2012-05-09 | 2015-06-11 | Hitachi Kokusai Electric Inc. | Image processing device and image processing method |
US20150078648A1 (en) * | 2013-09-13 | 2015-03-19 | National Cheng Kung University | Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same |
US20160232425A1 (en) * | 2013-11-06 | 2016-08-11 | Lehigh University | Diagnostic system and method for biological tissue analysis |
US20150131851A1 (en) * | 2013-11-13 | 2015-05-14 | Xerox Corporation | System and method for using apparent size and orientation of an object to improve video-based tracking in regularized environments |
US20160042251A1 (en) * | 2014-07-03 | 2016-02-11 | Oim Squared Inc. | Interactive content generation |
US9704261B2 (en) * | 2014-11-14 | 2017-07-11 | Huawei Technologies Co., Ltd. | Image segmentation processing method and apparatus |
US20160286080A1 (en) * | 2015-03-20 | 2016-09-29 | Pfu Limited | Image processing apparatus, region detection method and computer-readable, non-transitory medium |
US20190116200A1 (en) | 2017-01-27 | 2019-04-18 | Oracle International Corporation | Method and system for placing a workload on one of a plurality of hosts |
US10522186B2 (en) | 2017-07-28 | 2019-12-31 | Adobe Inc. | Apparatus, systems, and methods for integrating digital media content |
Non-Patent Citations (29)
Title |
---|
Canny, A Computational Approach to Edge Detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 8, No. 6, Nov. 1986, pp. 679-698. |
Chang, Jyh-Yeong, et al. "Digital image translational and rotational motion stabilization using optical flow technique." IEEE Transactions on Consumer Electronics 48.1 (2002): 108-115. * |
Dalal et al., Histograms of Oriented Gradients for Human Detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Jun. 25, 2005, pp. 1-8. |
Deng et al., Principal Curvature-Based Region Detector for Object Recognition, 2007 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2007, 8 pages. |
Deriche, Using Canny's Criteria to Derive a Recursively Implemented Optimal Edge Detector, International Journal of Computer Vision, 1987, pp. 167-187. |
Duda et al., Use of the Hough Transformation to Detect Lines and Curves in Pictures, Graphics and Image Processing, Jan. 1972, pp. 11-15. |
Harris et al., A Combined Corner and Edge Detector, Plessey Research, 1988, pp. 147-151. |
International Application No. PCT/US2017/023616, International Preliminary Report on Patentability dated Oct. 4, 2018, 9 pages. |
International Search Report and Written Opinion dated Jun. 6, 2017 in related International Application No. PCT/US2017/023616 filed Mar. 22, 2017, 15 pages. |
Knutsson, Representing Local Structure Using Tensors, Scandinavian Conference on Image Analysis, 2011, pp. 1-8. |
Lindeberg et al., Segmentation and Classification of Edges Using Minimum Description Length Approximation and Complementary Junction Cues, Computer Vision and Image Understanding, vol. 67, No. 1, 1997, pp. 88-98. |
Lindeberg et al., Shape-Adapted Smoothing in Estimation of 3-D Depth Cues from Affine Distortions of Local 2-D Brightness Structure, CV AP Dept of Numerical Analysis and Computing Science, Jan. 1997, 12 pages. |
Lindenberg, Image Matching Using Generalized Scale-Space Interest Points, Journal of Mathematical Imaging and Vision, vol. 52, No. 1, May 2015, 34 pages. |
Matas et al., Robust Wide Baseline Stereo from Maximally Stable Extremal Regions, Image and Vision Computing, vol. 22, No. 10, Sep. 1, 2004, pp. 761-767. |
Meyer et al., Wavelets and Operators, Cambridge Studies in Advanced Mathematics, vol. 37, Cambridge University Press, Jan. 1996, 4 pages. |
Mikolajczyk et al., A Comparison of Affine Region Detectors, International Journal of Computer Vision, 2006, 30 pages. |
Mikolajczyk et al., A Performance Evaluation of Local Descriptors, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, No. 10, Oct. 2005, pp. 1615-1630. |
Mikolajczyk et al., An Affine Invariant Interest Point Detector, European Conference on Computer Vision (ECCV '02), May 2002, 15 pages. |
Panwar et al., Image Segmentation using K-means clustering and Thresholding, International Research Journal of Engineering and Technology (IRJET), vol. 3, No. 5, May 2016, pp. 1787-1793. |
Roberts, Machine Perception of Three-Dimensional Solids, Department of Electrical Engineering, Jul. 23, 1963, 82 pages. |
Rosten et al., Machine Learning for High-Speed Corner Detection, European Conference on Computer Vision, 2006, 14 pages. |
Rublee et al., ORB: An Efficient Alternative to Sift or Surf, Computer Vision (ICCV), IEEE International Conference on. IEEE, Nov. 2011, pp. 1-8. |
Shi et al., Good Features to Track, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1994, pp. 593-600. |
Smith et al., Susan- A New Approach to Low Level Image Processing, International Journal of Computer Vision, vol. 23, No. 1, May 1997, 59 pages. |
U.S. Appl. No. 16/049,690, Notice of Allowance dated Apr. 26, 2019, 9 pages. |
U.S. Appl. No. 16/049,690, Notice of Allowance dated Aug. 20, 2019, 5 pages. |
Viola et al., Rapid Object Detection Using a Boosted Cascade of Simple Features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. 1511-1518. |
Wang et al., 6-2 Gray Level Corner Detection, IAPR Workshop on Machine Vision Applications, Nov. 1998, 4 pages. |
Zitnick et al., Consistent Segmentation for Optical Flow Estimation, Tenth IEEE International Conference on Computer Vision (ICCV'05), vol. 2, Oct. 2005, 8 pages. |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11049232B2 (en) * | 2017-02-10 | 2021-06-29 | Hangzhou Hikvision Digital Technology Co., Ltd. | Image fusion apparatus and image fusion method |
US11371835B2 (en) * | 2018-03-16 | 2022-06-28 | Nec Corporation | Object detection device, object detection system, object detection method, and non-transitory computer-readable medium storing program |
US11391840B2 (en) * | 2018-06-25 | 2022-07-19 | Ricoh Company, Ltd. | Distance-measuring apparatus, mobile object, distance-measuring method, and distance measuring system |
US11361448B2 (en) * | 2018-09-19 | 2022-06-14 | Canon Kabushiki Kaisha | Image processing apparatus, method of controlling image processing apparatus, and storage medium |
US20220051425A1 (en) * | 2019-04-30 | 2022-02-17 | Huawei Technologies Co., Ltd. | Scale-aware monocular localization and mapping |
US11294047B2 (en) * | 2019-12-23 | 2022-04-05 | Sensetime International Pte. Ltd. | Method, apparatus, and system for recognizing target object |
US11907838B2 (en) | 2020-05-22 | 2024-02-20 | Alibaba Group Holding Limited | Recognition method, apparatus, and device, and storage medium |
US11554324B2 (en) * | 2020-06-25 | 2023-01-17 | Sony Interactive Entertainment LLC | Selection of video template based on computer simulation metadata |
US20220172826A1 (en) * | 2020-11-30 | 2022-06-02 | Coreline Soft Co., Ltd. | Medical image reading assistant apparatus and method for adjusting threshold of diagnostic assistant information based on follow-up examination |
US11915822B2 (en) * | 2020-11-30 | 2024-02-27 | Coreline Soft Co., Ltd. | Medical image reading assistant apparatus and method for adjusting threshold of diagnostic assistant information based on follow-up examination |
WO2023283612A1 (en) * | 2021-07-08 | 2023-01-12 | Drake Alexander Technologies, Inc. | System and method for image-based parking determination using machine learning |
US11594258B2 (en) | 2021-07-19 | 2023-02-28 | Pes University | System for the automated, context sensitive, and non-intrusive insertion of consumer-adaptive content in video |
US11436708B1 (en) | 2021-08-19 | 2022-09-06 | Unity Technologies Sf | Removing moving objects from a video scene captured by a moving camera |
US11430132B1 (en) * | 2021-08-19 | 2022-08-30 | Unity Technologies Sf | Replacing moving objects with background information in a video scene |
Also Published As
Publication number | Publication date |
---|---|
US20170278289A1 (en) | 2017-09-28 |
EP3433816A1 (en) | 2019-01-30 |
WO2017165538A1 (en) | 2017-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10839573B2 (en) | Apparatus, systems, and methods for integrating digital media content into other digital media content | |
US10956967B2 (en) | Generating and providing augmented reality representations of recommended products based on style similarity in relation to real-world surroundings | |
Qian et al. | Html: A parametric hand texture model for 3d hand reconstruction and personalization | |
US9317778B2 (en) | Interactive content generation | |
US8411932B2 (en) | Example-based two-dimensional to three-dimensional image conversion method, computer readable medium therefor, and system | |
Serra et al. | Hand segmentation for gesture recognition in ego-vision | |
Xiao et al. | Efficient shadow removal using subregion matching illumination transfer | |
CN106096542B (en) | Image video scene recognition method based on distance prediction information | |
Thasarathan et al. | Automatic temporally coherent video colorization | |
Mohanty et al. | Robust pose recognition using deep learning | |
Zhang et al. | Online video stream abstraction and stylization | |
Gawande et al. | SIRA: Scale illumination rotation affine invariant mask R-CNN for pedestrian detection | |
US11538140B2 (en) | Image inpainting based on multiple image transformations | |
Ommer et al. | Seeing the objects behind the dots: Recognition in videos from a moving camera | |
Coniglio et al. | People silhouette extraction from people detection bounding boxes in images | |
Direkoglu et al. | Player detection in field sports | |
Ding et al. | Personalizing human avatars based on realistic 3D facial reconstruction | |
Wang et al. | STV-based video feature processing for action recognition | |
Gao et al. | Layout-guided indoor panorama inpainting with plane-aware normalization | |
Chen et al. | Illumination-invariant video cut-out using octagon sensitive optimization | |
Wang et al. | A study on hand gesture recognition algorithm realized with the aid of efficient feature extraction method and convolution neural networks: design and its application to VR environment | |
Halder et al. | Perceptual conditional generative adversarial networks for end-to-end image colourization | |
Ewerth et al. | Estimating relative depth in single images via rankboost | |
CN117333495B (en) | Image detection method, device, equipment and storage medium | |
Cavalcanti et al. | A survey on automatic techniques for enhancement and analysis of digital photography |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: URU, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MARINO, WILLIAM L.;ATTORE, BRUNNO FIDEL MACIEL;ADAMI, JOHAN;SIGNING DATES FROM 20170320 TO 20170321;REEL/FRAME:041711/0947 |
|
AS | Assignment |
Owner name: ADOBE SYSTEMS INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:URU, INC.;REEL/FRAME:046820/0758 Effective date: 20180427 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
AS | Assignment |
Owner name: ADOBE INC., CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:ADOBE SYSTEMS INCORPORATED;REEL/FRAME:048525/0042 Effective date: 20181008 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |