EP4189591A1 - System and method for preparing digital composites for incorporating into digital visual media - Google Patents
System and method for preparing digital composites for incorporating into digital visual mediaInfo
- Publication number
- EP4189591A1 EP4189591A1 EP21759478.7A EP21759478A EP4189591A1 EP 4189591 A1 EP4189591 A1 EP 4189591A1 EP 21759478 A EP21759478 A EP 21759478A EP 4189591 A1 EP4189591 A1 EP 4189591A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- image
- interest
- asset
- shot
- insert
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 99
- 239000002131 composite material Substances 0.000 title claims abstract description 79
- 230000000007 visual effect Effects 0.000 title description 9
- 238000013528 artificial neural network Methods 0.000 claims description 71
- 230000033001 locomotion Effects 0.000 claims description 32
- 238000012545 processing Methods 0.000 claims description 23
- 230000000694 effects Effects 0.000 claims description 19
- 238000002372 labelling Methods 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 7
- 238000002156 mixing Methods 0.000 claims description 6
- 238000005562 fading Methods 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims description 4
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000002730 additional effect Effects 0.000 claims description 3
- 230000005669 field effect Effects 0.000 claims 4
- 238000003780 insertion Methods 0.000 abstract description 30
- 230000037431 insertion Effects 0.000 abstract description 30
- 230000008569 process Effects 0.000 abstract description 28
- 230000004044 response Effects 0.000 description 12
- 238000004590 computer program Methods 0.000 description 10
- 238000010801 machine learning Methods 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 8
- 230000009471 action Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 230000003068 static effect Effects 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000000875 corresponding effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000004075 alteration Effects 0.000 description 4
- 239000003086 colorant Substances 0.000 description 4
- 238000012937 correction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000009877 rendering Methods 0.000 description 3
- 238000013515 script Methods 0.000 description 3
- 241001235534 Graphis <ascomycete fungus> Species 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000012010 growth Effects 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241000251730 Chondrichthyes Species 0.000 description 1
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 235000013361 beverage Nutrition 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000012517 data analytics Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000002716 delivery method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000003973 paint Substances 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/036—Insert-editing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/60—Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client
- H04N21/63—Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
- H04N21/643—Communication protocols
- H04N21/64322—IP
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/81—Monomedia components thereof
- H04N21/812—Monomedia components thereof involving advertisement data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/85—Assembly of content; Generation of multimedia applications
- H04N21/854—Content authoring
Definitions
- the present application relates in general to the field of digital media compositing.
- the present disclosure is directed to a system and method for generating media overlays and integrating said overlays into digital media.
- This digital media may be provided to consumers through various services, including over-the-top (OTT) delivery systems.
- OTT over-the-top
- OTT delivery a method used to go "over" a cable box to give users access to media content, has been an increasingly popular implementation for digital media distribution over the internet.
- OTT provides high-bandwidth content over the internet
- many additional features may be added to delivery systems to enhance both the consumer experience and analytical applications.
- Many OTT systems have provided advanced data analytics features for tracking consumers and understanding the macro-habits thereof. For example, more consumer data may be gathered and further analyzed using the metadata provided by an OTT device (e.g., computer, mobile device) and information provided by the consumer (e.g.. consumer interests, hobbies).
- OTT device e.g., computer, mobile device
- information provided by the consumer e.g. consumer interests, hobbies
- Small features for enhancing the consumer experience exist: however, these features may prove difficult to scale.
- One such example of unscalable features includes compositing and further integrating personal digital media alterations onto
- SUBSTITUTE SHEET (RULE 26) between consumers and needs to be applied individually for each consumer.
- Each composite generally requires a visual effects artist to manually paint the brand onto the OTT content using a graphical user interface (GUI).
- GUI graphical user interface
- the system as provided in the present disclosure may include an automated identification module.
- This automated identification module may execute a custom Automated Placement Opportunity Identification (APOI) engine.
- APOI Automated Placement Opportunity Identification
- This APOI engine may be used to tag and/or label content based on visual features.
- the visual features being identified may include flat surfaces, locations, particular objects, scenery characteristics, etc.
- the APOI engine may incorporate one or more neural networks for detecting indi vidual shots of a digital media set, generating labels associated with the visual features identified in each shot, and determining objects of interest that are mapped across the individual shots.
- the APOI engine and the one or more neural networks therein may be trained by analyzing labels generated in the past and confirmed as accurate.
- the system as provided m the present disclosure may further include a Placement Insertion Interface (PII) system that allows digital media clients to easily explore available placements for composites to be inserted throughout available digital media.
- PII Placement Insertion Interface
- This PII system may further include an upload tool for digital media clients to upload their own visual assets to be composited.
- the sy stem as provided in the present disclosure may also include an automated compositing service, according to some embodiments.
- This automated compositing service may automate the integration of composites onto digital media in a programmatic manner.
- the automated compositing service may analyze digital media provided by a digital media client to identify areas of interest for inserting thereto a creative graphic.
- areas of interest may include flat surfaces, common objects, text, or other data
- the creative graphic may include a logo or product intended for insertion into the areas of interest of the digital media. Dimensions of the creative graphic or the features provided therein may be altered in order to fit, replace, or otherwise composite onto the area of interest as identified in the digital media, according to some embodiments.
- the automated compositing service as provided by the present disclosure may further include combining a base layer image and an insert layer image to form a composite image.
- the combining as performed by the automated compositing service may include adding one or more layers to the base layer image, such as a creative graphic layer, an alpha layer, a shadow layer, a reflection layer, among others.
- the automated compositing service may further include inserting or otherwise applying to the composite image a one or more effects, such as adding motion blur to a video, adding depth of field blur to composites intended to be viewed out of focus, and color correction effects for creating the illusion that all of the composited layers appear genuine in the scene.
- the system as provided in the present disclosure may further include a preview system that allows digital media clients to quickly preview demo composites.
- the preview system may use standard media assets rather than the custom creative asset(s) of digital media clients, according to some embodiments.
- the preview system may use custom creative asset(s) of digital media clients, as well as other uploaded or otherwise provided assets.
- the preview system as provided in the present disclosure also allows digital media clients to push composites, whether predetermined or dynamically generated, onto digital content after approving said preview, according to some embodiments. Pushing composites may require additional steps before execution, including but not limited to, bidding by way of
- pushing composites may include generating a fully rendered composite into digital media assets as provided by the digital media clients.
- the present disclosure provides for a method of and a system for pre-processing digital media, the system executing the method comprising; receiving a digital media dataset; detecting, by way of one or more neural networks, one or more shots within the digital media dataset, wherein each shot is identified by way of boundary indicators; generating, by way of the one or more neural networks, contextual labels for each shot, wherein each contextual label correlates to a characteristic of each respective shot of the digital media dataset; extracting an array of images for each shot, wherein one or more images of the array comprise one or more objects of interest; detecting, by way of the one or more neural networks, objects of interest for each image of the array of images of each shot; determining, by way of the one or more neural networks, objects of interest to be mapped; mapping, by way of the one or more neural networks, an object of interest of a first image of the array of images of a first shot to an object of interest of a second image of the array of images of the first shot, wherein the object of interest of
- the boundary indicators may include shot-by- shot animations, including one or more of the following; black screens, rapid pixel deltas, dissolving animations, and fading animations.
- determining objects of interest to be mapped comprises; determining matching objects of interest between
- the present disclosure further comprises: generating, by way of the one or more neural networks, contextual labels for each shot, wherein each contextual label correlates to a characteristic of each respective shot of the digital media dataset.
- the generating contextual labels is handled in a prioritized order according to a first priority set of characteristics and a second priority set of characteristics.
- the first priority set of characteri stics comprises visually flat surfaces.
- the second priority set of characteris tics comprises one or more of the following: common objects visually present within the digital media dataset; text visually present within the digital media dataset; categorical data representative of the scene as presented in the digital media dataset; and audio data comprising recognizable speech provided in the digital media dataset.
- the one or more neural networks are at least partially trained on data manually labelled by a human user.
- the present disclosure provides for a method of and a system for digital image composition, the system executing the method comprising: receiving a primary image asset comprising a plurality of areas of interest; automatically identifying first and second ones of the areas of interest to include in a composite image; receiving a secondary image asset comprising one or more features of interest; automatically identifying a first one of the features of interest to include in the composite image; and generating the composite image by combining at least a portion of the primary image asset that includes the first and second areas of interest with at least a portion of the secondary image asset that includes the first feature of interest, wherein the combining comprises
- SUBSTITUTE SHEET (RULE 26) compositing the at least a portion of the secondary image asset and the at least a portion of the primary image asset.
- automatically identifying the first area of interest comprises: extracting, by way of one or more neural networks, characteristics of the primary image asset, the characteristics representative of visually flat surfaces located at a particular location in the primary image asset; generating a confidence value for each of the characteristics of the primary image asset; determining a best characteristic, wherein the best characteristic comprises a highest one of the confidence values; and labelling, by way of the one or more neural networks, the particular location associated with the best characteristic as the first area of interest.
- automatically identifying the second area of interest comprises: extracting, by way of one or more neural networks, characteristics of the primary image asset, the characteristics indicative of a particular location in the primaiy image asset; generating a confidence value for each of the characteri stics of the primary image asset; determining a best ch arac teristi c, wherein the best characteristic comprises a highest one of the confidence values; labelling, by way of the one or more neural networks, the particular location associated with the best characteristic as the first area of interest.
- the characteristics of the primary’ image asset include one or more of the following: common objects visually present within the primary image asset; text visually present within the primary image asset; categorical data representative of the scene as presented in the primary image asset; and audio data comprising recognizable words or speech provided with the primary image asset.
- automatically identifying the first feature of interest comprises automatic logo identification as provided by one or more neural networks.
- creating the composite image further comprises manipulating dimensions of the composited image assets to match a predetermined output dimension.
- the primary image asset is indicati ve of a digital video asset comprising a senes of image assets, wherein the method is programmatically repeated for each image asset of the series.
- each series of image assets are extracted from a digital video asset by: receiving the digital video asset, processing, by way of one or more neural networks, pixels of the digital video asset; identifying, by the one or more neural networks, a first shot boundary of the digital video asset and a second boundary of the digital video asset; extracting one or more video frames located between the first shot boundary and the second shot boundary of the digital video asset; and generating a series of image assets from the one or more video frames as extracted.
- the present disclosure provides for a method of and a system for digital image composition, the system executing the method comprising: receiving as input a base layer image and an insert image; identifying a base layer area in the base layer image for placing the insert image; creating an insert layer image having dimensions corresponding to dimensions of the base layer image, wherein the insert layer image comprises the insert image placed within an insert layer area in the insert layer image corresponding to the base layer area; and combining the base layer image and the insert layer image to form a composite image.
- the base layer image comprises a frame of a video.
- the base layer area comprises a surface of an object depicted in the base layer image.
- the insert layer image further comprises a transparent area surrounding the insert layer area.
- the present disclosure further comprises modifying the insert image to fit within the base layer area.
- the present disclosure further comprises creating an alpha layer image having dimensions corresponding to dimensions of the based layer image, wherein the alpha layer image comprises a cut-out or an application
- SUBSTITUTE SHEET (RULE 26) area corresponding to the insert layer area for applying additional effect layers thereto.
- the present disclosure further comprises determining that a first object depicted in the base layer image appears closer than a second object within the base layer area, and wherein the combining comprises depicting at least a portion of the first object in front of the insert image in the composite image.
- the present disclosure further comprises creating a shadow layer image comprising one or more shadows of one or more objects depicted in the base layer image, wherein the one or more shadows are disposed within the insert layer area.
- the combining comprises blending the shadow layer image with the composite image
- the present disclosure further comprises creating a reflection layer image comprising one or more reflections of one or more objects depicted in the base layer image, wherein the one or more reflections are disposed within the insert layer area.
- the combining comprises blending the reflection layer image with the composite image.
- the present disclosure further comprises adding motion blur to the insert image within the insert layer area to simulate motion over a period of time.
- the present disclosure further comprises adding depth of field blur to the insert image within the insert layer area to simulate a difference in focus.
- the present disclosure further comprises generating respective composite images for a sequence of base layer images corresponding to frames in a video using the insert image.
- SUBSTITUTE SHEET (RULE 26) merely intended to teach a person of skill in the art farther details for practicing aspects of the present teachings and is not intended to limit the scope of the claims. Therefore, combinations of features disclosed above in the detailed description may not be necessary to practice the teachings in the broadest sense, and are instead taught merely to describe particularly representative examples of the present teachings.
- Figure 1 illustrates a flowchart of the main components of the present disclosure, according to some embodiments.
- Figure 2 illustrates an automated placement opportunity identification engine, according to some embodiments.
- Figure 3 illustrates GUI elements of a Placement Insertion Interface (PII) system, according to some embodiments.
- PII Placement Insertion Interface
- Figure 4 illustrates a flowchart detailing the methods performable by a placement video clip tool, according to some embodiments.
- Figure 5A illustrates various creative graphic fit placements, according to some embodiments.
- Figure 5B illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Figure 5C illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Figure 6 illustrates the events that precede and succeed an automated compositing sen ice. according to some embodiments.
- Figure 7 illustrates an on-top composite logic process, according to some embodiments.
- Figure 8 illustrates an on-top composite logic, according to some embodiments.
- Figure 9 illustrates graphic insertion compositing logic, according to some embodiments.
- Figure 10 illustrates an exemplary insertion of a motion blur effect, according to some embodiments.
- Figure 11 illustrates an automated compositing service, according to some embodiments.
- OTT Over the Top
- machine learning or neural network systems may be beneficial to the implementation of automated placement opportunity identification engines.
- disclosed herein are exemplary embodiments of systems and methods for facilitating an automated placement opportunity identification engine using machine learning.
- the system may actively employ numerous machine learning methods, including neural networks, working in tandem to process input data and identify placement opportunities within digital media.
- a neural network may be used as a pre-processing mechanism for other neural networks.
- Figure 1 illustrates a flowchart of the main components of the present disclosure presented for demonstrative purposes only, according to some embodiments.
- the main components of the present disclosure may include content analysis for placement identification at 102.
- placement identification may include identifying a placement video for placement opportunities as described below.
- the main components of the present disclosure may further include selecting a graphic at 104.
- Graphic selection 104 may include selecting a pre-uploaded or previously available graphic for compositing into a placement video, Graphic selection 104 may further include uploading a new graphic by way of a Graphical User Interface displayed to a user. Graphic selection 104 allows a user to select which graphic is desired for compositing.
- the main components of the present disclosure may further include manipulating the desired graphic in order to best fit the placement video at 106.
- This process may include manipulation of the graphic by a programmatic process or manually adjusted in order to alter the rotation, skew, and/or color of said graphic to more closely resemble the placement video, according to some embodiments.
- Some embodiments may further include manipulating the graphic using one or more of a compositing or combination procedures. These procedures may be used to generate a manipulated graphic based on a combination of graphics, logos, texts, or other creatives provided or otherwise indicated by the user. Alternatively, these procedures may generate the manipulated graphic according to instructions determined or otherwise calculated by the system without instruction from a user.
- the main components of the present disclosure may further include compositing the manipulated graphic onto a placement location of the placement video at 108.
- compositing procedure at 108 may include a predetermined, programmatic methodology or automated process as indicated in Figure 1.
- the mam components of the present disclosure may further include displaying for the user a preview of the manipulated graphic, composited onto the placement location of the placement video at 110.
- Preview procedure at 110 may include generating a graphical user interface that displays for the user a generated preview, according to some embodiments.
- the main components of the present disclosure may further include delivering to the user a final output video comprising the manipulated graphic composited onto the placement location therein as shown at 1 12.
- Delivery procedure at 112 may include delivering the final output video by way of a communication protocol designed for file transfer, such as the IP protocol suite (e.g., TCP, UDP, FTP), or any other digital delivery- method.
- IP protocol suite e.g., TCP, UDP, FTP
- Compositing images onto digital media may be implemented through numerous steps as provided by the present system.
- the first step in order to implement the present system involves an Automated Placement Opportunity Identification engine.
- the Automated Placement Opportunity Identification engine may use one or more machine learning algorithms to identify placement opportunities within digital media.
- placement opportunities may include flat surfaces such as billboards, walls, sides of buildings, tables and desks, counter tops and bars, screens (e.g., digital screens, computer screens, monitors, etc.), signage, and/or posters.
- FIG. 2 illustrates an automated placement opportunity identification engine, according to some embodiments.
- the Automated Placement Opportunity Identification engine may receive a digital media dataset at 202, according to some embodiments. In order to identify the boundary (e.g., cuts, dissolves, fades) of a single shot, the Automated Placement Opportunity Identification engine may rapidly preprocess the digital media using a
- the shot boundary detection mechanism may utilize a pretrained neural network model that receives as input the pixels of digital media and outputs final shot boundaries therefrom.
- This neural network may be fully convolutional in time, allowing it to use a large temporal context without continuously processing frames. More information regarding such a shot boundary detection mechanism is described in Ridiculously Fast Shot Boundary Detection with Fully Convolutional Neural Networks (Gygli, Michael, May 23, 2017), and is hereby incorporated as reference.
- One or more neural networks may also be used to label the context of each shot at
- the context recognition engine may implement one or more various neural networks (pre-trained or otherwise) to identify the context of a scene, environment, location, or other data used to describe the context of a parti cular media.
- the contextual labels may be used as input to one or more neural networks to identify objects of interest, including placement opportunities, at 208.
- Placement opportunities may include object type recognition (e.g., cars, computers, food, beverage, etc.), scene/contextual recognition (e.g., office, outdoors, mountains, home, kitchen, etc.), audio/ speech recognition and categorization (e.g., subject of conversation/dialogue, keyword mapping, full transcriptions, etc.), and sensitive content (violence, nudity', alcohol, illicit drags, etc.), according to some embodiments.
- object type recognition e.g., cars, computers, food, beverage, etc.
- scene/contextual recognition e.g., office, outdoors, mountains, home, kitchen, etc.
- audio/ speech recognition and categorization e.g., subject of conversation/dialogue, keyword mapping, full transcriptions, etc.
- sensitive content violence, nudity', alcohol, illicit drags, etc.
- the Automated Placement Opportunity Identification engine may be implemented using various techniques.
- the Automated Placement Opportunity Identification engine may incorporate pre-trained neural networks that have been trained using publicly available computer vision datasets (e.g., Imagenet). These neural networks may be trained using this public data to learn and identify different labels, each of w'hich may be associated with a placement opportunity as described above.
- the Automated Placement Opportunity Identification engine may further incorporate a transformation of pre-trained neural networks to more accurately represent the intended models, according to some embodiments. Transforming pre-trained neural networks may include re-training, layer manipulation, progressive mutations, recurrent training, or any other alterations to a publicly -available, pre-trained neural net model.
- the Automated Placement Opportunity Identification engine may implement a custom neural net model that is trained by humans using a manual computer vision annotation tool, according to some embodiments.
- the computer vision annotation tool may allow a user to gather images for annotation. The user may then annotate and assign labels (e.g., using bounding boxes') to areas of the gathered images that the user identifies as placement opportunities. These labels are then used to train a custom neural net model for label identification purposes.
- Automated Placement Opportunity Identification engine may further implement an object tracking mechanism at 210, according to some embodiments.
- the object tracking mechanism may be used to match objects (e.g., placement opportunities) across various frames of a moving scene, fire object tracking mechanism may further be used to identify objects across various camera angles of the same scene at 212. By estimating depth and 3D geometry from 2D frames, the object tracking mechanism may be able to identify placement opportunities, according to some embodiments.
- the Automated Placement Opportunity Identification engine may output, at 214, the digital media dataset, indications of placement opportunities, as well as the labels added thereto.
- Compositing images onto still images may require a trivial amount of work. Analyzing a still image to detect availability for placing a composite image requires analysis of only one frame of a single image. Expanding this service to other media formats other
- SUBSTITUTE SHEET (RULE 26) than still images (e.g., video data) will benefit from further analysis and/or additional machine learning methods.
- the Automated Placement Opportunity Identification engine may use the above placement opportunity data points in at least two ways.
- a first way that the placement opportunity data points may be used is as an auditing tool for use by a human to evaluate the identified placement opportunity to determine whether or not to proceed with compositing. This may be used to reduce labor costs of analyzing video data for placement opportunities,
- the placement opportunity data points may be used in a search query to filter through the digital media available for compositing.
- the implementation of this search query may be used to identify many various aspects of a scene, including particular objects, scenery, dialogue category , presence of sensitive content, etc.
- This search query implementation may be integrated into a Placement Insertion Interface (PII) system as an inventory browsing tool.
- PII Placement Insertion Interface
- FIG 3 illustrates a placement inventory browsing tool 300 of a PII system, according to some embodiments.
- a user may use the placement inventory browsing tool to browse the inventory of digital media available to receive composites.
- This inventory may be organized by highly specific, individual placements of composites.
- This inventory may also be browsed by context as identified by the Automated Placement Opportunity Identification engine.
- Some embodiments may be browsed using other features as identified by the Automated Placement Opportunity Identification engine, the features including keywords, genres, formats, etc.
- Placement inventory browsing tool 300 includes a graphical user interface that displays options for browsing through the inventory of digital media available to recei ve composites.
- placement inventory browsing tool 300 includes a search bar utility 302, a genre selection utility 304, and a format selection utility 306.
- search bar utility 302 may receive keyword searches as show in Figure 3.
- Search bar utility 302 may also be a drop-down list, radio button, or any other graphical user interface element used to receive input, according to some embodiments.
- Genre selection utility 304 may receive one or more user selections from a drop-down list as shown in Figure 3.
- Genre selection utility 304 may also be a search bar, radio button, or any other graphical user interface element used to receive input, according to some embodiments.
- Genre selection utility 304 may provide selections such as Comedy, Horror, Action, Reality, and many other genres. Further yet, according to some embodiments, format selection utility 306 may receive one or more user selections from a drop-down list as shown in Figure 3. Format selection utility 306 may also be a search bar, radio button, or any other graphical user interface element used to receive input, according to some embodiments. Format selection utility 306 may provide selections such as In- Action Six, Overlay, Brand Insertion, Product Insertion, and many other formats.
- the graphical user interface of placement inventory browsing tool 300 may further include a search button 308, shown as "GO" in Figure 3.
- button 308 may be used to activate a search query.
- button 308 may fetch the query terms as provided by the user by way of GUI elements displayed on screen, such as search bar utility 302, genre selection utility 304, and format selection utility 306, according to some embodiments.
- Activation of search button 308 may further return results 310 based on a user's selections.
- the search query as shown in Figure 3 includes a keyword search for "New York City" in search bar utility 302.
- the search query includes a keyword search for "New York City" in search bar utility 302.
- SUBSTITUTE SHEET (RULE 26) as shown in Figure 3 further includes comedy in the genre selection utility 304, and all formats in the format selection utility 306.
- the search query as shown in Figure 3 returns at least two results 310: result 310A ("Broad City") and result 310B ("Jimmy Kimmel").
- Each of the results 310 may include a preview of the clip, a placement ID number, a program title, and a supply source, according to some embodiments.
- result 310A includes a placement ID number of 10124, a program title of "Broad City ,” and a supply source of "Viacom.”
- the graphical user interface of placement inventory browsing tool 300 may further include an upload button 312, shown as "Upload Asset” in Figure 3.
- upload button 312 may display for the user a graphical user interface whereby the user may upload his/her own digital media asset, according to some embodiments.
- the returned results in response to activation of a search button 308 may be fetched from a supply database of placements.
- the video clips as previewed in results 310 may be activated by a play button shown in the center of the video clip, according to some embodiments.
- this play button may activate a fetching protocol in which a preview video clip may be fetched from a server that hosts actual video assets of the placement video clips returned as results.
- activation of this play button may activate a preview of the video clip for the user's viewing, according to some embodiments.
- a preview of the video clip may provide completed composites previously rendered by other users. This may be done in order to show an example of how a composite looks when inserted into a particular video clip.
- FIG. 4 illustrates a placement video clip tool 400, which may allow users to setup new' placement video clips (also known as pre-composited versions of a specific shot from a specific digital media content video).
- a user may be able to upload a new creative graphic, according to some embodiments.
- a user may be able to upload a new creative graphic, according to some embodiments.
- SUBSTITUTE SHEET (RULE 26) may then select a placement video clip to preview a creative graphic composited thereon as shown m Figure 4 at 404.
- a user may also be able to simply select a placement video clip to preview without uploading a new creative graphic, according to some embodiments.
- a creative graphic may be newly uploaded at 406 or, alternatively, select a previously- uploaded creative graphic for the placement video clip at 407, such as the creative graphic uploaded at 402.
- a creative graphic may be uploaded either directly from a specific placement preview or from a distinct "upload" page. In either case, an uploaded creative can be inserted into any matching placements.
- the creative graphic may be programmatically adjusted to best fit the placement within the placement video clip.
- this programmatic adjustment may be accomplished through computer vision, permutationary rendering, or any other rendering technologies to provide one or more "best fit" options to be selected by the user.
- the user may then select one of the "best fit” options.
- the user may then edit the creative graphic by way of creative graphic editing tools for manual adjustments to more closely fit the placement at 408.
- a composited video clip will then be created and a preview rendering may be generated in order for the user to preview the composited video clip at 410.
- Figure 5A illustrates some "best fit" options that may be presented to a user as described above, according to some embodiments.
- the options may be presented in various ways by way of a user-interactive GUI, such as the GUI shown in Figure 5.
- best fit options 500 may include a fill mode 502 and a fit mode 504,
- SUBSTITUTE SHEET (RULE 26) among others.
- Other best fit options may be presented to a user, such as "stretch to fit,” “fit entirely,” and even more advanced modes such as programmatic skewing to account for various angles presented in placement video clips.
- the present technology may further recognize the best area of a creative graphic to display in a particular placement video clip.
- the present technology may include an area selection optimizer (ASO) engine, according to some embodiments.
- ASO area selection optimizer
- Area selection optimizer (ASO) engine may be used to programmatically recognize the optimal area of a creative graphic to display within the placement area of a placement video.
- ASO engine may be used to identify various features that typically indicate the focus of a graphic and may, according to some embodiments, extract such a feature for insertion into a placement video.
- ASO engine may further include logo identification, intelligent cropping, and optimal resizing,
- the ASO engine and APOI engine may implement a Guassian, machine learning, or otherwise computer vision algorithm to identify logos, faces, or other important features from a user's uploaded media or other media for use as a creative graphic.
- the ASO engine and APOI engine may use computer vision algorithms to analyze the pixel colors, brightness, and intensity to select a region that is a local minima with respect to brightness, as well as large enough for placement of a creative graphic, such as a logo, text or other overlay of interest.
- a creative graphic provided by the user may be altered according to instructions determined by the ASO engine.
- the creative graphic may also include a combination of one or more creative graphics composited onto or otherwise combined with each other.
- ASO engine may be used to recognize the most important features or otherwise an optimal area of the creative graphic before editing (e.g., 408) the creative graphic.
- the editing process may be programmatically enabled to include the features as recognized by the ASO engine.
- ASO engine may perform analytics on a creative graphic without altering or otherwise permanently changing the creative graphic.
- the ASO engine may perform analytics on a copy of the creative graphic in order to preserve the original creative graphic file.
- a creative graphic may be repeatedly analyzed, copied, and/or manipulated for placement in an unlimited number of placement video clips. For example, if a user uploads a creative graphic for placement in a first placement video clip, the said creative graphic may be copied, analyzed, further placed into the first placement video clip, preserving a copy of the said creative graphic.
- a user may then analyze and further place the same creative graphic (or a copy thereof) preserved from the previous upload across any number of placement video clips in the future
- the ASO engine may use one or more machine learning algorithms to identify important features or otherwise an optimal area of the creative graphic to include in a placement video clip. Similar to the Automated Placement Opportunity Identification engine, the machine learning algorithms as applied herein may be trained using training data provided by successful manipulation and placements of creative graphics, according to some embodiments.
- Some examples of important features identified by the ASO engine may include, but are not limited to, a face of an indi vidual, faces of a group of individuals, a group of people more generally, a prominent object of interest provided in the creative graphic, multiple objects of interest as provided in the creative graphic, objects or people at the center of the frame or alternatively in focus as provided in the creative graphic, among others.
- Important features identifiable by the ASO engine may further include, according to some
- SUBSTITUTE SHEET (RULE 26) embodiments, logos, icons, emblems, marks, designs, logotype designs, or other unique symbols associated with a company, organization, group, or individual.
- FIG. 5B illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Exemplary ASO procedure 508 may include receiving a creative graphic 510 to identify or otherwise extract an important feature therein.
- Creative graphic 510 may include therein one or more important features identifiable by an ASO engine.
- creative graphic 510 includes features such as buildings, street lights, and a group of people 512.
- ASO engine 514 may be trained using training data including other creative graphics with prelabeled important features.
- ASO engine 514 may receive creative graphic 510 to identify important features therein and label them accordingly. Labeling may include applying a bounding box or other notation to a portion of creative graphic 510 to indicate that an important feature may be located therein.
- ASO engine 514 may determine that group of people 512 is an important feature of creative graphic 510 and apply thereto a label 516. According to some embodiments, ASO engine 514 may extract important features from creative graphic 510 (or a copy thereof) in addition to or instead of labeling. For example, ASO engine 516 may extract an identifiable feature 520 from creative graphic 510 (or a copy thereof) by eliminating therefrom features not identified as important by ASO engine 514 (e.g., buildings and street lights), leaving only an extracted group of people as the identified important feature 520.
- Figure 5C illustrates an exemplary area selection optimization procedure, according to some embodiments.
- Exemplary ASO procedure 520 includes receiving one or more creative graphics to identify or otherwise extract a logo or icon therefrom.
- ASO procedure 520 demonstrates ASO engine 526 receiving two different creative graphics, such as bottle graphic 522 and automobile graphic 524, both of which have a logo contained
- ASO engine 526 may be the same ASO engine 516 as described in ASO procedure 510 trained using training data similar to that of ASO engine 516 along with additional training data. Alternatively, ASO engine 526 may be separate from ASO engine 516. According to some embodiments, ASO engine 526 may be trained using training data including other creative graphics with pre-labeled logos contained therein. For example, ASO engine 526 may receive bottle graphic 522 for analysis, identifying and further extracting an important feature, such as logo 528, therefrom.
- ASO engine 526 may receive a different graphic for analysis, such as automobile graphic 524, to identify and further extract an important feature, such as logo 528, therefrom.
- ASO engine 526 may extract important features (e.g., logo 528) from a creative graphic (e.g., bottle graphic 522, automobile graphi c 524) irrespective of what the creative graphic displays.
- FIG. 6 illustrates the events that precede and succeed an automated compositing service, according to some embodiments.
- an HTTP request may be triggered at 602.
- This HTTP request at 602 may transmit information by way of a compositing service API.
- This information may include, but is not limited to the following data: placement ID, placement format number, creative asset ID, BG color, and video fit.
- Placement format number may include one or more of the following:
- HTTP request at 602 may be an automated scheduled job that continually checks for newly uploaded creative graphics.
- the Compositing service API 604 as shown in Figure 6 may query database tables to gather more information and assets that the compositing job may need, such as those indicated or otherwise requested by HTTP post 602.
- compositing service API 604 may transmit a query request 606 to a first database table, OTT placements table 608.
- OTT placements table 608 may transmit a response 610 containing bounding box coordinates that specify the positions of video and creative assets in the composited output.
- the coordinates transmitted at response 610 may be static or otherwise dynamic for the duration of the placement video clip.
- compositing service API 604 may further transmit a query request 612 to a second database table, creative assets table 614.
- creative assets table 614 may transmit a response 616 containing a creative ID to get the public URLs of the actual creative graphics (e.g., images, GIFs, video), as well as a headline and caption.
- compositing service API 604 may generate a compositing job using information received from responses 610 and 616, among other data. Compositing service API 604 may further transmit compositing job 618 as a queue request into queuing system 620. Compositing job 618 may contain data gathered by compositing service API 604, including one or more of: placement ID(s), format number(s), creative asset ID(s), video fit type(s), compositing variables, original content clips, and a combination thereof, among other data. Compositing variables may include, but are not limited to, bounding boxes and background colors, among others. According to some embodiments,
- SUBSTITUTE SHEET (RULE 26) queuing system 620 may transmit a response 622 to compositing service API 604, response 622 including a task ID and a queue time, among others.
- Data received by compositing sendee API 604 from responses 610, 616, and 622 may be transmitted to and stored in composite processes database table 626 for later reference or retrieval.
- Compositing job 618 be stored at queueing system 620 until it is passed into compositing service 628, shown as a complex web of logic nodes. Compositing service 628 will be further described below.
- the output 630 of compositing service 628 may be a composited version of the original placement video clip, according to some embodiments.
- the name of output 630 may use a variety' of naming conventions, including those based on the placement video ID.
- output 630 of compositing service 628 (e.g., composited version of the original placement video clip) may be uploaded to composite directory 632 and stored with a render ID for later use.
- the naming convention of output 630 may be used to generate an access URL 634 for storage and later retrieval at composite processes database table 626 for later reference and retrieval.
- compositing sendee API 604 may query composite processes database table 626 to receive a response 632 containing access URL 634.
- compositing service API may use the access URL 634 to fetch and reuse the already composited output 630 from composite directory 632.
- the Automated Compositing service as described above may encompass generating at least four OTT formats:
- the In-Action Six format may be used to composite a second video into a small portion of the frame while a first video is shrunk into another small comer of the same frame.
- the Overlay format may be used to simply overlay a second video onto the comer of a first video.
- the Brand Insertion format may be used to realistically composite still images into a scene of a video.
- the Product Insertion format may be used to composite 3D objects into a scene of a video. Both the In- Action Six format and the Overlay format may be considered on-top compositing, while the Brand Insertion format and Product Insertion format may be considered compositing into the scene.
- Figure 7 illustrates an on-top composite process, specifically an in-action six compositing logic 700.
- a Super Bowl video stream could be used as the original content video clip 702 as shown in Figure 7.
- the original content video clip 702 may be shrunk into a smaller portion of the frame as shown as "Squeezing Back" at 704.
- the original content video clip 702 squeezes back from filling the full screen to at least a partial portion of the screen.
- original content video clip 702 is confined by the original content bounding box detailed by the compositing variables (e.g., bounding boxes, background colors, etc.) as described above.
- creative content 708 may include one or more of the following: a creative video clip 710 and a headline & caption 712. Creative video clip 710 may be confined by a bounding box as shown in Figure 7. At 706, creative content 708
- SUBSTITUTE SHEET may fade onto the screen at various places and sizes, according to some embodiments. This fading process at 706 may be described as a sliding gradient from 0% opacity to 100% opacity within a predetermined time frame (e.g., 2 seconds).
- Creative content 708 may or may not be dynamic, according to some embodiments.
- Static and dynamic content may be displayed by creative content 708, according to some embodiments.
- original content video clip 702 scales back to 100% of the frame size.
- creative content 708 is static, for example, the original content video clip 702 may scale back to 100% of the frame size after a predetermined period of time (e.g., 6 seconds).
- the compositing process used to accomplish such a scaling effect of the original content video clip 702 and the insertion of creative content 708 may be described as the on- top compositing logic. According to some embodiments, this on-top compositing logic may utilize the following elements:
- Bounding box (x, y, w, h) for a creative video clip such as creative video clip 710;
- Bounding box (x, y, w, h) for creative content such as creative content 708;
- another format that uses on-top composite logic may include an overlay format.
- Figure 8 illustrates an on- top composite process, such as overlay logic 800.
- a shark week video stream could be used as the original content video clip 802 as shown in Figure 8.
- the creative content 804 is confined by the original content bounding box detailed by the compositing variables (e.g., bounding boxes, background colors, etc.) as described above.
- Creative content 804 may include a dynamic or static creative video clip (e.g., a creative GIF, video, or static image). Creative content 804 may be confined by a bounding box as shown in Figure 8.
- creative content 804 may further comprise a pre-specified background color, headlines and captions, among other information, according to some embodiments.
- Figure 8 shows creative content 804 in the lower third of the original content video clip 802. According to some embodiments, the creative content 804 fades in from 0% opacity to 100% opacity on top of the original content video clip 802.
- Overlay format logic contains the creative content composited onto the original content video clip to a single bounding box in at least some portion of the screen;
- Overlay format logic may not necessarily display all of the creative content on the screen at the same tune. For example, a first creative content may dissolve in and then dissolve out. Then, after the first creative content is dissolved out. a second creative content, such as a logo, may dissolve in.
- Bounding box (x, y, w, h) for a. creative content such as creative content 804;
- In-scene compositing logic may be used to accomplish graphic insertion formats and product insertion formats in order to generate their respective outputs.
- Figure 9 illustrates an in-scene composite process, specifically using graphic insertion compositing logic 900.
- Graphic insertion formats may consist of compositing messaging or graphics (e.g., creative graphics) onto fiat surfaces within a scene in order to create the illusion that the messaging is part of the scene that was previously filmed.
- the graphic insertion compositing service may use one or more of the following as inputs:
- SUBSTITUTE SHEET collection of base layer images, creative graphics for inserting, collection of coordinate values for creative graphic positioning, collection of alpha layer images, collection of shadow layer images, collection of reflect layer images, collection of motion blur values, collection of depth of field blur values, among others. It is worth noting that the size of each "collection” may be directly correlated to the total number of video frames (images) in the placement video (e.g., one image / value per video frame). Each of the layered inputs are described further below.
- the first layer of compositing service 900 as executed by graphic insertion formats is base layer 902
- Base layer 902 includes the original content for the placement video to be used as a background image, according to some embodiments.
- the second layer of compositing sendee 900 as executed by graphic insertion formats may be creative graphic layer 904.
- the coordinates for inserting a creative graphic may be identified through a computer vision process, such as the Automated Placement Opportunity Identification engine as described above.
- a third layer of compositing sendee 900 as executed by graphic insertion formats may be alpha layer 906.
- Alpha layer 906 closely resembles the original base layer 902, however, alpha layer 906 contains a "cut-out" or an application area of in which a creative graphic may be inserted. The cut-out or application area may be added on top of the creative graphic layer 904 in order to generate shadows, objects, or any elements in the scene that may cover up the creative graphic.
- this layer handles characters blocking the creative graphic and illustrates the motion thereof.
- generating an alpha layer may further include identifying measurements on a z-axis for objects within the
- SUBSTITUTE SHEET (RULE 26) displayed within the application area to determine which items or graphics are displayed by the alpha layer 906.
- a fourth layer of compositing service 900 as executed by graphic insertion formats may be shadow layer 908.
- Shadow layer 908 may generate realistic shadows blended into the environment of the scene. According to various embodiments, these shadows may be realistically inserted by using a multiply blend mode.
- a fifth layer of compositing service 900 as executed by graphic insertion formats may be reflect layer 910.
- Reflect layer 910 may generate reflections over the layers as described above in order to match the environment of the scene. According to various embodiments, these reflections may be realistically inserted by using a screen blend mode.
- the layers as described above are combined or otherwise composited together for a single frame of the entire placement video. The layering and compositing process are performed repeatedly for each frame of a placement video clip. For example, if a 1 minute video clip has a frame rate of 30 frames per second, this layering and compositing process may be performed once per frame for a total of about 1,800 times.
- the compositing process may further include numerous other compositing steps.
- other compositing steps 912 may be executed on the graphic after the other layers or alterations (e.g., 902-910) have been finalized.
- some embodiments provide for applying other compositing steps 912 prior to the other layers or graphics (e.g., 902-910) have been finalized or otherwise generated for application to the creative graphic.
- Other compositing steps 912 may include, but are not limited to, one or more of the following steps: motion blur effects 914, depth of field blur 916, and color correction 918.
- other compositing steps 912 may be applied during the generation or otherwise application of the layers and graphics as demonstrated through 902-910.
- one of the other compositing steps 912 may include motion blur effects 914.
- Motion blur effects 914 may be used to generate artificial camera motion blur and composites such a blur onto the creative graphic inserted at creative graphic layer 904, along with the layers composited thereabove (e.g., alpha layer 906, shadow layer 908, reflect layer 910), according to some embodiments.
- the amount of motion blur can be described as an integer or index that represents the number of samples to average together in between frames.
- depth of field blur Another type of blur that can be generated onto the above layers is depth of field blur, which is unrelated to the mo tion of the camera.
- a scalar representation may be used to estimate the amount of depth of field blur that can be used to artificially blur the creative graphic layer 904, along with tire layers composited thereabove.
- depth of field blur effects 916 may be used to generate artificial depth of field blur and composite such a blur onto the creative graphic inserted at creative graphic layer 904, along with tire layers composited thereabove (e.g., alpha layer 906, shadow layer 908, reflect layer 910).
- the amount of depth of field blur can be changed throughout a scene as the camera changes its focus as the scene plays out. As the depth of field changes throughout the video clip, the artificial depth of field blur will change as well. Therefore, the depth of field
- SUBSTITUTE SHEET (RULE 26) blur may be represented by a collection of values (one value for each video frame) rather than a single scalar for the entire video clip.
- One of the compositing steps 912 includes color correction 918, according to some embodiments.
- the creative graphic can be color corrected to match the color of the scene. This correction may include color hue adjustments to any of the RGB channels, adjustments to the alpha channel, brightness adjustments, contrast adjustments, or the addition of noise or grain. These adjustments may be made uniformly across the entire creati ve asset or non-uniformly based on the specific color and lighting condi tions of the placement.
- Figure 10 illustrates an exemplary insertion of a motion blur effect, according to some embodiments.
- a motion blur effect procedure 1000 may analyze an image before applying motion blur to a creative graphic.
- the creative graphic 1002 shows what a creative graphic may look like before a motion blur effect is applied using compositing logic.
- Creative graphic 1004 demonstrates what creative graphic 1002 would look like with motion blur effects applied using compositing logic as described above.
- motion blur effects may be determined or otherwise generated by Fast Fourier Transform calculations, Variance of LaPlacian kernels, focus-measure operators (e.g., gradient-based oerators, Laplacian-based operators, wavelet- based operators, statistics-based operators, DCT-based operators), or Gaussian-kemels, among others.
- the motion blur effect may be generated from a blur analysis of the pixels surrounding the placement.
- the blur applied to the advertisement at 1004 may be equivalent to the blur identified by a blur analysis of the pixels that make up the vehicle on which the advertisement may be placed.
- depth of field blur may be generated and otherwise applied in a similar manner, wherein the pixels surrounding the placement may be analyzed for depth
- SUBSTITUTE SHEET (RULE 26) of field blur and, thus, generates a blur to apply to the creative graphic.
- the system may track the x, y coordinates of the advertisement on the vehicle for inserting the creative graphic at such x, y coordinates. Similarly, this tracking may also be used to calculate the pixel deltas between frames of a video in order to determine the apparent speed of the vehicle or otherwise the motion blur applicable to the advertisement placed thereon.
- the generation and application of the motion blur as demonstrated in motion blur procedure 1000 may be accomplished through one or more steps as described in compositing service 900 (e.g., creative graphic 904, motion blur effects 914, among others)
- FIG 11 illustrates an automated compositing service, according to some embodiments.
- the automated compositing service 1100 receives as input at least a base image 1102 and a creative graphic 1110,
- base image 1102 may be one or more of a media dataset, such as an image, a single video frame, or multiple video frames.
- Automated compositing service 1100 may include one or more neural networks, such as a computer vision neural network 1104 and a compositing neural network 1108.
- Base image 1102 may be analyzed by computer vision neural network 1104 to determine scene parameters 1106.
- Scene parameters 1106 may include various characteristics of base image 1102, including, but not limited to, camera data, objects in the scene, context of the scene, transformations performed on the scene, light data of the scene, materials in the scene, geometry data of the scene, among other data related to base image 1102.
- the output of computer vision neural network 1104 may be used as input for compositing neural network 1108.
- compositing neural network 1108 may receive as input base image 1102, scene parameters 1 106, as well as creative graphic 1 1 10.
- Creative graphic 1110 may include an
- SUBSTITUTE SHEET (RULE 26) image, a logo, or a product, among other data, to be composited into the scene of base image1102.
- Compositing neural network 1108 may generate as output a composited image 1112, which may include a copy of base image 1102 with the creative graphic 1110 composited therein.
- Composited image 1112 may be in the same format as base image 1102. For example, if base image 1102 is a senes of video frames from a particular scene in a television show, composited image 1102 may include the same series of video frames with the creative graphic altered onto each frame.
- the output of compositing neural network 1108 may be used to update compositing neural network 1108.
- composited image 1112 may be compared to another composited image in order to retrain or otherwise identify improvements to be made through a reinforcement learning module 1114.
- Composited image 1112 may be analyzed using a loss/reward function of compositing neural network 1108 implemented by reinforcement learning module 1114 to identify the differences between a professionally composited image 1116 and composited image 1112.
- Professionally composited image 1116 may be generated by a human visual effects artist or otherwise previously identified as a good composite.
- Reinforcement learning module 1114 may then provide to compositing neural network 1108 instructions to shift node values using backpropagation methods in order for compositing neural network 1108 to generate an output more similar to a professionally composited image (e.g., professionally composited image 1116). Reinforcement learning module 1114 may directly instruct the backpropagation of compositing neural network 1108 or may alternatively provide the data for compositing neural network 1 108 to perform its own backpropagation, according to some embodiments.
- a professionally composited image e.g., professionally composited image 1116
- SUBSTITUTE SHEET (RULE 26)
- some or ah of the processing described above can be carried out on a personal computing device, on one or more centralized computing devices, or via cloud-based processing by one or more servers.
- some types of processing occur on one device and other types of processing occur on another device.
- some or all of the data described above can be stored on a personal computing device, in data storage hosted on one or more centralized computing devices, or via cloud- based storage.
- some data are stored in one location and other data are stored in another location.
- quantum computing can be used.
- functional programming languages can be used.
- electrical memory such as flash-based memory, can be used.
- General-purpose computers, network appliances, mobile devices, or other electronic systems may also be included in an example system implementing the processes described herein.
- a system can include a processor, a memory, a storage device, and an input/output device. Each of the components may be interconnected, for example, using a system bus.
- the processor is capable of processing instructions for execution within the system.
- the processor is a single-threaded processor.
- the processor is a multi-threaded processor.
- the processor is capable of processing instructions stored in the memory or on the storage device.
- the memory stores information within the system.
- the memory is a non-transitory computer-readable medium.
- the memoiy is a volatile memory unit.
- the memory is a non-volatile memory unit.
- the storage device is capable of providing mass storage for the system.
- the storage device is a non-transitory computer-readable medium.
- the storage device may include, for example, a hard disk
- SUBSTITUTE SHEET (RULE 26) device an optical disk device, a solid-date drive, a flash drive, or some other large capacity storage device.
- the storage device may store long-term data (e.g., database data, file system data, etc.).
- the input/output device provides input/output operations for the system.
- the input/output device may include one or more of a network interface device, e.g., an Ethernet card, a serial communication device, e.g., an RS- 232 port, and/or a wireless interface device, e.g., an 802.11 card, a 3G wireless modem, or a 4G wireless modem.
- the input/output device may include driver devices configured to receive input data and send output data to other input/output devices, e.g., keyboard, printer and display devices.
- driver devices configured to receive input data and send output data to other input/output devices, e.g., keyboard, printer and display devices.
- mobile computing devices, mobile communication devices, and other devices may be used.
- At least a portion of the approaches described above may be realized by instructions that upon execution cause one or more processing devices to carry out the processes and functions described above.
- Such instructions may include, for example, interpreted instructions such as script instructions, or executable code, or other instructions stored in a non-transitory computer readable medium.
- the storage device may be implemented in a distributed way over a network, such as a server farm or a set of widely- distributed servers, or may be implemented in a single computing device.
- the program instructions can be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.
- the computer storage medium can be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
- system may encompass all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers
- a processing system may include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
- a processing system may include, in addition to hardware, code that creates an execution environment for the computer program in question, e g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.
- a computer program (which may also be referred to or described as a program, software, a software application, a module, a software module, a script, or code) can be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
- a computer program may, but need not, correspond to a file in a file system.
- a program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code).
- a computer program can be deployed to be
- SUBSTITUTE SHEET (RULE 26) executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
- Computers suitable for the execution of a computer program can include, by way of example, general or special purpose microprocessors or both, or any other kind of central processing unit.
- a central processing unit will receive instructions and data from a read-only memoiy or a random access memory or both.
- a computer generally includes a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.
- a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks.
- a computer need not have such devices.
- a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few.
- PDA personal digital assistant
- GPS Global Positioning System
- USB universal serial bus
- Computer readable media statable for storing computer program instructions and data include all forms of nonvolatile memory, media and memoiy devices, including by v/ay of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memoiy devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks;
- semiconductor memory devices e.g., EPROM, EEPROM, and flash memoiy devices
- magnetic disks e.g., internal hard disks or removable disks
- magneto optical disks magneto optical disks
- SUBSTITUTE SHEET (RULE 26) and CD-ROM and DVD-ROM disks.
- the processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
- a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer.
- a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
- a keyboard and a pointing device e.g., a mouse or a trackball
- Other kinds of devices can be used to provide for interaction with a user as well ; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
- a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending web pages
- Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back end, middleware, or front end components.
- the components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.
- LAN local area network
- WAN wide area network
- the computing system can include clients and servers.
- a client and server are generally remote from each other and typically interact through a communication network.
- SUBSTITUTE SHEET (RULE 26) The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
- a reference to "A and/or B", when used in conjunction with open-ended language such as “comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- SUBSTITUTE SHEET (RULE 26) list
- “or” or “and/or” shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as “only one of or “exactly one of,” or, when used in the claims, “consisting of,” will refer to the inclusion of exactly one element of a number or list of elements. In general, the term “or” as used shall only be interpreted as indicating exclusive alternatives (i.e.
- the phrase "at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified.
- At least one of A and B can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Processing Or Creating Images (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/983,064 US11301715B2 (en) | 2020-08-03 | 2020-08-03 | System and method for preparing digital composites for incorporating into digital visual media |
US16/984,608 US11625874B2 (en) | 2020-08-04 | 2020-08-04 | System and method for intelligently generating digital composites from user-provided graphics |
US16/986,617 US10984572B1 (en) | 2020-08-06 | 2020-08-06 | System and method for integrating realistic effects onto digital composites of digital visual media |
PCT/US2021/044374 WO2022031723A1 (en) | 2020-08-03 | 2021-08-03 | System and method for preparing digital composites for incorporating into digital visual media |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4189591A1 true EP4189591A1 (en) | 2023-06-07 |
Family
ID=77499939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP21759478.7A Pending EP4189591A1 (en) | 2020-08-03 | 2021-08-03 | System and method for preparing digital composites for incorporating into digital visual media |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4189591A1 (en) |
WO (1) | WO2022031723A1 (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9294822B2 (en) * | 2006-11-23 | 2016-03-22 | Mirriad Advertising Limited | Processing and apparatus for advertising component placement utilizing an online catalog |
US9467750B2 (en) * | 2013-05-31 | 2016-10-11 | Adobe Systems Incorporated | Placing unobtrusive overlays in video content |
US9911223B2 (en) * | 2016-05-13 | 2018-03-06 | Yahoo Holdings, Inc. | Automatic video segment selection method and apparatus |
-
2021
- 2021-08-03 WO PCT/US2021/044374 patent/WO2022031723A1/en active Application Filing
- 2021-08-03 EP EP21759478.7A patent/EP4189591A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2022031723A1 (en) | 2022-02-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11625874B2 (en) | System and method for intelligently generating digital composites from user-provided graphics | |
US11783461B2 (en) | Facilitating sketch to painting transformations | |
US10657652B2 (en) | Image matting using deep learning | |
US10049308B1 (en) | Synthesizing training data | |
US10956784B2 (en) | Neural network-based image manipulation | |
WO2017190639A1 (en) | Media information display method, client and server | |
US10049477B1 (en) | Computer-assisted text and visual styling for images | |
CN102232220B (en) | Method and system for extracting and correlating video interested objects | |
US20140189476A1 (en) | Image manipulation for web content | |
EP1887526A1 (en) | A digitally-augmented reality video system | |
CN110390048A (en) | Information-pushing method, device, equipment and storage medium based on big data analysis | |
KR20120091033A (en) | Video content-aware advertisement placement | |
Zhang et al. | A comprehensive survey on computational aesthetic evaluation of visual art images: Metrics and challenges | |
US10984572B1 (en) | System and method for integrating realistic effects onto digital composites of digital visual media | |
Pęśko et al. | Comixify: Transform video into comics | |
US11126788B2 (en) | Font capture from images of target decorative character glyphs | |
EP3396964B1 (en) | Dynamic content placement in a still image or a video | |
Hu et al. | Video summarization via exploring the global and local importance | |
CN117726718A (en) | E-commerce product poster generation method based on artificial intelligence image generation tool | |
US11301715B2 (en) | System and method for preparing digital composites for incorporating into digital visual media | |
CN116954605A (en) | Page generation method and device and electronic equipment | |
Chen | Real-time interactive micro movie placement marketing system based on discrete-event simulation | |
EP4189591A1 (en) | System and method for preparing digital composites for incorporating into digital visual media | |
US20150181288A1 (en) | Video sales and marketing system | |
CN108737892B (en) | System and computer-implemented method for rendering media with content |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20230223 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230621 |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20240111 |