WO2022201151A1

WO2022201151A1 - System and method for measuring advertisements exposure in 3d computer games

Info

Publication number: WO2022201151A1
Application number: PCT/IL2022/050316
Authority: WO
Inventors: Jihad El-Sana
Original assignee: Mirage Dynamics Ltd
Priority date: 2021-03-21
Filing date: 2022-03-21
Publication date: 2022-09-29

Abstract

A system for measuring the level of exposure of players to advertisements displayed in computer games, comprising a workstation comprising at least one processor, for executing the computer game; a software module installed on the workstation being adapted to copy, at a predetermined rate, frames to be analyzed, from a frame buffer of the workstation, into a memory, such as shared memory; a computerized device comprising at least one processor, for processing and analyzing the copied frames by an independent deep learning application that runs in parallel to the game application. The deep learning application being adapted to extract features from each analyzed frame using a Convolutional Neural Network (CNN) model and localize the detection region of advertisements within the each analyzed frame using a Recurrent Neural Network (RNN) model.

Description

SYSTEM AND METHOD FOR MEASURING ADVERTISEMENTS EXPOSURE IN 3D

COMPUTER GAMES

Field of the Invention

The present invention relates to the field of digital advertisements using computer graphics. More particularly, the invention relates to a system and method for measuring the exposure of players to advertisements displayed in computer games (such as 3D computer games).

Background of the Invention

Computer games have been attracting the interest of many people across ages. Over the recent years, 3D photorealistic computer games have been leading the market of computer games, in terms of revenue. Such development motivated advertisement agencies to address this market.

Recently, computer games have been leveraging the opportunity to monetize their game space by injecting advertisements into the game field. For this purpose, game designers mark various spots, which are used to place personalized advertisements to be displayed to the player, within the game arena. Measuring the exposure of these advertisements has a great value for publishers. Currently, these advertisements are placed on the world coordinates of the game field, which are rendered into the screen, based on the view position and direction of the players. Players determine camera parameters, such as position and view direction, in real-time. As a result, measuring the exposure of the placed advertisements becomes a challenging task.

One convention approach to measure the exposure is to intercept the rendering pipeline (a conceptual model that describes what steps a graphics system needs to perform to render a 3D scene to a 2D screen) and explore the view frustum to search for advertisements. This approach could be implemented either in software or in hardware. The software implementation must be integrated within the game engine and dramatically reduce the rendering speed of the 3D game, since at every frame the software should compute the view frustum, determines if there are any advisements within that frame, and whether they are visible or not to the player (e.g., by identifying which polygon is the closest to the player's view frustum and its visibility).

The hardware implementation is performed by exploring the z-buffer (a type of data buffer used in computer graphics to represent depth information of objects in 3D space from a particular perspective) to determine the existence and visibility of injected advertisements. In this implementation, at each frame, the z-buffer is read to a memory and scanned to search for advertisement polygons, while halting the game for the time required to determine whether or not an examined polygon that has been identified as an advertisement is visible to the player).

However, both software hardware implementations require changes within the game engine or the game software and therefore, require the intervention of the game designer, to provide a specific API that allows measuring the exposure of the player to injected advertisements. They also impose additional processing time at each frame, which eventually reduces the game speed and deteriorates the game flow.

It is therefore an object of the present invention to provide a method for measuring the exposure of players to advertisements displayed in 3D computer games without altering the game engine or the game software.

It is another object of the present invention to provide a method for measuring the exposure of players to advertisements displayed in 3D computer games, which does not reduce the game speed. It is a further object of the present invention to provide a method for measuring the exposure of players to advertisements displayed in 3D computer games, which considers the time, size, and quality of the exposure.

Other objects and advantages of the invention will become apparent as the description proceeds.

Summary of the Invention

A method for measuring the level of exposure of players to advertisements displayed in computer games, such as 3D games, comprising the steps of: a) copying, at a predetermined rate, frames, to be analyzed from a frame buffer of a workstation executing the computer game, into a memory, such as a shared memory; b) processing and analyzing the copied frames by an independent deep learning application that runs in parallel to the game application, the deep learning application being adapted to: c) extract features from each analyzed frame using a Convolutional Neural Network (CNN) model; and d) localize the detection region of advertisements within the each analyzed frame using a Recurrent Neural Network (RNN) model.

The deep learning application may be further adapted to: a) compute the boundaries of each detected advertisement using a quadrilateral regression model; b) improve the accuracy of the computed boundaries using a refinement model. The level of exposure may be determined by the exposure time, the view angle and the number of pixels occupied by each advisement in the screen space of the computer game.

The rate of copying the frame buffer into the memory may depend on the rate application of the game.

The copied frames from the frame buffer may be stored in the memory at lower resolution, compared to the game resolution.

The deep learning models may be pre-trained on an appropriate advertisement dataset and are updated regularly, using a cloud-based training process, during which additional new advertisements are continuously added to the dataset.

The method may further comprise the step of measuring changes in the orientation of the advertisement by computing a homography transformation among the detected objects across consecutive frames.

The method may further comprise the step of using a tracking model to reduce the time required for ads detection, where the tracking model includes: a) an adaptive correlation module for detecting the displacement of an advertisement in a current analyzed frame, relative to the preceding analyzed frame; b) a Key-point correspondence module for calculating the location of key-points in a current frame with respect to the preceding frame; and c) a homography transformation calculation module for determining the orientation of the detected advertisement, based on the results of the adaptive correlation module and the key-point correspondence module. The method may further comprise the step of: a) assigning a unique ID is to each advertisement; and b) measuring, by the measurements module, exposure parameters that correspond to the advertisement.

The exposure parameters may include one or more of the following: the number of frames showing a particular advertisement; the screen size of the particular advertisement and its orientation,

The exposure pavements may be computed by the measurement module on the workstation that runs the game.

The exposure pavements may be computed by sending the frames to be analyzed to a remote server, which receives a frame or a set of frames that are compressed or down- sampled by the measurement module, and performs the frame analysis externally to the workstation.

A system for measuring the level of exposure of players to advertisements displayed in computer games, comprising: a) a workstation comprising at least one processor, for executing the computer game; b) a software module installed on the workstation being adapted to copy, at a predetermined rate, frames to be analyzed, from a frame buffer of the workstation, into a memory, such as shared memory; c) a computerized device comprising at least one processor, for: processing and analyzing the copied frames by an independent deep learning application that runs in parallel to the game application, the deep learning application being adapted to: extract features from each analyzed frame using a Convolutional Neural Network (CNN) model; and localize the detection region of advertisements within the each analyzed frame using a Recurrent Neural Network (RNN) model.

Brief Description of the Drawings

The above and other characteristics and advantages of the invention will be better understood through the following illustrative and non-limitative detailed description of preferred embodiments thereof, with reference to the appended drawings, wherein:

Fig. 1 illustrates a flowchart of the processing steps performed by the deep learning system, implementing a measurement module, according to an embodiment of the invention; and

Fig. 2 illustrates the process carried out by the tracking module allows detecting how an advertisement moves from frame to frame and what was the change in orientation, according to an embodiment of the invention.

Detailed Description of the Invention

The present invention provides a method for measuring the exposure of players to advertisements displayed in 3D computer games without altering the game engine or the game software and without reducing the game speed. The performed measurement considers the time and quality of the exposure to the game player. Since the proposed method does not require any changes on the game engine or the game software itself, it can be implemented once for all the games, as a service on top of (or within) the operating system level. Current rendering pipeline that includes hardware implementation, stores the rendering results in a frame buffer (is a portion of random-access memory containing data representing all the pixels in a complete video frame), which is accessible within Microsoft DirectX (DirectX is an application program interface (API) for creating and managing graphic images and multimedia effects in applications such as games or active Web pages that will run in Microsoft's Windows operating systems) and OpenGL (Open Graphics Library - is the computer industry's standard application program interface (API) for defining 2-D and 3-D graphic images) implementation. Therefore, it is possible to copy this frame buffer into the main memory and possibly, accumulate the grabbed frames into a video segment.

The method proposed by the present invention utilizes computer vision to detect, locate, and track advisements in each frame or selected frames of the 3D game. In this method, not only the exposure time is measured, but also the view angle and number of pixels occupied by each advisement in the screen space.

Accordingly, Dynamic Link Library (DLL is a collection of small programs that larger programs can load when needed to complete specific tasks) injection techniques are utilized to intercept calls to the Direct3D (Direct3D is the Microsoft 3D application programming interface (API) component of the DirectX API package) or OpenGL APIs and copy the frame buffer into a shared memory that enables an independent application to process and analyze the copied frames. The rate of copying the frame buffer into the shared memory depends on the rate of the game.

The code intercepts each SwapBuffers function (The SwapBuffers function is used to copy the contents of an off-screen buffer to an on-screen buffer. The back buffer is off screen, and the front buffer is on-screen) and copy the previous framebuffer before the swap using, for example, the gIReadPixels function (The gIReadPixels function reads a block of pixels from the framebuffer). Of course, other functions may be used to read pixels from frame-buffer to shared memory.

The intercepting process, which runs in parallel to the game application, is used to copy the frame buffer to a shared memory. Another process analyzes the intercepted images on the local workstation, or sends them to other machine, for further processing, to detect and localize advertisements using deep learning (deep learning is a type of machine learning technique that teaches computers to do what comes naturally to humans: learn by example. In deep learning, a computer model learns to perform classification tasks directly from images, text, or sound. Deep learning models are trained by using a large set of labeled data and neural network architectures that contain many layers, and can achieve state-of-the-art accuracy).

In order to save computational resources such as memory and processing time (which may or may not be required, depending on the game resolution, the available memory, and processing power), the copied frames from the frame buffer are stored in the shared memory using substantially lower resolution, compared to the game resolution. Then deep learning is applied to the lower resolution frames, in order to detect advertisements.

According to the present invention, the deep learning models are pre-trained on an appropriate advertisement dataset and are updated regularly, using a cloud-based training process, during which additional new advertisements are continuously added to the dataset.

In one embodiment, the deep learning system of the present invention comprises the following models: A Convolutional Neural Network (CNN) model, for extracting features from an analyzed frame.

A Recurrent Neural Network (RNN) or a Transformers-based model (a transformer is a deep learning model that adopts the mechanism of self attention, differentially weighting the significance of each part of the input data), for localizing the detection region within the analyzed frame.

A regression model, for determining (computing) the boundaries of each advertisement

A refinement model, for improving the accuracy of the computed boundaries

CNN model

CNNs are powerful image processing, artificial intelligence (Al) that use deep learning to perform both generative and descriptive tasks, often using machine vison that includes image and video recognition, along with recommender systems and Natural Language Processing (NLP). This neural network computational model uses a variation of multilayer perceptrons (a perceptron is a simple model of a biological neuron in an artificial neural network) and contains one or more convolutional layers that can be either entirely connected or pooled. These convolutional layers create feature maps that record a region of image which is ultimately broken into rectangles and sent out for nonlinear processing. CNN have their "neurons" arranged more like those of the frontal lobe, the area responsible for processing visual stimuli in humans and other animals. The layers of neurons are arranged in such a way as to cover the entire visual field avoiding the piecemeal image processing problem of traditional neural networks. A CNN uses a system much like a multilayer perceptron that has been designed for reduced processing requirements. The layers of a CNN consist of an input layer, an output layer and a hidden layer that includes multiple convolutional layers, pooling layers, fully connected layers and normalization layers. CNN has very high accuracy in image recognition problems and can automatically detects the important features without any human supervision. Therefore, the CNN model is very effective for extracting features from an analyzed frame.

Generally, feature extraction is a part of the dimensionality reduction process, in which, an initial set of the raw data is divided and reduced to more manageable groups. The most important characteristic of these large data sets is that they have a large number of variables. These variables require a lot of computing resources to process. So, feature extraction helps to get the best feature from those big data sets by selecting and combining variables into features, thereby, effectively reducing the amount of data. These features are easy to process, but still able to describe the actual data set with accuracy and originality. The technique of extracting the features is useful when there is a large data set and it is required to reduce the number of resources without losing any important or relevant information. Feature extraction helps to reduce the amount of redundant data from the data set. The reduction of the data helps to build the model with less machine effort and also increases the speed of learning and generalization steps in the machine learning process.

RNN model

The RNN model is used for localizing the detection region within the analyzed frame. A Recurrent Neural Network (RNN) is a type of artificial neural network which uses sequential data or time series data. These deep learning algorithms are commonly used for ordinal or temporal problems, such as language translation, natural language processing (NLP), speech recognition, and image captioning; they are incorporated into popular applications such as Siri, voice search, and Google Translate. Like feedforward and Convolutional Neural Networks (CNNs), Recurrent Neural Networks utilize training data to learn. They are distinguished by their "memory" as they take information from prior inputs to influence the current input and output. While traditional deep neural networks assume that inputs and outputs are independent of each other, the output of recurrent neural networks depend on the prior elements within the sequence. While future events would also be helpful in determining the output of a given sequence, unidirectional recurrent neural networks cannot account for these events in their predictions. An RNN saves the output of processing nodes and feeds the result back into the model thereby learning to predict the outcome of a layer. Each node in the RNN model acts as a memory cell, continuing the computation and implementation of operations. If the network's prediction is incorrect, then the system self-learns and continues working towards the correct prediction during backpropagation. An RNN remembers each and every information through time. It is useful in time series prediction only because of the feature to remember previous inputs, as well. This is called Long Short Term Memory. RNNs are even used with convolutional layers to extend the effective pixel neighborhood. By doing so, the system proposed by the present invention actually performs continuous training.

Regression Model

A regression model is a statistical model that estimates the relationship between one dependent variable and one or more independent variables using a line (or a plane in the case of two or more independent variables). A regression model can be used when the dependent variable is quantitative, except in the case of logistic regression, where the dependent variable is binary. The present invention uses, for example, a quadrilateral regression model (or other polygonal shapes), for computing the boundaries of each advertisement.

Refinement model

A refinement model is used for improving the accuracy of the computed boundary by the regression model, by using a gradient search algorithm for detecting the maximal visible area polygon that corresponds to the boundaries that have been computed by the quadrilateral regression model. Fig. 1 illustrates a flowchart of the processing steps performed by the deep learning system, implementing a measurement module, according to an embodiment of the invention. At the first step, a frame to be analyzed 101 is read from the frame buffer and fed into the CNN model 102. At the next step, the CNN model extracts features from the analyzed frame 101. At the next step, the features of the analyzed frame 101 (that have been extracted by the CNN model) are fed into a Recurrent Neural Network (RNN) model 103 that localizes the detection region within the analyzed frame 101. At the next step, the boundaries of each advertisement in the analyzed frame 101 are computed by a regression model 104, based on the data of localized detection region. At the next step, the accuracy of the computed boundaries is improved using a refinement model. At the next step, the computed boundaries are obtained. In this example, four advertisement polygons 106a-106d (marked by solid red lines) were detected in the analyzed frame 101.

In order to accelerate the detection process and compute the homography transformation (planar homography is a transformation that is occurring between two planes, i.e., a mapping between two planar projections of an image. The element in an image has its projection to the other image in a homogenous coordinate plane, retaining the same information but in a transformed perspective) among the detected objects across the frames, temporal coherency among the consecutive frame is utilized. Pixel level accuracy may not be required for many measurements and in this case, applying the two main components, CNN and RNN is enough to provide such information.

In order to reduce the time required for ads detection, a tracking model may be applied, which include the following components:

1. An adaptive correlation module - is used for detecting the displacement of an advertisement in a current analyzed frame, relative to the preceding analyzed frame. This eliminates the need to perform a new search 2. A Key-point correspondence module - is used to calculate the location of key- points in a current frame with respect to the preceding frame

3. A homography transformation calculation module -is used for determining the orientation of the detected advertisement (and changes in that orientation), based on the results of the adaptive correlation module and the key-point correspondence module

The process carried out by the tracking module allows detecting how an advertisement moves from frame to frame and what was the change in orientation, as shown in Fig. 2. At the first step 201, the adaptive correlation module (which consists of correlation filters) detects the displacement of an advertisement in a current analyzed frame, relative to the preceding analyzed frame. At the next step 202, the key-point correspondence module calculates the location of key-points in a current frame with respect to the preceding frame, based on the results calculated in step 201. At the next step 203, the homography transformation calculation module determines the orientation of the detected advertisement, based on the results calculated in step 202. For simple measurement purposes, the tracking process may be eliminated. However, the tracking process is used to model orientation and estimate view direction, when needed.

In a typical scenario, the AD server sends a set of advertisements to the workstation, which embeds them within the game. A unique ID is assigned by the Ad server to each advertisement and the measurements module transfers exposure parameters, such as number of frames showing a particular advertisement, the screen size of that particular advertisement and its duration, to the AD server. The exposure duration is measured as the time that elapsed between the first and the last frames, in which that particular advertisement has been displayed. Computing the exposure pavements may be performed by the measurement module on the workstation that runs the game. Alternatively, the measurement module sends the frames to be analyzed to a remote cloud server, which received a frame or set of frames that are compressed or down-sampled (down-sampling is the process of reducing the sampling rate of a signal, to thereby reduce the data rate or the size of the data) by the measurement module, and then performs the frame analysis externally to the workstation.

The above examples and description have of course been provided only for the purpose of illustrations, and are not intended to limit the invention in any way. As will be appreciated by the skilled person, the invention can be carried out in a great variety of ways, employing more than one technique from those described above, all without exceeding the scope of the invention.

Claims

Claims:

1. A method for measuring the level of exposure of players to advertisements displayed in computer games, comprising: a) copying, at a predetermined rate, frames, to be analyzed from a frame buffer of a workstation executing said computer game, into a memory; b) processing and analyzing the copied frames by an independent deep learning application that runs in parallel to the game application, said deep learning application being adapted to: c) extract features from each analyzed frame using a Convolutional Neural Network (CNN) model; and d) localize the detection region of advertisements within said each analyzed frame using a Recurrent Neural Network (RNN) model.

2. A method according to claim 1, wherein the deep learning application is further adapted to: a) compute the boundaries of each detected advertisement using a quadrilateral regression model; b) improve the accuracy of the computed boundaries using a refinement model.

3. A method according to claim 1, wherein the computer game is a 3D game.

4. A method according to claim 1, wherein the level of exposure is determined by the exposure time, the view angle and the number of pixels occupied by each advisement in the screen space of the computer game.

5. A method according to claim 1, wherein the rate of copying the frame buffer into the memory depends on the rate application of the game.

6. A method according to claim 1, wherein the copied frames from the frame buffer may be stored in the memory at lower resolution, compared to the game resolution.

7. A method according to claim 1, wherein the deep learning models are pre-trained on an appropriate advertisement dataset and are updated regularly, using a cloud-based training process, during which additional new advertisements are continuously added to the dataset.

8. A method according to claim 1, further comprising measuring changes in the orientation of the advertisement by computing a homography transformation among the detected objects across consecutive frames.

9. A method according to claim 1, further comprising using a tracking model to reduce the time required for ads detection, said tracking model includes: a) an adaptive correlation module for detecting the displacement of an advertisement in a current analyzed frame, relative to the preceding analyzed frame; b) a Key-point correspondence module for calculating the location of key-points in a current frame with respect to the preceding frame; and c) a homography transformation calculation module for determining the orientation of the detected advertisement, based on the results of said adaptive correlation module and said key-point correspondence module.

10. A method according to claim 1, further comprising assigning a unique ID is to each advertisement; a) measuring, by the measurements module, exposure parameters that correspond to said advertisement.

11. A method according to claim 1, wherein the exposure parameters include one or more of the following: the number of frames showing a particular advertisement; the screen size of said particular advertisement and its orientation,

12. A method according to claim 1, wherein the exposure pavements may be computed by the measurement module on the workstation that runs the game.

13. A method according to claim 1, wherein the exposure pavements may be computed by sending the frames to be analyzed to a remote server, which receives a frame or a set of frames that are compressed or down-sampled by the measurement module, and performs the frame analysis externally to the workstation.

14. A system for measuring the level of exposure of players to advertisements displayed in computer games, comprising: a) a workstation comprising at least one processor, for executing said computer game; b) a software module installed on said workstation being adapted to copy, at a predetermined rate, frames to be analyzed, from a frame buffer of said workstation, into a memory; c) a computerized device comprising at least one processor, for: processing and analyzing the copied frames by an independent deep learning application that runs in parallel to the game application, said deep learning application being adapted to: extract features from each analyzed frame using a Convolutional Neural Network (CNN) model; and localize the detection region of advertisements within said each analyzed frame using a Recurrent Neural Network (RNN) model.

15. A system according to claim 14, in which the deep learning application is further adapted to: a) compute the boundaries of each detected advertisement using a quadrilateral regression model; and b) improve the accuracy of the computed boundaries using a refinement model.

16. A system according to claim 14, in which the computer game is a 3D game.

17. A system according to claim 14, in which the level of exposure is determined by the exposure time, the view angle and the number of pixels occupied by each advisement in the screen space of the computer game.

18. A system according to claim 14, in which the rate of copying the frame buffer into the memory depends on the rate application of the game.

19. A system according to claim 14, in which the copied frames from the frame buffer are stored in the memory at lower resolution, compared to the game resolution.

20. A system according to claim 14, in which the deep learning models are pre-trained on an appropriate advertisement dataset and are updated regularly, using a cloud- based training process, during which additional new advertisements are continuously added to the dataset.

21. A system according to claim 14, in which changes in the orientation of the advertisement are measured by computing a homography transformation among the detected objects across consecutive frames.

22. A system according to claim 14, in which a tracking model is used to reduce the time required for ads detection, said tracking model includes: a) an adaptive correlation module for detecting the displacement of an advertisement in a current analyzed frame, relative to the preceding analyzed frame; b) a Key-point correspondence module for calculating the location of key-points in a current frame with respect to the preceding frame; and c) a homography transformation calculation module for determining the orientation of the detected advertisement, based on the results of said adaptive correlation module and said key-point correspondence module. b) A system according to claim 14, in which a unique ID is assigned to each advertisement and used by the measurements module to measure exposure parameters that correspond to said advertisement.

23. A system according to claim 14, in which the exposure parameters include one or more of the following: the number of frames showing a particular advertisement; the screen size of said particular advertisement and its orientation,

24. A system according to claim 14, in which the exposure pavements may be computed by the measurement module on the workstation that runs the game.

25. A system according to claim 14, in which the exposure pavements may be computed by sending the frames to be analyzed to a remote server, which receives a frame or a set of frames that are compressed or down-sampled by the measurement module, and performs the frame analysis externally to the workstation.

26. A system according to claim 14, in which the memory is a shared memory.