EP4183136A1 - Smart overlay : positioning of the graphics with respect to reference points - Google Patents

Smart overlay : positioning of the graphics with respect to reference points

Info

Publication number
EP4183136A1
EP4183136A1 EP21758424.2A EP21758424A EP4183136A1 EP 4183136 A1 EP4183136 A1 EP 4183136A1 EP 21758424 A EP21758424 A EP 21758424A EP 4183136 A1 EP4183136 A1 EP 4183136A1
Authority
EP
European Patent Office
Prior art keywords
image
degree
area
transparency
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21758424.2A
Other languages
German (de)
French (fr)
Inventor
Ciro Gaglione
Luigi TROIANO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sky Italia Srl
Original Assignee
Sky Italia Srl
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sky Italia Srl filed Critical Sky Italia Srl
Publication of EP4183136A1 publication Critical patent/EP4183136A1/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234345Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker

Definitions

  • One of the objects of the present invention resides in improving the known solutions or obviating one or more of the problems present in the known solutions.
  • the object is reached by the independent claims.
  • Advantageous embodiments are defined by the dependent claims. Further examples are provided in this text for explanatory purposes as well.
  • an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; determining (S30), on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
  • Method according to the explanatory example El wherein performing a computational analysis on said image to determine at least one object present in the image comprises using a neural network to determine said at least one object preferably comprised in a predefined set of objects.
  • Method according to any one of the preceding explanatory examples wherein performing a computational analysis on said image to determine at least one perception zone comprises performing a computational analysis of visual attention.
  • performing a computational analysis of visual attention comprises determining a saliency map, wherein said at least one perception zone preferably comprises a section of the image at which the saliency map indicates a probability of visual perception of a user exceeding a perception probability threshold.
  • Entity according to the explanatory example E3 or E4, wherein the computational analysis of visual attention comprises determining said perception zone on the basis of characteristics of the pixels comprised in said portion.
  • determining (S30) the positioning indication comprises determining a plurality of non-reference portions represented by regions of the image not comprising the reference portion, and determining the positioning indication as a position indication in one of said non-reference portions.
  • said image reference portion comprises a portion of the image containing at least a part of said at least one determined object and at least a part of said one determined perception zone.
  • a processing unit (320) configured to determine, on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; a positioning determination unit (330) configured to determine, on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
  • the processing unit (320) is configured to perform a computational analysis on said image to determine at least one object present in the image using a neural network to determine said at least one object preferably comprised in a predefined set of objects.
  • the processing unit (320) is configured to perform a computational analysis on said image to determine at least one perception zone by performing a computational analysis of visual attention.
  • the computational analysis of visual attention comprises determining a saliency map
  • said at least one perception zone preferably comprises a section of the image at which the saliency map indicates a probability of a user's visual perception exceeding a perception probability threshold.
  • the positioning determination unit (330) is further configured to determine a plurality of non-reference portions represented by regions of the image not comprising the reference portion, and determine the positioning indication as a position indication in one of said non-reference portions.
  • E16 System comprising an entity according to any one of the explanatory examples E9 to E15, and a user device configured to display a video stream with said graphic element overlaid.
  • E17 Method according to any one of the explanatory examples El to E7, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped on the first transparent element (Zi), the method comprising also the steps of: - determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
  • Method according to one of the explanatory examples FI to F3 wherein the video stream comprises a plurality of images, and wherein the overlaying step is performed on each of two or more images comprised in said plurality using the same degree of transparency.
  • Method according to one of the explanatory examples FI or F4, wherein the video stream comprises a plurality of images, and wherein the step of determining an index is performed on at least two images and the overlaying step is performed on at least two images, and wherein at least one image between the at least two images related to the step of determining an index is the same as an image between the at least two images of the overlaying step, or the at least two images related to the step of determining an index are the same at least two images related to the overlaying step.
  • Method according to one of the explanatory examples FI to F5 wherein the video stream comprises a first image and a second image following the first image within said video stream, and wherein the step of determining a degree of transparency is performed upon determining a photometric value for said second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
  • - determining an index of photometric values comprises determining a respective index of photometric values for each of the plurality of areas
  • determining the degree of transparency of said graphic element comprises determining the degree of transparency of said first transparent element on the basis of each respective index and of the constraint, wherein the constraint comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold, wherein each respective resulting area corresponds to one of said plurality of areas to which said graphic element having said degree of transparency is overlapped.
  • determining an index of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a point representative of said area.
  • Computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the explanatory examples of method FI to F.
  • a transparency degree determination unit configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
  • an overlay unit configured to overlap, to an image of the video stream, the graphic element by applying, to the first transparent element or to the second non-transparent element, the degree of transparency determined.
  • Entity (200) for the overlay according to any one of the explanatory examples E15 to E17, in which the video stream comprises a plurality of images, and in which the overlay unit (230) is configured to overlay on each of two or more images comprised in said plurality using the same degree of transparency.
  • Entity (200) for the overlay according to any one of the explanatory examples F15 to F18, wherein the video stream comprises a plurality of images, the photometric value determination unit (210) is configured to determine an index on at least two images and a transparency degree determination unit (220) is configured to operate on at least two images, and in which
  • At least one image between the at least two images subjected to the photometric value determination unit (210) is the same as an image between the at least two images subjected to the transparency degree determination unit (220), or the at least two images subjected to the unit for determining the photometric values (210) are the same at least two images subjected to the unit for determining the degree of transparency (220).
  • Entity (200) for the overlay according to any one of the explanatory examples from F15 to F19, in which the video stream comprises a first image and a second image following the first image within said video stream, and in which the determination of degree of transparency (220) is configured to determine the degree of transparency upon determining a photometric value for said second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
  • the photometric value determination unit (210) is configured to determine a respective index of photometric values for each of the plurality of areas;
  • the transparency degree determination unit (220) is configured to determine the transparency degree of said first transparent element on the basis of each respective index and of the constraint, wherein the constraint comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold, in which each respective resulting area corresponds to one of said plurality of areas on which said graphic element having said degree of transparency is overlapped.
  • Entity (200) for the overlay according to any one of the explanatory examples F15 to F22, wherein determining an index of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a representative point of said area.
  • Entity (200) for the overlay according to any one of the explanatory examples from F15 to F22, in which said at least one area is a portion of the image on which to overlap at least a part of the second non-transparent element.
  • Entity (200) for the overlay according to any one of the explanatory examples F15 to F26, wherein the transparency degree determination unit (220) is configured to determine the degree of transparency at a predetermined interval of frames, preferably equal to a submultiple of the transmission frequency of the video stream.
  • a photometric value determination unit (210) configured to determine an index of photometric values at at least one area of an image of said video stream;
  • a transparency degree determination unit configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
  • a transmission unit configured to send, to an entity for overlaying at least one graphic element (Gi) on a video stream, the degree of transparency determined.
  • F29. System comprising an entity according to any one of the explanatory examples F15 to F28, and a user device configured to display a video stream with said graphic element overlaid.
  • F30. Method for overlaying at least one graphic element onto a video stream comprising at least one image, wherein the graphic element (Gi) comprises a first transparent element
  • Figure 1 is a flowchart representing a method according to an embodiment of the present invention
  • figure 2(a) reproduces a screenshot (still image) of an image of a video stream
  • figure 2(b) schematically illustrates objects detected in the image illustrated in figure 2(a)
  • figure 2(c) illustrates an example of the starting image 2 (a) after having undergone an analysis aimed at determining perception zones
  • figure 2(d) illustrates an image of the video stream in which the graphic element (or even the graphics) has been positioned avoiding overlapping the reference element (represented by the areas occupied by the three players in the example)
  • figure 3 illustrates a block diagram of an entity according to an embodiment of the present invention
  • figure 4 illustrates a block diagram of a computer adapted to run a program according to an embodiment of the present invention
  • figure 6A illustrates by way of example the result of a processing carried out on the image of figure 2(a), in which the processing comprises recognising objects
  • figure 6B illustrates by way of example the result of
  • Figure 7 illustrates a flow chart according to an embodiment of the present invention
  • figure 8(a) illustrates by way of example a graphic element composed of transparent and non-transparent elements
  • figure 8(b) illustrates an area comprised in an image of the video stream
  • figure 8(c) illustrates a resulting area obtained by overlapping a graphic element within the area of figure 8(b)
  • figure 8(d) illustrates the image on which the graphic element is overlapped
  • figure 9 illustrates a flow chart according to a variant of an embodiment of the present invention
  • figure 10 illustrates a block diagram according to an embodiment of the present invention
  • figure 11 illustrates a block diagram of a computer adapted to run a program according to an embodiment of the present invention
  • figure 12 illustrates a screenshot illustrating the operation of the present invention.
  • graphic elements possibly comprising textual parts are overlapped to the images of the video (of a television channel, of a streaming, etc.): think for example about the case of overlaid titles (banners) in the lower part of the screen during the transmission of a news program, about the overlaid banners at sporting events in which, for example, the statistics of the event in progress are reported, or about other banners comprising logos optionally together with parts of text.
  • banners banners at sporting events in which, for example, the statistics of the event in progress are reported, or about other banners comprising logos optionally together with parts of text.
  • Such a graphic element completely or at least partially overlaid on the video hides (if this is transparent, or partially transparent) the part of the images that it overlaps; this hinders the perception of the images, especially if the dimensions of the graphics are not negligible or if they are not located in a position on the edge of the image.
  • the graphic element can be sometimes positioned in a position on the screen that is considered not cumbersome, such as the left side; upon a movement of the scene, however, this position could become cumbersome and therefore no longer ideal, such that the scene is occluded and the use of the video is made difficult.
  • the graphics could be placed in an area of the image considered of little relevance (for example static compared to other zones) and where therefore the graphics would not hinder, at least substantially, the correct use of the video; however, this positioning is not easy to be obtained automatically.
  • This drawback is also present in view of the fact that the characteristics of the images are not easily predictable, and can also vary substantially within a video stream.
  • this solution is based on determining (the presence of) at least one object and/or a perception zone within an image of a video stream, and then positioning the graphic element in such a way as to completely or at least partially avoid overlapping the graphics with the determined object and/or with the determined perception zone.
  • This positioning indication indicates a position at which a graphic element is to be overlaid on a video stream comprising at least one image, and can be expressed in terms of coordinates of the image at which to position the graphics, and/or in terms of an area (for example a quadrant, or an area within a grid of image areas, etc.) in which to position the image.
  • the video stream can be broadcast via a classic broadcast channel (terrestrial, satellite, etc.), and/or in streaming (using for example protocols such as HLS, DASH, etc.).
  • step S10 a computational analysis is performed on the image of the stream in order to determine at least one object present in the image and/or at least one perception zone.
  • the object is preferably a predetermined object (comprising a predetermined and/or known type of object) whose presence and preferably the position within the image is to be determined.
  • the object can be represented by a car of a specific model, or in general by a car; the computation analysis therefore determines the presence of such a car and preferably also the position within the image.
  • the determination of the object can be carried out through an object recognition technique within the image, as also explained further on by way of example.
  • the at least one perception zone represents or comprises an image zone in which the visual perception of a user is estimated to be higher than a perception probability threshold.
  • the perception zone indicates an area of the image on which, at least probabilistically speaking, the user's vision is focused.
  • the image can be considered as a set of zones (or segments, where the size and/or shape of each zone/segment is irrelevant), in which not all of them are zones in which the human perception is focused.
  • due to the way in which the human visual system is structured within an image there may be some areas on which the visual perception has a higher probability of focusing, and other areas on which the visual perception has lower probabilities (relative to the other zones just mentioned) of focusing.
  • step S10 determines those perception zones that show a probability that the vision will focus is higher than a certain threshold.
  • This threshold can be established empirically, for example on the basis of a panel of users specialised in viewing video images; in another example, this threshold can be chosen by attributing probability to each area, and thus setting a threshold that chooses only the first N areas, with N for example equal to 3.
  • step S20 on the basis of the computational analysis, an image reference portion comprising a portion of said image is determined.
  • This portion of the image comprises the at least one object presents in the image and/or the at least one perception zone as obtained in step S10.
  • a portion of the image is determined as being a reference if at least one between the determined object or the perception zone determined in the computational analysis step falls within it.
  • the positioning indication is determined respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
  • the overlap factor preferably corresponds to a percentage of the area of the reference element on which the graphics are not to be overlapped; in one example, this can be zero, meaning no overlap must occur.
  • this overlap factor may indicate one third of the reference portion, thus indicating that no more than one third of the reference portion may be overlapped by the graphics. This was in fact found as a good compromise, in which video viewing is still acceptable, and which could in fact be advantageous in the presence of large reference portions and/or of graphics with large dimensions and/or small reproduction screen (which may require larger graphics, for example).
  • a positioning indication for the graphics is determined such that the graphics overlap only partially, or do not overlap at all, with the reference portion of the image.
  • the graphics can be positioned in the image immediately following or some image after, depending on the computing power available (in theory, the graphics can already be inserted in the same image on which the analyses were carried out, in case of huge computing resources especially for live events). Furthermore, the graphics can be overlaid by the same device that performed the analysis on the video image, or by another device as illustrated below.
  • an image reference portion which comprises a single image portion; however, it is conceivable to consider a reference portion that contains a plurality of portions, for each of which portions the presence of an object and/or a perception area has been determined (this case can be obtained, for example, in the presence of objects and/or perception areas detected in different portions or segments of the image). In this case, therefore, the graphics will only partially overlap the areas comprised in such plurality, or will not overlap any of the areas of such plurality.
  • the weight for a certain portion can be determined on the basis of the determined object and/or of the reception zone determined for that portion.
  • a number M of portions having a higher weight can be chosen, in which case the graphics will be partially overlapped or not overlapping only with respect to these M portions.
  • the above can be illustrated by way of example by referring to an image taken from a football sporting event, see for example the screenshot (still image) reproduced in figure 2A in which three players indicated with 52A, 54A and 56A are visible.
  • the screenshot still image
  • the portions 52B-56B have a rectangular shape and are such that the identified object (the football player in the example) is entirely comprised in each portion; however, other forms are conceivable.
  • the portion size can be such that it comprised an area around the identified object.
  • the size and the shape of the portion may be such as to exactly follow the shape of the identified object.
  • the portion may be such as to enclose a substantial part (e.g. 75%, preferably 90%) of the shape of the identified object.
  • Figure 2C (6C) illustrates an example in which the whole shape displayed on the screen and detected is entirely comprised in the respective portion; however, this is not indispensable, since in fact it is possible to enclose even only a part of the shape in the portion.
  • the portions 52B, 54B and 56B which are portions not to be covered or to be covered only partially, have been identified, it is possible to determine the remaining portions of the screen as portions in which to position the graphics (simply also free portions).
  • a positioning indication is then automatically determined indicating where to position the graphics, in figure 2(D) represented by textual graphics 58 positioned in the upper right part of the image so as not to interfere with the display of the detected players.
  • the free portions the one that is large enough to contain it, and/or use positioning rules such as giving priority to free areas located at the corners, or giving priority to free areas located on the sides (both right and left, both top and bottom, without priority or with priority between them), or accepting that one or more certain objects are partially occluded if there is not a sufficiently large free area, etc. Since the whole is determined automatically, it is possible to position the graphics on the full screen without hindering the viewing of the video even when the visual content of the image changes, thus making full-screen use easily usable.
  • the computational analysis carried out in step S20 can optionally comprise recognizing and/or classifying one or more objects present in the image (in the example of the figures, the three objects) and, again optionally, determining a saliency map in parallel (the saliency map, if determined, does not necessarily have to be determined in parallel with the other operations; a parallel determination, however, would allow a more efficient overall operation).
  • the information described can then be used jointly in order to determine which areas are occupied by visual attention points of the scene. The joint use of this information makes it possible to identify with increased accuracy a reference portion of the image on which the graphics are not to be overlapped, or on which to partially overlap them.
  • the computational analysis determines at least one perception zone; referring below for further examples and explanations of this type of analysis, an example includes calculating the contrast between an image zone and areas immediately surrounding it. In fact, due to how the human visual perceptual system is physiologically structured, these zones represent zones that are likely to be focused by the user.
  • an attention map based on the contrast of an area; in other words, it is possible to obtain three different portions 52C, 54C and 56C (as shown in figure 2C) each characterized by a zone (in the example, the zone inside the players' contours) having a strong contrast with respect to immediately adjacent areas; the map (which in this example represents a strong contrast) can be established with respect to a contrast threshold, the value of which can be established empirically (through tests on panels of users trained in video viewing), or through known techniques, for example by considering the statistical distribution of the contrast values.
  • a contrast threshold the value of which can be established empirically (through tests on panels of users trained in video viewing), or through known techniques, for example by considering the statistical distribution of the contrast values.
  • the attention map can be limited to the objects of interest recognized in the image (i.e. it is not necessary to determine it for the whole image, but only for one or more portions of the image).
  • the three players 52-6A, 54-6A, 56-6A are recognized and classified.
  • the piece of information thus obtained can be combined with the attention map as shown in figure 6B.
  • the information related to the object recognition (as in the example of figure 6A) can be combined with information related to saliency (as shown in figure 6B) to determine a portion (or more reference portions).
  • the object recognition information is combined with saliency information to determine in which portions of the image should not overlap graphics (or overlap them only partially).
  • three portions 52C- 56C corresponding to the football players 52A-56A of figure 2A, respectively, are identified.
  • each of the methods shown can lead to a result of detecting the reference portion (and finally of positioning) which is the same or at least similar.
  • each of these techniques alone does not lead to optimal results, as recognized by the inventors by analysing different types of images.
  • the analysis on the objects detects the presence of the two characters.
  • the analysis of the perception zone could lead to determining for example three perception zones (i.e.
  • the portion of the screen containing both (at least) a detected object and (at least) a detected perception zone can be determined as the reference portion.
  • the reference portion comprises an image portion in which an area occupied by a detected object (e.g. determined in step S20) is at least partially (preferably completely) overlapped with a detected perception zone (e.g. determined in step S20).
  • an object present in the image is not always observed, that is, a detected object is not always the one on which the user is focusing his vision.
  • the combination of the (detected) object and saliency map helps identify those objects on which the vision is focusing, that is, that the user is actually looking at.
  • the graphics should not be overlaid on these.
  • the combination of information on the detected object and on the perception zones also allows establishing a priority among the zones not to be occluded with the graphics, wherein this priority can be useful when deciding where to position the graphics or how to size the graphics (for example so as not to occlude only the zone having higher priority, or only a certain number of zones with higher priority).
  • this reference portion is the one on which the user is focusing.
  • the graphics are then positioned in view of this reference portion, and it is therefore possible to position the graphics on the full screen without hindering, in a particularly accurate way, the viewing of the video even when the visual content of the image changes, thus making the full- screen use easily usable with high accuracy in positioning the graphics.
  • the predefined set of objects contains a football ball, a football goal, a representation of a football player with a ball nearby, etc.
  • the set for example comprises the half court net, a racket, etc.
  • the set for example comprises cars, tools, furniture parts, etc. or even the face of some of the characters/actors (for example by defining a set of objects for each film, or a set of objects representing a certain number of actors or characters).
  • the set simply detects that there is at least one object whose presence and preferably the position within an image is to be determined.
  • the set can be omitted or made to coincide with the single object.
  • the step S10 of performing a computational analysis on an image to determine at least one object present in the image comprises using a neural network to determine the object.
  • the object is a predefined object, for example comprised in a predefined set of objects (in a set of predefined objects).
  • the neural network is preferably trained on the basis of a dataset comprising a number of images, for each of which at least one object is predefined; in other words, for each of these images it is known which object (comprising a specific object or type of object) is contained therein and preferably in which position or region of the image it is located (the same applies in the case where an image of the dataset contains a plurality of predefined objects).
  • the dataset refers to images comprising predetermined objects belonging to a set. Then the network is trained to recognize at least one object based on the images comprised in such a dataset.
  • the dataset is described as a collection of images by way of example, other representations of the dataset are possible as long as they allow the training described.
  • the neural network comprises a Fully Convolutional Neural Network (FCNN) type neural network. Other types of neural networks can be used, for example based on deep learning. It should be noted that the recognition of an object within an image does not necessarily have to be obtained by means of a neural network; in fact, other computational methods based on kernel and shape recognition are available (see for example Object Detection and Recognition in Digital Images: Theory and Practice, B.
  • the computational analysis on said image to determine at least one perception zone comprises performing a computational analysis of visual attention.
  • the perception zone refers to a zone on which the visual system focuses due to the way the human visual apparatus is physiologically structured, and can be determined through computational attention techniques; a computational analysis of visual attention outputs a portion of an image, and in particular causes that this portion corresponds to a certain degree of probability that a human observer has focused and observed it among other portions of the same image (therefore attention refers to the fact that the user's gaze is estimated to be on that portion of image).
  • the attention analysis comprises the use of a saliency map, as also discussed below, but not only this one, see e.g. "Computational attention systems" in “Computational Visual Attention", S. Frintrop et al.
  • the computational analysis of visual attention comprises determining a saliency map; in this case, the detected perception zone preferably comprises a section of the image at which the saliency map indicates a probability of a user's visual perception exceeding a perception probability threshold.
  • Figures 2C, 6B and 2D are examples of images processed to obtain saliency maps; in these figures, the saliency is visually represented by zones of strong contrast, specifically by (substantially) white zones with respect to neighbouring dark pixels.
  • a saliency map can be represented as the contrast values of the pixels of a certain area (or as the average value of the pixels of that area) with respect to neighbouring pixels; among these areas of the saliency map, those having a value exceeding a certain threshold can be chosen and established that they represent the image reference portion, i.e. the one on which the user is most likely focusing.
  • the visual attention map represents the probability of interest for the eye, which is linked to the metric of the human visual system; in fact, the human visual system has limited capabilities in the perception of images, which is why it processes areas of an image with different priorities.
  • the computational analysis makes an estimate of the areas on which a human observer focuses his attention, that is, on which the human observer focuses with priority.
  • the analysis is based, for example, on the measurement of contrast between two neighbouring areas and/or on the colour of a certain area compared to neighbouring areas.
  • limited and uniform zones are considered to be perception zones.
  • a small, uniform, high-contrast zone with respect to the neighbouring ones, and with a certain colour is considered as an area that a human observer focuses with priority over other areas of the same image; a high value can therefore be assigned to this area, for example on a predetermined scale which represents that this area is probably an area on which the user's vision is focused.
  • Parameters considered for the determination of zones of attention comprise, for example, the contrast, the type of colour mostly perceived by the eye, etc.
  • the computational analysis of visual attention comprises determining the perception zone on the basis of characteristics of the pixels comprised in a determined region of the image, preferably of the pixels comprised in the portion itself.
  • the characteristics comprise, for example, for one or more pixels of the image, the contrast, the colour, the grey scale corresponding to the colour of a pixel, etc.
  • the computational analysis comprises determining a contrast between a sample region of the image (a set of one or more pixels chosen as a sample from the image) and a region immediately next to it (a set of one or more pixels adjacent to the pixels chosen).
  • this operation is repeated on a plurality of sample regions of the image; among these, the region having the highest contrast is determined as the perception zone and therefore comprised in the image reference portion.
  • two or more sample regions having a contrast higher than a certain contrast threshold are determined as attention zones and comprised in the image reference portion.
  • step S30 of determining the positioning indication comprises determining a plurality of non-reference portions represented by regions of the image not comprising the reference portion; in other words, the regions of the image not constrained by the non-overlap or partial overlap are determined.
  • These non-reference portions are also called free portions for simplicity's sake, since the graphics can be freely overlapped to them, see also what has been said above with reference to the example of figures 2(a) - (d).
  • step S30 may comprise determining the positioning indication as a position indication in one of said non-reference portions. In other words, the positioning indicator indicates in which free portion the graphics are to be overlapped.
  • This indicator can be represented by the coordinates of one or more free portions, wherein in the case of several free areas it is preferable that they are adjacent. If these free areas are not adjacent, it is possible to choose one large enough to contain the graphics; if this condition did not occur (for example because all the free areas are not large enough to contain the graphics), it is conceivable to partially overlap the graphics on the reference portion and/or resize (shrink) the graphics. Furthermore, in the case of a plurality of free areas it is possible to position different graphic elements at two or more of these free areas.
  • the image reference portion can comprise a portion of the image containing at least a part of said at least one determined object and at least a part of said a determined perception zone.
  • step S10 of the computational analysis can produce as a result both at least one object detected in the image and a perception zone detected in the image.
  • the image reference portion will contain (at least) a part of this overlapping area. In this way it is possible to increase the accuracy in determining the reference portion and correspondingly the positioning indication.
  • step (S10) of performing a computational analysis can be performed at predetermined intervals, and/or upon a scene change, and/or upon a summary analysis indicating that one or more image characteristics have changed (for example, average contrast level, average colour of the image, etc. they have varied between a first image and a second image that are successive or close to each other), and/or the availability of a new graphic element.
  • image characteristics for example, average contrast level, average colour of the image, etc. they have varied between a first image and a second image that are successive or close to each other
  • step (S20) of determining an image reference portion can be performed at predetermined intervals, and/or upon the determination of an object and/or a perception zone that is not present in a previous image, and/or the availability of a new graphic element.
  • a scene change can be detected at a scene discontinuity between a first image and a second image (preferably the one following the other, or separated by a small number of images, for example between 2 and 5).
  • the scene discontinuity can be described by means of an index which indicates the variation of the spatial and/or temporal continuity of an element and/or of a portion of an image; preferably, when there is a change beyond a predetermined threshold in the transition from one image to the following one, then it is determined that there has been a scene change.
  • the term has been used optionally and/or preferably to indicate optional variants of the method of the first embodiment. It is possible to combine two or more of these variants in any way.
  • the first embodiment is directed to a method. All of the above considerations and/or variants apply to devices and/or entities, such as an entity to determine a positioning indication for positioning a graphic element, as well as to systems comprising the entity for determining a positioning indication for positioning a graphic element, computer programs, computer program support, signals and other examples and embodiments as also illustrated hereinafter. Where different details are omitted for the sake of brevity, all the above remarks apply equally and/or correspondingly to what follows and vice versa.
  • the entity comprises an analysis unit (310), a processing unit (320) and a positioning determination unit (330). It should be noted that each of these units can be realized by means of any combination of software and/or hardware, distributed on several devices or localized on a single device. Furthermore, each unit comprises any processor capable of performing respective operations; moreover, although the three units are described separately, they can be realized within a single component, of which each unit represents the hardware and/or software resources necessary to implement them.
  • the video stream can be provided as input (see "IN” in figure 3) to the entity in any video format, whether compressed or not.
  • the analysis unit (310) performs a computational analysis on the image to determine at least one object present in the image and/or at least one perception zone.
  • the processing unit (320) determines, on the basis of the computational analysis (or, in other words, on the basis of the result or the output of the computational analysis), an image reference portion comprising a portion of the image, wherein said portion comprises the at least one object present in the image and/or the at least one perception zone as obtained by means of the analysis.
  • the positioning determination unit (330) determines, on the basis of the reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor. In other words, the positioning determination unit (330) determines the positioning indication so that the graphic element does not overlap or only partially overlaps the image reference portion.
  • the positioning indication can be provided to another device (e.g. a server, or a device in the video stream distribution chain, etc.), which then takes care of positioning the graphic element on the video stream and of broadcasting this video stream with the overlaid graphics encoded in the video stream to one or more users (in the case of broadcasting or multicasting; similar considerations apply to video on demand streams, or sent to a single user who requests them, since the invention is equally applicable to such cases).
  • another device e.g. a server, or a device in the video stream distribution chain, etc.
  • the positioning indication is sent to the user device (or to several user devices); the user device will then position the graphic element on the basis of this indication; this example refers to the case in which the graphic rendering can be performed directly from the user device, while the case preceding the case in which a device that is remote to the one of the user inserts the overlaid graphics on the broadcast video (i.e. the broadcast video, for example in MPEG format, comprises the graphics already overlaid; it is also provided for the video and the graphics to be broadcast separately as two video streams to be overlaid without rendering on the user side). It is conceivable to perform the graphic rendering on the server side, and to send the video, the graphic element and the positioning indication in a transport stream (i.e. within a connection to the user); the user device will then overlay graphics and videos on the basis of the positioning information.
  • the broadcast video for example in MPEG format
  • the entity 300 may comprise further units (and/or in combination with the units 310-330) configured to perform any one or any combination of the steps illustrated above, or to implement what is discussed below.
  • a system comprising an entity according to the second embodiment (for example as illustrated in figure 3), and at least one user terminal connectable to this entity via a communication network.
  • This terminal for example, TV, smartphone, computer, tablet, etc.
  • the entity is connectable or connected to a plurality of terminals to which the video stream is broadcast.
  • a computer program is provided which is set up to perform, when said program is run on a computer, any combination of the steps according to any one of the methods and/or examples of the invention, and/or as set forth in this description.
  • Figure 4 illustrates a block diagram exemplifying a computer (500) capable of running the aforesaid program.
  • the computer (500) comprises a memory (530) for storing the instructions of the program and/or data necessary for the performance thereof, a processor (520) for performing the instructions and an input/output interface (510).
  • FIG 4 is illustrative and non limiting, because the computer can be made both in a concentrated manner in a device or in a distributed manner on several interconnected devices. For this reason, the program can be run locally on a concentrated (local) or distributed device.
  • a support for supporting a computer program set up to perform, when the program is run on a computer, a step or a combination of the steps according to the method described in the first embodiment.
  • a medium are a static and/or dynamic memory, a fixed disk or any other medium such as a CD, DVD, Blue Ray.
  • Comprised in the medium there is also a means capable of supporting a signal constituting the instructions, including a means of cable transmission (Ethernet, optical cable, etc.) or wireless transmission (cellular, satellite, digital terrestrial transmission, etc.).
  • graphic elements possibly comprising textual parts are overlapped to the images of the video (of a television channel, of a streaming, etc.): think for example about the case of overlaid titles (banners) in the lower part of the screen during the transmission of a news program, about the overlaid banners at sporting events.
  • the textual part is typically overlapped to an area characterized by a uniform and non-transparent colour in order to improve the visual perception thereof.
  • overlaid graphics completely hides the part of the images on which it overlaps; this hinders the perception of the images, especially if the dimensions of the graphics are not negligible or if they are not located in a position on the edge of the image.
  • a solution has been conceived concerning a graphic element comprising a first transparent element and a second non-transparent element, in which the degree of transparency of the first element is determined in such a way that the contrast between the non- transparent element and the background consisting of the overlap of the video image and the transparent element is such as to make the non-transparent part of the graphic element easily perceptible to the human eye.
  • the transparent element will also be called the rest element to indicate that the non-transparent element rests thereon (without limitations on the shape and/or extension of the surface).
  • the non-transparent element comprises for example a textual element and/or a logo, and in general it is that part of the graphic element for which a high degree of perception is to be obtained.
  • the degree of transparency of the rest surface is preferably determined dynamically, in particular on the basis of the characteristics of the video images, in order to guarantee a high level of perceptibility of the non- transparent element even if the photometric characteristics of the video images vary over time (therefore, the dynamism lies in the variation of the degree of transparency depending on variations of the photometric characteristics between two images of the video). In this way, a comfortable viewing of the video can be obtained, which therefore tires the eyes less out, making the transparent elements easily perceptible, without excessively hindering the vision of the images on the screen.
  • the degree of transparency is determined in such a way as to satisfy a predetermined minimum contrast threshold so as to make the non-transparent part of the graphic element better visible (in the sense of best perceptible) on the video stream.
  • FIG 7 a first embodiment will now be illustrated related to a method for overlaying at least one graphic element on a video stream.
  • the graphic element comprises a first transparent element and a second non-transparent element.
  • the second non- transparent element is overlapped to the first transparent element.
  • the graphic element can be obtained by overlapping the first transparent element pixel by pixel to the second non-transparent element, and/or by combining vector elements representing the first and second element, etc. Therefore, the overlap indicates that the non-transparent element rests on the transparent element regardless of how the graphic element is generated.
  • the transparent element and the non-transparent element can be provided distinctly and overlaid separately to the video image (first the one and then the other one, or vice versa, the overlaying order being irrelevant).
  • the graphic element is obtained starting from the first transparent element and the second non-transparent element, and how these are overlaid, as long as the result is an image in which the graphic element is overlaid to the video.
  • the transparent element is for example represented by an area filled with one or more colours characterized by a degree of transparency such as not to hide the part of the image underlying this area.
  • the transparent element is also referred to in the following as a rest element or rest surface since the non-transparent element rests thereon.
  • the non-transparent element is represented for example by a logo and/or by a textual element, preferably but not necessarily having a uniform colour.
  • the non-transparent element is what it is wished to make easily perceptible to the user depending on the background image to which it is to be overlapped, and regardless of the meaning possibly associated therewith.
  • Figure 8(a) shows, for illustrative purposes only, a graphic element Gi composed of a background Zi having a certain transparency p and a text Ti, representing an example of the non-transparent element.
  • the text comprises the three letters "abc", but the content or meaning of the element Ti is not relevant.
  • an index of photometric values I L ,j is determined at at least one area of an image Aj of the video stream.
  • Figure 8(b) shows an example of an image Ii of a video stream (thus comprising other images ⁇ ... Ii-1, Ii, Ii+1, ... ⁇ ), in which I L , j indicates the index of photometric values of that area.
  • the photometric values indicate the characteristic parameters of the light radiation emitted at one or more pixels of this area; such parameters comprising for example colourimetric, intensity, luminance, chrominance parameters, etc.
  • the area Aj for which the photometric values are determined can be predetermined, statically and/or dynamically chosen on the basis of certain criteria, or randomly chosen; however, it is preferable that this area represents, comprises or is part of an area on which the graphic element is to be overlapped.
  • the index of photometric values can directly represent the values themselves, and/or the values corrected by means of predetermined factors, and/or an index of a scale of photometric values, etc.
  • the area Aj can be provided as input to the system, and/or it can be determined in various ways as explained below in an example illustrated with reference to figure 12.
  • a degree of transparency pi of the first transparent element Zi is determined on the basis of the aforesaid index and a constraint, wherein the constraint indicates that a degree of contrast in at least one resulting area Aj is not lower than a predetermined contrast threshold.
  • the resulting area Aj represents an area obtained by overlapping, on the at least one image area Aj described above, the graphic element Gi (at least the transparent element Zi comprised in Gi) to which the degree of transparency is applied.
  • the degree of transparency of the first transparent element Zi is determined in such a way that it satisfies the condition that the contrast at the resulting area is not lower than a predetermined (minimum) contrast value.
  • the degree of transparency is determined so that there is a minimum level of contrast between the graphic element, and in particular the non-transparent element, and the image of the video on which the graphic element has been overlaid (at least a part of the image to which the transparent element has been overlaid).
  • the degree of transparency can be determined by solving equations representing the photometric properties of the areas discussed or at least some of the pixels contained therein.
  • Figure 8(c) schematically illustrates the resulting area Aj', dashed in the figure, to indicate that set of pixels that fall within the starting area Aj, but which have now been modified as a consequence of the overlay of the graphic element Gi (or at least modified by the addition of Zi).
  • the resulting area Aj' is indicated as an area comprised in the area Aj; however, the area Aj' can coincide with the whole area Aj.
  • the area Aj comprises the non-transparent element; preferably, also the area Aj' comprises the non- transparent element.
  • the area Aj' comprises or encloses the letter "a”; however, the Aj' area could enclose any other letter comprised in the string "abc" or combination of letters comprised therein.
  • step S30 the graphic element Gi is placed as overlay (or more briefly, overlaid) to the image Ii of the video stream, the graphic element applying the degree of transparency pi to the non-transparent element Zi determined at step S20.
  • Figure 8(d) schematically represents an example, in which the graphic element Gi is overlaid to the image Ii; by way of illustration, the resulting area is indicated in dashed line.
  • step S30 does not necessarily have to be performed by the same device or entity that performs step S20, for example.
  • step S20 can be performed by a server device, while step S30 by a user device which renders the graphics locally; in another example, step S30 is performed through a server-user device interaction; in another example, step S20 is performed by a first server device distinct from a second server device which instead performs step S30.
  • step S30 can be omitted, or replaced by a step in which the degree of transparency as determined in step S20 is provided as output.
  • the photometric values of the video image vary from image to image, or more generally between two different images of the video stream, and since the degree of transparency is determined taking into account also these properties of the background image (and that these can vary), it is possible to achieve a minimum contrast that makes the graphic element easily perceptible compared to the current image on which it is overlaid.
  • a dynamic overlay of a graphic element on a video stream can be obtained, in which the dynamic overlay is obtained by the dynamic determination of the degree of transparency depending on the variation of a photometric index between two different images of the stream. Consequently, the perception of the graphic element is facilitated without excessively occluding the vision of the underlying image.
  • the transparent element does not necessarily have to be represented by a single colour uniformly applied to the rest surface; in fact, the level of transparency can vary in the rest plane according to the photometric index of the underlying part and/or next to the non-transparent element.
  • the degree of contrast indicates a degree of contrast, within the resulting area Aj, between the non- transparent element Ti and the overlay of the first transparent element (Zi) to the image Ii of the video.
  • the degree of contrast can be represented by a contrast index between the non-transparent element and the pixels next thereto, which are the result of the overlay of the transparent part of the graphic element Gi to the image Ii.
  • the degree of contrast can be represented by a contrast index between the non- transparent element and a point representative of the area Aj as explained further on with reference to an example (in which reference to area probe as example of the area Aj and/or the area Aj will be made).
  • the at least one area Aj (and consequently also the at least one resulting area Aj corresponding to that one) comprises at least a part of the non-transparent element.
  • the calculation of the degree of transparency with respect to the non-transparent element is optimized.
  • the contrast can be calculated by taking into account at least a part of the non-transparent element next to the area Aj, or by simply referring to a colour representative of the non-transparent element.
  • the video stream comprises two or more images, and the image Ii mentioned above represents each of said two or more images.
  • the method illustrated above can be performed at two or more images of the video stream.
  • an index of photometric values is determined for each of the two images (for each respective area of said two images), and when there is a variation between these indices (for example higher than a certain threshold), the transparency index is determined again.
  • the video stream comprises a plurality of images
  • the step of determining an index of photometric values is optionally performed on each of two or more images comprised in the video stream.
  • These two images can be consecutive or separated, for example by a certain number of images, for example equal to N SkiPl according to the example of figure 9 described below.
  • the photometric index can be calculated for each video image, or for some of the video images spaced apart between them by a predetermined number of images or by a random number of images, or selected on the basis of other criteria.
  • the video stream comprises a plurality of images
  • the overlaying step is optionally performed on each of two or more images comprised in the video stream using the same degree of transparency.
  • this graphic element can be applied in the same way to a plurality of successive images.
  • the number of images to which the element is applied can be predetermined, or depend on other conditions; for example, when the photometric index of a new image deviates significantly from the photometric index of one of the previous images, the degree of transparency can be calculated again, and then applied.
  • there is a minimum number of images to which the same degree of transparency is applied in which the minimum number may depend for example on the processing capacity necessary to perform the determination of the photometric parameters and to calculate the corresponding degree of transparency.
  • the step of determining an index is applied to two or more images comprised in the video stream, and the step of determining an index is performed on at least two images and the overlaying step is performed on at least two images; in this optional variant: (a) at least one image between the at least two images related to the step of determining an index is the same as an image between the at least two images of the overlaying step, and/or (b) the at least two images related to the step of determining an index are the same at least two images related to the overlaying step.
  • the index is determined for a first image to which the correspondingly calculated degree of transparency is also applied, while for a second image the respective index is calculated without, however, necessarily determining a new degree of transparency (for example because the photometric index has not changed substantially) , in which case the previously applied overlap continues to apply.
  • an index is determined, therefore a respective degree of transparency, and then the graphic element is overlapped with the transparency index relative to the respective image.
  • the video stream comprises a first image and a second image following the first image within the video stream
  • the step of determining a degree of transparency is performed upon determining a photometric value for the second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
  • the degree of transparency is then determined again and the graphic element again overlaid with the new degree of transparency.
  • the steps S20 and/or S30 of figure 7 are performed (again) at a second image that has a different photometric index (taking into account the aforesaid threshold) from the photometric index of a first image to this previous one.
  • the overlap of the graphic element to an image of the video stream comprises overlapping the graphic element to one or more images following said image.
  • one or more images can correspond to a predefined and set number (or preset), for example on the basis of the available computing resources.
  • the one or more images may correspond to a variable number N, wherein the number N depends on the occurrence of a variation (except for a certain threshold, see above) in the index of the image Ii +N with respect to the index of the image Ii.
  • the at least image area Ii comprises a plurality of image areas, wherein determining an index of photometric values comprises determining a respective indication (index) of photometric values for each of the plurality of areas; determining the degree of transparency of the graphic element comprises determining the degree of transparency of the first transparent element on the basis of each respective indication and of the constraint (as described above, and applied for each of such areas); the constraint therefore comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold; moreover, each respective resulting area corresponds to one of the plurality of areas to which the graphic element having said degree of transparency is overlapped.
  • a photometric index is calculated for each area (also called probe or area probe in some examples below), and thus the degree of transparency for each area taking into account that the contrast of each area must be not lower than the threshold predetermined as imposed by the constraint.
  • the transparency value to be applied can therefore be determined on the basis of the calculations carried out on each area, for example taking into account the minimum contrast value determined between all areas, or an average value, etc. Below is an example in this respect.
  • the transparency level in an area Aj' enclosing the letter "a” can be different from the transparency level of another area Aj' enclosing the letter (c) (this area is not illustrated for simplicity's sake).
  • different areas are chosen and evaluated, and then a transparency index is applied to the entire rest plane.
  • determining an indication of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a point representative of that area.
  • a few pixels can be chosen at random, or one pixel for each group of M pixels.
  • the representative pixel can be the one having an average colour; more representative pixels can be chosen, for example those having three maximum and minimum photometric values, etc.
  • the at least one area is a portion of the image on which to overlap at least a part of the second non-transparent element.
  • the area for which the index is determined is an area in which the non-transparent element (textual and/or logo) overlaps.
  • the predetermined contrast threshold is within a predetermined interval and/or not outside a predetermined interval of contrasts, wherein preferably said predetermined interval is defined by the values 4.5:1 and 7:1. These values can be chosen on the basis of appropriate standards or recommendations of specialized study groups (as discussed below) or determined on the basis of a panel of users (for example, based on the response of a panel of users specialized in evaluating video images).
  • the predetermined contrast threshold is equal to 4.5:1; in this case, a degree of transparency will be chosen that produces a contrast as close as possible to 4.5:1, ideally 4.5:1 (but not less, in this example).
  • the determination of the degree of transparency is carried out at a predetermined interval of frames, preferably equal to a submultiple of the transmission frequency of the video stream.
  • transmission frequency it is meant the frequency of images broadcast in units of time, such as expressed in frames per second (fps). Having a submultiple of the transmission frequency ensures that at each instant of time t, there is always a frame to be processed. Otherwise, reference should be made for example to the immediately preceding frame.
  • the streaming occurs at 60 fps, and 7 fps are chosen for the processing in question, after the time 0, the first processing will take place at the instant 0.14 sec: this happens between frame 8 (0.133 sec.) and frame 9 (0.15); in that case, frame 8 could be chosen.
  • frame 10 frame 20, etc. can be processed, noting that all such frames are available at the time in which the processing takes place. As evident, this is beneficial but not essential.
  • the term is used optionally to indicate optional variants of the method of the first embodiment. It is possible to combine two or more of these variants in any way.
  • a graphic element G i is provided to be overlapped to an image Ii.
  • the graphic element Gi comprises a non-transparent element Ti and a transparent element (rest element) Z i .
  • a photometric index is determined at at least an area Aj of the image Ii. This index can be determined for example as discussed above with reference to step S10 of figure 7.
  • step S115 in the case where the absolute value of the difference between I L,i-k and I L ,i is not lower than a threshold, then it is determined that the photometric value of the image I i subsequent to I i _ k has changed by an amount that is sufficient for the calculation of a new degree of transparency to be justified; in this case, the method proceeds to step S120.
  • step S120 the degree of transparency for the graphic element (or better, of the rest element Zi) relative to the image Ii is determined.
  • the degree of transparency can be calculated as explained for example with reference to step S20 of figure 7.
  • step S130 the graphic element Gi is applied, overlaid to the image Ii, using the degree of transparency calculated in step S120.
  • the first embodiment is directed to a method. All of the above considerations and variations apply to devices or entities, such as an entity for overlaying at least one graphic on a video stream, as well as systems comprising the overlaying entity and at least one terminal, processor programs, processor program support, signals and other examples and embodiments as also illustrated hereinafter. Where different details are omitted for the sake of brevity, all the above remarks apply equally and/or correspondingly to what follows and vice versa.
  • the entity comprises a photometric value determination unit (210), a transparency degree determination unit (220), an overlap unit (230).
  • the photometric value determination unit (210) is configured to determine an indication of photometric values at at least one area of an image of said video stream.
  • the transparency degree determination unit (220) is configured to determine a transparency degree of said first transparent element on the basis of the indication and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold; the at least resulting area corresponds to the area of the image on which the graphic element having said degree of transparency is overlapped.
  • the overlay unit (230) is configured to overlap, to an image of the video stream, the graphic element by applying the determined degree of transparency to the first non-transparent element.
  • image Ii is provided as input to the entity 200, for example after being extracted from a video stream (for example in MPEG, AVC format, etc.) to be processed; after being processed, the image is provided as output (point 0 in the figure) and then entered into a video stream for distribution.
  • information related to the graphic element to overlay is provided at the output point OUT; this information comprises the graphic element, or parameters related to the graphic element (such as the degree of transparency; in fact, in one example, the degree of transparency is simply provided at the output point OUT, in which case the unit 230 can be omitted).
  • such information related to the graphic element to be overlaid can be sent to another device which composes them on the video stream to be distributed to one or more user devices; in another example, the information related to the graphic element to be overlaid is sent to the user device or devices that render the on-screen graphics locally. In another example, the on-screen graphics are generated through a client-server interaction that takes this information related to the graphic element into account.
  • the unit for determining the degree of transparency 220 is preferably provided with information related to the starting graphic element (i.e. the one for which the degree of transparency is to be determined).
  • the overlay entity 200 may comprise further units (and/or in combination with the units 210-230) configured to perform any one or any combination of the steps illustrated above, or to implement what is discussed below.
  • a system comprising an entity according to the second embodiment (for example as illustrated in figure 10), and at least one user terminal connectable to this entity via a communication network.
  • This terminal for example, TV, smartphone, computer, tablet, etc.
  • This terminal will then reproduce the video stream in which the graphic element is overlaid as according to the invention and/or as set forth in the present description.
  • a computer program is provided which is set up to perform, when said program is run on a computer, any combination of the steps according to any one of the methods and/or examples of the invention, and/or as set forth in this description.
  • Figure 11 illustrates a block diagram exemplifying a computer (500) capable of running the aforesaid program.
  • the computer (500) comprises a memory (530) for storing the instructions of the program and/or data necessary for the performance thereof, a processor (520) for performing the instructions and an input/output interface (510).
  • figure 11 is illustrative and non limiting, because the computer can be made both in a concentrated manner in a device or in a distributed manner on several interconnected devices. For this reason, the program can be run locally on a concentrated (local) or distributed device.
  • a support for supporting a computer program set up to perform, when the program is run on a computer, a step or combination of steps according to the method described in the first embodiment.
  • a medium are a static and/or dynamic memory, a fixed disk or any other medium such as a CD, DVD, Blue Ray.
  • Comprised in the medium there is also a means capable of supporting a signal constituting the instructions, including a means of cable transmission (Ethernet, optical cable, etc.) or wireless transmission (cellular, satellite, digital terrestrial transmission, etc.).
  • the contrast is improved on the basis of the colourimetric characteristics of the image and of the text in overlay using a surface (rest element) in transparency overlay on the video image, as shown in figure 12.
  • the background an example of the transparent element
  • the level of transparency is decided on the basis of the resulting level of contrast with respect to the text (an example of the non-transparent element), also with a fixed colour, such as white.
  • the contrast between text and background is defined as where L(P) and L(Q) are the luminances of the two colour points P and Q, wherein P and Q are any two colours, for example the first assigned to the text and the second to the background, and whose contrast is to be calculated (below, the colour points P and Q are also called T and S(p) .
  • the luminances of the two colour points P and Q are calculated as wherein (R,G,B) are the three components of the colour whose values are calculated as and are the components in the sRGB colour space (noting that other colour spaces can be used).
  • the luminances L(P) and L(Q) are examples of the photometric index described above.
  • the contrast is a function of the level of transparency p of the rest element as the colour of the text T is given.
  • the contrast C(p) ranges between a minimum of 1:1 and a maximum of 21:1.
  • this ratio must be lower than a minimum threshold.
  • Typical threshold values are 4.5:1 and 7:1.
  • the colour of the text and the colour of the rest element are chosen in such a way as to ensure that this threshold is largely exceeded.
  • white text on a black background allows obtaining a maximum contrast ratio of 21:1. Therefore, an increase in contrast is obtained by reducing the level of transparency p of the rest element.
  • Each of these reference areas, or probe is an example of an area Aj. In other words, for each probe:
  • the minimum can be chosen as this guarantees that the contrast level is respected for all probes.
  • statistical ones such as the median, truncated mean, a quartile or in general any other value representative of the probe.
  • the number, the arrangement, and the size of probes are system parameters.
  • the value p P is determined in a similar way for dark text on light background so where
  • This value is calculated with a frequency every F frames, where F is a submultiple of the video transmission frequency (see above with reference to the video transmission frequency, i.e. the number of frames in time units).
  • the algorithm adopted in this example is composed of the following steps:
  • a probe (or area Aj, or resulting area Aj') preferably contains a transparent element or is an area in which the transparent element can preferably or potentially be positioned.
  • the probe is a portion of the frame with respect to which the contrast ratio is calculated with the colour of the text and the transparency of the rest element.
  • the pixels that end up under the text are generally not of particular interest, as they will be covered by the text itself; however, this does not exclude the possibility of selecting a probe that will also be partially covered by the text, as also described below for illustrative purposes.
  • the probes can be chosen at random. In another example, probes are chosen that are equally distributed in the space where the graphic element or at least the non-transparent element will be inserted. In another example, a probe is chosen if it also contains a non-transparent part of the element. In another example, a probe is chosen among other candidate areas if it has a photometric index higher than (or equal to) a certain threshold. In another example, an area is chosen as the probe if it is close to where the text will be positioned. Two or more of these examples can be combined. For each probe: a.The representative average point of the probe is calculated . b.The value of the parameter p is determined to realize the alpha-blending with the colour of the transparent element that is able to satisfy the minimum requirement of contrast with the text. The p value represents an example of the degree of transparency .
  • the one with the minimum value is chosen, as it guarantees that the contrast is satisfied in reference to each point.
  • the non-transparent part of the graphic element is made easily perceptible for how the human visual perceptual system is physiologically structured.
  • the first part of the present disclosure it is optionally possible to combine the first part of the present disclosure with the second part thereof.
  • the background on which the graphics must be positioned is different from the background on which the graphics were previously positioned or with respect to the background for which the graphics were originally conceived. Therefore, by combining the dynamism of transparency as described in the second part with the positioning dynamism as described in the first part, it is possible to improve the overall dynamism of the graphics.
  • the graphics can be dynamically controlled in order to be positioned in a dynamically optimal and dynamically perceptible position at best.

Abstract

A method (and corresponding entity, system, computer program) is described for determining a positioning indication indicating a position at which a graphic element is to be overlaid on a video stream comprising an image. The method comprises the steps of: - performing (S10) a computational analysis on said image to determine at least one between at least one object present in the image and at least one perception zone; - determining (S20), on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; determining (S30), on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.

Description

DESCRIPTION of the invention having the title:
"Smart Overlay: positioning of the graphics with respect to reference points"
BACKGROUND OF THE INVENTION
In the transmission of television channels or streaming video it is increasingly common to associate graphic elements with the video in progress, such graphic elements sometimes comprising text. Regardless of the presence and/or possible textual content (in the sense of the meaning of the text) comprised in the graphic element, the problem arises on how to position the graphic element on the screen. One solution is to place the graphics alongside the video images, which usually involves reducing the size of the video to make room for the graphic element. In the case of a television program, for example, it is possible to reduce the size of the video in order to create a black L-shaped zone around the resized video, in which L-shaped zone there can be inserted one or more graphic elements. In other solutions, the graphic element is overlapped on the video, at least partially obstructing the vision of the video itself, and therefore hindering the vision thereof.
The known techniques therefore do not make the video images easily perceptible, thus making the perception of the same more difficult.
SUMMARY OF THE INVENTION
One of the objects of the present invention resides in improving the known solutions or obviating one or more of the problems present in the known solutions. The object is reached by the independent claims. Advantageous embodiments are defined by the dependent claims. Further examples are provided in this text for explanatory purposes as well.
El. Method for determining a positioning indication indicating a position at which a graphic element is to be overlaid on a video stream comprising an image, the method comprising the steps of:
- performing (S10) a computational analysis on said image to determine at least one between at least one object present in the image and at least one perception zone;
- determining (S20), on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; determining (S30), on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
E2. Method according to the explanatory example El, wherein performing a computational analysis on said image to determine at least one object present in the image comprises using a neural network to determine said at least one object preferably comprised in a predefined set of objects.
E3. Method according to any one of the preceding explanatory examples, wherein performing a computational analysis on said image to determine at least one perception zone comprises performing a computational analysis of visual attention. E4. Method according to the explanatory example E3, wherein performing a computational analysis of visual attention comprises determining a saliency map, wherein said at least one perception zone preferably comprises a section of the image at which the saliency map indicates a probability of visual perception of a user exceeding a perception probability threshold.
E5. Entity according to the explanatory example E3 or E4, wherein the computational analysis of visual attention comprises determining said perception zone on the basis of characteristics of the pixels comprised in said portion.
E6. Method according to any one of the preceding explanatory examples, wherein determining (S30) the positioning indication comprises determining a plurality of non-reference portions represented by regions of the image not comprising the reference portion, and determining the positioning indication as a position indication in one of said non-reference portions.
E7. Method according to any one of the preceding explanatory examples, wherein said image reference portion comprises a portion of the image containing at least a part of said at least one determined object and at least a part of said one determined perception zone.
E8. Computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the explanatory examples of method El to E7.
E9. Entity (300) for determining a positioning indication indicating a position at which a graphic element is to be overlaid on a video stream comprising an image, the entity comprising: an analysis unit (310) configured to perform a computational analysis on said image to determine at least one between at least one object present in the image and at least one perception zone;
- a processing unit (320) configured to determine, on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; a positioning determination unit (330) configured to determine, on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
E10. Entity according to the explanatory example E9, in which the processing unit (320) is configured to perform a computational analysis on said image to determine at least one object present in the image using a neural network to determine said at least one object preferably comprised in a predefined set of objects.
Ell. Entity according to the explanatory example E9 or E10, wherein the processing unit (320) is configured to perform a computational analysis on said image to determine at least one perception zone by performing a computational analysis of visual attention.
E12. Entity according to the explanatory example Ell, in which the computational analysis of visual attention comprises determining a saliency map, in which said at least one perception zone preferably comprises a section of the image at which the saliency map indicates a probability of a user's visual perception exceeding a perception probability threshold.
E13. Entity according to the explanatory example Ell or E12, wherein the computational analysis of visual attention comprises determining said perception zone on the basis of characteristics of the pixels comprised in said portion.
E14. Entity according to any one of the explanatory examples E9 to E13, wherein the positioning determination unit (330) is further configured to determine a plurality of non-reference portions represented by regions of the image not comprising the reference portion, and determine the positioning indication as a position indication in one of said non-reference portions.
E15. Entity according to any one of the explanatory examples E9 to E13, wherein said image reference portion comprises a portion of the image containing at least a part of said at least one determined object and at least a part of said one determined perception zone.
E16. System comprising an entity according to any one of the explanatory examples E9 to E15, and a user device configured to display a video stream with said graphic element overlaid. E17. Method according to any one of the explanatory examples El to E7, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped on the first transparent element (Zi), the method comprising also the steps of: - determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image; overlaying (S30), to an image of the video stream, the graphic element by applying, to the first non-transparent element, the degree of transparency determined.
FI. Method for overlaying at least one graphic element onto a video stream comprising at least one image, wherein the graphic element (Gi) comprises a first transparent element
(Zi) and a second non-transparent element (Ti) which is overlapped to the first transparent element (Zi), the method comprising the steps of:
- determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image; overlaying (S30), to an image of the video stream, the graphic element by applying, to the first transparent element or to the second non-transparent element, the degree of transparency determined.
F2. Method according to the explanatory example FI, wherein the degree of contrast indicates a degree of contrast, within said at least resulting area, between the second non- transparent element (Ti) and an overlay of the first transparent element (Zi) to said image (II).
F3. Method according to one of the explanatory examples FI or F2, wherein the video stream comprises a plurality of images, and wherein the step of determining an index of photometric values is performed on each of two or more images comprised in said plurality.
F4. Method according to one of the explanatory examples FI to F3, wherein the video stream comprises a plurality of images, and wherein the overlaying step is performed on each of two or more images comprised in said plurality using the same degree of transparency.
F5. Method according to one of the explanatory examples FI or F4, wherein the video stream comprises a plurality of images, and wherein the step of determining an index is performed on at least two images and the overlaying step is performed on at least two images, and wherein at least one image between the at least two images related to the step of determining an index is the same as an image between the at least two images of the overlaying step, or the at least two images related to the step of determining an index are the same at least two images related to the overlaying step.
F6. Method according to one of the explanatory examples FI to F5, wherein the video stream comprises a first image and a second image following the first image within said video stream, and wherein the step of determining a degree of transparency is performed upon determining a photometric value for said second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
F7. Method according to any one of the explanatory examples FI to F6, wherein overlaying the graphic element on an image of the video stream comprises overlapping the graphic element on one or more images following said image.
E8. Method according to any one of the preceding explanatory examples, wherein the at least one area of said image comprises a plurality of areas of said image, and wherein
- determining an index of photometric values comprises determining a respective index of photometric values for each of the plurality of areas;
- determining the degree of transparency of said graphic element comprises determining the degree of transparency of said first transparent element on the basis of each respective index and of the constraint, wherein the constraint comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold, wherein each respective resulting area corresponds to one of said plurality of areas to which said graphic element having said degree of transparency is overlapped. F9. Method according to any one of the explanatory examples FI to F8, wherein determining an index of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a point representative of said area.
F10. Method according to any one of the explanatory examples FI to F9, wherein said at least one area is a portion of the image on which to overlap at least a part of the second non- transparent element.
Fll. Method according to any one of the explanatory examples FI to F10, wherein the predetermined contrast threshold is comprised in a predetermined interval, wherein preferably said predetermined interval is defined by the values 4.5:1 and 7:1. F12. Method according to any of the explanatory examples FI to Fll, wherein the predetermined contrast threshold is equal to 4.5:1.
F13. Method according to any one of the examples FI to F12, wherein the determination of the degree of transparency is carried out at a predetermined interval of frames, preferably equal to a submultiple of the transmission frequency of the video stream.
F14. Computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the explanatory examples of method FI to F.
F15. Entity (200) for overlaying at least one graphic element (Gi) on a video stream, wherein the graphic element (gi) comprises a first transparent element (Zi) and a second non- transparent element (Ti) which is overlapped on the first element transparent, the entity (200) comprising: - a photometric value determination unit (210) configured to determine an index of photometric values at at least one area of an image of said video stream;
- a transparency degree determination unit (220) configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
- an overlay unit (230) configured to overlap, to an image of the video stream, the graphic element by applying, to the first transparent element or to the second non-transparent element, the degree of transparency determined.
F16. Entity (200) for the overlay according to the explanatory example F15, in which the degree of contrast indicates a degree of contrast, within said at least resulting area, between the second non-transparent element (Ti) and an overlay of the first transparent element (Zi) to said image (Ii).
F17. Entity (200) for the overlay according to the explanatory example F15 or F16, in which the video stream comprises a plurality of images, and in which the transparency degree determination unit (220) is further configured to determine an index of photometric values is performed on each of two or more images comprised in said plurality.
F18. Entity (200) for the overlay according to any one of the explanatory examples E15 to E17, in which the video stream comprises a plurality of images, and in which the overlay unit (230) is configured to overlay on each of two or more images comprised in said plurality using the same degree of transparency.
F19. Entity (200) for the overlay according to any one of the explanatory examples F15 to F18, wherein the video stream comprises a plurality of images, the photometric value determination unit (210) is configured to determine an index on at least two images and a transparency degree determination unit (220) is configured to operate on at least two images, and in which
- at least one image between the at least two images subjected to the photometric value determination unit (210) is the same as an image between the at least two images subjected to the transparency degree determination unit (220), or the at least two images subjected to the unit for determining the photometric values (210) are the same at least two images subjected to the unit for determining the degree of transparency (220).
F20. Entity (200) for the overlay according to any one of the explanatory examples from F15 to F19, in which the video stream comprises a first image and a second image following the first image within said video stream, and in which the determination of degree of transparency (220) is configured to determine the degree of transparency upon determining a photometric value for said second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
F21. Entity (200) for the overlay according to any one of the explanatory examples F15 to F20, wherein overlapping the graphic element on an image of the video stream comprises overlapping the graphic element on one or more images following said image. F22. Entity (200) for the overlay according to any one of the explanatory examples F15 to F21, wherein the at least one area of said image comprises a plurality of areas of said image, and wherein
- the photometric value determination unit (210) is configured to determine a respective index of photometric values for each of the plurality of areas; the transparency degree determination unit (220) is configured to determine the transparency degree of said first transparent element on the basis of each respective index and of the constraint, wherein the constraint comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold, in which each respective resulting area corresponds to one of said plurality of areas on which said graphic element having said degree of transparency is overlapped.
F23. Entity (200) for the overlay according to any one of the explanatory examples F15 to F22, wherein determining an index of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a representative point of said area.
F24. Entity (200) for the overlay according to any one of the explanatory examples from F15 to F22, in which said at least one area is a portion of the image on which to overlap at least a part of the second non-transparent element.
F25. Entity (200) for the overlay according to any one of the explanatory examples F15 to F24, wherein the predetermined contrast threshold is comprised in a predetermined interval, wherein preferably said predetermined interval is defined by the values 4.5:1 and 7:1.
F26. Entity (200) for the overlay according to any of the explanatory examples F15 to F25, where the predetermined contrast threshold is equal to 4.5:1.
F27. Entity (200) for the overlay according to any one of the explanatory examples F15 to F26, wherein the transparency degree determination unit (220) is configured to determine the degree of transparency at a predetermined interval of frames, preferably equal to a submultiple of the transmission frequency of the video stream.
F28. Entity (200) for determining a degree of transparency to be applied to at least one graphic element (Gi) on a video stream, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) overlapped on the first element transparent, the entity (200) comprising:
- a photometric value determination unit (210) configured to determine an index of photometric values at at least one area of an image of said video stream;
- a transparency degree determination unit (220) configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
- a transmission unit configured to send, to an entity for overlaying at least one graphic element (Gi) on a video stream, the degree of transparency determined. F29. System comprising an entity according to any one of the explanatory examples F15 to F28, and a user device configured to display a video stream with said graphic element overlaid. F30. Method for overlaying at least one graphic element onto a video stream comprising at least one image, wherein the graphic element (Gi) comprises a first transparent element
(Zi) and a second non-transparent element (Ti) which is overlapped to the first transparent element (Zi), the method comprising the steps of:
- determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image;
- providing (S30,) a device for overlaying the at least one graphic element, with the degree of transparency determined. For example E30, one or more of the examples from E2 to E13 in any combination can apply.
It is noted that any one of the examples F2 to F14 (or any combination thereof) can be applied to example E17.
LIST OF FIGURES
Figure 1 is a flowchart representing a method according to an embodiment of the present invention; figure 2(a) reproduces a screenshot (still image) of an image of a video stream; figure 2(b) schematically illustrates objects detected in the image illustrated in figure 2(a); figure 2(c) illustrates an example of the starting image 2 (a) after having undergone an analysis aimed at determining perception zones; figure 2(d) illustrates an image of the video stream in which the graphic element (or even the graphics) has been positioned avoiding overlapping the reference element (represented by the areas occupied by the three players in the example); figure 3 illustrates a block diagram of an entity according to an embodiment of the present invention; figure 4 illustrates a block diagram of a computer adapted to run a program according to an embodiment of the present invention; figure 6A illustrates by way of example the result of a processing carried out on the image of figure 2(a), in which the processing comprises recognising objects; figure 6B illustrates by way of example the result of a processing carried out on the image of figure 2(a), in which the processing comprises obtaining a saliency map; figure 6C illustrates, by way of example, the result of a processing carried out on the image of figure 2(a), in which the processing comprises obtaining the shapes of recognised objects and/or of the portions that enclose (at least partially) these shapes.
Figure 7 illustrates a flow chart according to an embodiment of the present invention; figure 8(a) illustrates by way of example a graphic element composed of transparent and non-transparent elements; figure 8(b) illustrates an area comprised in an image of the video stream; figure 8(c) illustrates a resulting area obtained by overlapping a graphic element within the area of figure 8(b); figure 8(d) illustrates the image on which the graphic element is overlapped; figure 9 illustrates a flow chart according to a variant of an embodiment of the present invention; figure 10 illustrates a block diagram according to an embodiment of the present invention; figure 11 illustrates a block diagram of a computer adapted to run a program according to an embodiment of the present invention; figure 12 illustrates a screenshot illustrating the operation of the present invention.
DETAILED DESCRIPTION
As mentioned, graphic elements possibly comprising textual parts are overlapped to the images of the video (of a television channel, of a streaming, etc.): think for example about the case of overlaid titles (banners) in the lower part of the screen during the transmission of a news program, about the overlaid banners at sporting events in which, for example, the statistics of the event in progress are reported, or about other banners comprising logos optionally together with parts of text. Such a graphic element completely or at least partially overlaid on the video hides (if this is transparent, or partially transparent) the part of the images that it overlaps; this hinders the perception of the images, especially if the dimensions of the graphics are not negligible or if they are not located in a position on the edge of the image. The graphic element can be sometimes positioned in a position on the screen that is considered not cumbersome, such as the left side; upon a movement of the scene, however, this position could become cumbersome and therefore no longer ideal, such that the scene is occluded and the use of the video is made difficult. Other times the graphics could be placed in an area of the image considered of little relevance (for example static compared to other zones) and where therefore the graphics would not hinder, at least substantially, the correct use of the video; however, this positioning is not easy to be obtained automatically. This drawback is also present in view of the fact that the characteristics of the images are not easily predictable, and can also vary substantially within a video stream. Having recognized this problem, the inventors have devised a system for automatically positioning a graphic element on the screen without hindering, or hindering to a minimum, the viewing and use of the video. In general, this solution is based on determining (the presence of) at least one object and/or a perception zone within an image of a video stream, and then positioning the graphic element in such a way as to completely or at least partially avoid overlapping the graphics with the determined object and/or with the determined perception zone.
With reference to figure 1, a first embodiment related to a method for determining a positioning indication will now be illustrated. This positioning indication indicates a position at which a graphic element is to be overlaid on a video stream comprising at least one image, and can be expressed in terms of coordinates of the image at which to position the graphics, and/or in terms of an area (for example a quadrant, or an area within a grid of image areas, etc.) in which to position the image. The video stream can be broadcast via a classic broadcast channel (terrestrial, satellite, etc.), and/or in streaming (using for example protocols such as HLS, DASH, etc.).
In step S10, a computational analysis is performed on the image of the stream in order to determine at least one object present in the image and/or at least one perception zone.
The object is preferably a predetermined object (comprising a predetermined and/or known type of object) whose presence and preferably the position within the image is to be determined. For example, the object can be represented by a car of a specific model, or in general by a car; the computation analysis therefore determines the presence of such a car and preferably also the position within the image.
The determination of the object can be carried out through an object recognition technique within the image, as also explained further on by way of example.
The at least one perception zone represents or comprises an image zone in which the visual perception of a user is estimated to be higher than a perception probability threshold. In other words, the perception zone indicates an area of the image on which, at least probabilistically speaking, the user's vision is focused. In fact, the image can be considered as a set of zones (or segments, where the size and/or shape of each zone/segment is irrelevant), in which not all of them are zones in which the human perception is focused. In fact, due to the way in which the human visual system is structured, within an image there may be some areas on which the visual perception has a higher probability of focusing, and other areas on which the visual perception has lower probabilities (relative to the other zones just mentioned) of focusing. The analysis of step S10 therefore determines those perception zones that show a probability that the vision will focus is higher than a certain threshold. This threshold can be established empirically, for example on the basis of a panel of users specialised in viewing video images; in another example, this threshold can be chosen by attributing probability to each area, and thus setting a threshold that chooses only the first N areas, with N for example equal to 3.
In step S20, on the basis of the computational analysis, an image reference portion comprising a portion of said image is determined. This portion of the image comprises the at least one object presents in the image and/or the at least one perception zone as obtained in step S10. In other words, a portion of the image is determined as being a reference if at least one between the determined object or the perception zone determined in the computational analysis step falls within it.
In step S30, on the basis of the reference portion, the positioning indication is determined respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor. The overlap factor preferably corresponds to a percentage of the area of the reference element on which the graphics are not to be overlapped; in one example, this can be zero, meaning no overlap must occur. In another example, this overlap factor may indicate one third of the reference portion, thus indicating that no more than one third of the reference portion may be overlapped by the graphics. This was in fact found as a good compromise, in which video viewing is still acceptable, and which could in fact be advantageous in the presence of large reference portions and/or of graphics with large dimensions and/or small reproduction screen (which may require larger graphics, for example). In other words, in step S30 a positioning indication for the graphics is determined such that the graphics overlap only partially, or do not overlap at all, with the reference portion of the image.
As also described below, the graphics can be positioned in the image immediately following or some image after, depending on the computing power available (in theory, the graphics can already be inserted in the same image on which the analyses were carried out, in case of huge computing resources especially for live events). Furthermore, the graphics can be overlaid by the same device that performed the analysis on the video image, or by another device as illustrated below.
In the description above, an image reference portion has been considered which comprises a single image portion; however, it is conceivable to consider a reference portion that contains a plurality of portions, for each of which portions the presence of an object and/or a perception area has been determined (this case can be obtained, for example, in the presence of objects and/or perception areas detected in different portions or segments of the image). In this case, therefore, the graphics will only partially overlap the areas comprised in such plurality, or will not overlap any of the areas of such plurality. In the case of a plurality of portions for each of which an object and/or a perception area has been detected, it is possible to assign a weight to each of them; preferably, the weight for a certain portion can be determined on the basis of the determined object and/or of the reception zone determined for that portion. Preferably, once the weights have been assigned, a number M of portions having a higher weight can be chosen, in which case the graphics will be partially overlapped or not overlapping only with respect to these M portions.
The above can be illustrated by way of example by referring to an image taken from a football sporting event, see for example the screenshot (still image) reproduced in figure 2A in which three players indicated with 52A, 54A and 56A are visible. By performing a computational analysis on this image, it is possible to determine, for example, the presence of the three players, and at these players to determine three portions of the image marked in figure 2B with 52B, 54B, and 56B, each containing the players 52A, 54A and 56A, respectively. In figure 2B, the portions 52B-56B have a rectangular shape and are such that the identified object (the football player in the example) is entirely comprised in each portion; however, other forms are conceivable. Furthermore, the portion size can be such that it comprised an area around the identified object. In one example, the size and the shape of the portion may be such as to exactly follow the shape of the identified object. In another example, the portion may be such as to enclose a substantial part (e.g. 75%, preferably 90%) of the shape of the identified object. Figure 2C (6C) illustrates an example in which the whole shape displayed on the screen and detected is entirely comprised in the respective portion; however, this is not indispensable, since in fact it is possible to enclose even only a part of the shape in the portion. In any case, once the portions 52B, 54B and 56B, which are portions not to be covered or to be covered only partially, have been identified, it is possible to determine the remaining portions of the screen as portions in which to position the graphics (simply also free portions). On the basis of the free portions, a positioning indication is then automatically determined indicating where to position the graphics, in figure 2(D) represented by textual graphics 58 positioned in the upper right part of the image so as not to interfere with the display of the detected players. In one example, once the size of the graphics to be overlaid is known, it is possible to choose from the free portions the one that is large enough to contain it, and/or use positioning rules such as giving priority to free areas located at the corners, or giving priority to free areas located on the sides (both right and left, both top and bottom, without priority or with priority between them), or accepting that one or more certain objects are partially occluded if there is not a sufficiently large free area, etc. Since the whole is determined automatically, it is possible to position the graphics on the full screen without hindering the viewing of the video even when the visual content of the image changes, thus making full-screen use easily usable.
The computational analysis carried out in step S20 can optionally comprise recognizing and/or classifying one or more objects present in the image (in the example of the figures, the three objects) and, again optionally, determining a saliency map in parallel (the saliency map, if determined, does not necessarily have to be determined in parallel with the other operations; a parallel determination, however, would allow a more efficient overall operation). The information described can then be used jointly in order to determine which areas are occupied by visual attention points of the scene. The joint use of this information makes it possible to identify with increased accuracy a reference portion of the image on which the graphics are not to be overlapped, or on which to partially overlap them.
In another example applied to the image of figure 2A, the computational analysis determines at least one perception zone; referring below for further examples and explanations of this type of analysis, an example includes calculating the contrast between an image zone and areas immediately surrounding it. In fact, due to how the human visual perceptual system is physiologically structured, these zones represent zones that are likely to be focused by the user. In the example under examination, as shown in figure 2C, it is possible to construct an attention map based on the contrast of an area; in other words, it is possible to obtain three different portions 52C, 54C and 56C (as shown in figure 2C) each characterized by a zone (in the example, the zone inside the players' contours) having a strong contrast with respect to immediately adjacent areas; the map (which in this example represents a strong contrast) can be established with respect to a contrast threshold, the value of which can be established empirically (through tests on panels of users trained in video viewing), or through known techniques, for example by considering the statistical distribution of the contrast values. In an example, it is possible to consider binary maps as shown in figure 2D in which the white points are those associated with a contrast value exceeding the threshold, and the lower ones in black.
The attention map can be limited to the objects of interest recognized in the image (i.e. it is not necessary to determine it for the whole image, but only for one or more portions of the image). In the example shown in figure 6A, the three players 52-6A, 54-6A, 56-6A are recognized and classified. The piece of information thus obtained can be combined with the attention map as shown in figure 6B. In other words, the information related to the object recognition (as in the example of figure 6A) can be combined with information related to saliency (as shown in figure 6B) to determine a portion (or more reference portions). In other words, the object recognition information is combined with saliency information to determine in which portions of the image should not overlap graphics (or overlap them only partially). In the example of figure 2C, three portions 52C- 56C corresponding to the football players 52A-56A of figure 2A, respectively, are identified. The considerations made above with regard to figures 2B and 2D, which apply correspondingly, therefore apply.
In the example shown above, reference was made to three portions that form the reference portion on which not to position, or position only partially, the graphics. However, it is possible to prioritize the three portions (based on the type of object and/or rules such as the position of the object, etc.) and consider only some of them. In other examples, only a portion will be detected, i.e. it is not necessary to detect a plurality of portions.
It is noted that in the examples of figures 2A to 2D each of the methods shown can lead to a result of detecting the reference portion (and finally of positioning) which is the same or at least similar. However, there may be cases in which each of these techniques alone does not lead to optimal results, as recognized by the inventors by analysing different types of images. Consider, for example, the hypothetical case of a scene in a film in which there are two characters in two distinct regions of the image and in which the image presents different regions with contrasts that are very different between them. Let us assume, in this example, that the analysis on the objects detects the presence of the two characters. However, the analysis of the perception zone could lead to determining for example three perception zones (i.e. three zones in which the contrast with respect to respectively neighbouring pixels is higher than a certain threshold), but in which only one of these zones coincides or is partially overlapped with a portion comprising one of the two characters detected. In this case, the portion of the screen containing both (at least) a detected object and (at least) a detected perception zone can be determined as the reference portion. Preferably, therefore, the reference portion comprises an image portion in which an area occupied by a detected object (e.g. determined in step S20) is at least partially (preferably completely) overlapped with a detected perception zone (e.g. determined in step S20). In fact, as recognized by the inventors, an object present in the image is not always observed, that is, a detected object is not always the one on which the user is focusing his vision. It is therefore possible to insert graphics on such an object. The combination of the (detected) object and saliency map helps identify those objects on which the vision is focusing, that is, that the user is actually looking at. The graphics should not be overlaid on these. In other words, it is possible to determine as the reference portion (of image) a portion of the image which comprises both the detected object and the determined perception zone (for example, determined to be higher than a certain threshold) ; this reference portion reflects more accurately a zone of the screen not to be occluded since there is a high probability, especially with respect to other portions of the image, that the user is actually looking at and focusing on this reference area. The combination of information on the detected object and on the perception zones also allows establishing a priority among the zones not to be occluded with the graphics, wherein this priority can be useful when deciding where to position the graphics or how to size the graphics (for example so as not to occlude only the zone having higher priority, or only a certain number of zones with higher priority).
In this way, it can be more accurately determined that this reference portion is the one on which the user is focusing. The graphics are then positioned in view of this reference portion, and it is therefore possible to position the graphics on the full screen without hindering, in a particularly accurate way, the viewing of the video even when the visual content of the image changes, thus making the full- screen use easily usable with high accuracy in positioning the graphics.
Preferably, depending for example on the type of video content comprised in the video stream or television program associated with the video stream, it is possible to define a respective set of objects. For example, for a football sporting event, the predefined set of objects contains a football ball, a football goal, a representation of a football player with a ball nearby, etc.; for a tennis sporting event the set for example comprises the half court net, a racket, etc.; for a film, the set for example comprises cars, tools, furniture parts, etc. or even the face of some of the characters/actors (for example by defining a set of objects for each film, or a set of objects representing a certain number of actors or characters). No matter how and on the basis of which criteria the objects comprised in the set are selected, it simply detects that there is at least one object whose presence and preferably the position within an image is to be determined. In the case of a single object, the set can be omitted or made to coincide with the single object.
Optionally, the step S10 of performing a computational analysis on an image to determine at least one object present in the image comprises using a neural network to determine the object. Preferably, the object is a predefined object, for example comprised in a predefined set of objects (in a set of predefined objects). The neural network is preferably trained on the basis of a dataset comprising a number of images, for each of which at least one object is predefined; in other words, for each of these images it is known which object (comprising a specific object or type of object) is contained therein and preferably in which position or region of the image it is located (the same applies in the case where an image of the dataset contains a plurality of predefined objects). Preferably the dataset refers to images comprising predetermined objects belonging to a set. Then the network is trained to recognize at least one object based on the images comprised in such a dataset. The dataset is described as a collection of images by way of example, other representations of the dataset are possible as long as they allow the training described. In one example, the neural network comprises a Fully Convolutional Neural Network (FCNN) type neural network. Other types of neural networks can be used, for example based on deep learning. It should be noted that the recognition of an object within an image does not necessarily have to be obtained by means of a neural network; in fact, other computational methods based on kernel and shape recognition are available (see for example Object Detection and Recognition in Digital Images: Theory and Practice, B. Cyganek, Wiley, 2013, ISBN-13: 978-0470976371) or hierarchical perceptual grouping (see e.g. Hierarchical Perceptual Grouping for Object Recognition: Theoretical Views and Gestalt-Law Applications (Advances in Computer Vision and Pattern Recognition), E. Machaelsen et al., Springer, 2019), etc. Although various methods are applicable, the inventors have found that the use of an FCNN leads to optimizing the accuracy of the results with respect to the specific application.
Optionally, in step S10, the computational analysis on said image to determine at least one perception zone comprises performing a computational analysis of visual attention. As mentioned, the perception zone refers to a zone on which the visual system focuses due to the way the human visual apparatus is physiologically structured, and can be determined through computational attention techniques; a computational analysis of visual attention outputs a portion of an image, and in particular causes that this portion corresponds to a certain degree of probability that a human observer has focused and observed it among other portions of the same image (therefore attention refers to the fact that the user's gaze is estimated to be on that portion of image). The attention analysis comprises the use of a saliency map, as also discussed below, but not only this one, see e.g. "Computational attention systems" in "Computational Visual Attention", S. Frintrop et al.
Optionally, the computational analysis of visual attention comprises determining a saliency map; in this case, the detected perception zone preferably comprises a section of the image at which the saliency map indicates a probability of a user's visual perception exceeding a perception probability threshold. Figures 2C, 6B and 2D are examples of images processed to obtain saliency maps; in these figures, the saliency is visually represented by zones of strong contrast, specifically by (substantially) white zones with respect to neighbouring dark pixels. A saliency map can be represented as the contrast values of the pixels of a certain area (or as the average value of the pixels of that area) with respect to neighbouring pixels; among these areas of the saliency map, those having a value exceeding a certain threshold can be chosen and established that they represent the image reference portion, i.e. the one on which the user is most likely focusing. It should be noted that the visual attention map represents the probability of interest for the eye, which is linked to the metric of the human visual system; in fact, the human visual system has limited capabilities in the perception of images, which is why it processes areas of an image with different priorities. The computational analysis (aimed at determining an area of interest) makes an estimate of the areas on which a human observer focuses his attention, that is, on which the human observer focuses with priority. The analysis is based, for example, on the measurement of contrast between two neighbouring areas and/or on the colour of a certain area compared to neighbouring areas. For example, limited and uniform zones are considered to be perception zones. In one example: a small, uniform, high-contrast zone with respect to the neighbouring ones, and with a certain colour is considered as an area that a human observer focuses with priority over other areas of the same image; a high value can therefore be assigned to this area, for example on a predetermined scale which represents that this area is probably an area on which the user's vision is focused. Parameters considered for the determination of zones of attention comprise, for example, the contrast, the type of colour mostly perceived by the eye, etc.
Preferably, the computational analysis of visual attention comprises determining the perception zone on the basis of characteristics of the pixels comprised in a determined region of the image, preferably of the pixels comprised in the portion itself. The characteristics comprise, for example, for one or more pixels of the image, the contrast, the colour, the grey scale corresponding to the colour of a pixel, etc. Preferably, the computational analysis comprises determining a contrast between a sample region of the image (a set of one or more pixels chosen as a sample from the image) and a region immediately next to it (a set of one or more pixels adjacent to the pixels chosen). Preferably, this operation is repeated on a plurality of sample regions of the image; among these, the region having the highest contrast is determined as the perception zone and therefore comprised in the image reference portion. In another example, two or more sample regions having a contrast higher than a certain contrast threshold are determined as attention zones and comprised in the image reference portion.
Optionally, step S30 of determining the positioning indication comprises determining a plurality of non-reference portions represented by regions of the image not comprising the reference portion; in other words, the regions of the image not constrained by the non-overlap or partial overlap are determined. These non-reference portions are also called free portions for simplicity's sake, since the graphics can be freely overlapped to them, see also what has been said above with reference to the example of figures 2(a) - (d). Further, step S30 may comprise determining the positioning indication as a position indication in one of said non-reference portions. In other words, the positioning indicator indicates in which free portion the graphics are to be overlapped. This indicator can be represented by the coordinates of one or more free portions, wherein in the case of several free areas it is preferable that they are adjacent. If these free areas are not adjacent, it is possible to choose one large enough to contain the graphics; if this condition did not occur (for example because all the free areas are not large enough to contain the graphics), it is conceivable to partially overlap the graphics on the reference portion and/or resize (shrink) the graphics. Furthermore, in the case of a plurality of free areas it is possible to position different graphic elements at two or more of these free areas.
Optionally, the image reference portion can comprise a portion of the image containing at least a part of said at least one determined object and at least a part of said a determined perception zone. In other words, step S10 of the computational analysis can produce as a result both at least one object detected in the image and a perception zone detected in the image. In the event that an area occupied by the detected object is at least partially overlapped to the zone corresponding to the perception area (it will be called overlapping area), then the image reference portion will contain (at least) a part of this overlapping area. In this way it is possible to increase the accuracy in determining the reference portion and correspondingly the positioning indication. Reference is also made to what has been discussed above with reference to the examples of figures 2(a)-2(d).
Optionally, step (S10) of performing a computational analysis can be performed at predetermined intervals, and/or upon a scene change, and/or upon a summary analysis indicating that one or more image characteristics have changed (for example, average contrast level, average colour of the image, etc. they have varied between a first image and a second image that are successive or close to each other), and/or the availability of a new graphic element.
Optionally, step (S20) of determining an image reference portion can be performed at predetermined intervals, and/or upon the determination of an object and/or a perception zone that is not present in a previous image, and/or the availability of a new graphic element.
A scene change can be detected at a scene discontinuity between a first image and a second image (preferably the one following the other, or separated by a small number of images, for example between 2 and 5). The scene discontinuity can be described by means of an index which indicates the variation of the spatial and/or temporal continuity of an element and/or of a portion of an image; preferably, when there is a change beyond a predetermined threshold in the transition from one image to the following one, then it is determined that there has been a scene change.
Above, the term has been used optionally and/or preferably to indicate optional variants of the method of the first embodiment. It is possible to combine two or more of these variants in any way.
As illustrated above, the first embodiment is directed to a method. All of the above considerations and/or variants apply to devices and/or entities, such as an entity to determine a positioning indication for positioning a graphic element, as well as to systems comprising the entity for determining a positioning indication for positioning a graphic element, computer programs, computer program support, signals and other examples and embodiments as also illustrated hereinafter. Where different details are omitted for the sake of brevity, all the above remarks apply equally and/or correspondingly to what follows and vice versa.
With reference to figure 3, a second embodiment will now be illustrated related to an entity 300 for determining a positioning indication indicating a position at which a graphic element is to be overlaid to a video stream comprising an image. The entity comprises an analysis unit (310), a processing unit (320) and a positioning determination unit (330). It should be noted that each of these units can be realized by means of any combination of software and/or hardware, distributed on several devices or localized on a single device. Furthermore, each unit comprises any processor capable of performing respective operations; moreover, although the three units are described separately, they can be realized within a single component, of which each unit represents the hardware and/or software resources necessary to implement them. The video stream can be provided as input (see "IN" in figure 3) to the entity in any video format, whether compressed or not.
The analysis unit (310) performs a computational analysis on the image to determine at least one object present in the image and/or at least one perception zone. The processing unit (320) determines, on the basis of the computational analysis (or, in other words, on the basis of the result or the output of the computational analysis), an image reference portion comprising a portion of the image, wherein said portion comprises the at least one object present in the image and/or the at least one perception zone as obtained by means of the analysis. The positioning determination unit (330) determines, on the basis of the reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor. In other words, the positioning determination unit (330) determines the positioning indication so that the graphic element does not overlap or only partially overlaps the image reference portion.
The positioning indication can be provided to another device (e.g. a server, or a device in the video stream distribution chain, etc.), which then takes care of positioning the graphic element on the video stream and of broadcasting this video stream with the overlaid graphics encoded in the video stream to one or more users (in the case of broadcasting or multicasting; similar considerations apply to video on demand streams, or sent to a single user who requests them, since the invention is equally applicable to such cases). In another example, the positioning indication is sent to the user device (or to several user devices); the user device will then position the graphic element on the basis of this indication; this example refers to the case in which the graphic rendering can be performed directly from the user device, while the case preceding the case in which a device that is remote to the one of the user inserts the overlaid graphics on the broadcast video (i.e. the broadcast video, for example in MPEG format, comprises the graphics already overlaid; it is also provided for the video and the graphics to be broadcast separately as two video streams to be overlaid without rendering on the user side). It is conceivable to perform the graphic rendering on the server side, and to send the video, the graphic element and the positioning indication in a transport stream (i.e. within a connection to the user); the user device will then overlay graphics and videos on the basis of the positioning information.
Optionally, the entity 300 may comprise further units (and/or in combination with the units 310-330) configured to perform any one or any combination of the steps illustrated above, or to implement what is discussed below.
According to a further embodiment, a system is provided comprising an entity according to the second embodiment (for example as illustrated in figure 3), and at least one user terminal connectable to this entity via a communication network. This terminal (for example, TV, smartphone, computer, tablet, etc.) will then reproduce the video stream in which the graphic element is overlaid as according to the invention and/or as set forth in the present description. Preferably, the entity is connectable or connected to a plurality of terminals to which the video stream is broadcast.
According to a further embodiment, a computer program is provided which is set up to perform, when said program is run on a computer, any combination of the steps according to any one of the methods and/or examples of the invention, and/or as set forth in this description.
Figure 4 illustrates a block diagram exemplifying a computer (500) capable of running the aforesaid program. In particular, the computer (500) comprises a memory (530) for storing the instructions of the program and/or data necessary for the performance thereof, a processor (520) for performing the instructions and an input/output interface (510). In particular, figure 4 is illustrative and non limiting, because the computer can be made both in a concentrated manner in a device or in a distributed manner on several interconnected devices. For this reason, the program can be run locally on a concentrated (local) or distributed device.
According to a further embodiment, a support is provided for supporting a computer program set up to perform, when the program is run on a computer, a step or a combination of the steps according to the method described in the first embodiment. Examples of a medium are a static and/or dynamic memory, a fixed disk or any other medium such as a CD, DVD, Blue Ray. Comprised in the medium there is also a means capable of supporting a signal constituting the instructions, including a means of cable transmission (Ethernet, optical cable, etc.) or wireless transmission (cellular, satellite, digital terrestrial transmission, etc.).
Many of the embodiments and examples have been explained with reference to steps of methods or processes. Nevertheless, what has been described can also be implemented in a program to be run on a processing entity (also distributed) or on an entity the means of which are configured to perform the corresponding method steps. The entities described above, as well as the components thereof (for example the units thereof) can be implemented in a single device, via hardware, software, or a combination of these, or on multiple interconnected units or devices (also hardware, software, or a combination thereof). In other words, each or some of the entities described above, as well as each or some of the components thereof (for example the units thereof) can be implemented locally or in a distributed manner. Naturally, the above description of embodiments and examples applying the principles recognized by the inventors is given only by way of example of these principles and must therefore not be construed as a limitation of the patent scope claimed herein.
In addition to the above, which represents a first part of the present disclosure, a second part of the present disclosure is now disclosed which can be implemented separately from the first part or optionally and advantageously, as explained further on, in combination with the first part. The reference numbers used hereinafter refer, unless otherwise indicated, to figures 7-12.
As mentioned, graphic elements possibly comprising textual parts are overlapped to the images of the video (of a television channel, of a streaming, etc.): think for example about the case of overlaid titles (banners) in the lower part of the screen during the transmission of a news program, about the overlaid banners at sporting events. In these graphic elements, the textual part is typically overlapped to an area characterized by a uniform and non-transparent colour in order to improve the visual perception thereof. However, such overlaid graphics completely hides the part of the images on which it overlaps; this hinders the perception of the images, especially if the dimensions of the graphics are not negligible or if they are not located in a position on the edge of the image. It would be conceivable to eliminate the non-transparent background area (i.e., overlay the text directly on the image) so as to avoid that a part of the underlying image is hidden; however, the textual element comprised in the graphics would not be easily perceptible since, due to the way human vision is physiologically structured, it would not be possible to easily distinguish the textual element from the background image. On the basis of studies and considerations of the inventors, a solution has been conceived concerning a graphic element comprising a first transparent element and a second non-transparent element, in which the degree of transparency of the first element is determined in such a way that the contrast between the non- transparent element and the background consisting of the overlap of the video image and the transparent element is such as to make the non-transparent part of the graphic element easily perceptible to the human eye. In the following, the transparent element will also be called the rest element to indicate that the non-transparent element rests thereon (without limitations on the shape and/or extension of the surface). The non-transparent element comprises for example a textual element and/or a logo, and in general it is that part of the graphic element for which a high degree of perception is to be obtained. The degree of transparency of the rest surface is preferably determined dynamically, in particular on the basis of the characteristics of the video images, in order to guarantee a high level of perceptibility of the non- transparent element even if the photometric characteristics of the video images vary over time (therefore, the dynamism lies in the variation of the degree of transparency depending on variations of the photometric characteristics between two images of the video). In this way, a comfortable viewing of the video can be obtained, which therefore tires the eyes less out, making the transparent elements easily perceptible, without excessively hindering the vision of the images on the screen. The degree of transparency is determined in such a way as to satisfy a predetermined minimum contrast threshold so as to make the non-transparent part of the graphic element better visible (in the sense of best perceptible) on the video stream.
With reference to figure 7, a first embodiment will now be illustrated related to a method for overlaying at least one graphic element on a video stream. For explanatory purposes, reference will also be made to figure 8 illustrating an example of graphics and images to which the method can be applied. The graphic element comprises a first transparent element and a second non-transparent element. The second non- transparent element is overlapped to the first transparent element. The graphic element can be obtained by overlapping the first transparent element pixel by pixel to the second non-transparent element, and/or by combining vector elements representing the first and second element, etc. Therefore, the overlap indicates that the non-transparent element rests on the transparent element regardless of how the graphic element is generated.
In the operation of overlaying the graphic element on the video image, it is possible first to combine the transparent element and the non-transparent element, thus obtaining the graphic element to be overlaid on the video image; alternatively, the transparent element and the non-transparent element can be provided distinctly and overlaid separately to the video image (first the one and then the other one, or vice versa, the overlaying order being irrelevant). In other words, it does not matter how the graphic element is obtained starting from the first transparent element and the second non-transparent element, and how these are overlaid, as long as the result is an image in which the graphic element is overlaid to the video. The transparent element is for example represented by an area filled with one or more colours characterized by a degree of transparency such as not to hide the part of the image underlying this area. The transparent element is also referred to in the following as a rest element or rest surface since the non-transparent element rests thereon. The non-transparent element is represented for example by a logo and/or by a textual element, preferably but not necessarily having a uniform colour. The non-transparent element is what it is wished to make easily perceptible to the user depending on the background image to which it is to be overlapped, and regardless of the meaning possibly associated therewith. Figure 8(a) shows, for illustrative purposes only, a graphic element Gi composed of a background Zi having a certain transparency p and a text Ti, representing an example of the non-transparent element. In the example, the text comprises the three letters "abc", but the content or meaning of the element Ti is not relevant.
In step S10, an index of photometric values IL,j is determined at at least one area of an image Aj of the video stream. Figure 8(b) shows an example of an image Ii of a video stream (thus comprising other images {... Ii-1, Ii, Ii+1, ...}), in which IL,j indicates the index of photometric values of that area. Within this image an area Aj is identified, by observing that other areas (j=0, 1, ..., j-1, j, j+1, ...) can be identified. The photometric values indicate the characteristic parameters of the light radiation emitted at one or more pixels of this area; such parameters comprising for example colourimetric, intensity, luminance, chrominance parameters, etc. The area Aj for which the photometric values are determined can be predetermined, statically and/or dynamically chosen on the basis of certain criteria, or randomly chosen; however, it is preferable that this area represents, comprises or is part of an area on which the graphic element is to be overlapped. The index of photometric values can directly represent the values themselves, and/or the values corrected by means of predetermined factors, and/or an index of a scale of photometric values, etc. The area Aj can be provided as input to the system, and/or it can be determined in various ways as explained below in an example illustrated with reference to figure 12.
In step S20, a degree of transparency pi of the first transparent element Zi is determined on the basis of the aforesaid index and a constraint, wherein the constraint indicates that a degree of contrast in at least one resulting area Aj is not lower than a predetermined contrast threshold. The resulting area Aj represents an area obtained by overlapping, on the at least one image area Aj described above, the graphic element Gi (at least the transparent element Zi comprised in Gi) to which the degree of transparency is applied. In other words, the degree of transparency of the first transparent element Zi is determined in such a way that it satisfies the condition that the contrast at the resulting area is not lower than a predetermined (minimum) contrast value. In other words, the degree of transparency is determined so that there is a minimum level of contrast between the graphic element, and in particular the non-transparent element, and the image of the video on which the graphic element has been overlaid (at least a part of the image to which the transparent element has been overlaid). As illustrated below by means of an example, the degree of transparency can be determined by solving equations representing the photometric properties of the areas discussed or at least some of the pixels contained therein. Figure 8(c) schematically illustrates the resulting area Aj', dashed in the figure, to indicate that set of pixels that fall within the starting area Aj, but which have now been modified as a consequence of the overlay of the graphic element Gi (or at least modified by the addition of Zi). In figure 8(d) the resulting area Aj' is indicated as an area comprised in the area Aj; however, the area Aj' can coincide with the whole area Aj. Preferably, the area Aj comprises the non-transparent element; preferably, also the area Aj' comprises the non- transparent element. Furthermore, in figure 8(d) the area Aj' comprises or encloses the letter "a"; however, the Aj' area could enclose any other letter comprised in the string "abc" or combination of letters comprised therein.
In step S30, the graphic element Gi is placed as overlay (or more briefly, overlaid) to the image Ii of the video stream, the graphic element applying the degree of transparency pi to the non-transparent element Zi determined at step S20. Figure 8(d) schematically represents an example, in which the graphic element Gi is overlaid to the image Ii; by way of illustration, the resulting area is indicated in dashed line.
It is noted that step S30 does not necessarily have to be performed by the same device or entity that performs step S20, for example. In fact, in one example, step S20 can be performed by a server device, while step S30 by a user device which renders the graphics locally; in another example, step S30 is performed through a server-user device interaction; in another example, step S20 is performed by a first server device distinct from a second server device which instead performs step S30. These examples can be combined. In this sense, step S30 can be omitted, or replaced by a step in which the degree of transparency as determined in step S20 is provided as output.
Since the photometric values of the video image vary from image to image, or more generally between two different images of the video stream, and since the degree of transparency is determined taking into account also these properties of the background image (and that these can vary), it is possible to achieve a minimum contrast that makes the graphic element easily perceptible compared to the current image on which it is overlaid. In this way, a dynamic overlay of a graphic element on a video stream can be obtained, in which the dynamic overlay is obtained by the dynamic determination of the degree of transparency depending on the variation of a photometric index between two different images of the stream. Consequently, the perception of the graphic element is facilitated without excessively occluding the vision of the underlying image.
It should be noted that the transparent element does not necessarily have to be represented by a single colour uniformly applied to the rest surface; in fact, the level of transparency can vary in the rest plane according to the photometric index of the underlying part and/or next to the non-transparent element.
Optionally, the degree of contrast indicates a degree of contrast, within the resulting area Aj, between the non- transparent element Ti and the overlay of the first transparent element (Zi) to the image Ii of the video. In the example of a textual element, the degree of contrast can be represented by a contrast index between the non-transparent element and the pixels next thereto, which are the result of the overlay of the transparent part of the graphic element Gi to the image Ii. In another example, the degree of contrast can be represented by a contrast index between the non- transparent element and a point representative of the area Aj as explained further on with reference to an example (in which reference to area probe as example of the area Aj and/or the area Aj will be made).
Optionally, the at least one area Aj (and consequently also the at least one resulting area Aj corresponding to that one) comprises at least a part of the non-transparent element. In this case, the calculation of the degree of transparency with respect to the non-transparent element is optimized. In the event that the area Aj (and Aj') does not contain the non- transparent element, the contrast can be calculated by taking into account at least a part of the non-transparent element next to the area Aj, or by simply referring to a colour representative of the non-transparent element.
Optionally, the video stream comprises two or more images, and the image Ii mentioned above represents each of said two or more images. In other words, the method illustrated above can be performed at two or more images of the video stream. In this example, an index of photometric values is determined for each of the two images (for each respective area of said two images), and when there is a variation between these indices (for example higher than a certain threshold), the transparency index is determined again.
In one example, the video stream comprises a plurality of images, and the step of determining an index of photometric values is optionally performed on each of two or more images comprised in the video stream. These two images can be consecutive or separated, for example by a certain number of images, for example equal to NSkiPl according to the example of figure 9 described below. In other words, the photometric index can be calculated for each video image, or for some of the video images spaced apart between them by a predetermined number of images or by a random number of images, or selected on the basis of other criteria.
In one example, the video stream comprises a plurality of images, and the overlaying step is optionally performed on each of two or more images comprised in the video stream using the same degree of transparency. In other words, once the degree of transparency and therefore the properties of the graphic element have been calculated, this graphic element can be applied in the same way to a plurality of successive images. The number of images to which the element is applied can be predetermined, or depend on other conditions; for example, when the photometric index of a new image deviates significantly from the photometric index of one of the previous images, the degree of transparency can be calculated again, and then applied. In another example, there is a minimum number of images to which the same degree of transparency is applied, in which the minimum number may depend for example on the processing capacity necessary to perform the determination of the photometric parameters and to calculate the corresponding degree of transparency.
Optionally, in the method of the present embodiment, the step of determining an index is applied to two or more images comprised in the video stream, and the step of determining an index is performed on at least two images and the overlaying step is performed on at least two images; in this optional variant: (a) at least one image between the at least two images related to the step of determining an index is the same as an image between the at least two images of the overlaying step, and/or (b) the at least two images related to the step of determining an index are the same at least two images related to the overlaying step. In other words, in case (a) the index is determined for a first image to which the correspondingly calculated degree of transparency is also applied, while for a second image the respective index is calculated without, however, necessarily determining a new degree of transparency (for example because the photometric index has not changed substantially) , in which case the previously applied overlap continues to apply. In case (b), on the other hand, for each of the first and second images an index is determined, therefore a respective degree of transparency, and then the graphic element is overlapped with the transparency index relative to the respective image.
Optionally, in the method of the present embodiment, the video stream comprises a first image and a second image following the first image within the video stream, and the step of determining a degree of transparency is performed upon determining a photometric value for the second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image. In other words, when the index of the second image varies by an amount not lower than a certain threshold, the degree of transparency is then determined again and the graphic element again overlaid with the new degree of transparency. In further other words, the steps S20 and/or S30 of figure 7 are performed (again) at a second image that has a different photometric index (taking into account the aforesaid threshold) from the photometric index of a first image to this previous one. Optionally, in the method of the present embodiment, the overlap of the graphic element to an image of the video stream comprises overlapping the graphic element to one or more images following said image. For example, one or more images can correspond to a predefined and set number (or preset), for example on the basis of the available computing resources. In another example, the one or more images may correspond to a variable number N, wherein the number N depends on the occurrence of a variation (except for a certain threshold, see above) in the index of the image Ii+N with respect to the index of the image Ii.
Optionally, in the method of the present embodiment, the at least image area Ii comprises a plurality of image areas, wherein determining an index of photometric values comprises determining a respective indication (index) of photometric values for each of the plurality of areas; determining the degree of transparency of the graphic element comprises determining the degree of transparency of the first transparent element on the basis of each respective indication and of the constraint (as described above, and applied for each of such areas); the constraint therefore comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold; moreover, each respective resulting area corresponds to one of the plurality of areas to which the graphic element having said degree of transparency is overlapped. In other words, a photometric index is calculated for each area (also called probe or area probe in some examples below), and thus the degree of transparency for each area taking into account that the contrast of each area must be not lower than the threshold predetermined as imposed by the constraint. The transparency value to be applied can therefore be determined on the basis of the calculations carried out on each area, for example taking into account the minimum contrast value determined between all areas, or an average value, etc. Below is an example in this respect. For example, with reference to figure 8(d), the transparency level in an area Aj' enclosing the letter "a" can be different from the transparency level of another area Aj' enclosing the letter (c) (this area is not illustrated for simplicity's sake). In another example illustrated in figure 12, different areas are chosen and evaluated, and then a transparency index is applied to the entire rest plane.
Optionally, in the method of the present embodiment, determining an indication of photometric values at at least one area comprises determining an index of photometric values for at least one point of said area, wherein preferably said at least one point is a point representative of that area. In other words, it is not strictly necessary to calculate the index for all points in the area, in order to reduce the necessary computing resources. For example, a few pixels can be chosen at random, or one pixel for each group of M pixels. The representative pixel can be the one having an average colour; more representative pixels can be chosen, for example those having three maximum and minimum photometric values, etc.
Optionally, in the method of the present embodiment, the at least one area is a portion of the image on which to overlap at least a part of the second non-transparent element. In other words, it is preferable that the area for which the index is determined is an area in which the non-transparent element (textual and/or logo) overlaps.
Optionally, in the method of the present embodiment, the predetermined contrast threshold is within a predetermined interval and/or not outside a predetermined interval of contrasts, wherein preferably said predetermined interval is defined by the values 4.5:1 and 7:1. These values can be chosen on the basis of appropriate standards or recommendations of specialized study groups (as discussed below) or determined on the basis of a panel of users (for example, based on the response of a panel of users specialized in evaluating video images). Preferably, the predetermined contrast threshold is equal to 4.5:1; in this case, a degree of transparency will be chosen that produces a contrast as close as possible to 4.5:1, ideally 4.5:1 (but not less, in this example).
Optionally, in the method of the present embodiment, the determination of the degree of transparency is carried out at a predetermined interval of frames, preferably equal to a submultiple of the transmission frequency of the video stream. By transmission frequency it is meant the frequency of images broadcast in units of time, such as expressed in frames per second (fps). Having a submultiple of the transmission frequency ensures that at each instant of time t, there is always a frame to be processed. Otherwise, reference should be made for example to the immediately preceding frame. In one example, if the streaming occurs at 60 fps, and 7 fps are chosen for the processing in question, after the time 0, the first processing will take place at the instant 0.14 sec: this happens between frame 8 (0.133 sec.) and frame 9 (0.15); in that case, frame 8 could be chosen. If, on the other hand, the processing takes place with a cadence of 6 fps, then frame 10, frame 20, etc. can be processed, noting that all such frames are available at the time in which the processing takes place. As evident, this is beneficial but not essential.
Above, the term is used optionally to indicate optional variants of the method of the first embodiment. It is possible to combine two or more of these variants in any way.
With reference to figure 9, a variant of the method for overlaying at least one graphic element on a video stream will now be described. The above can also be applied to this variant .
In step S100, a graphic element Gi is provided to be overlapped to an image Ii. The graphic element Gi comprises a non-transparent element Ti and a transparent element (rest element) Zi. In step 110 a photometric index is determined at at least an area Aj of the image Ii. This index can be determined for example as discussed above with reference to step S10 of figure 7.
In step S115 it is determined whether the photometric index IL,i of the image Ii deviates from the photometric index IL,i-k previously calculated for a previous image Ii-k (with K integer and k>=l). In case the absolute value of the difference between IL,i-k and IL,i is lower than (or equal to) a threshold, it is then determined that there is no substantial difference between the photometric indices of the two images; in this case, the method proceeds to step S118 where a graphic element having the same degree of transparency used for image Ii continues to be applied to image Ii-k· In step S119, the counter i of the image Ii is incremented by a value equal to NSkip2, with Nskip2 >=1 thanks to which it is possible to set with which image frequency the photometric index is to be recalculated. Following the increment of the counter, step S110 is again performed for the respective image determined at the incremented counter.
Returning to step S115, in the case where the absolute value of the difference between IL,i-k and IL,i is not lower than a threshold, then it is determined that the photometric value of the image Ii subsequent to Ii_k has changed by an amount that is sufficient for the calculation of a new degree of transparency to be justified; in this case, the method proceeds to step S120.
At step S120 the degree of transparency for the graphic element (or better, of the rest element Zi) relative to the image Ii is determined. The degree of transparency can be calculated as explained for example with reference to step S20 of figure 7.
In step S130, the graphic element Gi is applied, overlaid to the image Ii, using the degree of transparency calculated in step S120.
In step S135 the value of the counter i of the image Ik is incremented by a value equal to Nskipi, with Nskipi >=1 thanks to which it is possible to set whether to perform step S110 (i.e. the determination of the index) for the immediately following image or for a number of following images (in which case, therefore, the same degree of transparency will be applied for the one or more following images).
In figure 9 and in the illustration above, explanations related to the method initialization have been omitted; the person skilled in the art will recognize that at the very first image on which the graphic element must be overlaid, the method of figure 7 can for example be carried out, following which the counter i is incremented by 1 or by a value greater than 1, so that the method of figure 9 can be performed. Other initializations are possible.
The above variants can be combined with one another, as is obvious to the person skilled in the art. Furthermore, the first embodiment is directed to a method. All of the above considerations and variations apply to devices or entities, such as an entity for overlaying at least one graphic on a video stream, as well as systems comprising the overlaying entity and at least one terminal, processor programs, processor program support, signals and other examples and embodiments as also illustrated hereinafter. Where different details are omitted for the sake of brevity, all the above remarks apply equally and/or correspondingly to what follows and vice versa.
With reference to figure 10, a second embodiment will now be illustrated related to an entity for overlaying (200) at least one graphic element on a video stream, wherein the graphic element comprises a first transparent element and a second non-transparent element that is overlapped to the first transparent element. The entity comprises a photometric value determination unit (210), a transparency degree determination unit (220), an overlap unit (230).
The photometric value determination unit (210) is configured to determine an indication of photometric values at at least one area of an image of said video stream. The transparency degree determination unit (220) is configured to determine a transparency degree of said first transparent element on the basis of the indication and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold; the at least resulting area corresponds to the area of the image on which the graphic element having said degree of transparency is overlapped. The overlay unit (230) is configured to overlap, to an image of the video stream, the graphic element by applying the determined degree of transparency to the first non-transparent element.
As illustrated in the figure, image Ii is provided as input to the entity 200, for example after being extracted from a video stream (for example in MPEG, AVC format, etc.) to be processed; after being processed, the image is provided as output (point 0 in the figure) and then entered into a video stream for distribution. In another example, information related to the graphic element to overlay is provided at the output point OUT; this information comprises the graphic element, or parameters related to the graphic element (such as the degree of transparency; in fact, in one example, the degree of transparency is simply provided at the output point OUT, in which case the unit 230 can be omitted). In one example, such information related to the graphic element to be overlaid can be sent to another device which composes them on the video stream to be distributed to one or more user devices; in another example, the information related to the graphic element to be overlaid is sent to the user device or devices that render the on-screen graphics locally. In another example, the on-screen graphics are generated through a client-server interaction that takes this information related to the graphic element into account. The unit for determining the degree of transparency 220 is preferably provided with information related to the starting graphic element (i.e. the one for which the degree of transparency is to be determined).
Optionally, the overlay entity 200 may comprise further units (and/or in combination with the units 210-230) configured to perform any one or any combination of the steps illustrated above, or to implement what is discussed below.
According to a further embodiment, a system is provided comprising an entity according to the second embodiment (for example as illustrated in figure 10), and at least one user terminal connectable to this entity via a communication network. This terminal (for example, TV, smartphone, computer, tablet, etc.) will then reproduce the video stream in which the graphic element is overlaid as according to the invention and/or as set forth in the present description.
According to a further embodiment, a computer program is provided which is set up to perform, when said program is run on a computer, any combination of the steps according to any one of the methods and/or examples of the invention, and/or as set forth in this description.
Figure 11 illustrates a block diagram exemplifying a computer (500) capable of running the aforesaid program. In particular, the computer (500) comprises a memory (530) for storing the instructions of the program and/or data necessary for the performance thereof, a processor (520) for performing the instructions and an input/output interface (510). In particular, figure 11 is illustrative and non limiting, because the computer can be made both in a concentrated manner in a device or in a distributed manner on several interconnected devices. For this reason, the program can be run locally on a concentrated (local) or distributed device.
According to a further embodiment, a support is provided for supporting a computer program set up to perform, when the program is run on a computer, a step or combination of steps according to the method described in the first embodiment. Examples of a medium are a static and/or dynamic memory, a fixed disk or any other medium such as a CD, DVD, Blue Ray. Comprised in the medium there is also a means capable of supporting a signal constituting the instructions, including a means of cable transmission (Ethernet, optical cable, etc.) or wireless transmission (cellular, satellite, digital terrestrial transmission, etc.).
In the following an example will be presented for illustrative purposes of a user interface of interactive TV applications that makes use of graphic elements in overlay, with partial overlap on the video content in the background. These elements must be positioned in a way to minimize the occlusion of the scene, while remaining perfectly perceptible. The decision about the shape, colours and position of these elements is decided in a static way within the scene, not considering the dynamics that could suggest a different arrangement and adjustment of the colour. In this example, the contrast is dynamically optimized with respect to the background image. Video streaming (the same applies to terrestrial, satellite, etc. video distribution) is analysed in the back-end and events are created that remotely control the graphical interface. This poses problems of realization of the solution with real-time constraints and without assumptions about the development of the forward scene (the one that follows). In the case of recorded broadcasts, these events can be scheduled and sent as metadata in the video stream. According to this example, the contrast is improved on the basis of the colourimetric characteristics of the image and of the text in overlay using a surface (rest element) in transparency overlay on the video image, as shown in figure 12. The background (an example of the transparent element) has a fixed colour, such as black. The level of transparency is decided on the basis of the resulting level of contrast with respect to the text (an example of the non-transparent element), also with a fixed colour, such as white. Analytically, the procedure that follows is briefly described below:
Let's call p the level of transparency. Each image point /(x,y) underlying the rest element is altered by the presence of transparency through the alpha blending equation
Where (x,y) are the coordinates of the image point, /(x,y) is the RGB value of the image point, Z is the RGB colour of the rest element and is the RGB value of the image point obtained by applying the rest element. This example uses WCAG definitions and recommendations. In this regard, it should be noted that other reference standards may exist depending on the colours. Therefore, if the conditions do not allow the WCAG requirements to be applied, or if those requirements are not optimal under certain circumstances, then other standards may be followed, or the conditions may be determined depending on the colours in question. However, it is possible to test the contrast values on a panel, and then determine the perceptibility threshold for a specific graphic application and follow this as an alternative or in combination with recommendations such as WCAG ones.
Specifically, the contrast between text and background is defined as where L(P) and L(Q) are the luminances of the two colour points P and Q, wherein P and Q are any two colours, for example the first assigned to the text and the second to the background, and whose contrast is to be calculated (below, the colour points P and Q are also called T and S(p) . The luminances of the two colour points P and Q are calculated as wherein (R,G,B) are the three components of the colour whose values are calculated as and are the components in the sRGB colour space (noting that other colour spaces can be used).
The luminances L(P) and L(Q) are examples of the photometric index described above.
Therefore, the comparison between (colour of the) text (T) and (colour of the) background with rest element (5(p) ) , it is obtained for light text on dark background for dark text on light background
In both cases the contrast is a function of the level of transparency p of the rest element as the colour of the text T is given.
In this example, the contrast C(p) ranges between a minimum of 1:1 and a maximum of 21:1. To allow a good readability of the textual element on the video, this ratio must be lower than a minimum threshold. Typical threshold values are 4.5:1 and 7:1. The colour of the text and the colour of the rest element are chosen in such a way as to ensure that this threshold is largely exceeded. For example, white text on a black background allows obtaining a maximum contrast ratio of 21:1. Therefore, an increase in contrast is obtained by reducing the level of transparency p of the rest element. However, an excessive increase in the contrast of the rest element tends to excessively "cover" the underlying video (excessively covering can be determined by a panel of experts while evaluating video images). So the target contrast level CM being set, the invention identifies the maximum value pM for which C(pM)= CM .
To determine this solution, in the example shown herein, a number of reference areas K (probe, with k>=l) are used, each area/probe having a number of points H (or Hj, in case each probe j has a different number of points). Therefore, in an example it is possible to obtain a sample of points for K reference areas (probes), each averaged within an area of H points as shown in the figure. Each of these reference areas, or probe, is an example of an area Aj. In other words, for each probe:
1- The representative average value of the H points is determined;
2- With respect to this average value, the optimal value p for transparency is calculated.
Among all the K valuesp thus obtained, the minimum can be chosen as this guarantees that the contrast level is respected for all probes. In the example above, reference has been made to the average value for illustrative purposes only of a representative colour of the entire probe. However, in general it is possible to choose other values, also statistical ones, such as the median, truncated mean, a quartile or in general any other value representative of the probe.
The number, the arrangement, and the size of probes are system parameters.
With reference to a generic image point /(x,y) of the video corresponds to a contrast value
Since a probe is made up of H points, the contrast is referred to the average image point of the probe
SO
(light text on dark background)
(dark text on a light background).
For both equations there is an analytical solution to identify pP . In fact from which
where the constants LT = L(T)+ 0.05 and Lz = L(Z) appear.
The value pP is determined in a similar way for dark text on light background so where
The value that is chosen is therefore equal to
This value is calculated with a frequency every F frames, where F is a submultiple of the video transmission frequency (see above with reference to the video transmission frequency, i.e. the number of frames in time units).
In summary, the algorithm adopted in this example is composed of the following steps:
1. The current frame, the graphic asset and the probes to be used are provided as input. A probe (or area Aj, or resulting area Aj') preferably contains a transparent element or is an area in which the transparent element can preferably or potentially be positioned. In other words, the probe is a portion of the frame with respect to which the contrast ratio is calculated with the colour of the text and the transparency of the rest element. Preferably, if one is interested in areas close to the text as shown in figure 12. The pixels that end up under the text are generally not of particular interest, as they will be covered by the text itself; however, this does not exclude the possibility of selecting a probe that will also be partially covered by the text, as also described below for illustrative purposes. The probes (and correspondingly, each of the areas Aj or Aj') can be chosen at random. In another example, probes are chosen that are equally distributed in the space where the graphic element or at least the non-transparent element will be inserted. In another example, a probe is chosen if it also contains a non-transparent part of the element. In another example, a probe is chosen among other candidate areas if it has a photometric index higher than (or equal to) a certain threshold. In another example, an area is chosen as the probe if it is close to where the text will be positioned. Two or more of these examples can be combined. For each probe: a.The representative average point of the probe is calculated . b.The value of the parameter p is determined to realize the alpha-blending with the colour of the transparent element that is able to satisfy the minimum requirement of contrast with the text. The p value represents an example of the degree of transparency .
3.Among all the blending values obtained from the punctual comparison with the probes, the one with the minimum value is chosen, as it guarantees that the contrast is satisfied in reference to each point.
In this way, the non-transparent part of the graphic element is made easily perceptible for how the human visual perceptual system is physiologically structured.
As anticipated above or as also highlighted by example E17 reported in the summary of the invention, it is optionally possible to combine the first part of the present disclosure with the second part thereof. In particular, in determining the shift direction of the image and consequently in shifting the graphic according to the shift direction, it may occur that the background on which the graphics must be positioned is different from the background on which the graphics were previously positioned or with respect to the background for which the graphics were originally conceived. Therefore, by combining the dynamism of transparency as described in the second part with the positioning dynamism as described in the first part, it is possible to improve the overall dynamism of the graphics. In fact, the graphics can be dynamically controlled in order to be positioned in a dynamically optimal and dynamically perceptible position at best.
Many of the embodiments and examples have been explained with reference to steps of methods or processes. Nevertheless, what has been described can also be implemented in a program to be run on a processing entity (also distributed) or on an entity the means of which are configured to perform the corresponding method steps. The entities described above, as well as the components thereof (for example the units thereof) can be implemented in a single device, via hardware, software, or a combination of these, or on multiple interconnected units or devices (also hardware, software, or a combination thereof). In other words, each or some of the entities described above, as well as each or some of the components thereof (for example the units thereof) can be implemented locally or in a distributed manner. Naturally, the above description of embodiments and examples applying the principles recognized by the inventors is given only by way of example of these principles and must therefore not be construed as a limitation of the patent scope claimed herein.

Claims

1. A method for determining a positioning indication indicating a position at which a graphic element is to be overlaid on a video stream comprising an image, the method comprising the steps of:
- performing (S10) a computational analysis on said image to determine at least one between at least one object present in the image and at least one perception zone;
- determining (S20), on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; determining (S30), on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
2. The method according to claim 1, wherein performing a computational analysis on said image to determine at least one object present in the image comprises using a neural network to determine said at least one object preferably comprised in a predefined set of objects.
3. The method according to any one of the preceding claims, wherein performing a computational analysis on said image to determine at least one perception zone comprises performing a computational analysis of visual attention.
4. The method according to claim 3, wherein performing a computational analysis of visual attention comprises determining a saliency map, wherein said at least one perception zone preferably comprises a section of the image at which the saliency map indicates a probability of visual perception of a user exceeding a perception probability threshold.
5. The method according to claim 3 or 4, wherein performing a computational analysis of visual attention comprises determining said perception zone on the basis of characteristics of the pixels comprised in said portion.
6. The method according to any one of the preceding claims, wherein determining (S30) the positioning indication comprises determining a plurality of non-reference portions represented by regions of the image not comprising the reference portion, and determining the positioning indication as a position indication in one of said non-reference portions.
7. The method according to any one of the preceding claims, wherein said image reference portion comprises a portion of the image containing at least a part of said at least one determined object and at least a part of said one determined perception zone.
8. A computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the method claims 1 to 7.
9. An entity (300) for determining a positioning indication indicating a position at which a graphic element is to be overlaid on a video stream comprising an image, the entity comprising: an analysis unit (310) configured to perform a computational analysis on said image to determine at least one between at least one object present in the image and at least one perception zone;
- a processing unit (320) configured to determine, on the basis of said computational analysis, an image reference portion comprising a portion of said image, wherein said portion comprises said at least one between said at least one object present in the image and said at least one perception zone; a positioning determination unit (330) configured to determine, on the basis of said reference portion, the positioning indication respecting a constraint indicating that the graphic element is overlapping with said image reference portion to an extent not exceeding an overlap factor.
10. A system comprising an entity according to claim 9, and a user device configured to display a video stream with said overlaid graphic element.
11. A method for overlaying at least one graphic element onto a video stream comprising at least one image, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped to the first transparent element (Zi), the method comprising the steps of:
- determining (S10) an index of photometric values (L,j) in correspondence of at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image; overlaying (S30), to an image of the video stream, the graphic element by applying, to the first transparent element, the degree of transparency determined.
12. The method according to claim 11, wherein the degree of contrast indicates a degree of contrast, within said at least resulting area, between the second non-transparent element (Ti) and an overlay of the first transparent element (Zi) to said image (Ii).
13. The method according to one of claims 11 or 12, wherein the video stream comprises a plurality of images, wherein the step of determining an index is performed on at least two images and the overlaying step is performed on at least two images, and wherein at least one image between the at least two images related to the step of determining an index is the same as an image between the at least two images of the overlaying step, or the at least two images related to the step of determining an index are the same at least two images related to the overlaying step.
14. The method according to one of claims 11 to 13, wherein the video stream comprises a first image and a second image following the first image within said video stream, and wherein the step of determining a degree of transparency is performed upon determining a photometric value for said second image which deviates no less than a predefined amount with respect to a photometric value determined for said first image.
15. The method according to any one of claims 11 to 14, wherein the at least one area of said image comprises a plurality of areas of said image, and wherein
- determining an index of photometric values comprises determining a respective index of photometric values for each of the plurality of areas;
- determining the degree of transparency of said graphic element comprises determining the degree of transparency of said first transparent element on the basis of each respective index and of the constraint, wherein the constraint comprises a constraint indicating that a degree of contrast of each respective resulting area is not lower than a predetermined contrast threshold, wherein each respective resulting area corresponds to one of said plurality of areas to which said graphic element having said degree of transparency is overlapped.
16. The method according to any one of claims 11 to 15, wherein the predetermined contrast threshold is comprised in a predetermined interval, wherein preferably said predetermined interval is defined by the values 4.5:1 and 7:1, and wherein the predetermined contrast threshold is preferably equal to 4.5:1.
17. A computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the method claims 11 to 16.
18. An entity (200) for overlaying at least one graphic element (Gi) on a video stream, wherein the graphic element (gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped to the first element transparent, the entity (200) comprising:
- a photometric value determination unit (210) configured to determine an index of photometric values at at least one area of an image of said video stream;
- a transparency degree determination unit (220) configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
- an overlap unit (230) configured to overlap, to an image of the video stream, the graphic element by applying the determined degree of transparency to the first transparent element.
19. The entity (200) for determining a degree of transparency to be applied to at least one graphic element (Gi) on a video stream, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) overlapped to the first transparent element, the entity (200) comprising:
- a photometric value determination unit (210) configured to determine an index of photometric values at at least one area of an image of said video stream;
- a transparency degree determination unit (220) configured to determine a degree of transparency of said first transparent element on the basis of said index and of a constraint, the constraint indicating that a degree of contrast of at least one resulting area is not lower than a predetermined contrast threshold, wherein the at least one resulting area corresponds to said image area to which said graphic element having said degree of transparency is overlapped;
- a transmission unit configured to send, to an entity for overlaying at least one graphic element (Gi) on a video stream, the degree of transparency determined.
20. A method for overlaying at least one graphic element onto a video stream comprising at least one image, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped to the first transparent element (Zi), the method comprising the steps of: - determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image;
- providing (S30,) a device for overlaying the at least one graphic element, with the degree of transparency determined.
21. The method according to any one of claims 1 to 7, wherein the graphic element (Gi) comprises a first transparent element (Zi) and a second non-transparent element (Ti) which is overlapped to the first transparent element (Zi), the method further comprising the steps of:
- determining (S10) an index of photometric values (L,j) at at least one area (Aj) of said one image (Ii) of said video stream;
- determining (S20) a degree of transparency of said first transparent element (Zi) on the basis of a constraint, the constraint indicating that a degree of contrast in at least one resultant area (Aj ) is not less than a predetermined contrast threshold, wherein the at least one resultant area (Aj') corresponds to an area obtained by overlaying the graphic element (Gi) having said degree of transparency to said at least one area of said one image; overlaying (S30), to an image of the video stream, the graphic element by applying, to the first transparent element, the degree of transparency determined.
EP21758424.2A 2020-07-20 2021-07-20 Smart overlay : positioning of the graphics with respect to reference points Pending EP4183136A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
IT202000017575 2020-07-20
IT202000017569 2020-07-20
PCT/IB2021/056533 WO2022018629A1 (en) 2020-07-20 2021-07-20 Smart overlay : positioning of the graphics with respect to reference points

Publications (1)

Publication Number Publication Date
EP4183136A1 true EP4183136A1 (en) 2023-05-24

Family

ID=79729060

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21758424.2A Pending EP4183136A1 (en) 2020-07-20 2021-07-20 Smart overlay : positioning of the graphics with respect to reference points

Country Status (2)

Country Link
EP (1) EP4183136A1 (en)
WO (1) WO2022018629A1 (en)

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG119229A1 (en) * 2004-07-30 2006-02-28 Agency Science Tech & Res Method and apparatus for insertion of additional content into video
US8451380B2 (en) * 2007-03-22 2013-05-28 Sony Computer Entertainment America Llc Scheme for determining the locations and timing of advertisements and other insertions in media
RU2012103841A (en) * 2009-07-06 2013-08-20 Конинклейке Филипс Электроникс Н.В. REINFIGURING AN IMAGE WITH IMPLIED GRAPHICS
US9237305B2 (en) * 2010-10-18 2016-01-12 Apple Inc. Overlay for a video conferencing application
US20180316947A1 (en) * 2012-04-24 2018-11-01 Skreens Entertainment Technologies, Inc. Video processing systems and methods for the combination, blending and display of heterogeneous sources
US9339726B2 (en) * 2012-06-29 2016-05-17 Nokia Technologies Oy Method and apparatus for modifying the presentation of information based on the visual complexity of environment information
US9467750B2 (en) * 2013-05-31 2016-10-11 Adobe Systems Incorporated Placing unobtrusive overlays in video content
WO2016012875A1 (en) * 2014-07-23 2016-01-28 Comigo Ltd. Reducing interference of an overlay with underlying content
US9432703B2 (en) * 2014-11-17 2016-08-30 TCL Research America Inc. Method and system for inserting contents into video presentations
US10121513B2 (en) * 2016-08-30 2018-11-06 International Business Machines Corporation Dynamic image content overlaying
CN110178375B (en) * 2016-12-13 2022-03-25 乐威指南公司 System and method for minimizing the masking of a media asset by an overlay by predicting a movement path of an object of interest of the media asset and avoiding placement of the overlay in the movement path
GB201714000D0 (en) * 2017-08-31 2017-10-18 Mirriad Advertising Ltd Machine learning for identification of candidate video insertion object types
US11010946B2 (en) * 2017-12-21 2021-05-18 Rovi Guides, Inc. Systems and method for dynamic insertion of advertisements

Also Published As

Publication number Publication date
WO2022018629A1 (en) 2022-01-27

Similar Documents

Publication Publication Date Title
US9672437B2 (en) Legibility enhancement for a logo, text or other region of interest in video
JP4295845B2 (en) Method and apparatus for automatically detecting aspect format of digital video image
US8749710B2 (en) Method and apparatus for concealing portions of a video screen
JP4271027B2 (en) Method and system for detecting comics in a video data stream
KR20180091915A (en) Dynamic video overlay
EP1274251A2 (en) Method and apparatus for segmenting a pixellated image
JP2004504736A (en) Three-dimensional image generation method and apparatus
MXPA05009704A (en) Method of viewing audiovisual documents on a receiver, and receiver therefor.
US20070291134A1 (en) Image editing method and apparatus
EP3298578B1 (en) Method and apparatus for determining a depth map for an image
CN110310231B (en) Device and method for converting first dynamic range video into second dynamic range video
JP5950605B2 (en) Image processing system and image processing method
US20090180670A1 (en) Blocker image identification apparatus and method
EP4183136A1 (en) Smart overlay : positioning of the graphics with respect to reference points
JP2004282137A (en) Television broadcast receiver
CN112752110A (en) Video presentation method and device, computing equipment and storage medium
Nur et al. Ambient illumination as a context for video bit rate adaptation decision taking
US11908340B2 (en) Magnification enhancement of video for visually impaired viewers
US20170278286A1 (en) Method and electronic device for creating title background in video frame
EP4183135A1 (en) Smart overlay : dynamic positioning of the graphics
JP2001197405A (en) Method and device for generating index image
JP4881282B2 (en) Trimming processing apparatus and trimming processing program
KR101025651B1 (en) Apparatus and method for identifying video object
CN115811582A (en) Processing method and device based on video data
RU2431939C1 (en) Method of displaying two-dimensional on-screen menu on stereo video sequence

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230220

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)