WO2022018628A1

WO2022018628A1 - Smart overlay : dynamic positioning of the graphics

Info

Publication number: WO2022018628A1
Application number: PCT/IB2021/056532
Authority: WO
Inventors: Ciro Gaglione; Luigi TROIANO
Original assignee: Sky Italia S.R.L.
Priority date: 2020-07-20
Filing date: 2021-07-20
Publication date: 2022-01-27
Also published as: EP4183135A1

Abstract

A method (and corresponding entity, system, computer program) is provided for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid to a video stream comprising a first image and a second image. The method comprises the steps of: - determining (S10) at least one probe area (S_1,i) for the first image (I_i) of said video stream, and at least one corresponding probe area (S_1,i+N) for the second image (I_i+N) of said video stream; determining (S20) a shift direction indicating a direction according to which the at least one corresponding probe area (Si,i+N) of said second image (S_1,i+N) has shifted with respect to the at least one probe area (S_1,i) of said first image (I_i); - determining (S30) the positioning indication (p) on the basis of said shift direction (d).

Description

DESCRIPTION of an invention having the title:

"Smart Overlay: dynamic positioning of the graphics"

BACKGROUND OF THE INVENTION

In the transmission of television channels or streaming video it is increasingly common to associate graphic elements with the video in progress, such graphic elements sometimes comprising text. Regardless of the presence and/or possible textual content (in the sense of the meaning of the text) comprised in the graphic element, the problem arises on how to position the graphic element on the screen. One solution is to place the graphics alongside the video images, which usually involves reducing the size of the video to make room for the graphic element. In the case of a television program, for example, it is possible to reduce the size of the video in order to create a black L-shaped zone around the resized video, in which L-shaped zone there can be inserted one or more graphic elements. In other solutions, the graphic element is overlapped on the video, at least partially obstructing the vision of the video itself, and therefore hindering the vision thereof.

The known techniques therefore do not make the video images easily perceptible, thus making the perception of the same more difficult.

SUMMARY OF THE INVENTION

One of the objects of the present invention resides in improving the known solutions or obviating one or more of the problems present in the known solutions. The object is reached by the independent claims. Advantageous embodiments are defined by the dependent claims. Further examples are provided for explanatory purposes as follows:

El. Method for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid to a video stream comprising a first image and a second image, the method comprising steps of:

- determining (S10) at least one probe area (Si,i) for the first image (I_±) of said video stream, and at least one corresponding probe area (Si,i_+N) for the second image (I±_+N) of said video stream; determining (S20) a shift direction indicating a direction according to which the at least one corresponding probe area (Si,i_+N) of said second image (I±_+N) has shifted with respect to the at least one probe area (Si,i) of said first image (11);

- determining (S30) the positioning indication (p) on the basis of said shift direction (d).

E2. Method according to the explanatory example El, wherein at least one of said probe area (Si,i) and said at least one corresponding probe area (Si,i_+N) comprises an image or a portion of a reference image containing an element substantially fixed in the recording field (substantially immobile in the recording field).

E3. Method according to the explanatory example El or E2, wherein said at least one probe area (Si,±) and said at least one corresponding probe area (Si,i_+N) are associated on the basis of a correspondence indication indicating a measurement of correspondence between a visual content of said at least one probe area (Si,i) and a visual content of said at least one corresponding probe area (Si,i+_N).

E4. Method according to any one of the explanatory examples El to E3, wherein said correspondence measurement is determined on the basis of a similarity measurement of one or more pixels of said at least one probe area (Si,i) and one or more pixels of said at least one corresponding probe area (Si,i+_N), and/or a distance measurement between said at least one probe area (Si,i) and the visual content of said at least one corresponding probe area (Si,i+_N).

E5. Method according to any one of the explanatory examples El to E4, wherein determining said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) comprises determining as said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) a portion of the first image and, respectively, a portion of the second image each classified by means of a neural network as a probe area.

E6. Method according to the explanatory example E5, wherein each of said portion of the first image and of said portion of the second image is classified as a probe area (Si,i, Si,i+_N) when it corresponds to at least one portion of the reference image; preferably, said at least one portion of the reference image contains a substantially fixed (substantially immobile) element in a recording field.

E7. Method according to the explanatory example E5 or E6, wherein said neural network is trained to classify portions of an image on the basis of a dataset comprising reference image samples. E8. Method according to any one of the preceding explanatory examples, wherein said at least one probe area (Si,i) of said first image and said at least one corresponding probe area (Si,i+_N) of said second image represent portions of the respective images whose content does not vary; for example, the content does not change in the recording field, i.e. the content corresponds to one or more fixed elements in the recording field.

E9. Method according to one of the preceding explanatory examples, wherein determining said at least one probe area

(Si,i) and said at least one corresponding probe area (Si,i) comprises determining said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) by means of image processing techniques adapted to determine that the content of such areas (Si,i, Si,i+_N) is substantially not subject to variations as the image changes.

E10. Method according to one of the preceding explanatory examples, wherein determining the positioning indication comprises determining the positioning indication at an image zone opposite to the shift direction.

Ell. Method according to one of the preceding explanatory examples, wherein images comprised in said video stream, said images comprising said first image and said second image, comprise a first sector and a second sector, the first sector being associated with the shift direction and the second sector being opposite to the first side along the shift direction, wherein determining the positioning indication comprises determining the positioning indication at the second sector.

E12. Computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the explanatory examples of method El to Ell.

E13. Entity for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid on a video stream comprising a first image and a second image, the entity comprising:

- a first processing unit (22) configured to determine at least one probe area (Si,i) for the first image of said video stream, and at least one corresponding probe area (Si,i_+N) for the second image of said video stream;

- a second processing unit (24) configured to determine a shift direction indicating a direction according to which the at least one corresponding probe area of said second image has shifted with respect to the at least one probe area of said first image;

- a third processing unit (26) configured to determine the positioning indication on the basis of said shift direction. E14. Entity according to the explanatory example E13, wherein at least one of said probe area (Si,i) and said at least one corresponding probe area (Si,i_+N) comprises an image or a portion of a reference image containing an element substantially fixed in the recording field (substantially immobile in the recording field).

E15. Entity according to the explanatory example E13 or E14, wherein said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i_+N) are associated on the basis of a correspondence indication indicating a measurement of correspondence between a visual content of said at least one probe area (Si,i) and a visual content of said at least one corresponding probe area (Si,i_+N). E16. Entity according to any one of the explanatory examples

E13 to E15, wherein said first processing unit (22) is configured to determine as said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) a portion of the first image and, respectively, a portion of the second image each classified by means of a neural network as a probe area.

E17. Entity according to any one of the explanatory examples

E13 to E16, wherein said correspondence measurement is determined on the basis of a similarity measurement of one or more pixels of said at least one probe area (Si,i) and one or more pixels of said at least one corresponding probe area (Si,i+_N), and/or a distance measurement between said at least one probe area (Si,i) and the visual content of said at least one corresponding probe area (Si,i+_N), and/or a similarity measurement with at least one portion of a reference element.

E18. Entity according to any one of the explanatory examples E15 to E17, wherein each of said portion of the first image and of said portion of the second image is classified as a probe area (Si,i, Si,i+_N) when it corresponds to at least one portion of the reference image.

E19. Entity according to one of the explanatory examples from E16 to E18, wherein said neural network is trained to classify portions of an image on the basis of a dataset comprising reference image samples.

E20. Entity according to any of the explanatory examples E13 to E19, wherein said at least one probe area (Si,i) of said first image and said at least one corresponding probe area (Si,i+_N) of said second image represent portions of the respective images whose content does not vary.

E21. Entity according to any one of the explanatory examples E13 to E20, wherein a first processing unit (22) is configured to determine said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) by means of image processing techniques adapted to determine that the content of such areas (Si,i, Si,i+_N) is substantially not subject to variations as the image changes.

E22. Entity according to any explanatory example from E13 to E21, wherein said third processing unit (26) is configured to determine the positioning indication at an image zone opposite to the shift direction.

E23. Entity according to one of the explanatory examples E13 to E22, wherein images comprised in said video stream, said images comprising said first image and said second image, comprise a first sector and a second sector, the first sector being associated with the shift direction and the second sector being opposite to the first side along the shift direction, wherein said third processing unit (26) is configured to determine the positioning indication at the second sector.

E24. System comprising an entity according to any one of the explanatory examples E13 to E23, and a user device configured to display a video stream with said graphic element overlaid.

LIST OF FIGURES

Figure 1 is a flowchart representing a method according to an embodiment of the present invention; figures 2(a) and 2(b) schematically illustrate a first image and, respectively, a second image; figure 2(c) schematically illustrates an indication of relative shift between two probe areas; figure 2(d) schematically illustrates a positioning indication of the graphic element in response to the detection of a shift indication; figure 3 illustrates a block diagram of an entity according to an embodiment of the present invention; figure 4 illustrates a block diagram of a computer adapted to run a program according to an embodiment of the present invention; figure 5 schematically illustrates probe areas within an image; figures 6(a) and 6(b) schematically illustrate examples of indication of shift to the right and, respectively, to the right; figures 7, 8(a) and 8(b) illustrate screenshots of examples corresponding to figures 6, 6(a) and 6(b), respectively.

DETAILED DESCRIPTION

As mentioned, graphic elements possibly comprising textual parts are overlapped to the images of the video (of a television channel, of a streaming, etc.): think for example about the case of overlaid titles (banners) in the lower part of the screen during the transmission of a news program, about the overlaid banners at sporting events in which, for example, the statistics of the event in progress are reported, or about other banners comprising logos optionally together with parts of text. Such a graphic element completely or at least partially overlaid on the video hides (if this is transparent, or partially transparent) the part of the images that it overlaps; this hinders the perception of the images, especially if the dimensions of the graphics are not negligible or if they are not located in a position on the edge of the image. The graphic element can be sometimes positioned in a position on the screen that is considered not cumbersome, such as the left side; at a movement of the scene, however, this position could become cumbersome and therefore no longer ideal, such that the scene is occluded and the use of the video is made difficult. Having recognized this problem, the inventors have devised a system for automatically positioning a graphic element on the screen without hindering, or hindering to a minimum, the viewing and use of the video. In general, this solution is based on determining a shift direction of the image (for example as a consequence of the shift of the scene and/or of the framing), and positioning the graphic element in a different direction, preferably opposite, with respect to this direction. In determining the shift direction of the image, one or more reference elements are taken into consideration which are substantially fixed elements (substantially immobile) in the recording field, i.e. elements (for example, or in particular static or substantially immobile objects, such as explained further on, the lines of the field, the doors, the billboards, the stands) in the real world that are essentially immobile in the recording field, at least over a period of time during which a certain scene is recorded. Therefore, an element (or at least one portion thereof) can be in different positions between two different frames as a consequence of the movement of the camera or of the scene; however, tracking the shift of at least one portion of a reference element allows to accurately determine the shift direction of the image. With reference to figure 1, a first embodiment will now be illustrated related to a method for determining a positioning indication for positioning a graphic element, where in particular the positioning indication indicates a position at which the graphic element is to be overlaid to a video stream. The video stream comprises at least a first image and a second image, which for the illustration of the present method can preferably be consecutive but not necessarily (in which case they will preferably be close together and separated by no more than a number N of images, for example N not exceeding 25 or a number corresponding to a time interval not exceeding a few seconds, preferably not exceeding 2, again preferably 1 second, again preferably 0.5 seconds).

In step S10, a probe area is determined (probe area, or simply probe) for the first image and a probe area for the second image corresponding to the probe area of the first image. For simplicity's sake, reference is made to only one probe area per image, but the method can equally be applied to the case of two or more probe areas per image, wherein a probe area of a first image corresponds to a respective probe area of the second image (that is, there are pairs of corresponding probe areas, each of the pair belonging respectively to the first and second image). By probe area it is meant a portion of the image, in which the term probe refers to the fact that this area or portion of the image is taken into account in the process for positioning the graphic element. In other words, the probe area (or sample area) represents an area or portion inside an image for which and/or on which to carry out certain processing in order to position the graphic element. It is noted that the probe area of the second image is at the probe area of the first image; as explained below, the correspondence indicates that the visual content of each of such probe area is substantially the same as the image changes. It can be said that the probe area of the second image substantially represents the same visual content as the probe area of the first image, taking into account possible variations in brightness and/or geometry due to the fact that between the first and the second image, a variation in the framing (for example, there is a movement of the camera without the scene having changed), a variation in the scene (for example, with a fixed camera, there are movements in the scene) or a variation of both might have occurred. Each probe area therefore comprises a respective number of pixels, which can be the same or different for the corresponding probe areas of two images, and which represents a visual content that does not vary (e.g. in intensity and/or geometry) substantially in the passage from a first to a second image. Each probe area preferably comprises at least one portion of the same reference element, where by reference element it is meant an element (e.g. static or substantially immobile object, or in general any element of the scene, in which this element is static or substantially immobile at least for a certain duration of time, etc.) present in the recording field and which is fixed in the recording field.

In step S20, a shift direction is determined which indicates a direction according to which the probe area of the second image has shifted with respect to the corresponding probe area of the first image (the shift direction is relative between the two probe areas). Preferably, the shift direction is relative to the "frame" that encloses one of the two images, and/or to the screen (or portion of the screen) in which the images are reproduced, and/or relative to a reference system that is common to both images (i.e. common to the first image and to the second image). As recognized by the inventors, the shift direction of at least one probe area indicates in which direction the framing and/or the video scene has shifted. The choice of probe areas comprising at least one portion of a reference element facilitates this determination. Preferably, the shift direction can be represented by a shift vector; however, this is not necessary, since in fact it is sufficient to obtain only the direction of the shift and not necessarily also the amount of the shift. The shift direction can be expressed as right, left, up, down or a combination thereof, or in any other mode (e.g. East, West, North, South or a combination thereof; angle of the shift vector, wherein the angle is measured relative to a reference system, etc.).

In step S30, the indication for positioning the graphic element is determined on the basis of the shift direction. By positioning indication it is meant an indication of the position in which the graphic element is overlaid on the video stream; this overlay can be made in the image immediately following the second image or a number M of images following the second image, depending on the computing power required to reposition the graphic element (hypothetically, as permitted by computing resources and/or if there is some latency in the video, the graphic element can already be (re)positioned in the second image on the basis of the positioning indication; hence M>=0). Preferably, the positioning indication is different from, and still preferably opposite, with respect to the shift indication; for example, if the shift indication indicates left (thus indicating that the framing has shifted to the left and/or that the scene has shifted to the left), then the positioning indication indicates right to specify that the graphic element should be positioned in a right part of the screen and/or in a position to the right with respect to a position of the graphic element in a previous image. In this way, it is possible to ensure that the graphic element does not obstruct the vision or reduce the occlusion of the video, thus not hindering the vision. Preferably, the positioning indication indicates a position of the graphic element (to be overlaid) relative to the position of the graphic element in a previous image; in one example, the positioning indication includes the coordinates with which to overlay it to the video stream. It should be noted that the positioning indication can be expressed by means of coordinates (in a reference system of the image and/or of the screen on which it is to be reproduced) indicating where to position the graphics. The exact location is not necessary, however, as the indication can simply provide a direction (or side, or part, or sector of the screen) in which to position the graphics relative to the shift direction.

The operation of what was conceived by the inventors can be exemplified with reference to figure 2, wherein a first image I_± is represented in figure 2(a) and a second image Ii_+N in figure 2(b), wherein N>=1, and wherein N is preferably a small number to indicate that the two images are sufficiently close together to represent a change in the scene and/or framing within the continuous stream of the video to which they belong. In the image I_± there is a graphic element G positioned on the left, for example because that is considered at that moment a position that does not hinder or hinder the use of the video in the least (this position can be determined for example on the basis of image and/or video analysis techniques, or manually). In the image I±, a probe area Si,i is selected (in the case of a plurality of probe areas l...k...N_s, each will be named S_k,i for image i); in the figure it is drawn on the right side of the screen only for simplicity of illustration. The probe image is such that the characteristics thereof (e.g. brightness and/or geometry of the content comprised therein) remain substantially stable over time, i.e. remain stable in the passage from a first image to a second image following the first one. For example, the characteristics remain stable since they refer to at least one portion of a reference element that is fixed in the recording field. In figure 2(b), the probe area Si,i_+N has shifted to the right, for example because the framing and/or scene has shifted correspondingly, by an amount indicated by the shift indication d as shown in figure 2(c). At the shift indication d, a positioning indication p is determined as illustrated in figure 2(d); in the example, the positioning indication p is in the opposite direction to the shift indication d. The graphic is then shifted in this example to the right as shown in figure 2(b), and in the direction indicated by p, as determined starting from d. In this way, since the left part of the image I± was considered as the part of the video not to be occluded, it is possible to automatically shift the graphic element so that the part of the image Ii_+N not to be occluded is preserved, and therefore making the video easier to use; it is significant that this is achieved not only automatically, but by minimizing the computing resources at the probe area Si,i. Figure 2(a) shows a graphic element G; however, this may not be overlaid (yet), in which case the area G represents a portion of the screen where graphics could be positioned without disturbing viewing. In figure 2(b) the graphic element is indicated with G', to underline that it does not necessarily have to be identical to the graphic element G, but that for example the textual (if any) and/or graphic content thereof could be different from that of the first image in figure 2 (a); the invention is however independent of a possible content of the graphics or of the meaning of the graphic element, since in fact the invention is aimed at how and/or where to position said graphic element so as not to hinder viewing the video.

Preferably, at least one of the probe area (Si,i) and the at least one corresponding probe area (Si,i+_N) comprises an image or a portion of a reference image containing an element substantially fixed in the recording field (substantially immobile in the recording field). The recording field is that zone (in the real world) that is recorded by one or more cameras. The scene is a sequence of images in which there is typically continuity of space and/or time relative to the recording field as recorded by means of one or more cameras; therefore, within the recording field there are elements (objects, people, etc.) that move (for example characters from a film or a TV show, footballers, etc.) and elements that do not move (e.g. furniture of the television studio or setting design; lines, published billboards, door in the case of the football pitch, etc.). The elements that do not move, or at least substantially, represent the elements of reference. In the scene, i.e. in the sequence of images recorded, even the fixed elements in the recording field can move as a consequence of the shift of the camera; however, since the reference images (or reference portions) contain a fixed element, it is possible to determine the shift of the scene, i.e. of the camera, in an accurate and simple way. The above is also valid in the case of virtual worlds (for example scenes comprising scenes wholly or partially based on computer graphics, or scenes from animation programs), in which a reference element is a substantially fixed element in the virtual scene.

Preferably, the probe area of the first image and the corresponding probe area of the second image are associated with each other on the basis of a correspondence indication which indicates a measurement of correspondence between a visual content of the probe area of the first image and a content view of the corresponding probe area of the second image. By visual content it is meant the portion of the image present in the respective probe area; the correspondence indication therefore indicates how much the two probe areas are visually similar. Preferably, the indication is a measurement of correspondence between the image produced by the pixels of the probe in the first image and the image produced by the pixels of the probe in the second image. In other words, the portion of the image represented by the probe area of the first image substantially corresponds to the portion of the image represented by the probe area of the second image.

Preferably, the measurement of correspondence between the two probe areas is determined on the basis of a) a similarity measurement of one or more pixels

(representing the respective visual content) of said at least one probe area (Si,i) and one or more pixels (representing the respective visual content) of said at least one corresponding probe area (Si,i+_N), and/or b) a distance measurement between said at least one probe area (Si,i) and the visual content of said at least one corresponding probe area (Si,i+_N), and/or c) a similarity measurement with at least one portion of a reference element.

The similarity therefore expresses how much more the photochromic values (for example the values of the colour model of the pixel: for example, RGB or CMYB; preferably also taking into account the luminance) of some or all the pixels of the respective areas are similar. For example, the similarity is determined when the difference between photochromic values is below a certain threshold, in particular, again by way of example, the similarity can be indicated by the difference (or the modulus thereof) of the values RGB (or CMYB, depending on the colour model used to describe a pixel) of a point representative of the respective probe areas; or it can be indicated by the average of these differences calculated for different pixels or for all the pixels of the respective probe areas, etc. The similarity measurement with at least one portion of a reference element (as in point (c) above) can be obtained, as also explained further on, for example through a neural network trained to recognize at least one portion of one or more reference elements, or image recognition techniques aimed at recognizing at least one portion of one or more reference elements (for example, techniques aimed at recognizing that both the first and the second probe area substantially contain at least one portion, preferably the same portion, of the same reference element).

Preferably, the correspondence indication can be obtained by means of an algorithm or method which associates pairs of elements (the probe areas) belonging to two different sets (each set comprising a certain number of possible probe areas for a given image). An example of such an algorithm is the Hungarian method or algorithm (see for example Assignment Problems: Revised Reprint, Rainer Burkard et al., 2012, ISBN: 978-1-61197-222-1) .

According to an example of application of the above (which will be called the case of the decomposition of the image into a series of probe areas), it is conceivable to define an area (for example an image portion) having a predetermined size and to cut out from the whole image as many of these areas, even partially overlapped. By repeating the same decomposition into portions for the first and second image, two respective sets are obtained, each containing a certain number of areas (the probe areas); the algorithm (for example Hungarian) associates, among all the possible combinations of areas of the two sets, pairs of areas that are similar, using known similarity criteria and/or as well as criteria as illustrated below by way of example. In this example, once one or more pairs of probe areas have been extracted having a degree of correspondence above a certain threshold (in short, one or more pairs with a high degree of visual similarity have been extracted), it is determined in which direction and, preferably, how far one of these probe areas has shifted between the two images; this result is an indication of shift.

By recognizing that the case above of image decomposition may not be very accurate and/or computationally little efficient, the inventors also propose the case of using a neural network (alternative or combinable with the one above), and which will be now explained. A neural network is trained, through an appropriate dataset (a collection of data), to classify a portion of an image of a video stream as a probe area, i.e. as a portion of the image that is subject to undergo limited variations in successive images of the video stream (which will be also called low propensity or low tendency to variation); thanks to these limited variations, the probe area represents a reference point in the scene object of the image. These variations can be quantified by referring to parameters such as brightness, colour, geometry (points, lines and/or geometric shapes, including perspective effects), etc., or any combination thereof. This dataset may contain reference images (or portions of image) characterized by a low propensity for variation between two images of a video stream. Examples of such reference images (or portions) include those images (or portions) characterized by one or more lines and/or geometric shapes whose visual representation does not substantially vary with the variation of the recording distance and/or angle. In particular, the reference images (or portions thereof) contain at least one reference element (or a portion of a reference element) which is fixed in the recording field, as also explained above. For example, reference elements or a portion thereof are represented by: in the case of a sporting event, lines and/or geometric shapes on the pitch (penalty area in football, semicircle on the basketball court, intersection of background lines, etc.), lines and/or geometric shapes representing frames and structures (e.g. football goal, and/or net on the tennis court, and/or advertising billboards on the sidelines, etc.); in the case of a musical event, instruments and/or structures such as the stage, etc.; in general, in case of television programs including films, known objects such as cars, buildings, etc.). The dataset can be enriched for example on the basis of new videos and/or images /portions of images contained therein, or it can be created at a video (e.g. television program) to be broadcast. The neural network trained with such a dataset is therefore able to classify, for an image provided as input, whether and which portions of these images can be considered as probe areas (for example, determining if each of them contains at least one portion of a reference element). Therefore, by providing an image Ii of a video stream as input, it is possible to obtain one or more probe areas for that image I_; the same process can be repeated with a following image Ii_+N to identify one or more probe areas comprised in that following image Ii_+N. Based on the above, it can therefore be said that determining the at least one probe area and the at least one corresponding probe area (step S10) comprises determining as one of said at least one probe area and said at least one corresponding probe area a portion of the first image and, respectively, a portion of the second image each classified, by means of a neural network, as a probe area. In other words, determining (step S10) said at least one probe area and said at least one corresponding probe area comprises classifying as probe area, by means of a neural network, a portion of the first image and, respectively, a portion of the second image, and determining the portions classified as respective said at least one probe area and said at least one corresponding probe area. Therefore, the neural network makes it possible to obtain (as a result of the classification) at least one probe area for each of two images. Therefore, as a result of the classification in probe area, a probe area thus classified contains at least one portion of the reference element (in turn contained in the reference image). Preferably, one or more reference images contain one or more geometric shapes, preferably rectilinear (e.g., in the case of a football event, midfield lines, lines delimiting the penalty area, lines corresponding to posts and/or crosspieces of the football goal, etc.; in the case of a film, demarcation lines on the street, lines of furniture components, lines delimiting a building, etc.). In the presence of such geometric shapes, less processing resources are advantageously required since it is easier to process this type of pattern in the images.

Once at least one probe area has been identified for each image I_± and Ii_+N, it is possible to determine a correspondence between a probe area of the image I_± and a probe area Ii_+N, for example through an algorithm that associates pairs of elements (the probe areas) belonging to two different sets; in this context, the Hungarian algorithm can be applied.

Preferably, a variant of the Hungarian algorithm can be applied, which by way of illustration can be explained as follows:

A matrix is defined to measure the correspondence between two probe areas belonging to two images. In particular, the matrix can consider two criteria: similarity and/or distance between the centres of gravity;

The optimal correspondence should preferably maximize the similarity and/or minimize the distance between the two probe areas;

The algorithm proceeds with the assignment, between multiple probe areas detected in each of the two images, on the basis of the similarity and/or of the centres of gravity, preferably independently; Preferably, the assignment is considered valid only if it is located at both assignments.

The above allows to overcome some problems that could occur in the case under examination of the probe areas belonging to two distinct images. For example, it is possible that probe areas may appear very similar to each other but distant, for example because one or more probe areas enter the scene while one or more exits; there is no useful correspondence for these probe areas. The method proposed above therefore makes it possible to exclude such an association.

It should be noted that the optional variant of the Hungarian method described above can also be applied in combination with other methods, and that it can be applied both to the case of image decomposition and to the case of the use of the neural network. Furthermore, in the event that only one probe area is identified for each image (or for one of two images), the Hungarian method and/or the variant thereof above can be used to evaluate the degree of correspondence between these probe areas, in order to determine if they are suitable for determining the shift direction.

The cases of image partition and use of neural network were exposed above. In another example, it is possible to determine the probe area of the first image and the corresponding probe area of the second image using, for example, an image processing technique adapted to identify, within each of the first and second images, portions of the image that correspond to a reference image (preferably, see above, containing at least one portion of a reference element which is substantially fixed in the recording field), in which the reference image is chosen so that it does not substantially vary as the framing changes and/or the scene changes. For example, the image processing technique recognizes lines and/or geometric shapes that are estimated as similar to reference images present in a given scene (field lines, etc., see also examples above); the similarity can be determined mathematically according to the technique used. Alternatively or in combination, the image processing technique recognizes as a probe area a portion of an image having characteristics that are not subject to variations as the image changes; for example, the recognition technique determines the presence within the image of at least one reference element (not prone to variations, see also above), and classifies an area that comprises it as a probe area.

Preferably, the probe area of the first image and the corresponding probe area of the second image represent portions of the respective images whose content does not vary (for example, because they refer to at least one portion of a reference element which is fixed in the recording field). By content it is meant a visual content represented by the pixels comprised in the probe area; in particular, each probe area represents a portion of the respective image which remains (visually) stable or substantially unchanged with the succession of the frames. For example: set of pixels that can be associated with geometric or regular parts, and which in general remain substantially unchanged (unless there are variations due to a change of perspective, etc.) without disappearing or being covered in the following frame. Furthermore, and preferably, the number of pixels of an invariant element of the probe area (and therefore its size) is such as to be higher than a certain threshold; in other words, very small elements of the image, even if potentially invariant, are not considered as probe areas because their small size may not guarantee high recognition accuracy. This threshold can be determined empirically, for example in a test and/or calibration step of the method described herein. For example, areas smaller than AxB pixels (e.g. 4x4) are not considered as probe area.

Preferably, in the previously described method, the positioning indication is determined at an image zone different from, still preferably opposite with respect to the shift direction. In the example of figure 2, if the shift direction indicates left (i.e. that the framing and/or the scene have shifted correspondingly), the positioning indication indicates that the graphic element is to be positioned in a zone of the screen preferably opposite to the shift direction, therefore (in the example in the figure) that it is to be positioned in the right zone of the screen. Preferably, rules can be set that define how and/or where to position the graphic element on the basis of the shift indication. One of these rules indicates that the graphic element is to be positioned in a zone opposite to the shift direction.

Preferably, according to another association rule between shift index and positioning index, it is possible to define for each of the first image and second image a first sector and a second sector (the sector refers to a part of the image, for example arranged on one of the sides and/or representing a perimeter section of the surface represented by each of these images), these sectors obtained by dividing for example the image into two parts (not necessarily the same, but for example such that at least one of these parts is large enough to contain at least part of the graphic). The first sector can be associated with the shift direction (i.e. the part of the screen towards which the framing and/or scene is moving) and the second sector is the one opposite to the first sector along the shift direction; the positioning indication is then determined at the second sector. The sectors can be dynamically determined according to the shift direction (for example, if the shift is from top to bottom, then the sectors will be obtained by dividing the screen with respect to a horizontal line). Considering the image as a rectangle, for example, the first and second sector correspond to respective areas near the opposite geometric sides (right and left) of the rectangle. The sector can also be indicated with a side (see for example figure 2: the graphics shift to the opposite side with respect to the movement of the scene); however, the side in this example is not strictly a geometric side of the image, but a portion (e.g. a quadrant, sector) of the image. The graphic element will then be positioned, for example, within such a sector, or at it.

Above, the term has been used optionally and/or preferably to indicate optional variants of the method of the first embodiment. It is possible to combine two or more of these variants in any way.

As illustrated above, the first embodiment is directed to a method. All of the above considerations and/or variants apply to devices and/or entities, such as an entity to determine a positioning indication for positioning a graphic element, as well as to systems comprising the entity for determining a positioning indication for positioning a graphic element, computer programs, computer program support, signals and other examples and embodiments as also illustrated hereinafter. Where different details are omitted for the sake of brevity, all the above remarks apply equally and/or correspondingly to what follows and vice versa.

With reference to figure 3, a second embodiment related to an entity (20) for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid on a video stream will now be illustrated. The video stream comprises a first image and a second image. The entity (20) comprises a first (22), a second (24) and a third (24) processing unit. Each of the first, second and third processing unit can be realized through a hardware and/or software combination, can be localized in a device or distributed on different devices; moreover, two or more of these units can be combined together. The first processing unit (22) configured to determine at least one probe area (Si,i) for the first image of the video stream, and at least one corresponding probe area (Si,i+_N) for the second image of the same video stream. The video stream comprising the first and second images can be provided as input to the device 20 (as indicated by "IN" in the figure); this video stream can be obtained from a camera, from a direction system, from a database (e.g. content repository) in which television programs and/or advertisements are stored, etc. The first unit

22 is therefore able to determine the probe areas. In case a neural network is used, this neural network will preferably be comprised in the unit 22 (in one example, the unit 22 is a neural network). The second processing unit (24) configured to determine a shift direction indicating a direction according to which the at least one corresponding probe area of the second image has shifted with respect to the at least one probe area of the first image. The third processing unit (26) is configured to determine the positioning indication based on the shift direction. The shift indication can be provided to another device (e.g. a server, or a device in the video stream distribution chain, etc.), which then takes care of positioning the graphic element on the video stream and of broadcasting this video stream with the overlaid graphics encoded in the video stream to one or more users (in the case of broadcasting or multicasting; similar considerations apply to video on demand streams, or sent to a single user who requests them, since the invention is equally applicable to such cases). In another example, the positioning indication is sent to the user device (or to several user devices); the user device will then position the graphic element on the basis of this indication; this case for example refers to the case in which the graphic rendering can be performed directly from the user device, while the case preceding the case in which a device that is remote to the one of the user inserts the overlaid graphics on the broadcast video (i.e. the broadcast video, for example in MPEG format, comprises the graphics already overlaid). It is conceivable to perform the graphic rendering on the server side, and to send the video, the graphic element and the positioning indication in a transport stream (i.e. within a connection to the user); the user device will then overlay graphics and videos on the basis of the positioning information.

Optionally, the entity 20 may comprise further units (and/or in combination with the units 22-26) configured to perform any one or any combination of the steps illustrated above, or to implement what is discussed below.

According to a further embodiment, a system is provided comprising an entity according to the second embodiment (for example as illustrated in figure 3), and at least one user terminal connectable to this entity via a communication network. This terminal (for example, TV, smartphone, computer, tablet, etc.) will then reproduce the video stream in which the graphic element is overlaid as according to the invention and/or as set forth in the present description. Preferably, the entity is connectable or connected to a plurality of terminals to which the video stream is broadcast.

According to a further embodiment, a computer program is provided which is set up to perform, when said program is run on a computer, any combination of the steps according to any one of the methods and/or examples of the invention, and/or as set forth in this description.

Figure 4 illustrates a block diagram exemplifying a computer (500) capable of running the aforesaid program. In particular, the computer (500) comprises a memory (530) for storing the instructions of the program and/or data necessary for the performance thereof, a processor (520) for performing the instructions and an input/output interface (510). In particular, figure 4 is illustrative and non limiting, because the computer can be made both in a concentrated manner in a device or in a distributed manner on several interconnected devices. For this reason, the program can be run locally on a concentrated (local) or distributed device.

According to a further embodiment, a support is provided for supporting a computer program set up to perform, when the program is run on a computer, a step or a combination of the steps according to the method described in the first embodiment. Examples of a medium are a static and/or dynamic memory, a fixed disk or any other medium such as a CD, DVD, Blue Ray. Comprised in the medium there is also a means capable of supporting a signal constituting the instructions, including a means of cable transmission (Ethernet, optical cable, etc.) or wireless transmission (cellular, satellite, digital terrestrial transmission, etc.).

With reference to figure 5, an example of the use case of neural network applied to sporting events, in particular to a football match, will now be illustrated. The neural network is trained to recognize images or portions of reference images; portions of reference images are indicated in the figure with 510, 520, 530, 540, noting that these portions of images represent probe areas. In detail, the areas 510 and 520 represent intersections of the lines of the pitch; the area 520 represents the angle between the post and the crosspiece of the goal; the area 540 instead represents an advertising billboard. As evident, these portions of the image tend not to vary substantially as the scene changes, that is, they do not substantially vary between one image and the immediately following ones (or have a propensity or a tendency not to vary substantially) . In fact, at a movement of the camera, even if the perspective and the dimensions can change, the appearance thereof overall does not change substantially; each of these can therefore be taken as a reference point to determine the shift direction of the framing/scene, and can therefore be considered as a probe area since the shift direction will be calculated on the basis of this area. This characteristic is due to the fact that the image portions 510, 520, 530 and 540 are substantially fixed in the recording field of the scene, and therefore represent reference elements (or portions of reference elements) fixed in the recording field. In another example, it is possible to consider the contours of the penalty area, or the lines of the sidelines, midfield, etc. The neural network is preferably trained with a dataset comprising a series of reference images, in the present case typical of a football pitch or of what is usually present when a football match is broadcast. Once trained, the neural network is able to detect or recognize each of the areas 510- 540. A first image I_± and a second image Ii_+N, in which the second is preferably immediately following (or separated by a small number of images) the first one, will be inputted to the neural network, which will detect the respective probe areas for each of these images. Afterwards, the corresponding probe areas will be determined, i.e. for each probe area of the first image a corresponding probe area of the other image will be determined (or, vice versa, for each probe area of the second image a corresponding probe area of the first image is determined; by corresponding it is indicated that the two probe areas are corresponding, regardless of how this correspondence is determined and regardless of the temporal order of the images). For this purpose, the Hungarian algorithm and/or a variant thereof can be used as explained above. Once at least one pair of probe area and corresponding probe area has been determined, a shift direction is determined by calculating the relative shift of a probe area with respect to the corresponding probe area, preferably relative to the screen. For example, a centre of gravity is calculated for each probe area, and the shift direction is calculated with reference to the respective centres of gravity. In another example, a reference point is defined for the probe area, and the shift direction from that reference point. In the case of several pairs of probe areas, it is possible to perform an average of the various probe areas, or it is possible to choose a pair as the most representative and calculate the shift thereof, etc. Figure 6(a) illustrates the case in which a shift direction to the right is determined, for example because the framing has shifted to the right with respect to a previous image. In this case, the graphic element is positioned to the left in response to determining that the shift is to the left. Similarly, in figure 6(b), a shift position to the left is determined; as a consequence of, or in response to, this shift direction, the graphic element is overlaid in an opposite manner on the right side of the screen. Therefore, it is possible to ensure that the viewing of the video is not hindered, or is hindered in a minor way. The same advantage can be found with other types of television programs, since - once the position of a graphic element has been decided for a certain image I_± so that it does not substantially hinder the vision - it is possible to make sure that the graphics are adaptively positioned also in successive images while preserving the same characteristic of not hindering the vision.

Figures 7, 8(a) and 8(b) are screenshots that represent examples of what is shown with reference to figures 5, 6(a) and 6(b), respectively.

It is noted that a probe area present in a first image may not be present in a second image following the first one, for example because it has disappeared from the framing: think of the case of a football goal that disappears because the framing is following the action that is moving away from that door (situation X). In this case, a corresponding area in the other image will not be identified. Furthermore, the intersection between post and crosspiece in the first image may resemble the intersection of two pitch lines in the second image (situation Y). The area association algorithm can advantageously resolve uncertainties or ambiguities as in said X and Y situations. For example, with reference to the variant of the Hungarian method described above, it is possible to weigh the centre of gravity and/or the distance between two probe areas of two different images in the association matrix. In this way, two areas which resemble each other but which are very distant can be determined as probably not corresponding to each other (case Y), and therefore not associated; similarly, two areas having different centres of gravity and/or very distant from each other may not be associated.

Many of the embodiments and examples have been explained with reference to steps of methods or processes. Nevertheless, what has been described can also be implemented in a program to be run on a processing entity (also distributed) or on an entity the means of which are configured to perform the corresponding method steps. The entities described above, as well as the components thereof (for example the units thereof) can be implemented in a single device, via hardware, software, or a combination of these, or on multiple interconnected units or devices (also hardware, software, or a combination thereof). In other words, each or some of the entities described above, as well as each or some of the components thereof (for example the units thereof) can be implemented locally or in a distributed manner. Naturally, the above description of embodiments and examples applying the principles recognized by the inventors is given only by way of example of these principles and must therefore not be construed as a limitation of the patent scope claimed herein.

Claims

1. Method for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid to a video stream comprising a first image and a second image, the method comprising steps of:

2. The method according to claim 1, wherein said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i_+N) are associated on the basis of a correspondence indication indicating a measurement of correspondence between a visual content of said at least one probe area (Si,i) and a visual content of said at least one corresponding probe area (SI,I+N)·

3. The method according to claim 1 or 2, wherein determining said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i_+N) comprises determining as said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) a portion of the first image and, respectively, a portion of the second image each classified by means of a neural network as a probe area.

4. The method according to claim 3, wherein each of said portion of the first image and of said portion of the second image is classified as a probe area (Si,i, Si,i+_N) when it corresponds to at least one reference image portion.

5. The method according to claim 3 or 4, wherein said neural network is trained to classify portions of an image on the basis of a dataset comprising reference image samples.

6. The method according to any one of the preceding claims, wherein said at least one probe area (Si,i) of said first image and said at least one corresponding probe area (Si,i+_N) of said second image represent portions of the respective images whose content does not vary.

7. The method according to one of the preceding claims, wherein determining said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i) comprises determining said at least one probe area (Si,i) and said at least one corresponding probe area (Si,i+_N) by means of image processing techniques adapted to determine that the content of such areas (Si,i, Si,i+_N) is substantially not subject to variations as the image changes.

8. The method according to one of the preceding claims, wherein determining the positioning indication comprises determining the positioning indication at an image zone opposite to the shift direction.

9. A computer program comprising instructions set up to perform, when said program is run on a computer, all the steps according to any one of the method claims 1 to 8.

10. An entity for determining a positioning indication for positioning a graphic element indicating a position at which a graphic element is to be overlaid on a video stream comprising a first image and a second image, the entity comprising:

- a first processing unit (22) configured to determine at least one probe area (Si,i) for the first image of said video stream, and at least one corresponding probe area (Si,i+_N) for the second image of said video stream;

- a third processing unit (26) configured to determine the positioning indication on the basis of said shift direction.