EP1053533A1 - Verfahren und verwendung zur räumlichen segmentierung von bildern in visuellen objekten - Google Patents

Verfahren und verwendung zur räumlichen segmentierung von bildern in visuellen objekten

Info

Publication number
EP1053533A1
EP1053533A1 EP99901651A EP99901651A EP1053533A1 EP 1053533 A1 EP1053533 A1 EP 1053533A1 EP 99901651 A EP99901651 A EP 99901651A EP 99901651 A EP99901651 A EP 99901651A EP 1053533 A1 EP1053533 A1 EP 1053533A1
Authority
EP
European Patent Office
Prior art keywords
regions
objects
segmentation
space
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP99901651A
Other languages
English (en)
French (fr)
Inventor
Pascal Faudemay
Gwena[L Durand
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Universite Pierre et Marie Curie Paris 6
Original Assignee
Universite Pierre et Marie Curie Paris 6
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Universite Pierre et Marie Curie Paris 6 filed Critical Universite Pierre et Marie Curie Paris 6
Publication of EP1053533A1 publication Critical patent/EP1053533A1/de
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Definitions

  • the invention relates to the field of analysis of the information contained in color images, in particular from multimedia documents, and in particular videos. This analysis is intended more particularly, but not exclusively, to allow indexing of audiovisual documentation.
  • the problem consists in particular in making a temporal division of video sequences into scenes which constitute narrative units of these sequences, with a view to the storage and the selective visualization of certain scenes by the users.
  • Such a level of understanding is not directly accessible by known segmentation methods. Methods have been developed for segmenting grayscale images or for segmenting moving objects in the images of a video. But none of the color image segmentation algorithms provides sufficient results when the images to be analyzed are taken from "real life".
  • the main methods can be grouped into pixel type, edge type, and region type methods.
  • a region is defined as a connected component of a set of pixels specified by a membership function, possibly fuzzy, in a color class of the Color Space (abbreviated as EdC) .
  • EdC Color Space
  • These methods are mainly differentiated by the way they define the color classes and the membership functions of these classes.
  • the simplest of these methods is a rigid quantification of the EdC, as described in the articles by C. Carson, S. Belongie, et al. "Region Based Image Querying", Proc.
  • Another pixel type method is a histogram thresholding method, in which the peaks and valleys appearing in one or more histograms corresponding to the different dimensions of the EdC are sought. The detected valleys are then used as limits between the classes of colors (as described for example in the article by R Hayasaka, J Zhao and Y Matsushita, " Outstanding Object-Oriented Color Image Segmentation Using Fuzzy Logic" Proc SPIE'97 Multimedia Storage and Archiving Systems il, Vol 3229, 303-314, 1997
  • the clustering methods (clustering in English terminology) of the EdC are multidimensional extensions of the previous thresholding techniques and apply classification algorithms such as the algorithms for finding nearest neighbors, (see the article by R Fer ⁇ and E Vidal, " Color image Segmentation and labeling through multiediting and condensing", Pattern Récognition Letters, vol 13, No 8, pp 561-568 1992), the algorithm of mean K (K-mean in English terminology) or fuzzy mean C (Fuzzy c-mean in English terminology) (see the article by YW Lim, SU Lee, “ On The Color Image Segmentation Algo ⁇ thm Based on the Thresholding and the Fuzzy c-Means Techniques", Pattern Récognition, Volume 23, Number 9, pp 935-952, 1990) These algorithms make it possible to search for potential clusters of colors in the images Finally, certain methods seek the EdC allowing an optimal representation of the images using techniques such as analysis into main components or the Karhunen-Loève transform, such as that described in the article by SE Umbaugh e
  • the first drawback of these methods is that a robust search for peaks in the histograms or clusters in the EdC is not easy, in particular in the case of low-contrast images, and can be costly in computation time.
  • these methods implicitly consider that if two pixels, that is to say two homogeneous image points, whether the support is a video image or not, belong to the same region of the image, their respective colors belong to the same color class or are close in EdC. This only applies correctly in clip-art or cartoon images, but generally not in real complex images. When extracting regions, these methods must therefore analyze the immediate vicinity of the pixels to determine to which region each pixel should be attached.
  • contours are detected and used to determine the limits of the regions.
  • edge extraction methods are not segmentation techniques by themselves and must be combined with at least one of the other methods.
  • contours obtained in the case of poorly contrasted or highly textured images are difficult to use since they are generally not closed.
  • a region is defined as a set of connected pixels satisfying a given homogeneity criterion, for example an area which contains only one color present in 95% of the pixels of the region.
  • a given homogeneity criterion for example an area which contains only one color present in 95% of the pixels of the region.
  • the so-called region-growing techniques are region-type methods in which a certain number of initial zones, used as growth seeds, are first sought. The neighboring pixels and regions are iteratively included in these initial zones until a stopping criterion is satisfied, for example when the number of regions obtained is less than a threshold.
  • a known example of this category of method is the so-called "topographic basins" algorithm in which an image is considered as a topographic relief, where the altitude of each point can, for example, be proportional to its light intensity. The bottom of the most important basins is pierced, and the relief is immersed in water. The pools gradually fill up, delimiting the main regions. This method is very sensitive to noise and costly in computation time.
  • RSST Recursive Shortest Spanning Trees, in English, RSST for short.
  • RSST (see the article by O.J. Morris et al, in “graph theory for image analysis: an approach based on RSST”, IEEE proceedings, vol. 1333, 146-152, 1986) considers each pixel as an initial region. The regions with the closest average colors are merged recursively, favoring the merging of small regions. Most of the above methods operate at the pixel scale. This makes them particularly sensitive to local variations in intensity and therefore to textures.
  • This sensitivity is necessary for artificial vision or pattern recognition applications for which the extraction of exact contours is essential, but it is penalizing when looking for large semantically significant regions.
  • the invention aims to overcome these drawbacks by proposing a segmentation into regions of sizes such that the regions thus segmented still have a semantic meaning in the context of the image.
  • semantic or semantically significant object it is understood an object corresponding to the real world, for example a face, a sky, etc.
  • semantic objects can compose another semantic object (for example a hair, a face and a jacket constitute a person) hereinafter called composite semantic object.
  • the semantic objects composing a composite semantic object can also themselves be composite semantic objects (for example the face is composed, inter alia, of a nose, a mouth and eyes).
  • segmentation of images into objects with significant semantic value is a key step in the process of analyzing and understanding the content of multimedia documents, in particular video documents.
  • the invention makes it possible to segment the images into significant objects while neglecting the details.
  • the invention thus aims to obtain a robust segmentation in the presence of possibly very textured images, and insensitive to insignificant details which could lead to unnecessary over-segmentation of large homogeneous regions, for example a black cord on a white wall.
  • the segmentation method according to the invention which is similar to region type techniques, operates initially on the scale of the region, starting from initial blocks of reduced size while being considered as homogeneous, so as to allow segmentation of larger objects.
  • the invention therefore relates to a method of spatial segmentation of an image into visual objects, characterized in that, in order to obtain objects having a semantic meaning, it comprises the following steps: - in a first phase, a partition of the image in regions according to a predetermined tiling, a fusion of neighboring regions whose similarity, according to a first similarity function, is less than a first threshold, and obtaining enlarged regions,
  • the method according to the invention also comprises a third phase of fusion of the regions obtained at the end of the second phase and which are similar according to a third similarity function.
  • the similarity functions used are different in at least two of the phases.
  • the subject of the invention is a method of spatial segmentation of an image into visual objects which, in order to obtain objects having a semantic meaning, comprises the following steps:
  • a representation of each of the regions obtained by a cloud of points in a representation space formed of at least one basic dimension characterizing an electromagnetic signal originating from this region and of a dimension characterizing the pixels corresponding to the values considered in the other dimensions, with
  • the neighboring regions which can be merged in each of the phases can be initial regions, regions resulting from a fusion of initial regions, or enlarged regions resulting from previous mergers; the different types of regions thus defined can be merged with each other from the moment when the same similarity function is applied to them, and until all the similarities according to the applied function are greater than the chosen threshold.
  • a new similarity function is applied, the merging of the regions obtained in a previous phase by application of a lower threshold similarity function is then possible until no more merging is done. possible.
  • the process can thus be repeated by applying a new similarity function with a higher threshold than the previous one.
  • the regions Mergers always remain those that have not merged at the lower threshold of the previous function, including initial regions.
  • the similarity function between two regions can be defined by the position of the centroids of the two regions and of the ends of the two curve segments representing these two regions.
  • the electromagnetic signal appearing in at least two images can be transformed in order to extract from it at least two components of movement between the two images to represent it, such as scalar value and orientation, the other process steps applying to this representation.
  • the points of a cloud describing a region are distributed in a space of which three basic dimensions are three distinct linear or non-linear combinations of the three primary colors of the additive synthesis, and another dimension in this space being the number of pixels according to this distribution.
  • the points of a cloud describing a region are distributed in a space of which three basic dimensions are the hue, the saturation and the intensity of the color and another dimension in this space being the number pixels according to this distribution.
  • a second segmentation on a finer scale which may be that of the pixel, is carried out so as to obtain the precise contours of the limits of the objects, as well as their structure. internal, then a fusion of the two segmentations is carried out in order to obtain both semantically significant objects and precise contours for these objects.
  • the invention also relates to a method of fine segmentation of images into semantically significant objects, consisting of:
  • a superposition of the two preceding segmentations provides regions corresponding to the objects of the image of the first step with the precise contours of the second step, as well as an internal structure representative of these objects.
  • image is represented as a tree of objects, each higher level object being able to include one or more objects of lower level.
  • the last two steps can naturally be carried out at several consecutive resolutions so as to obtain a hierarchical description of the structure of composite semantic objects.
  • the similarity calculation between regions can be performed: 10
  • the predetermined thresholds can be chosen to maintain the number of regions in an interval in which the over-segmentation and the sub-segmentation do not appear and to maintain the distribution of the sizes of the regions in a defined interval to avoid over and under - segmentation.
  • the parameters of threshold, degree of polynomial and similarity function can be chosen adaptively by a predetermined learning method, according to thresholds of over- and sub-segmentation to be avoided, and of a calculation of predetermined evaluation of these.
  • the similarity function applied at the end of any of the steps of the method includes the threshold parameter beyond which the fusion is not carried out.
  • the invention is first described as an algorithm for segmenting large regions. Fine segmentation is a more particular mode that can be achieved by the same algorithm. An algorithm using two resolutions, and combining the two segmentations obtained, is described below.
  • the image is first cut into a grid of so-called initial blocks, of suitable size, equal to
  • RGB Red Green Blue
  • each region is evaluated by calculating the distance between its histogram and those of the adjacent regions.
  • the distance calculations are carried out according to the order 1 standard (L1), equal to the sum of the absolute values of the differences in the 1st degree of the values of the histograms taken two by two, the standards of order n, (Ln), being those of Minkowski (equal to the power 1 / n of the sum of the absolute values of the same differences to the power n).
  • the current region is merged with the one whose histogram is closest to hers, but only if the distance between their histograms is less than a high threshold which, in this embodiment, is set at 50% of the maximum distance possible from this exemplary embodiment.
  • a high threshold which, in this embodiment, is set at 50% of the maximum distance possible from this exemplary embodiment.
  • the higher the threshold the higher the level of similarity of the merged regions.
  • the merge is repeated until all the distances between adjacent regions are greater than this threshold.
  • the remaining regions are either small regions, that is to say corresponding to details (size less than 1% in the embodiment), or larger and homogeneous regions, due to the high melting threshold (50% in the exemplary embodiment). These large regions can possibly be very 13
  • the first phase is followed by a second phase of merging only the small regions, of size less than 1% in the embodiment example.
  • These "details” are automatically merged with their closest neighbor by deleting the 50% fusion threshold used in the first phase, so that all the details are integrated into their surrounding region or their closest neighbor.
  • the entry E of the mound being smaller than the threshold of 1% in the example of embodiment, it was segmented during the first phase, then was merged during the second phase since, evaluated as detail, it could not not be considered a significant semantic object by the present algorithm.
  • Objects like the sky in this figure 1, can be over-segmented, the regions that compose them remaining “perceptually” similar.
  • the third phase allows us to go beyond this stage.
  • the over-segmentation of the sky C illustrates the limits of the use of color histograms: they are sensitive to optical effects such as illumination, lighting variations or gradients, as described in the article by M. Stricker and M. Orengo, " Similarity of Color Images", Proc. SPIE'95, Storage & Retrieval for Image and Video Databases III, 1995).
  • their corresponding histograms are "hollow” (that is to say have a large proportion of values close to or equal to 0) and therefore cannot be effectively compared using the measurements of distances such as L1 or L2.
  • each region is represented by the statistical mean of the values of its histogram forming its mean color, corresponding to the centroid C of each point cloud N1 to N6, corresponding to the color distributions of the regions, and by a polynomial interval, P, which gives an estimate of the color variations within the region.
  • the clouds of the regions obtained are subsets, or sub-regions, of the final regions R1 to R3.
  • 6 sub-regions N1 to N6 have been extracted.
  • a polynomial fit is calculated for the final regions, as well as an interval over their domain of definition.
  • the polynomial P of a final region and those Pi of the sub-regions (obtained at the end of the second phase) which compose it are the same.
  • the interval of P is the union of the intervals of Pi.
  • the third phase of the algorithm merges regions with similar polynomial fits, and having similar, consecutive, or overlapping definition-related intervals.
  • a line is approximated using the classic method of linear regression.
  • the point clouds of the regions obtained ( Figure 2b) at the end of the second phase are then represented by a line segment obtained by a linear adjustment and by the centroid of the cloud N1 to N6 corresponding, which is not necessarily the center of the segment.
  • Each of the regions is represented by its average color and by a segment S1 to S6 on the right carried by the linear regression line of the corresponding point cloud (FIG. 2b).
  • the ends E1 and E2 of the representative line segment S are the projections on the regression line D3 of the extreme points of the point cloud N. Under these conditions, the segment does not go beyond the projection of the most extreme points of the A cloud of dots.
  • the Euclidean distance from these ends to the centroid of the cloud is preferably limited to a threshold in RGB space, threshold equal to 1.5 times the standard deviation of the color distribution for the region considered in the example embodiment.
  • the average regression coefficients obtained are equal to 0.87 for keyframes and 0.84 for still images.
  • the regions are merged in the third phase of the algorithm by no longer comparing the color histograms, but the representative segments obtained in the second step.
  • the comparison of the segments is carried out in the Hue-Saturation-Intensity HSI space (initials of “Hue-Saturation-Intensity” in English terminology).
  • This space is perceptually uniform, since providing a linear representation of the variations in the spectral frequency of a color, while the RGB space, not providing such a representation, is not suitable for such a comparison.
  • the fusion to the perceptually similar regions, and therefore potentially belonging to the same objects of the scene only the regions whose differences in Hue and Saturation between the centroids are less than a given threshold are merged.
  • the maximum difference in hue is fixed at a threshold equal to 7.5 °, and the maximum difference in saturation at a threshold of 15%.
  • the neighboring regions R1, R2 and R3, satisfying these criteria are merged. These representative segments are close to the best polynomial adjustment, which can be obtained in the case of an adjustment of order greater than one.
  • the comparison method mainly consists in comparing the average gray levels (i.e. the average intensity) and the variations in intensity (i.e. the textures).
  • Another embodiment concerns the fine segmentation of objects obtained using the previous fusion process, aimed at obtaining the fine outline of these objects as well as their internal structure.
  • the same algorithm is used with at least two different resolutions, one called fine and the other called wide. This finer resolution is obtained only by using initial blocks of smaller size, for example 4x4 pixels.
  • the algorithm then performs the segmentation by cutting at the pixel scale, which makes it more sensitive to contours and textures.
  • FIGS. 8a and 8b respectively present a so-called broad segmentation ⁇ , obtained with a resolution of 16 ⁇ 16 pixels, and a so-called fine resolution ⁇ f of 4x4 pixels, on the same image.
  • the overlay retains the outlines of the fine regions included in the regions having semantic significance.
  • the fine regions may not be systematically included in the corresponding wide region, because that may result from details obtained thanks to the finer resolution, for example the bars 11 on the wall at the back of the figure in figure 9.
  • a zone is spatially included in a region and its representation is close (in the sense of the similarity measure used during the third phase of the basic algorithm) to that of this region. In this case, the area is considered to be part of the region. • 20
  • an area can correspond to a detail of the image that has not been extracted by the coarse segmentation (eg bars 11 on the wall behind the character of Figure 9).
  • the distance between the representations of the zone and of the region is high, and it is considered that the zone is not part of the region, but forms a region by itself.
  • a zone is not mainly included in a region (in practice, a threshold depending on the size of the zone is fixed) but extends over several regions (eg the collar 12 of the shirt of the figure in figure 9 ). In this case, the area is part of the most similar region, or is considered as a region in its own right if none of the surrounding regions is similar enough. Examples of application of the method according to the invention are described below.
  • Example 1 Characterization of objects; classification. All the characteristics of the segmented objects are kept for the purpose of an analysis of the document and / or an indexing of the images to allow the search for plans on their content in terms of semantic objects and actions of these objects.
  • the set of characteristics of each region (color, texture, size, position, shape index, movement, ...) is very compact. In the case of a linear adjustment, a summary of the previous characteristics can be stored in less than 20 bytes (The position of the average color and the representative segment requiring 9). More complete representations may require a few tens of bytes per object.
  • An image can be represented in a form summarized by a list of descriptors of the main objects it contains, each descriptor including in particular the position and movement of the object.
  • the criteria for choosing the main objects can be for example the size, the difference in color with neighboring objects, movement, or semantic knowledge of the type "object X is important".
  • the representation of the image can be as compact as 80 bytes.
  • To characterize an object it is useful to know not only the descriptor of this object, but also the descriptors of neighboring objects, since an object can also be characterized by its context (ex: an airplane in the sky.).
  • the semantics of some of the objects segmented by the proposed method can be easily extracted using their visual characteristics in a certain number of simple cases (ex: detection of day skies, lights, skin, ...) .
  • the association of semantics with objects can also be based on the contribution of external knowledge. For example: "a sky is a blue or gray object with little texture, generally at the top of an image”.
  • the problem of characterizing a semantic object is a known problem of classification or clustering of points in a multidimensional space.
  • This classification can be done with or without learning, in supervised mode or not.
  • this classification is based on a compact representation of the object and, where appropriate, of the surrounding objects, in a multi-dimensional space.
  • Known classification methods that can be used are conventional data analysis methods, neural methods and methods using genetic algorithms.
  • the clouds of neighboring points are characterized as clusters and projected into an adequate smaller representation space.
  • the characterization of the cluster objects can then be done from the description by the user of a 22
  • the indexing system generalizes the characterization of one or more objects described by points of the cluster, to objects described by other points of this cluster.
  • the system learns a "classifier", which allows dividing the representation space into clusters, from a set of examples. Examples can be provided by one of the system users during training or while in use.
  • initial classifiers each characterized by a similarity function taken from a set of possible functions, and by thresholds. These classifiers are represented by a signature which is a bit string.
  • the initial classifiers can be drawn at random or provided by users. The user or the system determines which classifiers have given an appropriate response. The classifiers who participated in the correct answers are hybrid by recombining the signature of two of these classifiers. Random modifications of signatures or "mutations" can also be applied when creating new classifiers. For certain classes of applications, this process converges towards a population of classifiers close to the optimum.
  • the fourth possible classification method based on the segmentation of the image into semantic objects is the search for visual objects similar to a set of given examples, based on a similarity in the characteristics of colors, shapes, etc. .
  • the initial query obtains a global similarity function, by calculating a sum of similarity functions applied independently to different criteria, each being weighted by a value called weight.
  • This initial request can be enriched in a known manner by allowing the user to specify which answers are satisfactory or not.
  • a generic technique for enriching a vector query from these responses is known. In some variations of this technique, it can be based on the estimation of desirable changes in the weights of the different similarity functions, using methods derived from Bayesian probabilities.
  • Vector research by similarity is proposed by several video indexing projects, but these projects are not based on a spatial segmentation of semantic objects and a measure of similarity according to our process.
  • the application of these methods is facilitated by the reduced size of the descriptor, and by the possibility for the user to indicate examples and counterexamples, and if necessary to indicate whether the answer is satisfactory or not.
  • a script describing the content of each clip of a video is aligned with the boundaries of the clips of the video by known methods.
  • This sc ⁇ pt describes each object of a plan and its actions.
  • By correlating the presence of an object in the script and in the video it is possible to determine with a certain probability which object of the video corresponds to an object of the script and what are its actions. From this information, we have examples of this type of object, which allow us to automatically build a classifier for this object.
  • Another use of these methods in our process is to use segmentation to annotate objects by objective or subjective characteristics. To recognize the presence of one of these characteristics in an object or part of a video, it is possible to automatically choose as examples the visual objects which are annotated by this characteristic, and then proceed to learn 'one of the previous classifiers.
  • the recognition of the speaker by known methods of audio analysis makes it possible to choose as example several instances of the same object, and then to proceed to learn the characteristics of this objects according to one of the methods mentioned above.
  • Example 2 Time segmentation of video into sequences
  • a video is most often structured in shots, separated by cuts (cuts in English) or by special effects (fades, panes).
  • a plan is a continuous series of images taken in a single take by a single camera. The segmentation of a video into plans is useful in particular for navigating the video from an interface called "story-board", which represents each plan by a characteristic image.
  • a sequence is a series of shots describing the same environment and the same characters.
  • the sequence is a semantic unit suitable for description of content and navigation in the video.
  • Another method of segmentation into sequences is based on the detection of characteristic objects. For example, a change of sequence is often linked to a change of environment, for example 25
  • the detection of an object of the day sky or night sky or lighting type optionally makes it possible to characterize a shot as shot outside day or outside night.
  • the segmentation into semantic objects and then the characterization of a certain number of objects by the methods of the preceding paragraph makes it possible to detect limits of sequences.
  • Plane groups have the same properties as sequences, but are not necessarily formed of contiguous planes.
  • the subjects are a series of sequences on the same theme. Detecting subjects is particularly interesting for characterizing time intervals in documentary or news videos.
  • the segmentation into subjects according to the present application is based on the segmentation into sequences according to the approach described above.
  • the detection of a subject boundary is done using one or more of the following methods:
  • the character is a composite semantic object composed of the helmet, the face, the jacket, the shirt collar, ... It is interesting to be able to find it by any region, for example by the helmet, or to visualize and annotate the complete character and not just the face.
  • movement is calculated on each object or part of an object.
  • This distribution (for example the mean and the standard deviation of the motion vectors) is used to define composite objects from the same motion of the different parts.
  • the differences between the movement characteristics in different parts of a semantic object can also be used to describe a complex movement or an action of this object.
  • the movement of an arm in a character is not necessarily the average movement of the object.
  • the movement information can be used in 2 ways:
  • a third method based on the co-occurrence of the regions in the images is proposed: if the combination of regions, for example helmet-face-jacket, appears regularly in sequences of plans, then these regions can be associated with a significant probability of co-occurrence.
  • regions for example helmet-face-jacket
  • segmented semantic objects can be calculated, for example by known statistical methods used for indexing textual documents.
  • neighboring objects can be grouped into composite objects using their semantic value.
  • Example 4 Tracking an object through one or more scenes
  • the previous methods allow you to find the same semantic object through several successive images of a scene in a video, or even through several scenes located in different passages of this video.
  • This characterization is done using similarity methods between objects or visual regions described above, and taking into account the object's displacements: a similar object located in the same place in the following image is more likely to be the same object only if it is in an opposite part of the image.
  • Object tracking (or tracing objects) in a video is a known problem, which is the subject of several works by other authors. In our approach, the fact of having simple or composite semantic objects limits the number of objects to follow, on the other hand we use a specific method of finding similarity between several occurrences of an object, as we have described previously.
  • tracking objects can detect actions. For example, the fact that two objects move together then are separated, frequently translates the fact that one of the objects deposited the other during the interval. The fact that these are semantic objects increases the quality of this action detection.
  • Example 5 Selecting objects for storage on a user's system
  • the parts of the video stored on this storage system are time intervals or sets of images characterized by the presence of sequence descriptors or subjects verifying a request from the user or the system, or by the presence of visual or audio objects verifying such a request.
  • transition rules can be extracted from the usual associations made by a user, or from the transitions usually made by the user.
  • the purpose of the query used is to find an object (or a sequence or a subject) in which we find with a degree of relevance and more or less high a set of content characteristics present in the query or in a game. examples associated with the query.
  • the objects or time segments sought may be those for which either the user has expressed an interest, for example by consulting similar objects in previous sessions, or a similar user has expressed an interest.
  • Example 6 Use for a compression and composition system for video objects
  • a set of objects An object that interests a user more can be transmitted with a lower compression rate than another object (such as the background).
  • another object such as the background.
  • a video scene can be edited so as to juxtapose several objects from different scenes, or to delete certain objects.
  • the arrangement of a segmentation into semantic objects is useful.
  • the segmentation methods used allow access to a tree structure of objects, from the time interval or the image, then from composite objects, to the internal structure of these objects, as we have previously described.
  • This approach makes it possible to apply the methods of a video representation system by objects, in an efficient manner, and with a granularity which varies from the composite object to the fine structure.
  • the invention is not limited to the examples described and shown.
  • an audiovisual object comprising images, in a representation format describing in particular the position of the semantic objects contained in the audiovisual object, these semantic objects being characterized by a set of semantic characteristics;
  • an audiovisual object comprising images, in a representation format describing in particular the actions of the semantic objects contained in the audiovisual object; the use for selecting the objects of a stream of audiovisual objects, to be stored in the storage system of a user of this audiovisual stream for the purpose of subsequent access to these objects;

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
EP99901651A 1998-02-04 1999-01-28 Verfahren und verwendung zur räumlichen segmentierung von bildern in visuellen objekten Withdrawn EP1053533A1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9801308A FR2774493B1 (fr) 1998-02-04 1998-02-04 Procede de segmentation spatiale d'une image en objets visuels et application
FR9801308 1998-02-04
PCT/FR1999/000176 WO1999040539A1 (fr) 1998-02-04 1999-01-28 Procede de segmentation spatiale d'une image en objets visuels et application

Publications (1)

Publication Number Publication Date
EP1053533A1 true EP1053533A1 (de) 2000-11-22

Family

ID=9522598

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99901651A Withdrawn EP1053533A1 (de) 1998-02-04 1999-01-28 Verfahren und verwendung zur räumlichen segmentierung von bildern in visuellen objekten

Country Status (3)

Country Link
EP (1) EP1053533A1 (de)
FR (1) FR2774493B1 (de)
WO (1) WO1999040539A1 (de)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10017551C2 (de) * 2000-04-08 2002-10-24 Carl Zeiss Vision Gmbh Verfahren zur zyklischen, interaktiven Bildanalyse sowie Computersystem und Computerprogramm zur Ausführung des Verfahrens
US6859554B2 (en) * 2001-04-04 2005-02-22 Mitsubishi Electric Research Laboratories, Inc. Method for segmenting multi-resolution video objects
FR2864300A1 (fr) * 2003-12-22 2005-06-24 France Telecom Procede de localisation et de segmentation floue d'une personne dans une image video
CN113963051A (zh) * 2021-09-15 2022-01-21 国网四川省电力公司 基于视觉信息和特征提取的目标直径自动测量方法和系统
CN113989359A (zh) * 2021-09-15 2022-01-28 国网四川省电力公司 一种基于视觉信息的目标直径自动测量方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2894113B2 (ja) * 1992-11-04 1999-05-24 松下電器産業株式会社 画像クラスタリング装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO9940539A1 *

Also Published As

Publication number Publication date
WO1999040539A1 (fr) 1999-08-12
FR2774493A1 (fr) 1999-08-06
FR2774493B1 (fr) 2000-09-15

Similar Documents

Publication Publication Date Title
EP3707676B1 (de) Verfahren zur schätzung der installation einer kamera im referenzrahmen einer dreidimensionalen szene, vorrichtung, system mit erweiterter realität und zugehöriges computerprogramm
EP1316065B1 (de) Videobildsegmentierungsverfahren unter verwendung von elementären objekten
Eitz et al. Photosketcher: interactive sketch-based image synthesis
Truong et al. Scene extraction in motion pictures
WO2010059188A2 (en) Method for event-based semantic classification
Fei et al. Creating memorable video summaries that satisfy the user’s intention for taking the videos
Bartolini et al. Shiatsu: semantic-based hierarchical automatic tagging of videos by segmentation using cuts
Kim et al. Automatic color scheme extraction from movies
EP1053533A1 (de) Verfahren und verwendung zur räumlichen segmentierung von bildern in visuellen objekten
WO2004040472A2 (fr) Procede de selection de germes pour le regroupement d'images-cles
WO2004029833A2 (fr) Procede et dispositif de mesure de similarite entre images
Morikawa et al. Food region segmentation in meal images using touch points
Canini et al. Emotional identity of movies
Delezoide et al. Irim at trecvid 2011: Semantic indexing and instance search
Darji et al. A review of video classification techniques
Ciocca et al. Dynamic storyboards for video content summarization
Suran et al. Automatic aesthetic quality assessment of photographic images using deep convolutional neural network
EP1435054A2 (de) Verfahren für indexierung und vergleich von multimediadokumenten
Merler et al. Selecting the best faces to index presentation videos
Benini et al. Identifying video content consistency by vector quantization
Neuschmied et al. ÖWF-OD: A Dataset for Object Detection in Archival Film Content
Tapu et al. Salient object detection in video streams
Tapu et al. Multiresolution median filtering based video temporal segmentation
Wang et al. Identification and annotation of erotic film based on content analysis
Hu et al. Classification based on SVM of cultural relic videos' key frame

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000714

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE ES FR GB IT

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20040803