WO2010089488A1

WO2010089488A1 - Method for merging audiovisual programs, and corresponding device and computer program product

Info

Publication number: WO2010089488A1
Application number: PCT/FR2010/050104
Authority: WO
Inventors: Gael Manson; Sid Ahmed Berrani
Original assignee: France Telecom
Priority date: 2009-02-06
Filing date: 2010-01-25
Publication date: 2010-08-12
Also published as: EP2394246A1

Abstract

The invention relates to a method for merging segments of an audiovisual stream previously clipped into a plurality of program segments to be merged. According to the invention, such method includes, for at least one first and one second segment from said plurality of segments, a step of computing a set of descriptors, and includes a step of obtaining at least one piece of information representative of the fact that said at least one first and said at least one second segments belong to a same audiovisual program on the basis of data representative of said previously computed descriptors.

Description

A method of merging audiovisual program segments, device, and corresponding computer program product. 1. DOMAIN OF THE INVENTION

The present invention relates to the field of audiovisual content analysis.

The present invention relates more particularly to a method for fusing previously segmented audiovisual contents.

Today television channels provide continuous content and their number is growing. In France for example, the background of the National Audiovisual Institute (INA) responsible for archiving French broadcasts increases by five hundred and forty thousand hours each year and in the end more than four million hours of programs are available . In addition, a current French viewer can choose between more than four hundred hours of content per day on the only digital terrestrial television channels. Faced with this gigantic volume of audio-visual data, new needs and services have emerged such as the archiving of these data, carried out in France by the INA, the control of the broadcasts, in particular for the Superior Council of the Audiovisual, the freelance advertising or non-linear access to the desired content, that is to say without constraint of the broadcast time. All these services are based on an indexing of audiovisual streams, consisting of a segmentation of the streams to extract programs and inter-programs (advertising sequences in particular) broadcast continuously. These treatments are extremely expensive when done manually. Automatic techniques are needed to exploit the large number of audiovisual streams available. These automatic segmentation techniques use an analysis of the contents of the audiovisual streams or use the information on the programs provided by the television channels, which information may take the form of electronic program guides. Many different methods have been proposed for segmenting audiovisual streams. The invention uses segmented audiovisual streams.

Subsequently, a particular technical vocabulary is used. For there to be no ambiguity, it should be noted that: - an audiovisual stream represents audio and video content broadcast continuously by a television channel or broadcaster of this type; a program is a program broadcast in the audiovisual stream. It may consist of several parts separated by advertising breaks.

A program can be a movie, an episode of a series, a game, a newspaper, the weather, a clip, a magazine or other categories. an inter-program is an element diffused between two programs or in an advertising break. This can be an advertisement, a trailer for an upcoming program, a pub "jingle" (generic ad and end of commercial breaks), a channel or broadcaster logo, or a sponsor preceding the beginning or following the end of a program.

Segmentation techniques have the particularity of segmenting a program into several segments. This poses a problem when one wishes to reconstitute the program in question for the needs of the aforementioned services.

2. PRIOR ART

In connection with FIG. 1, a general diagram of automatic segmentation techniques of an audiovisual stream is presented.

Segmentation techniques are generally based on the detection (step 101) of the inter-program areas 13 because the inter-programs are short sequences that share many common properties. In particular, the inter-programs are broadcast several times in the stream. These properties make cross programs much easier to detect than long programs (A, B, and C). These are heterogeneous (series, film, emissions, etc.) and do not generally share common properties. The portions of the stream (A, B, C) that separate the interspecific detected areas thus form segments that correspond to segments of program parts also referred to hereinafter as program segments. The audiovisual stream is then segmented (etapelO2) into three segments (A, B and C). The problem is then to decide which segments of the audiovisual stream must merge to form the same program in order to find the original structure of the audiovisual stream broadcast. In Figure 1, for example, the program segments B and C must merge (they are two segments of the same program, in this case a movie) while the program segments A and B must remain separate (the segment A represents a newscast). A solution for automatic program reconstruction from segments has been proposed. This solution is based on the use of metadata associated with the audiovisual stream.

According to this method, when the metadata on the programming grid is available (the Electronic Program Guide (EPG) or the Event Information Table (EIT), English for "Event Information Table"), Matching between the times mentioned in the EPG or the EIT and the detected schedule of program segments can be used for segment labeling and thus for program reconstruction. This approach was used through tagging based on the recovery study between the flow program segments and the program schedules inscribed in the EPG.

A more elaborate approach, which nevertheless follows the same general principle as the one mentioned above, was also used in X. Naturel's thesis work ("Automatic structuring of television video streams." PhD Thesis, University of Rennes I , 2007). It relies on the use of a Dynamic Time Warping (DTW) procedure that looks for the best match between stream segments and the information in the EPG (or EIT). This is an overall optimization that assigns a cost to the adjustments needed to find a match between segments and 1ΕPG (OR the EIT). The selected match is the one that induces the lowest cost.

The disadvantage of these approaches is their dependence on metadata. On the one hand, the latter are not always available and on the other hand they are unreliable and their inaccuracy may be around a few minutes. In particular, the short programs indicated in the EPG are regularly absent and often staggered by more than five minutes.

Moreover, when no metadata on the programming grid is available, the problem is even more complex and no technique has been proposed to identify and merge the different segments, for example consecutive ones of the same program.

3. SUMMARY OF THE INVENTION

The invention does not have these disadvantages of the prior art. Indeed, the invention relates to a method for merging segments of an audiovisual stream previously cut into a plurality of program segments to be merged. According to the invention, such a method comprises, for at least a first and at least a second segment of said plurality of segments, a step of calculating a set of descriptors and a step of obtaining at least one representative information a membership of said at least one first and at least one second segment to the same audiovisual program based on data representative of said previously calculated descriptors.

Thus, the invention makes it possible to solve the problems that are not solved by the solutions of the prior art. Indeed, unlike the prior art fusion method, the invention does not use the data provided by the electronic program guide to decide on the merger of two segments belonging to the audiovisual stream. On the contrary, the method of the invention calculates descriptors of segments. From these descriptors extracted from the two segments, the method of the invention comprises a step of obtaining the representative information of membership. Thus it is no longer necessary to use the electronic program guide to merge two segments of the audiovisual program. Only the information contained in the segments is used to define the membership of these segments in the same audiovisual program.

According to a particular characteristic of the invention, said at least one first and at least one second segment are consecutive segments.

According to a particular embodiment of the invention, said set of descriptors comprises: a first subset of at least one descriptor specific to said at least one first segment; a second subset of at least one descriptor specific to said at least one second segment. Thus, the invention makes it possible to take into account the similarities of the segments. Thus, the method of the invention makes it possible to maximize the probabilities of fusion between two segments of the same program. Based on segment-specific descriptor sets, the invention makes it possible to somehow determine particular characteristics of these segments. These particular features can then be used to determine a difference between segments. A subset contains a defined number of descriptors that correspond to a determined number of characteristic measures of a segment.

According to a particular characteristic of the invention, said set of descriptors comprises a subset of descriptors calculated using data belonging to said at least one first segment and auditing at least a second segment, said common descriptors.

Thus, the invention makes it possible to take into account the similarities of the segments. To this end, the invention introduces specific descriptors, called common descriptors, which result from a calculation carried out on the data of the first and second segment. By way of illustration, an example of a common descriptor is the number of images or of a plane common to the two segments. Thus, the probabilities of recognizing two segments belonging to the same program are improved. According to a particular embodiment of the invention, said method comprises at least one step of calculating a distance separating a descriptor from said first subset of eigen descriptors and a corresponding descriptor of the same type from said second subset of eigen descriptors, delivering a vector of at least one distance. Thus, the invention makes it possible to create a set of distances between the descriptors of the same types of the first and second segments. These distances constitute a vector of distances. The smaller the distance between two descriptors, the more the characteristics of the two segments relating to this descriptor will be similar. According to a particular characteristic of the invention, said descriptors are of different types, said types belonging to the group comprising: the ratio between a number of key images of a segment and a duration of this segment; a three-dimensional color histogram in the RGB color space of the average color on all keyframes of a segment; a three-dimensional color histogram in the color space

RGB of the intersection of colors on all keyframes of a segment; the ratio of the number of faces detected on the segment and a duration of a segment; the average and standard deviation of the number of faces detected by keyframes of a segment; the maximum size of the faces detected on all the keyframes of a segment; - average and standard deviation of face size detected by keyframes a segment; the number of similar keyframe groups in a segment; the number of similar keyframe groups containing keyframes belonging to the at least one first segment and the at least one second segment audit; the average and standard deviation of the number of similar images in groups of similar images.

According to a particular embodiment of the invention, said distances separating said descriptors belong to the group comprising: the absolute value of the difference; the Euclidean distance; the correlation distance according to the Pearson correlation coefficient; the distance from Chi-Square; the intersection distance which is the sum of the respective minimums between the respective values of two distributions; the distance from Bhattacharyya.

According to a particular embodiment of the invention, said method comprises, prior to the merger, a learning phase during which a classifier learns to differentiate different membership classes of audiovisual programs.

According to a particular characteristic of the invention, said obtaining step comprises: a step of transmitting said distance vector and / or said descriptors common to a classifier previously trained; a supervised classification step of said at least one first and at least one second segment as a function of said distances of said distance vector and / or said common descriptors.

Thus, the invention makes it possible to merge the segments in an automated and simple manner while ensuring that the segments are correctly merged. In a specific embodiment of the invention, the classifier can be a binary classifier SVM type to provide a decision of membership of said segments to the same audiovisual program.

The invention also relates to a device for merging segments of an audiovisual stream previously cut into a plurality of program segments to be merged.

According to the invention, such a device comprises, for at least a first and at least a second segment of said plurality of segments, means for calculating a set of descriptors and means for obtaining at least one representative information. a membership of said at least one first and at least one second segment to an identical audiovisual program based on data representative of said previously calculated descriptors.

According to another aspect, the invention also relates to a computer program product downloadable from a communication network and / or stored on a computer-readable medium and / or executable by a microprocessor, and comprising program code instructions for the computer. execution of the fusion process as described above. 4. LIST OF FIGURES

Other features and advantages of the invention will emerge more clearly on reading the following description of a preferred embodiment, given as a simple illustrative and nonlimiting example, and the appended drawings, among which: FIG. 1 , already commented, presents a synoptic of the general techniques of segmentation of an audiovisual flow; Figure 2 generally illustrates the method of fusion of the invention; FIG. 3 illustrates a mode of implementation of the fusion method of the invention for three consecutive segments; FIG. 4 illustrates another mode of implementation of the fusion method according to the invention; FIG. 5 illustrates another embodiment of the fusion process according to the invention; FIG. 6 describes a fusion device according to the invention. 5. DETAILED DESCRIPTION OF THE INVENTION

Recall of the principle of invention

The invention proposes to merge the different segments forming a program using descriptors of these segments. In contrast to the solutions of the prior art, these descriptors do not depend on data external to the stream or stream metadata, but on audiovisual data comprising the stream. The descriptors can therefore relate to both the video content of the stream and the audio content thereof. Note that the invention does not exclude the use of metadata provided by the EPG or ETI when such data exist. In such an embodiment, the invention fully combines these techniques using EPG or ETI to significantly improve the accuracy of the fusions and to reduce the time required for the fusion. The general principle of the The invention thus relies on the calculation of descriptors for the segments that compose the stream, on the calculation of data associated with these descriptors and on the provision of these data and descriptors to a particular component that will provide a response as to the membership two segments to the same program. In relation to FIG. 2, the steps of the method of the invention are presented. It is considered that the audiovisual stream has been segmented beforehand according to an approach for detecting suitable inter-program areas. Thus, the method of the invention uses a stream segmented into a plurality of program segments 20 consisting for example of segments A, B, and following. The method of the invention then performs a merging of the segments by: calculating 201 a set of descriptors 21. These descriptors 21 are calculated for at least two segments of the audiovisual stream, said first and second segments. As is explained later, the calculated descriptors are of different types; - estimating 203 the belonging of the first and second segments to the same program using the data from these descriptors 21. This estimation step 203 can be performed using automatic classification means, such as classifiers. Other appropriate means can also be used to obtain an estimate of this membership.

The descriptors that are implemented in the context of the invention are of two kinds: the clean descriptors and the common descriptors.

A clean descriptor is a value, or a data structure comprising several values representing the result of a calculation carried out on a segment: it can for example be the duration of the segment, the number of images of this segment, the sound volume of the segment, a number of plans, a spectral analysis of this segment, etc. This is segment-specific data. The proper descriptors are therefore of different types. According to the invention, a specific number of eigen descriptors per segment is calculated, each own descriptor being of a particular type.

A common descriptor is a value, or a data structure comprising several values representing the result of a calculation carried out on the two (or more) segments which one wishes to know if they belong to the same program. This is for example a number of identical images between the two segments, an estimate of an identity of a background sound, etc.

The common descriptors are therefore also of different types. According to the invention, a determined number of common descriptors are calculated on the two (or more) segments which one wishes to know if they belong to the same program, each common descriptor being of a particular type. In at least one embodiment of the invention, the eigen descriptors of each of the two segments whose membership in the same program is to be tested are then used to determine distances. These are distances between two descriptors belonging to two given segments, for example consecutive. These distances make it possible to establish a proximity of the two segments with respect to a given type of descriptor, such as for example a color distribution. These distances can be expressed in the form of integer values, real values or vectors comprising several dimensions.

A certain number of distances are calculated. The number of distances calculated between two segments may be greater or less than the number of descriptors calculated for these two segments.

Distances separating the descriptors include: the absolute value of the difference; the Euclidean distance; the correlation distance according to the Pearson correlation coefficient (used for example between two color histograms); the Chi-Square distance (used for example between two color histograms); the intersection distance which is the sum of the respective minimums between the respective values of two distributions (used for example between two color histograms); the distance of Bhattacharyya (used for example between two histograms of colors).

These distances are known and are applied to determine data representative of the segment descriptors that are subsequently used to determine whether segments belong to the same program or not.

FIG. 3 shows the implementation of the method of the invention for three segments of an audiovisual stream: segments A, B and C are extracted from the audiovisual stream by a segmentation method. Descriptors (Ds {A, B}, Ds {B, C}) are then calculated (steps 201, 202) for the segments: they can be descriptors specific to the segment (for example descriptors of A, B or C) or common descriptors (i.e., descriptors that use both A and B or A and C data).

The descriptors (Ds {A, B}, Ds {B, C}) are then provided to a classifier C1 which estimates (steps 203 and 204) the membership of the segments in the same program and decides on the separation (N) or the fusion (Y) of the two segments. Note that in Figure 3, the segments are consecutive and are compared in pairs, that is to say that the segment A is compared with the segment B (step 203) and the segment B with the segment C (step 204).

Of course, it is also possible to provide the classifier C1 with non-consecutive segment descriptors. For example, it would be quite relevant to provide the classifier C1 with data from the descriptors of A and C. If the classifier C1 concludes that A and C belong to the same program, then it will be easy to conclude that B also belongs to the same program as A and C. This reduces the calculation time needed to determine the membership of the segments to the programs.

The classifier C1 uses the data from the descriptors to estimate the membership of the two segments in the same program and to decide on the separation (N) or the merger (Y) of the two segments to which these data belong. Here again, it is not necessary to provide the classifier with data from two segments. It is quite relevant in some cases to provide classifier data directly to the classifier from several segments in one step. Such a case, illustrated with reference to FIG. 4, can occur when, for several segments, there is a presumption of belonging to the same program, for example because of the use of data from a single source. electronic program guide. Therefore, if it is assumed that the program could be segmented into several segments (three or four for example), it is envisaged to calculate (step 201 ') descriptors for these three or four segments (Ds {A, B, C}) and provide them together to the classifier. The classifier uses (step 203 ') for its part the data from the descriptors to decide on the separation (N) or the merger (Y) of the two segments to which these data belong.

Similarly, as illustrated in Figure 5, the two segments are not necessarily consecutive. The method of the invention is implemented in the same manner as above. Descriptors for segments A and C (Ds (A, C)) are calculated (step 201 ") and the classifier used (step 203") for its part the data from the descriptors to decide the separation (N) or the merger (Y) of the two segments to which these data belong. If, in the case of FIG. 5, classifier C1 decides to merge segments A and C, then it can be concluded that segments A, B and C belong to the same program. Such an approach makes it possible, in certain cases, to reduce the number of calculations required and therefore to increase the processing speed.

Thus, the invention proposes a method for deciding whether two program segments, for example consecutive segments of an audiovisual stream, must or must not merge to form the same program. The method chooses to merge the segments by analyzing only the audiovisual content and the properties of the segments.

Among the advantages of the invention, it is more particularly noted that: when the programming metadata are available, the merging of the segments of the same program prior to the mapping step with the programming grid makes it possible to simplify significantly this mapping and improve its performance. Indeed, the number of matching possibilities is reduced and a simple recovery study can achieve good performance; - when the programming metadata are not available, the merging of the segments of the same program makes it possible to extract all the long programs from the stream, which makes it possible to significantly reduce the cost of the manual intervention necessary for automatically feed a catalog of "TVoD" (from "Television on Demand" for "Television on Demand").

Subsequently, we present the case of a segment merger in which the flow descriptors used take into account the characteristics of at least some images that make up the segments. It is clear, however, that the invention is not limited to this particular implementation, but may also use descriptors that take into account the audio characteristics of the segments.

Description of an embodiment

In this embodiment, an implementation of the method of the invention is presented by using several descriptors that make it possible to determine whether two consecutive segments of the same audiovisual stream belong to the same program. In this embodiment of the invention, a binary classifier SVM type (of the English "Support Vector Machine") is used. Any other type of classifier can however be used. The binary classifier has the advantage of being simple and of being adapted to decision-making in the context of the invention since it renders a binary type response.

A classifier is a mathematical function that associates a class of membership based on input data. Learning a classifier is a method of estimating mathematical function from a sample of examples of membership class associations. A classifier is said to be binary when it allows the determination of a binary result (of the yes / no type).

In this embodiment of the invention, the binary classifier makes it possible, from the data derived from the descriptors, to determine whether the two segments whose data from the descriptors are analyzed belong to the same audiovisual program. This determination is possible because, in a previous phase, using a set of segments for which the merger decision was manually taken, the binary classifier was trained to determine on the basis of the descriptors whether two consecutive segments should be or do not merge to form the same program. In one embodiment of the invention, it is also possible to use several classifiers. This type of approach may be of interest in a wide variety of program types that require differential analysis by classifiers with different learning outcomes.

As already mentioned, in order to merge the consecutive segments of the same program, the process is based on the study of the contents of the different program segments. Descriptors are thus calculated from segment data, and using a supervised classification technique, a decision is made for merging or separating two consecutive segments.

Own descriptors and common descriptors In this embodiment of the invention, the descriptors considered for each segment are selected from their ability to characterize an audiovisual stream segment.

Own descriptors.

In this embodiment, the following clean descriptors are used. As a first step, keyframes are identified for each segment using a keyframe detection method. A first descriptor is used for each segment: it is the number of key images of a segment divided by the duration of the segment.

The main colors of the video segments make it possible to roughly differentiate the video segments. For example, parts of a dark film will differentiate from sporting events such as football matches or the green color of the lawn will predominate.

In this embodiment of the invention, two color histograms are used to characterize the segments: a histogram of the average colors is calculated by accumulating all the colors of each key image of a segment and is then normalized by the duration of the segment. This is the second descriptor of its own; a color intersection histogram is calculated by calculating the colors common to all key images in a segment. It is also normalized by the duration of the segment. This is the third descriptor of its own.

In order to calculate the similarity of each histogram between two segments, in this embodiment of the invention, the histogram correlation distance, the "Chi-Square" distance and the histogram intersection distance are used. . Thus, from two descriptors, it is possible to determine three distances with different values.

The size and number of faces in a segment also makes it possible to distinguish short segments such as the weather containing only one person from longer segments such as the newspaper involving many people.

Similarly, the detection of faces makes it possible to distinguish a magazine from an animal documentary. In this embodiment of the invention, the face detection technique presented in C. Garcia and M. Delakis. Convolutional Face Finder: A Neural Architecture for Fast and Robust Face Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (11), 1408-1423. 2004 is used.

This detection is performed on key images of the segment. The result of this detection provides, for a keyframe of a segment, enclosing rectangles for each detected face. An enclosing rectangle is a part of an image. For a given image, the number the position, and the size of the enclosing rectangles present on this image indicates the number, position and size of the faces detected.

The segments are then described by the following four descriptors: the total number of faces detected divided by the duration of the segment; the mean and standard deviation of the average number of faces detected by key segment images; the maximum size of a face detected on all keyframes of the segment, ie the largest face size in the keyframes of the segment; the mean and standard deviation of the maximum face size detected by key frames of the segment;

Common descriptors

In this embodiment of the invention, an identification of common points in two segments is carried out. For example, the repetition of many nearly identical pieces of a segment in another segment characterizes important common points between two segments. For example, the repetition of the shots with the presenter characterizes the game shows. This embodiment of the invention uses the identification of these repetitions to provide additional data to the classifier.

For two segments A and B for which almost identical pieces are sought, key images are identified and described by a 64-bit signature, for example. For this we use the same method as the summary descriptors in the document "A non-supervised approach for repeated sequence detection in TV broadcast streams". In Signal Processing: Image Communication, special issue on "Semantic Analysis for Interactive Multimedia Services", 2008, Volume 23, Number 7, pages 525-537.

Characterization of a segment Then, from the 64-bit signatures, groups of key images of the segments A and / or B containing all the keyframes at a Hamming distance d of at least one key image of the group are constructed. .

The segments are described by the following values relating to the specific and common descriptors: - the total number of groups calculated on a segment; the average number of keyframes per group on a segment; the total number of groups containing images of both a first and a second segment; the average number of keyframes per group containing images of a first segment and a second segment.

These values are provided, in the form of a vector, to the classifier. On this basis, the classifier makes a response to validate the membership segments to the same program. Alternative methods of merger decision

The method of the invention has been presented in the context of the implementation of a single binary classifier which makes it possible to determine whether segments belong to the same program. Other approaches are of course possible. They can be based on a general implementation of perceptron, of which the classifiers are part. They can also be based on any other approach that makes it possible to obtain information relating to the membership of the segments in the same audiovisual program according to the data of the previously calculated descriptors. Other optional features and benefits

In relation to FIG. 6, an embodiment of a fusion device according to the invention is presented.

Such a melting device comprises a memory 61, a processing unit 62 equipped for example with a microprocessor, and driven by the computer program 63, implementing the method according to the invention.

At initialization, the code instructions of the computer program 63 are for example loaded into a RAM memory before being executed by the processor of the processing unit 62. The processing unit 62 receives as input the stream audio visual cut into several segments. The microprocessor of the processing unit 62 implements the steps of the merger process, according to the instructions of the computer program 61 to decide on the membership of the different segments in the same program. For this, the merging device comprises, in addition to the memory 61, for at least a first and at least a second segment of the plurality of segments, means for calculating a set of descriptors of different types and means for obtaining information representative of a membership segments to the same audiovisual program based on data representative of said previously calculated descriptors. These means are controlled by the microprocessor of the processing unit 62.

Claims

A method for merging segments of an audiovisual stream previously cut into a plurality of program segments to be merged, characterized in that it comprises, for at least a first and at least a second segment of said plurality of segments, a step of calculating a set of descriptors; and in that it comprises a step of obtaining at least one piece of information representing a membership of said at least one first and at least one second segment in the same audiovisual program as a function of data representative of said previously calculated descriptors.

2. Method according to claim 1, characterized in that said set of descriptors comprises: a first subset of at least one descriptor specific to said at least one first segment; a second subset of at least one descriptor specific to said at least one second segment.

3. Method according to any one of claims 1 or 2, characterized in that said set of descriptors comprises a subset of descriptors calculated using data belonging to said at least one first segment and said at least one second segment, said common descriptors.

4. Method according to claim 2 or 3, characterized in that it comprises at least one step of calculating a distance separating a descriptor of said first subset of clean descriptors and a corresponding descriptor of the same type from said second subset of clean descriptors, delivering a vector of at least one distance.

5. Method according to claim 1, characterized in that said descriptors are of different types, said types belonging to the group comprising: the ratio between a number of key images of a segment and a duration of this segment; a three-dimensional color histogram in the color space

RGB of the average color on all keyframes of a segment; a 3-dimensional color histogram in the RGB color space of the intersection of colors on all keyframes of a segment; the ratio of the number of faces detected on the segment and a duration of a segment; the average and standard deviation of the number of faces detected by keyframes of a segment; the maximum size of the faces detected on all the keyframes of a segment; the average and standard deviation of face size detected by keyframes of a segment; the number of groups of similar key images of a segment; the number of similar keyframe groups containing keyframes belonging to the at least one first segment and the at least one second segment audit; the average and standard deviation of the number of similar images in groups of similar images.

6. Method according to claim 4, characterized in that said distances separating said descriptors belong to the group comprising: the absolute value of the difference; the Euclidean distance; the correlation distance according to the Pearson correlation coefficient; the distance from Chi-Square; the intersection distance which is the sum of the respective minimums between the respective values of two distributions; the distance from Bhattacharyya.

7. Method according to claim 1, characterized in that it comprises, prior to the merger, a learning phase during which a classifier learns to differentiate different membership classes of audiovisual programs.

8. Method according to claim 4, characterized in that said obtaining step comprises: a step of transmitting said distance vector and / or said descriptors common to a classifier previously trained; a step of supervised classification of said at least one first and at least one second segment according to said distances of said distance vector and / or said common descriptors.

9. Device for merging segments of an audiovisual stream previously cut into a plurality of program segments to be merged, characterized in that it comprises, for at least a first and at least a second segment of said plurality of segments, means for calculating a set of descriptors; and in that it comprises means for obtaining at least one piece of information representing a membership of said at least one first and at least one second segment in an identical audiovisual program as a function of data representative of said previously calculated descriptors.

10. Computer program characterized in that it comprises program code instructions for implementing the merging method according to claim 1, when this program is executed by a processor.