EP3289763A1

EP3289763A1 - Method for analysing a video sequence and equipment for implementing said method

Info

Publication number: EP3289763A1
Application number: EP16721199.4A
Authority: EP
Inventors: Pierre Larbier
Original assignee: Ateme SA
Current assignee: Ateme SA
Priority date: 2015-04-28
Filing date: 2016-04-20
Publication date: 2018-03-07
Also published as: WO2016174329A1; FR3035729B1; US10742987B2; FR3035729A1; US20180302627A1

Abstract

The invention relates to a method for analysing a set of images of a video sequence with a view to performing a processing of the sequence. The method comprises: determining, in the video sequence, a plurality of disjointed consecutive sub-sequences of at least one successive image according to the type of processing to be carried out and according to the content of the video sequence; and analysing the images of each sub-sequence determined in the video sequence.

Description

METHOD FOR ANALYZING A VIDEO SEQUENCE AND EQUIPMENT FOR

IMPLEMENTATION OF THE PROCESS

The present invention relates to a method for analyzing video sequences and a device for implementing this method. It applies in particular to the analysis of a video sequence for processing (video coding, compression, denoising, etc.) to be performed on the sequence.

The video data is generally subject to source coding to compress them in order to limit the resources required for their transmission and / or storage. There are many coding standards, such as H.264 / AVC, H.265 / HEVC and MPEG-2, that can be used for this purpose.

A video sequence comprising a set of images is considered. Many automatic processes on video sequences require prior analysis. This is the case, for example, of two-pass variable rate compression where a first pass corresponding to a prior analysis phase mainly makes it possible to determine the complexity of the sequence before encoding it properly in a corresponding second pass. at a treatment phase. The operations of the analysis phase are often as complex as those of the subsequent treatment phase. Therefore, the overall time of treatment with prior analysis is found to be significantly higher than that of a treatment without prior analysis.

In the case of video encoding processing, the duration of the prior analysis phase thus plays a preponderant role for the determination of the total duration necessary for the encoding, in other words of the processing speed of the processed sequence. It is therefore desirable to reduce as much as possible the duration of the prior analysis phase to achieve high processing speeds that are compatible with the requirements of video catalog processing containing for example several thousand films.

Existing techniques for analyzing video sequences prior to processing generally consist of successively scanning the images that make up the video sequence to be processed. At each image, statistics are extracted, and at the end of the analysis phase, a summary containing synthetic information about the sequence is calculated. The treatment phase that follows takes as input the video sequence and the summary calculated previously.

In the case of two-pass video compression, the processing first-pass analysis may also consist of a video encoding which makes it possible to extract for each image or subsequence of images a relationship between the video quality (taking into account the compression distortion) and the bit rate obtained after compression. During the second pass, all these relationships are used by the encoder to regulate the flow optimally.

In the case of a two-pass video denoiser, the prior analysis phase generally makes it possible to extract the characteristics of the noise in order to guide the denoising that is carried out during the second phase. A method of estimating the noise characteristics consists in removing it and then measuring statistics of the signal from which the noise has been extracted to determine a difference between the noisy signal and the signal after extraction of the noise characterizing the noise extracted. This operation generally presents the complexity of a complete denoising, and as in the case of video compression, almost doubles the processing time compared to a treatment done without prior analysis.

The document US 2010/0027622 proposes to reduce the spatial and / or temporal resolution of a video stream prior to a first encoding pass in order to reduce the calculation time in the particular case of a two-pass video encoding. .

Reducing the spatial resolution involves decreasing the size of the analyzed images, linearly or by selecting a portion of each image. The statistics extracted in the first pass are then extrapolated to obtain an estimate of what would have been achieved if the entire images had been analyzed.

When the temporal resolution is changed, the image flow is reduced in a regular or irregular manner and only the stored images are analyzed during the first pass. As before, the statistics of images that are not analyzed are extrapolated.

Both methods can be combined, thus reducing the analysis time even further. Both are based on the idea that it is possible to extrapolate missing data from the analyzed data.

However, it is observed that a sampling operation combined with an extrapolation can lead to statistics of a low quality, which leads to not using this method in the vast majority of practical applications.

There is thus a need for a method for analyzing an improved video sequence that does not have the disadvantages set forth above.

There is also a need for a video sequence analysis method improved in the reduction of the calculation time of a phase of analysis of the sequence in the context of a multi-pass processing.

An object of the present invention is to provide an improved method of analysis of a video sequence as part of a multi-pass processing.

According to a first aspect, there is provided a method of analyzing a set of images of a video sequence for a processing to be performed on the sequence, the method comprising determining in the video sequence a plurality consecutive subsequences disjoint from one or more successive images, and analyzing the images of each subsequence determined in the video sequence, wherein the subsequences are determined according to the type of processing to be performed and according to the content of the video sequence.

It is found that conventional methods of analysis ignore several important aspects, which greatly impedes their implementation:

In the general case, the images are not spatially homogeneous. Their center for example, which is the point of interest privileged, does not have the same complexity as their edges. The analysis results, which may include statistical data extracted after spatial subsampling, often have only a distant relationship with the results of an analysis performed on the initial images. A typical example is noise, which changes characteristics when the size of the images is reduced.

The temporal subsampling also poses the problem of homogeneity of the content which makes the extrapolation of the statistics difficult. Indeed, all the treatments based on the temporal coherence of the video sequences lose in effectiveness or even become inapplicable when the sequences are subsampled temporally. This is the case, for example, of the motion estimation of a video compressor which loses in precision as the time distance between the images increases.

Conventional processes can thus lead to an inaccurate flow control, which is difficult to use without prior learning, which explains the limited use that is made of these methods in practical applications.

The proposed method has the advantage of favoring temporal sub-sampling, in order to avoid the pitfalls inherent in spatial subsampling mentioned above.

The proposed method also advantageously takes into account the content of the video sequence for its analysis, the inventors having identified the problem homogeneity of the content mentioned above. The proposed method therefore facilitates the extrapolation of the statistics when a temporal subsampling is implemented during a phase of analysis of a video sequence. The proposed method can thus for example take into account the type of content (film, sport, musical show, etc.) analyzed in the context of the first pass of a multipass processing.

In addition, unlike the conventional method of US 2010/0027622, which is an integral part of a multi-pass encoder and is therefore not of general use, the proposed method advantageously takes into account the type of treatment to be performed. on the video sequence (compression, denoising, etc.) during a processing phase using the analysis results generated during an analysis phase using the proposed method.

The proposed method therefore has the advantage that it can be adapted according to the processing to be performed for a video sequence, taking into account for the analysis phase of the sequence of the type of processing (compression, filtering, etc.) carried out later. on the sequence.

The proposed method is particularly well, although not exclusively, for encoding or compressing a video sequence according to a H.264, H. 265, H. 262, MPEG-2, AVC, or HEVC. But it is also suitable for encoding images according to any video encoding scheme in two passes (an analysis pass and an encoding pass), or for any treatment of a video sequence in two passes.

In a particular implementation of the proposed method, the respective sizes of the subsequences and the respective gaps between two neighboring subsequences are determined according to the type of processing to be performed and according to the content of the video sequence.

In a particular implementation of the proposed method, the subsequences may have an identical size, with the exception of the last subsequence of the plurality of consecutive subsequences.

Preferably, the size of the last sub-sequence will be chosen greater than or equal to the size of the other subsequences, whether the other subsequences have a single size or not.

In a particular implementation of the proposed method, with the exception of the difference between the last subsequence of the plurality of consecutive subsequences and the subsequence adjacent to the last subsequence, the respective deviations between two neighboring subsequences may be chosen identical. In a particular embodiment, the proposed method further comprises generating, by extrapolation of the results of analysis of the subsequences of the video sequence, the results of analysis of the video sequence.

In a particular implementation of the proposed method, at least one of the subsequences may contain only one image.

In a particular implementation of the proposed method, the subsequences are further determined as a function of the analysis speed or the accuracy of the analysis.

According to a second aspect, there is provided a device for analyzing a set of images of a video sequence for a processing to be performed on the sequence, comprising an input interface configured to receive the video sequence, and a sequence analysis unit, comprising a processor operatively coupled to a memory, configured to determine in the video sequence a plurality of consecutive subsequences disjoint from one or more successive images, depending on the type of processing to be performed. perform and according to the content of the video sequence and analyze the images of each subsequence determined in the video sequence.

In another aspect, there is provided a computer program, loadable in a memory associated with a processor, and comprising portions of code for implementing the steps of the proposed method during the execution of said program by the processor, and a set of data representing, for example by compression or encoding, said computer program.

Another aspect relates to a non-transient storage medium of a computer executable program, comprising a data set representing one or more programs, said one or more programs including instructions for executing said one or more programs. by a computer comprising a processing unit operatively coupled to memory means and an input / output interface module, driving the computer to analyze the images of a video sequence according to the proposed method.

Other features and advantages of the present invention will become apparent in the following description of nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

FIG. 1 is a diagram illustrating the architecture of a video sequence analysis device according to an embodiment of the proposed method; FIG. 2 is a diagram illustrating the architecture of a video sequence processing device according to an embodiment of the proposed method;

FIG. 3 is a diagram illustrating the proposed method according to one embodiment;

FIGS. 4a and 4b are diagrams illustrating subsampling of a video sequence according to an embodiment of the proposed method;

FIGS. 5a, 5b, 5c and 5d are diagrams illustrating subsampling of a video sequence according to an embodiment of the proposed method;

- Figure 6 shows an embodiment of a computer system for implementing the proposed method.

In the following detailed description of embodiments of the invention, many specific details are presented to provide a more complete understanding. Nevertheless, those skilled in the art may realize that embodiments can be practiced without these specific details. In other cases, well-known features are not described in detail to avoid unnecessarily complicating the description.

By "subsampling" is meant here any operation carrying out the extraction or selection of subsequences within a video sequence, without limitation relating to the particular method or to a particular sub-sampling parameter (period, recurrence , etc.), unless expressly stated. Subsampling is thus distinguished from decimation, which assumes regular extraction of subsequences (for example, extraction of an image every n frames of the video sequence).

With reference to FIG. 1, the video sequence analysis device 100 receives at input 102 an input video sequence 101 to be analyzed as part of a multi-pass processing. The analysis device 100 comprises a controller 103, operatively coupled to the input interface 102, which controls a subsampling unit 104 and an analysis unit 105. The data received on the interface of input 102 are input to the subsampling unit.

The sub-sampling unit 104 sub-samples the video sequence according to the proposed method by determining in the video sequence a plurality of consecutive subsequences disjoint from one or more images successive. The unit 104 generates, after downsampling, data representing the plurality of consecutive and disjoint subsequences of images of the determined sequence, which are processed by the controller 103 which supplies, at the input of the analysis unit 105, the images of the plurality of disjoint sub-sequences of images of the video sequence selected by the sub-sampling unit 104. The analysis unit 105 generates, after analysis, images of the plurality of sub-sequences of images of the video sequence received as input of the statistical data 107 relating to the input video sequence 101, which is provided by the controller 103 on an output interface 106 of the analysis device 100.

In one or more embodiments of the proposed method, the generation of the statistical data 107 may include an extrapolation of extracted statistical data using image analysis results from the plurality of received image sequence subsequences. at the input to obtain analysis results for all the images of the video sequence, and not only for those that have actually been analyzed. By "extrapolation" is meant here any operation for generating statistical data for the images that have not been analyzed (ie the images that have not been selected during the subsampling of the sequence) using in particular extracted statistical data for the scanned images (i.e., the images of the plurality of image subsequences of the video sequence). The analysis device 100 can thus output a set of statistics that do not show the subdivision into subsequences of the initial video sequence. Subsequent processing can then be carried out using the results of the analysis phase generated by the analysis device 100 without having to know the division of the video sequence produced by the analysis device 100.

The controller 103 is configured to drive the subsampling unit 104 and the analysis unit 105, and in particular the inputs / outputs of these units. The architecture of the analysis device 100 illustrated in FIG. 1 is however not limiting. For example, the input interface 102 of the analysis device 100 could be operably coupled to an input interface of the subsampling unit 104. Similarly, the subsampling unit 104 could include an output operatively coupled to an input of the analysis unit 105, and the analysis unit 105 could include an output operatively coupled to the output interface 106.

The analysis device100 can be a computer, a computer network, an electronic component, or other apparatus having a processor operatively coupled to a memory, and, depending on the embodiment selected, a data storage unit, and other associated hardware elements such as a network interface and a hardware interface. support reader for reading a non-transitory removable storage medium and writing on such a medium (not shown in the figure). The removable storage medium may be, for example, a compact disc (CD), a digital video / versatile disc (DVD), a flash disk, a USB key, etc.

Depending on the embodiment, the memory, the data storage unit, or the removable storage medium contains instructions that, when executed by the controller 103, cause the controller 103 to perform or control the interface portions. 102, subsampling 104, analysis 105 and / or output interface 106 of the exemplary embodiments of the proposed method described herein. The controller 103 may be a component implementing a processor or a calculation unit for the image analysis according to the proposed method and the control of the units 102, 104, 105 and 106 of the analysis device 100.

The analysis device 100 can thus be put into the form of software which, when loaded into a memory and executed by a processor, implements the analysis of a video sequence according to the proposed method.

In addition, the analysis device 100 can be implemented in software form, as described above, or in hardware form, as an application specific integrated circuit (ASIC), or in the form of a combination of hardware elements. and software, such as for example a software program intended to be loaded and executed on a FPGA (Field Programmable Gate Array) type component.

Fig. 2 is a diagram illustrating a video sequence processing device. With reference to FIG. 2, the video sequence processing device 200 receives as input 202 an input video sequence 201 to be processed as part of a multi-pass processing. The analysis device 100 comprises a controller 203, operably coupled to the input interface 202, which controls a processing unit 204 and an analysis unit 205. The data received on the input interface 202 are inputted to the analysis unit 205. The output data 207 of the processing device 200 is generated on an output interface 206.

The controller assembly 203, analysis unit 205 and input and output interfaces output 202/206 forms an analysis unit that can correspond to the analysis unit 100 described with reference to FIG. 1, and configured to implement the proposed method.

The analysis unit 205 generates, after sub-sampling of the video sequence according to the proposed method, data representing a plurality of sub-sequences of images of the sequence determined by the sub-sampling, which are processed by the controller 203 or by a controller of the analysis unit, for analyzing the images of the subsequences. In one or more embodiments of the proposed method, the sub-sampling subsequence image analysis results may be extrapolated to generate analysis results of all the images in the video sequence.

The controller 203 is configured to drive the analysis unit 205 and the processing unit 204, and in particular the inputs / outputs of these units. In one or more embodiments, the analysis results produced by the analysis unit 205 are provided, under the supervision of the controller 205, at the input of the processing unit 204 for processing the video sequence 201 of entry in the context of a multi-pass processing, the analysis performed by the analysis unit 205 corresponding to a first pass and the processing performed by the processing unit corresponding to a second pass.

The architecture of the processing device 200 illustrated in FIG. 2 is however not limiting. For example, the input interface 202 of the processing device 200 could be operably coupled to an input interface of the analysis unit 205 and to an input interface of the processing unit 204. Similarly, the analysis unit 205 could include an output operably coupled to an input of the processing unit 204, and the processing unit 204 could include an output operably coupled to the output interface. 206 to produce data corresponding to the video sequence 201 processed.

In addition, the processing device 200 may be a multi-pass video encoder, a video denoising device, or any other multi-video video processing device in which at least one pass comprises an analysis of an input video sequence. prior to treatment.

The processing device 200 may be a computer, a computer network, an electronic component, or another apparatus comprising a processor operatively coupled to a memory, and, depending on the embodiment chosen, a storage unit data, and other elements associated hardware such as a network interface and a media player for reading and writing removable media on such media (not shown in the figure). Depending on the embodiment, the memory, the data storage unit, or the removable storage medium contains instructions that, when executed by the controller 203, cause the controller 203 to perform or control the interface portions. input 202, analysis 205, processing 204 and / or output interface 206 of the exemplary embodiments of the proposed method described herein. The controller 203 may be a component implementing a processor or a calculation unit for image processing comprising an analysis according to the proposed method and the control of the units 202, 204, 205 and 206 of the processing device 200.

In addition, the processing device 200 may be implemented in software form, as described above, or in hardware form, such as an application specific integrated circuit (ASIC), or in the form of a combination of hardware and software, such as a software program intended to be loaded and executed on a FPGA (Field Programmable Gate Array) type component.

Figure 3 shows a diagram illustrating the proposed method according to one embodiment.

One or more parameters relating to the type of processing to be performed on the input video sequence are input (301) to be taken into account during the sub-sampling phase. Similarly, one or more parameters relating to the content of the video sequence to be analyzed are entered (302) in order to be taken into account during the sub-sampling phase. The video sequence to be analyzed is inputted (303) for analysis. This video sequence is then subsampled (304) to determine a plurality of consecutive subsequences disjoint images of one or more successive images depending on the type of processing to be performed and the content of the video sequence, on the base of the parameter (s) relating to the type of processing and the parameter (s) relating to the content of the video sequence. The images of each subsequence thus determined are then analyzed (305) according to a predetermined analysis method corresponding to the processing to be performed on the sequence, to provide sub-sequence analysis results. In one or more embodiments, the image analysis of the subsequences can be followed by the extrapolation (306) of the results of this analysis to generate analysis results of all the images of the video sequence of 'Entrance. Figures 4a and 4b illustrate subsampling of a video sequence according to an embodiment of the proposed method.

FIG. 4a schematically represents a video sequence (400) comprising a set of N images distributed over a duration D _s between the first image (401) of the sequence and the last image (402) of the sequence.

Figure 4b shows the video sequence (400) after subsampling. Sub-sampling of the sequence (400) made a determination of consecutive and disjoint subsequences (403a, 403b, 403c, 403d) of the sequence, sometimes referenced in the present application under the terms "chunk", "sub- together "or" package ".

The subsequences (403a, 403b, 403c, 403d) determined are disjoint, in that two neighboring subsequences are respectively separated by "holes" (404a, 404b, 404c), each hole containing at least one image of the video sequence (400). These holes (404a, 404b, 404c) correspond to the groups of images of the initial sequence (400) which, according to the proposed method, will not be analyzed.

The subsequences (403a, 403b, 403c, 403d) determined are consecutive, in that they result from the sampling of a video sequence corresponding to a duration. The sampling of the video sequence may be performed according to the sequence of the sequence, to determine a sequence of subsequences, among which a first sequence corresponds to the beginning of the video sequence, and a last sequence corresponds to the end of the sequence. video sequence.

The subsequences determined by the sub-sampling are not necessarily equal in size, in that they do not contain, for example, not all the same number of images. In the example illustrated by FIG. 4b, the subsequences 403a, 403b and 403c are of equal size, this size being less than or equal to that of the last subsequence (403d) of the sequence (400).

Thus, in one or more embodiments, the method realizes a temporal division into "chunks" of the video sequence to be analyzed. Chunks are subsequences of consecutive images that do not necessarily contain the same number of images. Images that are not in chunks will not be scanned.

The respective sizes of the video sequence and the subsequences can be expressed in number of images, or in the form of a duration, the two measurements being linked by the number of images per second of the sequence video considered. It is the same of the difference between two neighboring subsequences, which can be, according to the implementation, expressed in number of images or as a duration.

Subsampling can be done on the basis of different parameters, depending on the implementation.

In one or more embodiments, the sub-sequences derived from the sub-sampling can be determined as a function of a subsequence size, for example expressed by a number of images, which is identical for all sub-sequences, if any. the exception of a subsequence (preferably the last one), and a sub-sampling frequency or a subsampling period defining the difference between two neighboring subsequences.

Alternatively, the subsequences can be determined according to a subsequence size and a subsampling rate. For example, sub-sequences can be determined by setting a single size (possibly with the exception of the last subsequence) equal to one image, and a sub-sampling rate of 1/6. The subsequences will therefore be determined by selecting an image every 6 images of the video sequence to be processed for analysis. For a video sequence lasting one hour, the analysis time will be reduced to 10 minutes. Analysis results for the unanalyzed (i.e., non-subsequence) portions may be inferred from the subsequence analysis results, for example by an extrapolation method.

In one or more embodiments, the sub-sequences derived from the sub-sampling can be determined as a function of a subsequence size, for example expressed by a number of images, which is identical for all sub-sequences, if any. except for a subsequence, a number of subsequences, and the size of the sequence.

Depending on the implementation, the proposed method may determine downsampling parameters taking into account the type of processing to be performed and the content of the video sequence to be processed.

In one or more embodiments, the size of the last subsequence, corresponding to the end of the video sequence to be processed, may be chosen greater than or equal to those of the other subsequences. Thus, depending on the size of the sequence, the proposed method could merge the last two subsequences in the case where the size of the last subsequence would otherwise be less than a predetermined threshold, the size of the other subsequences if it is unique, or the size of at least one other subsequence.

In the case of regular subsampling of the sequence, the position of a chunk in a subsampling period may vary. Figure 5a illustrates the case of a subsampling period (501) containing a chunk (500) positioned at the beginning of the period.

Figure 5b illustrates the case of a subsampling period (503) containing a chunk (502) positioned at the end of the period.

As explained above, the size of the last chunk of the sequence can be chosen greater than or equal to that of the other chunks. As illustrated in Figure 5c, in the case of chunks positioned at the beginning of the sampling period, the last chunk (504) can be extended to cover the entire period (505), so as to ensure a more accurate analysis of the end. of the video sequence.

It will be the same case of chunks positioned at the end of the period, or at any other position of the period, as illustrated in Figure 5d. The last chunk (506) can be extended to cover the entire period (507), so as to ensure a more accurate analysis of the end of the video sequence.

The distribution of the chunks as well as their size expressed in number of images makes it possible to deduce a speed gain of the analysis method. Indeed, if the computational load is proportional to the number of images analyzed (which is generally the case in most applications), the proportion of images constituting the chunks can directly provide an estimate of this gain.

In one or more embodiments, the size of the chunks and their distribution is dependent on the application (of the processing performed on the sequence). If, depending on the application, the analysis method uses temporal information such as the movement of objects in the scene, the chunks will preferably consist of a significant number of consecutive images. For example, a chunk can be considered large when it includes several thousand consecutive images. For example, as part of an analysis for a video compression application, some chunks might have about 3000 images. On the contrary, if the analysis processes the images independently, such as for example a brightness measurement, the chunks may be reduced in size, and for example be reduced to a single image.

In one or more embodiments, the analysis results, for example the statistical data generated by the analysis method, are extrapolated for sub-sequences of images that are not analyzed. It is preferable in this case that the skipped images be of the same nature as the analyzed images from the point of view of the statistics returned. Thanks to this extrapolation, the analysis method can provide the next processing stage with a complete set of statistics, that is to say including statistics corresponding to the images that have not been analyzed. Several methods of extrapolation are possible and it is advisable to choose those which are adapted to the type of statistics to be measured. For example, statistics can be extrapolated linearly. We will estimate that their values are placed on a line that connects the last image of a chunk to the first image of the next chunk.

As indicated above, the size of the last chunk will preferably be chosen greater than or equal to that of the other chunks, and for example large enough to correct the analysis errors of the input video sequence. The size of this end analysis window can indeed influence the quality of analysis of all the content, and it is preferable to choose a sufficient size to be able to compensate for the analysis errors at the beginning of the sequence attributable to the downsampling and at the chosen size for the start analysis windows of the sequence.

The sub-sampling operation of a video sequence taking into account the type of processing to be performed and the content of the video sequence is illustrated by the following two examples: the filtering of a video sequence in the context of video compression , and video compression in two passes.

Filtering a noise to denoise video in the context of video compression

One of the important aspects of assessing the video quality of a compressed video sequence is its visual homogeneity. The video quality will be higher as the defects inherent in the compression are uniform during the compressed video sequence. When it is intended to remove noise before performing the compression task, it should preferably be removed homogeneously throughout the video sequence denoising.

A denoising method consists in analyzing all the video sequence beforehand so as to identify noise characteristics (for example statistical noise characteristics), and then, in a second step, to filter using the characteristics collected. reduce the noise. This denoising method thus comprises a first phase of noise analysis, followed by denoising treatment, for example by noise filtering, which uses the results. of the preliminary analysis phase. Depending on the type of filtering performed, different characteristics can be acquired during the preliminary analysis phase, such as for example the noise energy and its spectral amplitude when it is considered to be a Gaussian additive white noise.

In one or more embodiments of the proposed method, the content type of the video sequence to be denoised is taken into account in order to determine in the video sequence a plurality of consecutive subsequences disjoined from one or more successive images, during the phase analysis prior to denoising.

For example, in the case where the content of the video sequence to be treated is a movie film, the noise to be removed will typically be film grain (film or artificially added in the case of digital cinema). The characteristics of this noise can be considered as being homogeneous throughout the film. On the other hand, it is not added in a linear way. It can also be more important in dark scenes than in bright scenes for example. In this case, nonlinear parameters can be calculated. They can allow a complete identification of the noise, using for example the technique described in the US patent application US 2004/005365. For example, non-linear parameters such as those transported in the H.264 and HEVC video compression standards may be calculated. These can indeed be used to model multiplicative noises with coefficients depending on the luminous intensity.

In this particular non-limiting case of a video sequence whose content is a film of cinema, it is possible to carry out, for the preliminary analysis of the sequence, measures of statistical characteristics of the noise, preferably at regular intervals, using -Sequences, or "chunks", of some images. Subsampling of the video sequence is performed by determining in the video sequence a plurality of consecutive subsequences disjoined, the subsequences comprising one or more successive images. These subsequences are determined according to the type of processing to be performed (in this example a noise filtering) and according to the content of the video sequence (in this example a movie film). In particular, it is possible, according to the proposed method, to use sub-sequences of reduced size to a few images, because for the example of a cinema film it can be considered that the characteristics of the noise are homogeneous on the whole video sequence. In a particular example of implementation, the use of "chunks" of 4 consecutive images allows to acquire precise characteristics of the noise of cinema films, while avoiding the very brutal local variations of brightness such as the flashes.

Similarly, one can choose for this particular case a constant difference between two adjacent subsequences, as well as the duration of this difference. The average duration of scenes in cinema films is in fact of the order of two seconds. Consequently, for example, performing a measurement every second makes it possible to acquire information on all the scenes of the film and thus obtain an accurate profile of the characteristics of the noise over the entire video sequence.

As discussed above, the size (expressed in number of frames or time unit) of the last subsequence, as well as the difference between the latter sub-sequence and the previous sub-sequence, may be chosen differently. other subsequences determined by downsampling.

For example, it is possible to choose chunks of size Nc, 1, possibly with the exception of the last chunk of the sequence that can be chosen of size greater than or equal to Nc, 2 in order to ensure a global quality of analysis as discussed below. above. Sampling can be done every second which will allow to calculate the total number of chunks according to the total duration D _s of the video sequence.

In one or more embodiments, an extrapolation of the chunks analysis results is performed. Different methods of extrapolation of the acquired statistics can, if necessary, be used. For example, one can consider that these statistics on the unanalyzed images are identical to those of the images of the ends of the adjacent chunks. This method is particularly valid when chunks are composed of few images, as is the case in the embodiment described above where they are limited to 4 images.

In an exemplary implementation, for a video sequence at 24 Hz (24 frames per second) corresponding to a film, the analysis method may be configured to perform measurements on 4 consecutive images (the chunks will therefore be of equal size 4 images, possibly with the exception of the last chunk) all 24 images that represent a second of film. The analysis is thus carried out 6 times faster than if all the images of the sequence were analyzed. In this embodiment, the division of the sequence is adapted to the content as to the type of processing performed later. Indeed, the period of chunks as their durations proposed above are well adapted to video sequences of film type.

In the case of other types of content, such as televised sporting events, a longer switching period will be chosen because the editing of this type of video sequence is much slower. As a result, the invention makes it possible to optimize the analysis speed according to the type of content, which is an important advantage in the case of batch processes.

Two-pass video compression

Two-pass video compression usually aims to ensure consistent video quality over the entire compressed video sequence, while maintaining a specified total size. The first pass is an analysis phase, having for example the purpose of determining the complexity of the sequence. The second pass is the actual compression phase. The latter uses the results of the analysis phase, in the above example the complexities obtained, to optimize the flow locally and maintain both a constant quality and a total size.

In one or more embodiments, the analysis phase may comprise determining each image a relationship between a compression parameter, a quantizer, and a corresponding bit rate. This relationship is generally obtained by compressing the video sequence, so that the analysis phase proves to be almost as expensive as the processing phase (compression) itself.

Several relationships between a compression parameter, a quantizer, and a corresponding bit rate have been proposed. One can for example consider functions of the type: Q _p = a ^■ \ og Flow) + b, where Q _p is a mean quantizer of the images of the video sequence to be compressed, and a and b are two parameters to be identified depending on the images of the video sequence to be encoded. Alternatively, one can also consider a relationship between a compression parameter, a quantizer, and a corresponding flow, constructed by logarithmic extrapolation from measured points Qp, Flow.

In one or more embodiments, the statistical summary produced at the end of the analysis phase is constituted for each image of a quantizer relationship, Flow. According to the proposed method, it is determined in the video sequence to be compressed a plurality of consecutive sub-sequences disjoint from one or more successive images. The images of each subsequence are analyzed to produce, for each image of each sub-sequence analyzed, Quantifier relationship, Flow.

In one or more embodiments, statistics of unparsed portions may be generated by extrapolation using the subsequence analysis results. In the above example, several methods of generating analysis results for the parts of the sequence that have not been analyzed are possible. For example, analysis results can be generated by linearly interpolating Quantifier relations, Flow obtained by the analysis of sub-sequence images. This amounts to linearly varying the parameters a and b of the relation Q _p = a-log (Flow) + b described above.

The proposed method makes it possible to considerably reduce the complexity of this analysis phase because, during the analysis phase, subsampling is carried out which takes into account the type of treatment to be performed (in this example a compression) and the contents of the video sequence. In one or more embodiments, subsampling parameters are selected taking into account the type of processing to be performed and the content of the video sequence to be processed.

For example, in the case where the video sequence to be compressed is a movie film (for example in the context of a video-on-demand application), the relationship between the compression parameter (s) and the bit rate obtained varies little. In other words, the films have substantially homogeneous characteristics, at least locally. Therefore, it is not necessary to perform the analysis on the entire film because sub-sequences of a few seconds correctly distributed enough to provide the necessary information for the second pass.

The video sequence containing a film may for example be cut into subsequences of duration equal to ten seconds, the analysis being performed only every minute of content, which determines the difference between two neighboring subsequences. Analysis results (for example complexity) for areas that are not analyzed can be obtained by extrapolation of the sub-sequence analysis results, which introduces an inaccuracy that can be compensated during the treatment phase. In one or more embodiments, to facilitate this compensation and to obtain the required size, the extreme end of the sequence is completely analyzed, choosing a size for the last subsequence larger than that of the other subsequences. For example, the last subsequence can be chosen corresponding to the last minutes of the sequence, in the case where the content thereof is a film. This last sequence can in particular be composed of the last two minutes of the sequence, ie about 3000 images.

In the case of a two-hour film, the choice of the sub-sampling parameters proposed above to take into account the content of the sequence as part of a compression process (sub-sequence sizes and differences between two neighboring sub-sequences) leads to analyze only about 22 minutes of the film, only 18% of the images. The gain in terms of calculation time is therefore about 5.5.

Studies of hundreds of film films have shown that this method of cutting allows for an indistinguishable quality of a complete analysis.

In one or more embodiments, an objective of speed of the analysis, precision of the analysis, or a criterion representing a compromise between the speed and accuracy of the analysis is also taken into account. . It is indeed possible to modify the distribution of the zones analyzed to further accelerate the analysis phase, or improve the accuracy of the analysis. For example, a compromise is made between the speed of the analysis and its accuracy, and the distribution of the subsequences determined according to the type of processing to be performed and according to the content of the video sequence to be processed is modified according to whether we want to focus on the speed of analysis, or the quality of it.

When the analysis speed is preferred, the duration of the respective subsequences determined according to the type of processing to be performed and according to the content of the video sequence to be processed can be reduced, for example by a predetermined factor corresponding to the gain. of analysis speed sought. In a particular embodiment, this can be achieved by reducing the duration of the sub-sequences analyzed by half, for example by reducing the duration of the subsequences from 10 s to 5 s.

Conversely, in the case where the accuracy of the analysis is preferred, which may be necessary when the contents are very inhomogeneous (as is the case, for example, of television image sequences) the respective duration sub-sequences determined according to the type of processing to be performed and according to the content of the video sequence to be processed can be increased, for example by a predetermined factor corresponding to the gain of analysis accuracy sought. In a particular embodiment, this can be obtained by increasing the duration of the sub-sequences analyzed or by increasing the duration of the final sub-sequence.

Embodiments of the method of analyzing a video sequence may be, at least in part, implemented on virtually any type of computer, regardless of the platform used. For example, as shown in Fig. 6, a computer system (600), which may correspond to the video sequence analysis and video sequence processing units shown in Figs. 1 and 2, or be operably coupled to these elements, comprises a data processing unit (601), which comprises one or more processors (602), such as a central processing unit (CPU) or another hardware processor, an associated memory (603) (for example, a random access memory (RAM), a cache memory, a flash memory, etc.), a storage device (604) (for example a hard disk, an optical disk such as a CD or a DVD, a flash memory key, etc.), and many other elements and features typical of current computers (not shown).

The data processing unit (601) also comprises an input / output interface module (605) which controls the different interfaces between the unit (601) and input and / or output means of the system (600). ). The system (600) may indeed also include input means, such as a keyboard (606), a mouse (607), or a microphone (not shown). Additionally, the computer (600) may include output means, such as a monitor (608) (for example, a liquid crystal display (LCD) monitor, an LED display monitor, or a tube monitor cathodic (CRT)). The computer system (600) can be connected to a network (609) (for example, a local area network (LAN), a wide area network (WAN) such as the Internet, or any other similar type of network) via a network interface connection (not shown). One skilled in the art may realize that there are many different types of computer systems (for example, a desktop computer, a laptop, or any other computer system capable of executing computer readable instructions. ), and the aforementioned input and output means may take other forms, currently known or developed later.

In general, the computer system (600) comprises at least the minimal means of processing, input and / or output necessary to practice one or more embodiments of the proposed analysis method. For example, the processor (602) is adapted to be configured to execute a computer program including portions of code for implementation an analyzer, configured to perform the analysis of an input video sequence according to the different embodiments of the proposed analysis method. The storage device (604) will preferably be chosen to store the data corresponding to the results of the analysis and processing of the video sequence.

Those skilled in the art may realize that one or more elements of the aforementioned computer system (600) may be at a remote location and be connected to other elements on a network. In addition, one or more embodiments may be implemented on a distributed system having a plurality of nodes, where each portion of the implementation may be located on a different node within the distributed system. In one or more embodiments, the node corresponds to a computer system. Alternatively, the node may correspond to a processor with associated physical memory. The node may also correspond to a processor with shared memory and / or shared resources. In addition, software instructions for performing one or more embodiments may be stored on a computer-readable non-transitory medium such as a compact disc (CD), floppy disk, tape, or any other readable storage device. computer.

Depending on the embodiment chosen, certain acts, actions, events or functions of each of the methods described in this document may be performed or occur in a different order from that in which they were described, or may be added, merged or not to be performed or not to occur, as the case may be. In addition, in some embodiments, certain acts, actions or events are performed or occur concurrently and not successively.

Although described through a number of detailed exemplary embodiments, the proposed encoding method and equipment for implementing the method include various alternatives, modifications, and enhancements that will be apparent to the man of the art. art, it being understood that these various variants, modifications and improvements are within the scope of the invention, as defined by the following claims. In addition, various aspects and features described above may be implemented together, or separately, or substituted for each other, and all of the various combinations and sub-combinations of aspects and features are within the scope of the invention. 'invention. In addition, some of the systems and equipment described above may not incorporate all the modules and features described for the preferred embodiments.

Claims

1. A method of analyzing a set of images of a video sequence for processing on the sequence, the method comprising: determining in the video sequence a plurality of consecutive subsequences disjoint from one or more successive images; analyzing the images of each subsequence determined in the video sequence; wherein the subsequences are determined according to the type of processing to be performed and the content of the video sequence.

2. Method according to claim 1, wherein the respective sizes of the subsequences and the respective gaps between two neighboring subsequences are determined according to the type of processing to be performed and according to the content of the video sequence.

The method of claim 2, wherein, except for the last subsequence of the plurality of consecutive subsequences, the subsequences have an identical size.

4. The method of claim 3, wherein the size of the last subsequence is chosen greater than or equal to the size of the other subsequences.

The method according to any one of claims 2 to 4, wherein, except for the difference between the last subsequence of the plurality of consecutive subsequences and the subsequence adjacent to the last sub-sequence. sequence, the respective gaps between two neighboring subsequences are identical.

The method of claim 1, further comprising: generating, by extrapolation of the sub-sequence analysis results of the video sequence, video sequence analysis results.

The method of any one of the preceding claims, wherein at least one of the subsequences contains a single image.

The method of any of the preceding claims, wherein the subsequences are further determined based on the analysis rate or the accuracy of the analysis.

An apparatus for analyzing a set of images of a video sequence for processing to be performed on the sequence, comprising:

an input interface configured to receive the video sequence;

a sequence analysis unit, comprising a processor operatively coupled to a memory, configured to: determine in the video sequence a plurality of consecutive subsequences disjoint from one or more successive images, depending on the type of treatment to be performed and according to the content of the video sequence; o analyze the images of each subsequence determined in the video sequence.

Computer program, loadable in a memory associated with a processor, and including portions of code for the implementation of the steps of a method according to any one of claims 1 to 8 during the execution of said program by the processor.

1 1. Data set representing, for example by compression or encoding, a computer program according to claim 10.

A non-transitory storage medium of a computer executable program, comprising a data set representing one or more programs, said one or more programs including instructions for, when executing said one or more programs by a computer comprising a processing unit operatively coupled to memory means and an input / output interface module, driving the computer to analyze the images of a video sequence according to the method of any one of claims 1 to 8 .