EP4029252A1

EP4029252A1 - Ai prediction for video compression

Info

Publication number: EP4029252A1
Application number: EP20785794.7A
Authority: EP
Inventors: Thomas Guionnet; Josselin COZANET
Original assignee: Ateme SA
Current assignee: Ateme SA
Priority date: 2019-09-11
Filing date: 2020-09-09
Publication date: 2022-07-20
Also published as: FR3100679A1; US20220345715A1; FR3100679B1; WO2021048498A1

Abstract

A method for encoding a first image within a first set of images, in which the first image is cut into blocks, each block being encoded according to one among a plurality of coding modes, is proposed, which comprises, for a current block of the first image, the determination, on the basis of at least one second image distinct from the first image and previously encoded according to an encoding sequence of the images of the first set of images, of a prediction of a feature of the current block in one or more third images from the first set of images distinct from the first image and not yet encoded according to the encoding sequence, and the use of the prediction to encode the current block while minimizing a flow-distortion criterion.

Description

Title: IMAGE ENCODING PROCESS AND EQUIPMENT FOR IMPLEMENTING THE PROCESS

The present invention relates to an image encoding method and a device for implementing this method. It applies in particular to the encoding of images of a video stream.

[0002] Video data is generally subject to source coding aimed at compressing it in order to limit the resources required for its transmission and / or storage. There are many coding standards, such as H.264 / AVC (for "Advanced Video Coding"), H.265 / HEVC (for "High Efficiency Video Coding") and MPEG-2 (developed by the Motion Picture Experts Group), which can be used for this purpose.

We consider a video stream comprising a set of images. In conventional encoding schemes, the images of the video stream to be encoded are typically considered according to an encoding sequence, and each is divided into sets of pixels which are also processed sequentially, for example starting at the top left and ending at the bottom. to the right of each image.

[0004] The encoding of an image of the stream is thus carried out by dividing a matrix of pixels corresponding to the image into several sets, for example blocks of fixed size 16 x 16, 32 x 32 or 64 x 64, and by encoding these blocks of pixels according to a given processing sequence. Some standards, such as H.264 / AVC, provide for the possibility of breaking down blocks of size 16 x 16 (then called macro-blocks) into sub-blocks, for example of size 8 x 8 or 4 x 4, in order to perform encoding processing with finer granularity.

[0005] The existing video compression techniques can be divided into two main categories: on the one hand, so-called “intra” compression, in which the compression processing operations are carried out on the pixels of a single image or video frame, and d on the other hand, so-called “Inter” compression, in which the compression processing operations are carried out on several images or video frames. In Intra mode, the processing of a block (or set) of pixels comprises typically a prediction of the pixels of the block carried out using causal pixels (previously encoded) present in the image being encoded (called “current image”), in which case one speaks of “Intra prediction”. In the Inter mode, the processing of a block (or set) of pixels typically comprises a prediction of the pixels of the block performed using pixels from previously encoded images, in which case one speaks of “Inter prediction” or of. "Motion compensation".

These two types of coding are used in existing video coded (MPEG2, H.264 / AVC, HEVC) and are described for the HEVC coded in the article entitled "OverView of the High Efficiency Video Coding (HEVC) Standard By Gary J. Sullivan et al., IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, No. 12, Dec. 2012.

This exploitation of spatial and / or temporal redundancies makes it possible to avoid transmitting or storing the value of the pixels of each block (or set) of pixels, by representing at least some of the blocks by a residual of pixels representing the difference (or the distance) between the prediction values of the pixels of the block and the actual values of the pixels of the predicted block. The pixel residual information is present in the data generated by the encoder after transform (eg, DCT type) and quantization in order to reduce the entropy of the data generated by the encoder.

[0008] It is desirable to reduce as much as possible the additional information generated by the pixel prediction and present at the output of the encoder in order to increase the efficiency of an encoding / compression scheme at a given level of distortion. Conversely, one can also seek to reduce this additional information to increase the efficiency of an encoding / compression scheme at a given encoder output rate level.

[0009] A video encoder typically makes a choice of encoding mode corresponding to a selection of encoding parameters for a set of pixels processed. This decision-making can be implemented by optimizing a bit rate and distortion metric, the encoding parameters selected by the encoder being those which minimize a bit rate-distortion criterion. The choice of mode encoding then has an impact on the performance of the encoder, both in terms of bit rate gain and visual quality.

[0010] A video encoder can thus be designed so as to produce the highest possible quality while respecting a set of constraints corresponding to the use case considered. In the case of the broadcasting of television programs, the main constraints imposed on video encoders are the bit rate, the processing time, the latency (or delay), the characteristics of the video source, the energy consumption and the cost. Processing time is critical for real-time broadcasting. Combined with the other constraints, it emerges that a video encoder designed for real-time broadcasting implements a compromise between quality and computational resources. The increase in available computing resources improves quality. Likewise, the throughput imposes a limit on the quality that can be achieved. Increasing the bit rate thus improves the quality.

The quality also depends on the latency. Also, in order to maximize the encoding quality, some video encoders implement a technique called "lookahead", according to which images which enter the encoding process are temporarily stored in a buffer memory before being actually encoded. . This storage of images before processing by the video encoder makes it possible to implement tools for analyzing and processing images from which the encoder can benefit, but introduces a latency in the encoding process induced by the setting. in memory.

[0012] In systems where latency is allowed, encoding efficiency can be maximized by the use of image storage before encoder processing. Conversely, when latency is prohibited, coding efficiency is limited by the lack of memory. However, in the case of a "live" broadcast, it is desirable to have both low latency, low bit rate and high video quality.

[0013] There is thus a need for an image encoding method meeting the drawbacks set out above.

summary The present disclosure improves the situation.

According to a first aspect, there is proposed a method of encoding a first image in a first set of images, in which the first image is cut into blocks, each block being encoded according to one of a plurality of modes encoding, the method comprising, for a current block of the first image: determining, on the basis of at least one second image distinct from the first image and previously encoded according to an encoding sequence of the images of the first set of images , a prediction of a characteristic of the current block in one or more third images of the first set of images distinct from the first image and not yet encoded according to the encoding sequence; and use the prediction for encoding the current block by minimizing a rate-distortion criterion.

Advantageously, the proposed method uses, for the encoding of a current block of a current image (during encoding), a prediction of a characteristic of the current block in one or more images not yet encoded, which makes it possible to avoid having recourse, in whole or in part, to the technique known as "lookahead" of buffering of images not yet encoded upstream of encoding, so as to perform an analysis using these images to calculate the characteristic. The latency generated by this storage can thus be reduced, or even completely eliminated.

[0017] In one or more embodiments, the prediction can be determined based on at least one image already encoded in the set of images. These embodiments advantageously make it possible to determine the prediction solely on the basis of images already encoded, without it being necessary to keep in memory images not yet encoded for the purposes of determining the prediction.

[0018] In one or more embodiments, the characteristic may include a cost of propagating the current block in one or more third images of the first set of images.

In one or more embodiments, the characteristic may include a measurement of the presence of a transition in the current block. [0020] In one or more embodiments, the characteristic can comprise a measurement of the evolution in the current block of the quantity of information over time.

[0021] In one or more embodiments, the prediction of the current block can further be determined on the basis of at least a fourth image distinct from the first image and not yet encoded according to the encoding sequence. These embodiments advantageously make it possible to determine the prediction not only on the basis of images already encoded, but also on the basis of images not yet encoded insofar as these are available for determining the prediction, ie. that is, where they are stored in memory. The determination of the prediction can thus be refined in cases where it is available by means of image storage of the set of images to be encoded which have not yet been encoded.

[0022] In one or more embodiments, the prediction of the characteristic of the current block is determined using an artificial intelligence algorithm such as, for example, a supervised learning algorithm.

In one or more embodiments, the proposed method can then comprise a learning phase of a neural network performed on a second set of images, the learning phase comprising, for a current block of a current image of the second set of images: determining, on the basis of at least one image of the second set of images distinct from the current image and not yet encoded according to an encoding sequence of the images of the second set of images, a reference prediction of the characteristic of the current block in an image of the second set of images distinct from the current image and not yet encoded according to the encoding sequence of the second set of images; and performing a learning phase of the neural network on the basis of input data, and on the basis of the reference prediction of the current block included in reference data, to generate a prediction model of the characteristic of the block current in the images of the second set of images not yet encoded according to the encoding sequence.

In one or more embodiments, the plurality of encoding modes may include at least one time correlation prediction type encoding mode using a plurality of images from a set of images to be encoded. The method can then further comprise, for a current block of a current image of the second set of images: determining a motion estimation vector of the current block, the motion estimation vector pointing to a block correlated with the current block in an image of the second set of images distinct from the current image and previously encoded according to the predefined sequence for encoding the images of the second set of images; and the learning of the neural network can be further performed based on the motion estimation vector of the current block included in the input data.

In one or more embodiments, the learning of the neural network can be performed on the basis of the current image included in the input data and / or on the basis of an image of the second set of 'images distinct from the current image and previously encoded according to the encoding sequence of the images of the second set of images, included in the input data.

[0026] In one or more embodiments, the neural network can be chosen to be convoy utif.

In one or more embodiments, the prediction of the characteristic of the current block can be determined using the prediction model, based on the first image and based on the at least one second image included in input data of the prediction model.

In one or more embodiments, the plurality of encoding modes may include at least one time correlation prediction type encoding mode using a plurality of images from the first set of images, the proposed method then comprising in addition: determining a motion estimation vector of the current block, the estimation vector of movement pointing to a block correlated with the current block in an image of the first set of images distinct from the first image and previously encoded according to the predefined sequence for encoding the images of the first set of images; and the prediction of the current block can be determined using the prediction model, based on the motion estimate vector included in the input data of the prediction model.

The proposed method is particularly suitable, although not exclusively, for encoding or compressing an image of a sequence of images according to a scheme of the H.261, MPEG-1 Part 2 type, H.262, MPEG-2 Part 2, H.264, AVC, MPEG-4 Part 2, H. 265, HEVC or SHVC (Scalable HEVC). But it is also suitable for encoding images, for example of a video sequence, according to any video encoding scheme operating on images cut into blocks, in particular in which the blocks are encoded according to a plurality of encoding modes. comprising at least one coding mode of the type with prediction by temporal or spatial correlation.

The proposed method is particularly suitable, although in a non-limiting manner, for the encoding or the compression of an image of a sequence of images corresponding to one or more multimedia contents distributed live, using a technology for broadcasting multimedia content on the Internet, for example according to an "HLS" type scheme (standing for "HTTP Live Streaming", the acronym "HTTP" denoting the protocol "HyperText Transfer Protocol"), "MSS" (for "Microsoft Smooth Streaming"), "HDS" (for "HTTP Dynamic Streaming"), "MPEG-DASH" (for "MPEG Dynamic Adaptive Streaming over HTTP"), or "HAS "(From the English" HTTP Adaptive Streaming "), or by using a television broadcasting technology for multimedia content on a television broadcasting network, for example according to a" DVB "type scheme (from the English" Digital Video Broadcast ”), or of the“ ATSC ”type (standing for“ Advanced Television Systems Commi head ').

The proposed method can advantageously be implemented in any device configured for the encoding or the compression of an image of a sequence of images, in particular corresponding to one or more contents. multimedia distributed live, for example according to an MPEG DASH, HLS, HDS, MSS, or HAS type scheme, such as, without limitation, any computer, server, broadcast head end equipment, broadcast network equipment , etc.

According to a second aspect, there is proposed an image encoding device comprising an input interface configured to receive a first image of a set of images, and an image encoding unit, coupled operatively at the input interface, and configured to split the first image into blocks, and to encode each block according to one of a plurality of encoding modes according to the proposed method.

According to another aspect, there is proposed a computer program, loadable into a memory associated with a processor, and comprising portions of code for the implementation of the steps of the method proposed during the execution of said program by the processor, as well as a set of data representing, for example by compression or encoding, said computer program.

Another aspect relates to a non-transient storage medium for a computer executable program, comprising a set of data representing one or more programs, said one or more programs comprising instructions for, during the execution of said one. or several programs by a computer comprising a processing unit operatively coupled to memory means and to an input / output interface module, causing the computer to encode a first image cut into blocks according to the proposed method.

Brief description of the drawings

Other features and advantages of the present invention will become apparent from the following description of non-limiting embodiments, with reference to the accompanying drawings, in which:

Fig. 1

[0036] [Fig. 1] is a diagram illustrating an example of encoder architecture, according to one or more embodiments. Fig. 2

[0037] [Fig. 2] is a block diagram illustrating the storage technique.

Fig. 3 [0038] [Fig. 3] is a diagram illustrating an example of image reordering for encoding with B images.

Fig. 4a

[0039] [Fig. 4a] illustrates an example of the configuration of a storage unit memory buffer comprising 3 parts. Fig. 4b

[0040] [Fig. 4b] illustrates an example of an encoder integrating an analysis module of the "MB Tree" type and using a storage unit whose memory has a structure comprising three memory buffers.

Fig. 4c [0041] [Fig. 4c] illustrates an example of a structure for storing images from a source video sequence within a storage unit.

Fig. 5

[0042] [Fig. 5] is a diagram illustrating the method proposed according to one or more embodiments. Fig. 6a

[0043] [Fig. 6a] is a diagram illustrating an image encoding system configured for implementing the method proposed according to one or more embodiments.

Fig. 6b [0044] [Fig. 6b] is a diagram illustrating a system configured for implementing the method proposed according to one or more embodiments. Fig. 6c

[0045] [Fig. 6c] is a diagram illustrating a system configured for implementing the method proposed according to one or more other embodiments.

Fig. 6d [0046] [Fig. 6d] is a diagram illustrating a system configured for implementing the method proposed according to one or more embodiments.

Fig. 6th

[0047] [Fig. 6e] is a diagram illustrating a system configured for implementing the method proposed according to one or more embodiments. Fig. 7a

[0048] [Fig. 7a] is a diagram illustrating a system configured for the implementation of a learning phase according to one or more embodiments.

Fig. 7b

[0049] [Fig. 7b] is a diagram illustrating a system configured for implementing the method proposed according to one or more embodiments.

Fig. 8

[0050] [Fig. 8] is a diagram illustrating an example of the architecture of a device configured for the implementation of the method proposed according to one or more embodiments. Description of the embodiments

In the following detailed description of embodiments of the invention, many specific details are presented to provide a more complete understanding. Nevertheless, one skilled in the art can appreciate that embodiments can be practiced without these specific details. In other cases, well-known characteristics are not described in detail to avoid unnecessarily complicating the description. The present description refers to functions, motors, units, modules, platforms, and diagram illustrations of the methods and devices according to one or more embodiments. Each of the functions, motors, modules, platforms, units and diagrams described can be implemented in hardware, software (including in the form of on-board software ("firmware"), or "middleware"), microcode, or any combination of these. In the case of implementation in software form, functions, motors, units, modules and / or diagram illustrations may be implemented by computer program instructions or software code, which may be stored or transmitted on a computer readable medium, including a non-transient medium, or a medium loaded in memory of a generic, specific computer, or of any other apparatus or programmable data processing device to produce a machine, so that the Computer program instructions or the software code executed on the computer or the programmable data processing apparatus or device, constitute means of implementing these functions.

Embodiments of a computer readable medium include, but are not limited to, computer storage media and communication media, including any medium facilitating the transfer of a computer program from a location to another. By “computer storage medium (s)” is meant any physical medium that can be accessed by a computer. Examples of computer storage media include, but are not limited to, flash memory disks or components or any other flash memory devices (eg, USB keys, memory sticks, memory sticks, key disks), CD-ROMs or other optical data storage devices, DVDs, magnetic disk data storage devices or other magnetic data storage devices, data memory components, RAM, ROM, EEPROM, memory cards ("smart cards"), memories of the SSD type ("Solid State Drive"), and any other form of support which can be used to transport or store or memorize data or data structures which can be read by a processor. computer. In addition, various forms of computer readable medium can transmit or carry instructions to a computer, such as a router, a gateway, a server, or any data transmission equipment, whether it is wired transmission (by coaxial cable, optical fiber, telephone wires, DSL cable, or Ethernet cable), wireless (by infrared, radio, cellular, microwave), or virtualized transmission equipment (virtual router, virtual gateway, end of virtual tunnel, virtual firewall). The instructions may, depending on the embodiments, include code of any computer programming language or computer program element, such as, without limitation, assembly languages, C, C ++, Visual Basic, HyperText Markup Language (HTML), Extensible Markup Language (XML), HyperText Transfer Protocol (HTTP), Hypertext Preprocessor (PHP), SQL, MySQL, Java, JavaScript, JavaScript Object Notation (JSON), Python, and bash scripting.

In addition, the terms "in particular", "for example", "example", "typically" are used in the present description to denote examples or illustrations of nonlimiting embodiments, which do not necessarily correspond to preferred or advantageous embodiments over other possible aspects or embodiments.

By "server" or "platform" is meant in the present description any point of service (virtualized or not) or device operating data processing, one or more databases, and / or communication functions. data. For example, and in a non-limiting manner, the term "server" or the term "platform" can refer to a physical processor operably coupled with associated communication, database and data storage functions, or make reference to a network, group, set or complex of processors and associated data storage and networking equipment, as well as an operating system and one or more database system (s) and application software in support of the services and functions provided by the server. A computing device can be configured to send and receive signals, by wireless and / or wired transmission network (s), or can be configured for processing and / or storage of data or signals, and can therefore function as a server. Thus, equipment configured to operate as a server may include, by way of non-limiting examples, dedicated rack-mounted servers, desktops, laptops, service gateways (sometimes referred to as "boxes" or " residential gateway ”), multimedia decoders (sometimes called“ set-top boxes ”), integrated equipment combining various functionalities, such as two or more of the functionalities mentioned above. Servers can vary widely in their configuration or capabilities, but a server will typically include one or more central processing unit (s) and memory. A server can also include one or more mass memory equipment (s), one or more power supply (s), one or more wireless and / or wired network interface (s), one or more several input / output interface (s), one or more operating system (s), such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or an equivalent.

By "multimedia content" is meant in the present description any audio and / or video, audiovisual, music, sound, image and interactive graphical interface, and any combination of these types of content.

The terms "network" and "communication network" as used in the present description refer to one or more data links which can couple or connect equipment, possibly virtualized, so as to allow the transport of data. electronic devices between computer systems and / or modules and / or other electronic devices or equipment, such as between a server and a client device or other types of devices, including between wireless devices coupled or connected by a wireless network, for example. A network can also include a mass memory for storing data, such as a NAS (in English "network attached storage", a SAN (in English "storage area network"), or any other form of media readable by a computer. or by a machine, for example A network may include, in whole or in part, the Internet network, one or more local area networks (LANs), one or more WAN type networks (in English). English “wide area networks”), wired type connections, wireless type connections, cellular type, or any combination of these different networks. Similarly, subnets may use different architectures or be compliant or compatible with different protocols, and interoperate with larger networks. Different types of equipment can be used to make different architectures or different protocols interoperable. For example, a router can be used to provide a communications link or a data link between two LANs that would otherwise be separate and independent.

The terms "operably coupled", "coupled", "mounted", "connected" and their various variants and forms used in the present description refer to couplings, connections, assemblies, which can be direct or indirect , and include in particular connections between electronic equipment or between portions of such equipment which allow operations and functions as described in the present description. In addition, the terms "connected" and "coupled" are not limited to physical or mechanical connections or couplings. For example, an operative coupling may include one or more wired connection (s) and / or one or more wireless connection (s) between two or more devices that allow simplex and / or duplex communication links. between the equipment or portions of the equipment. According to another example, an operational coupling or connection may include a wired and / or wireless link coupling to allow data communications between a server of the proposed system and other equipment of the system.

The terms “application” or “application program” (AP) and their variants (“app”, “webapp”, etc.) as used in the present description correspond to any tool which operates and is operated by means of from a computer, to provide or execute one or more function (s) or task (s) for a user or another application program. In order to interact with an application program, and to control it, a user interface can be provided on the equipment on which the application program is implemented. For example, a graphical user interface (GUI) can be generated and displayed on a screen of the user equipment, or an audio user interface can be rendered to the user using a loudspeaker, a headset or audio output.

The term "current" as used in the present description in connection with an image ("current image") or an image portion, such as for example a block ("current block") for an image cut into blocks, refers to an image or portion of an image being processed, such as encoding, analysis, compression, etc. In particular, the terms “current image” refer to an image being encoded from among the images of a set of images, the encoding of the current image possibly comprising the implementation of the method proposed on the. current image, and the terms “current block” refer to a block being encoded in a current image divided into blocks whose encoding is implemented according to an encoding sequence of the blocks of the image, the encoding of the current block which may include the implementation of the method proposed on the current block. In the present description, the current image may be associated with a time index, for example the index "t", to distinguish it from the images already encoded (which may be associated with time indices less than t, such as "t -1 ”,“ t-2 ”, ...,“ tk ”for a set of k images), and images not yet encoded (which could be associated with temporal indices greater than t, such as“ t + 1 "," T + 2 ", ...," t + n "for a set of n images).

In the present description, the terms “real-time” distribution, “linear mode” distribution, “linear TV mode” distribution, “dynamic mode” distribution and “live” or “live mode” distribution are used interchangeably to denote the distribution in dynamic mode or type (in English "live" or "dynamic" mode) of multimedia content in a content distribution system to terminals, including in particular the distribution of the content to as it is generated, as opposed to the distribution of previously generated content, on a user's access request (distribution on access request or distribution in or of the “static” type - in English, “static”), such as, for example, content recorded on a server, and made available to users by a video on demand (VOD) service.

In the present description, the terms “live content” refer to content, for example of multimedia type, distributed, for example according to an OTT distribution mode, in dynamic mode (as opposed to the static distribution mode). Live content will typically be generated by a television channel, or by any type of television media, and may also be broadcast over a multimedia content distribution network, in addition to being made available on content servers in a distribution system. OTT.

In the present description, the terms "video input signal" or "input video stream" refer to a signal carrying data corresponding to a set of images supplied at the input of a device used for the implementation of the proposed method. The set of images may be referred to as "source video footage".

The proposed method can be implemented by any type of image encoder of a set of images using a coding mode of the prediction type by temporal and / or spatial correlation, such as for example a video coded. Compliant with H.264 / AVC, H.265 / HEVC, and / or MPEG-2 standards.

A video codec typically comprises a set of video sequence processing and representation tools. The specification of the video codec generally makes it possible to design a decoder to perform the transformation of a compressed binary train in accordance with the specification of the codec into a so-called “reconstructed” video. The aim of the video encoder is to transform a so-called “source” video into a binary train conforming to the specification of the codec. Depending on the implementation chosen for the encoder, the same source content can be represented in different ways by the same codec. Not all representations are created equal. For example, for a given target rate, different representations will give different qualities. Likewise, for a given target quality, different representations will give different bit rates. FIG. 1 illustrates an example of an encoder architecture, according to one or more embodiments.

Referring to Figure 1, the encoder 100 receives at input 109 an input video stream 101 comprising a plurality of images to be processed to perform the encoding of the stream. Encoder 100 includes a controller 102, operably coupled to input interface 109, which drives a motion estimation unit 110 and a time correlation prediction unit 104 to perform time correlation predictions (such as that, for example, Inter and Skip predictions, so-called “Merge” and “Affine” predictions, depending on the video encoding scheme used), as well as a spatial correlation prediction unit 103 for performing correlation predictions spatial (such as, for example, Intra predictions). The data received on the input interface 109 is inputted to the spatial correlation prediction 103, motion estimation 110, time correlation prediction 104 and controller 102 units. The controller assembly 102, unit d The motion estimation 110, time correlation prediction unit 104, and spatial correlation prediction unit 103 form an encoding unit 111 operatively coupled to the input interface 109.

The spatial correlation prediction unit 103 generates spatial correlation prediction data 107 (for example Intra prediction data) which is provided at the input of an entropy encoder 105. The motion estimation unit 110 for its part generates motion estimation data which is supplied to the controller 102 as well as to the time correlation prediction unit 104 for the purposes of the prediction in Inter mode. The time correlation prediction unit 104 generates time correlation prediction data (eg Inter or Skip prediction data) which is input to entropy encoder 105. For example, data supplied to the decoder for prediction by temporal correlation may include a residual of pixels and information regarding one or more motion vectors. This information relating to one or more motion vectors may comprise one or more indices identifying a vector predictor in a list of predictor vectors known to the decoder. The data supplied to the decoder for a Skip type prediction will typically not include any pixel residual, and may also include information identifying a predictor vector in a list of predictors known to the decoder. The list of predictor vectors used for Inter type coding will not necessarily be identical to the list of predictor vectors used for Skip type coding. The spatial correlation prediction data can include an Intra coding mode. For a current block of a current image, the entropy encoder 105 receives spatial correlation prediction data 107 or temporal correlation prediction data 106.

The controller 102 calculates encoding data 108, which may include, in one or more embodiments, a residual of pixels as well as data relating to the partitioning of the image into elementary entities, after transformation and quantization, which are also supplied at the input of the entropy encoder 105. The data relating to the selected encoding mode can be included in the prediction data 106-107, by temporal or spatial correlation as a function of each encoded block.

The controller 102 is configured to drive the spatial correlation prediction unit 103 and the time correlation prediction unit 104 in order to control the prediction data which are respectively supplied at the input of the entropy encoder 105 by the unit. correlation prediction 103 and the time correlation prediction unit 104. Depending on the encoding scheme implemented by the encoder 100, the controller 102 may further be configured to select from among different types of prediction mode. (for example Intra mode, Inter mode or Skip mode depending on the encoding modes implemented in the encoding unit 111) that for which prediction data will be transmitted to the entropy encoder 105. Thus, the encoding scheme may include a decision for each image set processed aiming to choose the type of prediction for which data will be transmitted to the entropy encoder 105. This choice will typically be made n work by the controller 102, to decide on the application of the prediction mode (for example Inter, Intra or Skip prediction mode) to the block being processed. This makes it possible to control the sending to the entropy encoder of spatial correlation prediction data 107 or else of temporal correlation prediction data 106 as a function of the decision taken by the controller 102.

The encoder 100 can be a computer, a computer network, an electronic component, or another device comprising a processor operably coupled to a memory, as well as, depending on the embodiment chosen, a control unit. data storage, and other associated hardware elements such as a network interface and a media drive for reading removable storage media and writing to such media (not shown in the figure). The removable storage medium can be, for example, a compact disc (CD), a digital video / versatile disc (DVD), a flash disc, a USB stick, etc. Depending on the embodiment, the memory, the data storage unit or the removable storage medium contains instructions which, when executed by the controller 102, cause this controller 102 to perform or control the interface parts of it. input 109, spatial correlation prediction 103, temporal correlation prediction 104, motion estimation 110 and / or data processing of the examples of implementation of the proposed method described herein. The controller 102 can be a component implementing a processor or a computing unit for encoding images according to the proposed method and controlling the units 109, 110, 103, 104, 105 of the encoder 100.

In addition, the encoder 100 can be implemented in software form, as described above, in which case it takes the form of a program executable by a processor, or in hardware form (or "hardware") , as an application specific integrated circuit (ASIC), a system on chip (SOC), or in the form of a combination of hardware and software elements, such as for example a software program intended to be loaded and executed on a component of type FPGA (Field Programmable Array). SOC (System On Chip) or system on chip are embedded systems that integrate all the components of a electronic system in a single chip. An ASIC (Application-specific Integrated Circuit) is a specialized electronic circuit that groups together functionalities tailored to a given application. ASICs are typically configured during manufacture and can only be simulated by the user. Programmable logic circuits of the FPGA (Field-Programmable Gâte Array) type are electronic circuits that can be reconfigured by the user.

An encoder can also use hybrid architectures, such as for example architectures based on a CPU + FPGA, a GPU (Graphics Processing Unit) or an MPPA (Multi-Purpose Processor Array).

The image being processed is divided into blocks or coding units (in English "Coding Unit", or CU), the shape and size of which are determined as a function in particular of the size of the matrix of pixels representing the image, for example in square shaped macroblocks of 16 x 16 pixels. A set of blocks is thus formed for which a processing sequence is defined (also called “processing path”). In the case of square-shaped blocks, we can for example process the blocks of the current image by starting with the one located at the top left of the image, followed by the one immediately to the right of the previous one, until reaching the end of the first row of blocks to go to the leftmost block in the row of blocks immediately below this first row, to end the processing by the lower-most block and to the right of the image.

We thus consider a "current block" (sometimes called "original block"), that is to say a block being processed in the current image. The processing of the current block can comprise the partitioning of the block into sub-blocks, in order to process the block with a spatial granularity finer than that obtained with the block. The processing of a block furthermore comprises the prediction of the pixels of the block, by exploiting the spatial (in the same image) or temporal (in the previously encoded images) correlation between the pixels. When several types of prediction, such as for example an Intra type prediction, an Inter type prediction, and / or a skip type prediction are implemented in the encoder, the prediction of the pixels of the block typically comprises the selection of a type of prediction of the block and of prediction information corresponding to the type selected, the whole forming a set of encoding parameters.

The prediction of the processed block of pixels makes it possible to calculate a residual of pixels, which corresponds to the difference between the pixels of the current block and the pixels of the prediction block, and is transmitted in certain cases to the decoder after transformation and quantization .

To encode a current block, several coding modes are thus possible and it is necessary to include in the data generated by the coding coding information 106-108 indicating the choice of coding mode which was made during encoding and according to which the data was encoded. This coding information 106-108 can include in particular the coding mode (for example the particular type of predictive coding among the “Intra” and “Inter” codings, or among the “Intra”, “Inter” and “Skip” codings) , the partitioning (in the case of one or more blocks partitioned into sub-blocks), as well as a motion information item 106 in the case of a predictive coding of the temporal correlation type and an Intra prediction mode 107 in the case of predictive coding of the spatial correlation type. As indicated above for the “Inter” and “Skip” coding modes, these last two items of information can also be predicted in order to reduce their coding cost, for example by using the information from the neighboring blocks of the current block.

As indicated above, the spatial correlation type predictive coding includes a prediction of the pixels of a block (or set) of pixels being processed using the previously coded pixels of the current image. There are various “Intra” type predictive coding modes, such as for example the “Intra” prediction mode called “DC” (for “Discrete Continuous”), the “Intra” prediction mode called “V” (for “Vertical”). "), The" Intra "prediction mode called" H "(for" Horizontal ") and the" Intra "prediction mode called" VL "(for" Vertical-Left ").

The H.264 / AVC video coding standard provides for 9 intra prediction modes (including DC, H, V, VL prediction modes). The video encoding standard HEVC provides for a larger number of 35 intra prediction modes.

These video coding standards also provide for special cases for performing an intra prediction. For example, the H.264 / AVC standard allows 16x16 pixel blocks to be cut into smaller blocks, up to 4x4 pixels in size, to increase the granularity of predictive coding processing.

As indicated above, the information of the Intra prediction mode is predicted in order to reduce its coding cost. In fact, the transmission in the encoded stream of an index identifying the Intra prediction mode has a cost that is higher as the number of prediction modes that can be used is large. Even in the case of H.264 / AVC encoding, transmitting an index between 1 and 9 identifying the intra prediction mode used for each block among the 9 possible modes turns out to be expensive in terms of encoding cost.

A most probable mode is thus calculated, denoted MPM (in English "Most Probable Mode"), which is used to encode the most probable Intra prediction mode on a minimum of bits. The MPM is the result of the prediction of the Intra prediction mode used to code the current block.

When the Intra mode is selected for encoding the current block, we can typically transmit to the decoder the residual pixels and the MPM.

The predictive coding in mode referenced for certain video coders under the name “Inter” includes a prediction of the pixels of a block (or set) of pixels being processed using pixels from previously images. coded (pixels which therefore do not come from the current image, unlike the Intra prediction mode).

The Inter prediction mode typically uses one or two sets of pixels located in one or two previously encoded images in order to predict the pixels of the current block. That said, it is possible to envisage, for an Inter prediction mode, the use of more than two sets of pixels situated respectively in previously coded images distinct two-by-two and the number of which is greater than two. This technique, called motion compensation, involves the determination of one or two vectors, called motion vectors, which respectively indicate the position of the set or sets of pixels to be used for the prediction in the image (s). previously encoded images (sometimes referred to as “reference images”). With reference to FIG. 1, the vectors used for the “Inter” mode are to be chosen by the encoder 100 by means of the movement estimation unit 110 and of the prediction unit by time correlation 104. The implementation of the motion estimation within the encoder 100 may therefore provide, depending on the case, for the determination of a single motion estimation vector or else of two motion estimation vectors which point to potentially different images.

The motion estimation vector (s) generated at the output of the motion estimation unit 110 will be supplied to the prediction unit by time correlation 104 for the generation of prediction vectors, for example a prediction vector Inter. Each Inter prediction vector can in fact be generated from a corresponding motion estimation vector.

The motion estimation can consist in studying the movement of the blocks between two images by exploiting the temporal correlation between the pixels. For a given block in the current image (the “current block” or “original block”), the motion estimation makes it possible to select a block that resembles the most (called “reference block”) in a previously coded image. , called “reference image”, by representing the movement of this block for example with a two-dimensional vector (horizontal displacement, vertical displacement).

A video encoder can thus be designed to decide for each elementary portion (for example for each block in the case of a division into blocks) of the images of a video sequence to be encoded, the encoding tools and the parameters to be applied. . The better the decision, the better the encoding quality.

A decision method consists in testing all the possibilities, for example all the different coding modes available at the encoder, so as not to remember that the best combination. But the number of combinations is in some cases so high that this method cannot be performed in a reasonable time.

The so-called "lookahead" memory storage technique, described above, of images of the source video before processing by the encoder, thus providing a set of images of the future with regard to the images in progress processing by the encoder for encoding, improves the encoding quality, by temporarily storing in memory images of a video stream to be encoded before encoding them, so that when an image is encoded, the encoder has at least part of the future of this image in the video stream. Storing thus allows the implementation of image analysis and processing tools from which the encoder can benefit for encoding the images of the video stream to be encoded.

Figure 2 is a block diagram illustrating the storage technique.

Referring to Figure 2, a storage unit (120) comprising a memory is interposed between the encoder (100) and the arrival of the video input signal (101) (corresponding to a video sequence source comprising a set of images) for storing in the memory N images (121) of the source video sequence (images of time indices t + 1, t + 2, ..., t + N) before encoding. The encoder performs the encoding of k images (112) of the source video sequence (images of temporal indices t, t-1, ..., tK) according to the chosen encoding sequence, and outputs a video compressed (113). An analysis unit (130) of the images of the source video sequence can process one or more images (121) stored by the stored unit (120), so as to provide the encoder (100) with results of analysis of these images (121).

However, in the case of a real-time encoding system, the storage introduces a delay equal to the number of images it contains. In practice, encoders typically implement storage delays of 0.5 to 5 seconds. Two non-limiting examples of image analysis of the source video sequence that can use the storage technique are described below: the detection of transitions, and the “macroblock tree” algorithm.

Example 1: detection of transitions

A video encoder encodes the images making up the video sequence to be encoded according to an encoding sequence defining an encoding order of the images of the video sequence. Depending on the prediction mode used for encoding the images, several types of images can be defined, in order to distinguish the images using independent encoding from other images (called “Intra” or “I” images), from images using encoding with unidirectional temporal prediction (called “Inter-predicted” or “P” images) and images using encoding with bidirectional temporal prediction (called “Inter-bi-predicted” images). The bi-predicted images which can serve as a reference for other images are usually denoted Br. Conversely, those which cannot serve as a reference are denoted B. The coding cost of the inter bi-predicted images is lower. than the cost of coding the inter predicted images, which is itself lower than the cost of coding the intra images.

The reference encoders of conventional standards set a priori the succession of types of images. Some advanced encoders, on the other hand, dynamically adapt image types to content, including the presence of transitions between video scenes. In the case of a clean transition ("scene had"), it is desirable to encode the first image of the new scene intra, in order to avoid unnecessary dependencies on the previous scene. Likewise, it is desirable to avoid intra-frames at the end of a scene as this is an unnecessary expense of bits.

Thus, it can be useful when encoding an image to know if there will be a transition in the near future, in order to choose the most suitable type of image. Storing makes it possible to execute methods for detecting transitions upstream from encoding, so as to provide the encoder with the transitions detected for the choice of the most suitable encoding mode as a function of the criterion used. [0100] Example 2: "Macroblock tree"

[0101] Bit rate control is an important element in the design of a video encoder, because it very strongly conditions the quality of the encoding. The bit rate is allocated by adjusting the quantization parameter (in English, "Quantization parameter", or "QP"). In recent codes, the QP parameter is configured at the level of a spatial subset of pixels, that is to say at the level of an elementary image portion. For example, in the HEVC coded, several types of nested partitions are defined: a partition in Coding Units (CU), a partition in Prediction Units (PU), and a partition in Transform Units. The QP parameter is set at the TU level. For AVC / H.264, the QP is set at the elementary image portion called "macroblock". For the future coded WC, the QP is set at the level of the elementary image portion called the coding unit (CU). The bitrate control must decide for each frame, and then for each block of the partitioned image, which QP to use. We speak of temporal and spatial allocation respectively. As indicated above, the QP of a block can be allocated by minimizing a rate-distortion criterion, preferably with a so-called perceptual distortion measurement. It is also possible to take into account the phenomena of temporal propagation. Indeed, if we look at the level of the image, the cost of coding an image depends on the quality of its reference, that is to say on the image used for the prediction of the image in encoding course. The better the benchmark, the better the prediction.

When encoding an image, it can therefore be useful to ask yourself whether this image will serve as a reference for other images in the future. If this is the case, it will be preferable to allocate more bit rate to this image compared to the following ones. For example, an I-image is usually very important for the future. On the contrary, a B image does not matter, since it does not serve as a reference at all. The same reasoning applies at the block level (or, depending on the case, at the macroblock, TU or CU level). A block that serves as a reference for many other blocks in the future is more important than a block that does not serve as a reference. However, to know this information, one must not only look into the future, but also take into account the movement of objects. The so-called “macroblock tree” (MB tree) algorithm is an example of implementation of this principle which provides a high gain in coding efficiency.

The MB tree algorithm in fact makes it possible, from an image cut into blocks, to adjust the encoding parameters for each block on the basis of a criterion determining for the block whether it will serve as a reference. in the future, that is to say if it will be used for the prediction of other blocks belonging to one or more images which will be encoded subsequently.

[0104] Example 3: coding complexity / cost prediction

[0105] As explained in the previous example, bit rate control is an important element in the design of a video encoder. In Example 2, the "MB tree" algorithm is used to determine the relative importance of the blocks of an image for future images. This provides a criterion for the allocation of spatial throughput through the setting of the QP per block. Another criterion for bit rate allocation is the temporal evolution of the coding cost of the blocks.

[0106] The intrinsic characteristics of a video vary over time. Thus, if we encode a video at a given fixed QP, we observe variations in bit rate over time, because all the images / blocks do not contain the same amount of information.

The principle of constant bit rate control (in English "Constant bit rate", or "CBR") is to smooth the variations of the flow by modifying the QP over time and by relying on a memory buffer of fixed size given to smooth the residual variations (in English "Video Buffer Check", or "VBV"). As the QP varies over time, the quality of the rendered video varies accordingly. To ensure consistent perceived quality to viewers, the quality should vary as smoothly as possible, while respecting the constraints of fixed bit rate and Damping Buffer Size (VBV). To achieve this goal, it is useful to anticipate variations in the amount of information over time, which can be implemented using the store (or “lookahead”) principle described above, or using two pass encoding. As noted above, the downside of principle of storage is that it introduces latency. As for two pass encoding, it does not work for live streaming.

When the configuration of the analysis unit allows the execution of different analyzes of the images stored in memory in the storage unit, for example in parallel, different ordering sequences of the images in the storage memory. in memory can be considered. Indeed, the use of type B images involves reordering the images during encoding and decoding, since if a B image refers to an image of the past and an image of the future, the image of the future must be encoded / decoded before frame B, as shown in figure 3.

FIG. 3, which illustrates an example of reordering of images for encoding with B images, shows three images, one of type I, one of type B, and one of type P, placed in order of 'display (image of index 1 of type I, then image of index 2 of type B, and image of index 3 of type P) and in encoding / decoding order (image of index 1 of type I, then image of index 3 of type P, and image of index 2 of type B). In the figure, the arrows indicate the reference images: the image of type B (index 2) uses the two images of indexes 1 and 3 as reference images, while the image of index 3 uses the image of index 1 as the reference image.

[0110] Depending on the embodiment, some of the planned analyzes of the images of the input video sequence will be easier in the order of display, while others will be better suited to the order of encoding. These images being stored in the stored unit, it is possible to provide several ways of structuring the memory of the storage unit by integrating one or more re-orderings of the images of the input video sequence which are stored there. .

FIG. 4a illustrates an example of the configuration of a unit memory buffer placed in memory comprising 3 parts. As illustrated in the figure, the first part could for example store Na images of the input video sequence in the display order, the second part could store Nr images in a temporary order of reordering, and the third part will be able to store Ne images in the order of encoding, the total number of images stored being equal to N (N = Na + Nr + Ne). The use of a reordering buffer allows the temporary storage of images received in the order of display, in order to produce a sequence of images in the order of encoding. For example, if at the input of the encoder we receive the images in a certain order, for example the images of indices 1, 2, 3, but we wish to encode the image 3 in second, the image 2 can be stored temporarily in a reordering buffer, while waiting to receive image 3.

[0112] FIG. 4b illustrates an example of an encoder incorporating an analysis module of the "MB Tree" type and using a storage unit whose memory has a structure comprising three memory buffers, as illustrated in FIG. 4a. The MB Tree algorithm uses the images stored in the order of encoding to output the encoder with propagation costs to improve the efficiency of the encoding performed by the encoder to produce compressed video. . In this example, the “lookahead” part and the encoding part can be built from any implementation of the HEVC encoder, such as the x265 implementation.

[0113] FIG. 4c illustrates a more detailed example of the structure of the storage of images from a source video sequence within a storage unit. In this figure, each column corresponds to an image stored in the storage unit. The first part, on the left, contains Na images in order of display. The second part, in the middle, contains Nr images being reordered. The third part, on the right, contains Ne images in order of encoding. This corresponds exactly to the structure of Figure 4a.

For each image, the index of the image is indicated in the order of display, and for each image stored in the second and third parts, the type of image, and the index of the image in order of encoding. The type of image can in fact, for example, be generated by analyzing the images in the display order, and recorded in memory, for example at the level of the scheduling buffer. Thus in the left part, we see that the index in display order is sorted, while in the right part, it is the index in encoding order which is sorted.

[0116] Each time an image is encoded, this storage in memory is updated (one image leaves to be encoded, another enters to be stored in memory) while retaining its properties.

Thus, the detection of transitions, and in particular the detection of gradual transitions, being carried out more easily in the order of display, and conversely, the MB tree being interested in the propagation of the references in the encoding order, FIG. 4a illustrates an example of a storage configuration in a unit placed in memory allowing an analysis module to execute these two algorithms, including in parallel.

[0118] FIG. 5 is a diagram illustrating the method proposed according to one or more embodiments.

We consider the case of an image (called "current image") from a set of images, for example a sequence of images corresponding to a video sequence, and cut into blocks, the encoding of which is performed by encoding blocks, each block being encoded according to one of a plurality of encoding modes.

[0120] The coding of a current block is thus considered according to one coding mode from among a plurality of coding modes, for example comprising one or more coding modes of the prediction type by time correlation using a plurality of images of the. sequence of images and / or one or more coding modes of the prediction type by spatial correlation on the image being encoded. The images in the set of images may be encoded in a sequence defining an encoding sequence of the images in the set of images.

With reference to FIG. 5, a prediction of at least one characteristic of the current block is determined (200) in one or more images of the set of images distinct from the current image and not yet encoded according to a encoding sequence, based on at least one image distinct from the image current and previously encoded according to the encoding sequence of the images of the set of images.

The prediction determined for the encoding of the current block is then used (201), for example by minimizing a distortion rate criterion to select from among a plurality of encoding modes an encoding mode for the current block considered to be optimal with regard to of a decision criterion. An example of such a criterion, noted [Math. 1] /, is of the form [Math. 2 \ J = D + À.R, where [Math. 3] D denotes the distortion, [Math. 4] A is a Lagrange multiplier and [Math. 5] R designates the bit rate associated with the coding of the estimated decision. Different types of criteria can be used, such as criteria using a so-called "objective" metric for calculating the distortion [Math. 6] D, such as the sum of the absolute differences (in English "Sum of Absolute Differences", or SAD) or the mean square error (in English "Mean Square Error", or MSE), or criteria incorporating a measure of visual distortion (also called "subjective distortion"). For example, the correlation between a block and its displacement according to a motion estimation vector can be calculated using the Sum of Absolute Differences (SAD):

[0123] [Math.

[0124] where [Math. 8] p _xy is the pixel at position [Math. 9] (x, y) of the original block and [Math. 10] p 'xy the pixel at position [Math. 11] (x, y) of the reference block. A

Weak SAD will be interpreted as an indication that the two blocks are very similar.

The proposed method introduces the use of a prediction of one or more characteristics of the current block (either during encoding), determined in the video in the future, that is to say in a or several images of the set of images (typically of the video sequence) which have not yet been encoded according to the encoding sequence. In one or more embodiments, this prediction is calculated on the basis of the images of the past, that is to say on the basis of one or more images which have been previously encoded according to the encoding sequence. Determining this prediction makes it possible to dispense, in whole or in part, from the use of the storage technique (called "lookahead") to keep in memory a knowledge of the video in the future compared to the image being encoded, and thus reduce the processing latency by reducing the latency corresponding to the use of the storage technique (“lookahead”). Indeed, the proposed method can be implemented by using, for the determination of the prediction, only images which have already been encoded and are therefore already available at the encoder, or by using a storage of a smaller number of images not yet encoded. This provides good performance in terms of latency, bit rate and video quality, which can be compatible with video encoding for “live” broadcasting.

[0127] In one or more embodiments, the prediction of the current block can further be determined on the basis of at least one image not yet encoded according to the encoding sequence. Thus, the determination of the prediction can in certain embodiments use at least one already encoded image of the set of images (ie at least one image from the past) and at least one not yet encoded image of the set of images. images (ie at least one image from the future). In one or more embodiments, one can use a lookahead to have images of the future available for the determination of the prediction, while limiting the amount of images of the set of images. images which are stored in memory in order to obtain a gain in latency reduction.

In one or more embodiments, the predicted characteristic or characteristics of the current block may correspond to the results of analyzes to be carried out upstream of the encoding in order to improve its efficiency as described above. For example, in correspondence with an MB Tree type analysis, the predicted characteristic could correspond to a score which indicates the persistence of the current block as a reference for encoding images of the future. This score may, depending on the embodiment, include a cost of propagation of the current block in images of the future (that is to say not yet encoded) of the set of images.

In correspondence with a transition detection analysis, the predicted characteristic could correspond, in one or more embodiments, to a score indicating the presence or not of a transition (for example a score indicating whether the image belongs to same scene as before, or does not belong to the same scene as before).

In correspondence with a flow control analysis, the predicted characteristic may correspond, in one or more embodiments, to a score which indicates for the current block the evolution of the quantity of information over time, that is to say in blocks of images of the future (not yet encoded) corresponding to the current block.

In what follows, we consider image encoding systems configured for the implementation of the method proposed according to one or more embodiments comprising an encoder provided with a buffer memory configured to store k images, including an image being encoded and k - 1 images already encoded. Images already encoded and stored at the encoder may be considered as images from the past compared to the image being encoded with regard to the image encoding sequence used by the encoder. There are no special penalties for storing these images in memory, apart from some memory consumption. These images from the past are therefore considered to be still available.

[0132] FIG. 6a illustrates an image encoding system configured for implementing the method proposed according to one or more embodiments.

With reference to FIG. 6a, a storage unit (120a) comprising a memory is interposed between the encoder (100a) and the arrival of the source video sequence (101a) in order to store n images in a memory. (121a) of the source video sequence (not yet encoded images with time indices t + 1, t + 2, ..., t + n) before encoding. The encoder performs the encoding of k images (112a) of the source video sequence (image being encoded with a time index t, and images already encoded with respective time indices t-1, ..., tk) according to the chosen encoding sequence, and outputs a compressed video (113a). A prediction unit (131a) of the images of the source video sequence is configured to predict the evolution of the video in the future, from one or more images of the past (images (112a) already encoded or in progress. 'encoding by the encoder), and possibly from one or more images of the future (images (121a) not yet being encoded, and stored in memory, for example in the storage unit ( 120a)). In one or more embodiments, the prediction unit (131a) of the images of the source video sequence is configured to provide the analysis unit (130a) with predictions of at least one characteristic of blocks of the images to. encode on the basis of one or more images among the images (112a) of the past (image being encoded with time index t, and images already encoded with respective time indices t-1, ..., tk ) stored in the memory of the encoder (100a), as well as, in the embodiments illustrated in FIG. 6a, on the basis of one or more images among the images (121a) of the future (images not yet encoded d 'time indices t + 1, t + 2, ..., t + n) stored in the memory of the storage unit (120a). The analysis unit (130a) can be configured to perform certain analysis processing of the images to be encoded, such as, for example, as described above, detection of transitions and / or the implementation of an algorithm. of the “MB Tree” type, on the basis of the predictions supplied to it by the prediction unit (131a). Preferably, the number (n) of images stored in the memory of the storage unit (120a), in embodiments using a storage unit, will be less than the number (N) of stored images. in the memory of the storage unit when the proposed prediction module is not used (for example as illustrated in FIG. 2), so as to reduce the latency induced by this storage, even when image storage of the source video sequence before encoding is used.

[0134] Thus, the analysis of future images can be reconstructed without generating latency, or by generating a latency corresponding to the amount of storage used, for example the number of images stored in the memory of the setting unit. in memory used. [0135] FIG. 6b is a diagram illustrating a system configured for the implementation of the method proposed according to one or more embodiments, corresponding to a variant of the implementation illustrated by FIG. 6a in which the unit of analysis (130a) and the prediction unit (131a) are grouped together within an analysis prediction unit (132a), the functionalities of which correspond to all the functionalities of the analysis unit (130a) and functions of the prediction unit (131a) combined. In one or more embodiments, the analysis prediction unit (132a) of the images of the source video sequence is configured to provide the encoder (100a) with predictions of at least one characteristic of blocks of the images to be analyzed. encode on the basis of one or more images among the images (112a) of the past (image being encoded with time index t, and images already encoded with respective time indices t-1, ..., tk ) stored in the memory of the encoder (100a), as well as, in the embodiments illustrated in FIG. 6b, on the basis of one or more images among the images (121a) of the future (images not yet encoded d 'time indices t + 1, t + 2, ..., t + n) stored in the memory of the storage unit (120a). The encoder (100a) can be configured to use certain processing for analyzing the images to be encoded, on the basis of the predictions and the analysis results supplied to it by the analysis prediction unit (132a).

[0136] FIG. 6c is a diagram illustrating a system configured for the implementation of the method proposed according to one or more other embodiments not using storage before encoding of images of the source video sequence.

With reference to FIG. 6c, and unlike the diagrams of FIGS. 6a and 6b, no storage unit is interposed between the encoder (100b) and the arrival of the source video sequence (101b ) to store images of the source video sequence in memory before encoding. The encoder performs the encoding of k images (112b) of the source video sequence (image being encoded with time index t, and images already encoded with respective time indices t-1, ..., tk) according to the encoding sequence chosen, and output a compressed video (113b). An analysis prediction unit (131b) of the images of the source video sequence is configured to provide the encoder (100b) with predictions of block characteristics of the images to be encoded only on the basis of one or more images. among the images (112b) of the past (image currently being encoded with time index t, and images already encoded with respective time indices t-1, tk) stored in the memory of the encoder (100b). The encoder (100b) can be configured to perform or use certain analysis processing of the images to be encoded, on the basis of the predictions supplied to it by the analysis prediction unit (131b). Thus, in this case, the analysis prediction unit (131b) relies only on images from the past to predict the desired analysis results. The latency (otherwise induced by the use of the image memory storage of the source video before encoding) is then advantageously reduced to its minimum.

[0138] FIG. 6d is a diagram illustrating a system configured for the implementation of the method proposed according to one or more embodiments.

With reference to FIG. 6d, a storage unit (120c) comprising a memory is interposed between the encoder (100c) and the arrival of the source video sequence (101c) in order to store n images in a memory. (121 c1; 121c2; 121c3) of the source video sequence before encoding. As illustrated in FIG. 4a, the memory storage within the storage unit (120c) is structured with three memories: a first memory (121 c1) stores images of the source video sequence in the order of d When displaying the images, a second memory (121c2) stores images of the source video sequence after reordering relative to their display order, and a third memory (121c3) stores images of the source video sequence in the image display. encoding order of these images. The encoder (100c) performs the encoding of k images (112c) of the source video sequence (image being encoded with a time index t, and images already encoded with respective time indices t-1, ... , tk) according to the chosen encoding sequence, and outputs a compressed video (113c). An analysis prediction unit (132c) of the images of the source video sequence is configured to predict the evolution of the video in the future, from one or more images from the past (images (112c) already encoded or being encoded by the encoder), and possibly from one or more images from the future (images (121 c1; 121c2; 121c3 ) not yet being encoded, and stored in memory, for example in the storage unit (120c)). In one or more embodiments, the analysis prediction unit (132c) of the images of the source video sequence is configured to provide the encoder (100c) with predictions of at least one characteristic of blocks of the images to be analyzed. encode on the basis of one or more images among the images (112c) of the past (image being encoded with time index t, and images already encoded with respective time indices t-1, ..., tk ) stored in the memory of the encoder (100c), as well as, in the embodiments illustrated in FIG. 6d, on the basis of one or more images among the images (121 c1; 121c2; 121c3) of the future stored in memory of the storage unit (120c). The encoder (100c) can be configured to use certain processing for analyzing the images to be encoded, on the basis of the predictions and the analysis results supplied to it by the analysis prediction unit (132c). As indicated above, the number (n) of images stored in the memory of the storage unit (120c), in embodiments using a storage unit, will preferably be chosen less than the number ( N) images stored in the memory of the storage unit when the proposed prediction module is not used (for example as illustrated in FIG. 2), so as to obtain a gain in latency reduction.

[0140] FIG. 6e is a diagram illustrating a system configured for the implementation of the method proposed according to one or more embodiments, corresponding to a variant of the implementation illustrated by FIG. 6d in which the implementation unit in memory (120d) only comprises a memory (121 d2) in which a reordering of the images of the source video sequence (101 d) is carried out. This limitation of the number of images stored in the memory of the storage unit, and therefore of the size of the memory of the storage unit, with respect to the embodiments illustrated in FIG. 6d, allows a gain reduction in latency induced by the use of the storage unit, while keeping an analysis prediction unit (132d) configured to predict at least one characteristic for the blocks of the images of the future, that is to say according to the encoding sequence the images being encoded (112d) within the encoder (100d), on the base of the images being encoded (112d) and the images (121 d2) stored in the memory of the storage unit (120d), for the production of the compressed video (113d).

[0141] In one or more embodiments, the prediction of a characteristic of the current block can be determined using an artificial intelligence algorithm, such as for example a supervised learning algorithm.

[0142] With reference to FIG. 4b illustrating an image encoding system using a unit for storing and re-ordering images of the source video sequence not yet encoded in order to supply them to an analysis unit implementing an “MB Tree” type algorithm to determine the costs of propagation of a current block in the images stored in the memory of the storage unit, the analysis unit implementing the algorithm of "MB Tree" type, as well as the storage unit, can be wholly or partly replaced in one or more embodiments by a propagation cost prediction unit which relies only on past images, c 'that is to say on images available at the encoder. In one or more embodiments, this prediction unit can be configured to implement an artificial intelligence algorithm, and for example use a neural network. For example, in one or more embodiments, the prediction unit may be configured to implement a supervised machine learning algorithm.

[0143] The implementation of an artificial intelligence algorithm can, in one or more embodiments, lead to performing a learning phase, prior to determining the prediction of the characteristic of the current block during a so-called inference phase during which a prediction model will be used to determine the prediction, in order to determine parameters of the prediction model. The learning phase can be performed on a set of images different from the set of images comprising the images to be encoded, so that the algorithm used for determining the prediction has performed a phase training on data different from those used for the implementation of the prediction for encoding the images of a set of images.

In one or more embodiments, the learning phase can comprise the determination of reference data comprising a reference prediction of the characteristic of a current block on which the learning is performed, the current block belonging to an image of the set of images on which the learning is carried out, called the current image, in an image of the set of images distinct from the current image and not yet encoded according to the encoding sequence of the set of images on which the learning is performed, on the basis of another image of the set of images distinct from the current image and not yet encoded according to the encoding sequence. In one or more embodiments, the reference data can thus correspond to the characteristic that one seeks to predict using the artificial intelligence algorithm.

[0146] These reference data (for example, in one or more embodiments, the reference prediction of the current block) can be used to carry out a training phase on a neural network, further on the basis of data input, in order to generate a prediction model of a characteristic of a current block in the images of the set of images not yet encoded according to an encoding sequence, for example to determine parameters of the model. The prediction model can then be used to determine a prediction of the feature in one or more not yet encoded images, based on at least one already encoded image, as provided in one or more embodiments.

[0147] For example, in one or more embodiments, the reference data used during the learning phase may comprise data generated by an analysis unit implementing an “MB Tree” type algorithm, comprising for example block propagation costs of one or more images not yet encoded which are stored in memory, for example of a storage unit as illustrated in FIG. 4b. The reference data can thus for example comprise the results of an “MB Tree” type algorithm applied to the current image of the learning phase.

[0148] The input data of the prediction model used for the learning phase of this model can include, in one or more embodiments, data of the current image of the learning phase.

In one or more embodiments, the input data can comprise data of an image preceding the current image according to an encoding sequence used for the learning phase, that is to say of an image already encoded, previously to the current encoding of the current image. For example, the input data may include data from the image immediately preceding the current image in the encoding sequence used for the learning phase.

[0150] In one or more embodiments, the input data may comprise data for estimating the movement of the current block between the current image and an image preceding the current image according to the encoding sequence used for the. learning phase. Depending on the embodiment, this estimation data may include a motion estimation vector for the current block of the training phase, as well as optionally an objective distortion metric value, such as the sum of the differences. absolute values (SAD) or the mean square error (MSE), for the current block.

[0151] Thus, in one or more embodiments in which the plurality of coding modes comprises at least one coding mode of the prediction type by temporal correlation using a plurality of images of the set of images used for the 'learning, for example a coding mode of the Inter type, the proposed method can comprise the determination of a motion estimation vector of the current block, the motion estimation vector pointing to a block correlated with the current block in a image of separate image set of the current image and previously encoded according to the encoding sequence of the images of the set of images used for training.

The learning of the prediction model, for example of the neural network in the cases where the prediction model is implemented using a neural network, can, in one or more embodiments, be performed using the motion estimation vector determined for the current block. Depending on the embodiment, training the prediction model, for example the neural network, may further use an objective distortion metric value, such as the sum of absolute differences (SAD) or the root mean square error. (MSE), for the current block.

[0153] As indicated above, in one or more embodiments, as an alternative or in addition to the use of motion estimation data, the training of the neural network can be performed based on the current image of the learning phase (the current image then being included in the input data of the network for learning), and / or on the basis of an image of the set of images distinct from the current image and previously encoded according to the sequence for encoding the images of the set of images used for training (this image then being included in the input data of the network for training).

[0154] In one or more embodiments, the learning phase therefore makes it possible to determine parameters of a prediction model, on the basis of input data of the model and of reference data supplied to the model for its training. . Depending on the embodiment, the input data can comprise data of the current image comprising the current block on which the learning is carried out, data of an image preceding the current image in an encoding sequence. images of the set of images used for training, and / or one or more motion vectors and an objective metric value, for example an SAD metric value, resulting from an estimation of motion between these two images . In the embodiments where the block characteristic to be predicted by the prediction model corresponds to a propagation cost representing a score of persistence of the block in the not yet encoded images of the set of images to be encoded, the reference data can comprise propagation cost data, for example obtained by applying an “MB Tree” type algorithm to the image current.

[0155] In one or more embodiments, the neural network used for determining a prediction of a characteristic of the current block may be of the convolutional neural network type. Such a network is typically configured to learn filtering operations, so that learning parameters of the neural network includes learning filtering parameters.

[0156] FIG. 7a illustrates a system configured for the implementation of a learning phase according to one or more embodiments.

[0157] With reference to FIG. 7a, the system (1e) comprises a storage unit (120e) comprising a memory interposed between an encoder (100e) and the arrival of the source video sequence (101e) to store in a memory N images (121 e1; 121e2; 121e3) of the source video sequence (101e) before encoding. The encoder (100e) performs the encoding of k images (112a) of the source video sequence (image being encoded with a time index t, and images already encoded with respective time indices t-1, ... , tk) according to the chosen encoding sequence, and outputs a compressed video (113e). For example, the storage unit (120e) and encoder (100e) can be built from a HEVC encoder implementation, such as x265.

A prediction unit (131e) of the images of the source video sequence is configured to predict the evolution of the video in the future, from one or more images from the past (images (112e) already encoded or being encoded by the encoder), and possibly from one or more images of the future (images not yet being encoded, and stored in memory, for example in the storage unit ( 120e)). In one or more embodiments, the prediction unit (131e) of the images of the source video sequence is configured to perform a training phase to generate an estimation model. In the example shown, the prediction unit (131e) comprises an analysis unit (133e) configured to execute an "MB Tree" type algorithm from image data (121e3) stored in the memory of the storage unit (120e) and generate reports. propagation cost data provided as reference data to a learning unit (135e) of the prediction unit (131e). The learning unit (135e) is configured to receive this reference data, as well as input data comprising data of the current image stored in the encoder (100e) (image of time index "t" ), data of an image preceding the current image in the encoding sequence (for example, as illustrated in the figure, image of time index "t - 1"), as well as motion estimation data (comprising for example motion vectors (MV) and objective criterion values (SAD)) between these two images generated by a motion estimation unit (134e) of the prediction unit (131e). The reference data can thus be generated in one or more embodiments by applying an “MB Tree” type algorithm to images of the future with respect to the images being encoded. Depending on the embodiment, the input data may include data from the current image and data from the image preceding the current image in the encoding sequence without including data for estimating motion between these. two images, or conversely comprising data for estimating movement between these two images without comprising data from the current image or data from the image preceding the current image in the encoding sequence. The training unit (135e) can be configured to learn to estimate the reference data from the input data supplied to it, to generate the parameters of an estimate model (136e).

In one or more embodiments, the system (1e) can use a neural network, to which the input data is provided for a learning phase for estimating the reference data, as described herein. - above. At the end of the learning, the parameters of the neural network are saved, the parameterized neural network providing the estimation model (136e). [0160] FIG. 7b illustrates a system configured for implementing the method proposed according to one or more embodiments.

[0161] With reference to FIG. 7b, the system (1f) comprises a storage unit (120f) comprising a memory interposed between an encoder (10Of) and the arrival of the source video sequence (101 f) for storing in an image memory (121 f1; 121 f 2) of the source video sequence (101 f) before encoding. The encoder (10Of) performs the encoding of k images (112f) of the source video sequence (image being encoded with a time index t, and images already encoded with respective time indices t-1, ... , tk) according to the chosen encoding sequence, and outputs a compressed video (113f). For example, the storage unit (120f) and encoder (1 OOf) can be built from the HEVC x265 encoder.

A prediction unit (131 f) of the images of the source video sequence is configured to predict the evolution of the video in the future, from one or more images from the past (images (112f) already encoded. or being encoded by the encoder), and possibly from one or more images of the future (images not yet being encoded, and stored in memory, for example in the storage unit (120f)). In one or more embodiments, the prediction unit (131 f) of the images of the source video sequence comprises an inference unit (137f) configured to determine a prediction of a characteristic of a current block of a. current image (image being encoded with temporal index t), and supplying the determined prediction to the encoder (1 OOf). In the example illustrated in the figure, the predicted characteristic comprises a propagation cost of the current block, which corresponds to the learning phase illustrated by FIG. 7a which uses reference data comprising propagation costs generated by application of d 'an “MB Tree” type algorithm. The system (1f) therefore advantageously makes it possible to supply an encoder with propagation cost data, without having recourse to an analysis unit configured to apply an “MB Tree” type algorithm to image data of the future stored in memory of a storage unit. In one or more embodiments, the types of input data for the learning phase can correspond to the types of input data for the inference phase (or prediction phase).

[0164] Depending on the embodiment, the input data of the inference unit (137f) can comprise data of the current image stored in the encoder (1 OOf) (time index image " t "), data of an image preceding the current image in the encoding sequence (for example, as illustrated in the figure, image of time index" t - 1 "), as well as estimation data of motion (comprising for example motion vectors (MV) and objective criterion values (SAD)) between these two images generated by a motion estimation unit (134f) of the prediction unit (131e), data of the current image and data of the image preceding the current image in the encoding sequence without including data for estimating movement between these two images, or conversely including data for estimating movement between these two images without including data from the current image or data from the image preceding the current image in the encoding sequence.

In one or more embodiments, the system (1f) can use a neural network, to which, after learning, input data is provided for the estimation of a characteristic of a current block of an image. current in a set of images to encode. The inference unit (137f) can be configured to apply the neural network with the settings saved during training. The neural network can be configured to take the same data as input as during training, and to output estimated propagation costs, replacing the propagation costs generated by an "MB tree" type algorithm. The part of the storage unit (120f) that is no longer needed is removed, which effectively reduces encoding latency.

[0166] FIG. 8 illustrates an exemplary architecture of a device for implementing the proposed method, according to one or more embodiments. [0167] With reference to FIG. 8, the device 300 comprises a controller 301, operably coupled to an input interface 302, an output interface 303 and to a memory 304, which drives a prediction unit 305.

The input interface 302 is configured to receive, for example via a storage unit (to implement a functionality of the “lookahead” type) or an encoding unit. video (not shown in the figure), data corresponding to images of a set of images. The input interface 302 may further be configured to receive reference data and input data, for implementations of a learning phase and an inference phase, as described below. above, in embodiments in which the prediction unit 305 is configured to implement an artificial intelligence algorithm such as, for example, a supervised machine learning algorithm.

[0169] The output interface 303 is configured to provide data generated by the prediction unit to a device configured to use that data, such as, for example, a video encoder unit.

[0170] The controller 301 is configured to drive the prediction unit 305 for the implementation of one or more embodiments of the proposed method.

[0171] Prediction unit 305 can be configured to determine a prediction of a characteristic of a current block, and to provide that prediction on output interface 303 to a video encoder unit. In one or more embodiments, the prediction unit 305 can be configured to implement an artificial intelligence algorithm, using a neural network, such as, for example, a supervised learning algorithm. In one or more embodiments, the prediction unit 305 may include an analysis unit configured to perform analysis of image data received on the input interface 302, such as, for example, analysis using an “MB Tree” type algorithm.

[0172] The device 300 may be a computer, a computer network, an electronic component, or other apparatus comprising a processor operably coupled to a memory, as well as, depending on the embodiment selected, a data storage unit, and other associated hardware elements such as a network interface and a media drive for reading and writing removable storage media (not shown in the figure). The removable storage medium can be, for example, a compact disc (CD), a digital video / versatile disc (DVD), a flash disc, a USB stick, etc. Depending on the embodiment, the memory, the data storage unit or the removable storage medium contains instructions which, when executed by the controller 301, cause this controller 301 to perform or control the interface parts of it. input 302, output interface 303, memory 304, and prediction unit 305 for implementing the proposed method. The controller 301 can be a component implementing one or more processors or a calculation unit for the image encoding according to the proposed method and the control of the units 302, 303, 304 and 305 of the device 300.

[0173] In addition, the device 300 can be implemented in software form, in hardware form, such as an ASIC type circuit, or in the form of a combination of hardware and software elements, such as for example a software program intended for to be loaded and executed on an FPGA type component.

Industrial application

[0174] Depending on the embodiment chosen, certain acts, actions, events or functions of each of the methods described in this document may be performed or occur in a different order from that in which they have been described, or may be added, merged or not to be done or not to happen, as the case may be. Further, in some embodiments, certain acts, actions or events are performed or occur concurrently and not sequentially.

[0175] Although described through a number of detailed exemplary embodiments, the proposed encoding method and the equipment for implementing the method include various variants, modifications and improvements which will become evident from the description. those skilled in the art, it being understood that these different variants, modifications and improvements form part of the scope of the invention, as defined by the claims which follow. Furthermore, different aspects and characteristics described above can be implemented together, or separately, or alternatively substituted for each other, and all of the different combinations and under combinations of aspects and characteristics are within the scope of the invention. In addition, some systems and equipment described above may not incorporate all of the modules and functions described for the preferred embodiments.

Claims

[Claim 1] A method of encoding a first image in a first set of images, wherein the first image is cut into blocks, each block being encoded according to one of a plurality of encoding modes, the method comprising, for a current block of the first image: determining, on the basis of at least one second image distinct from the first image and previously encoded according to a sequence for encoding the images of the first set of images, a prediction of a characteristic of the current block in one or more third images of the first set of images distinct from the first image and not yet encoded according to the encoding sequence; and use the prediction for encoding the current block by minimizing a rate-distortion criterion.

[Claim 2] An encoding method according to claim 1, wherein the characteristic includes a cost of propagating the current block in one or more third images of the first set of images.

[Claim 3] An encoding method according to claim 1, wherein the characteristic comprises a measure of the presence of a transition in the current block.

[Claim 4] An encoding method according to claim 1, wherein the characteristic comprises a measure of the change in the current block of the amount of information over time.

[Claim 5] An encoding method according to any one of the preceding claims, wherein the prediction of the current block is further determined on the basis of at least a fourth image distinct from the first image and not yet encoded according to the sequence. encoding.

[Claim 6] A method according to one of the preceding claims, wherein the prediction of the characteristic of the current block is determined using a supervised learning algorithm.

[Claim 7] Method according to one of the preceding claims comprising a learning phase of a neural network carried out on a second set of images, the learning phase comprising, for a current block of a current image of the second set of images: determining, on the basis of at least one image of the second set of images distinct from the current image and not yet encoded according to an encoding sequence of the images of the second set of images, a reference prediction of the characteristic of the current block in an image of the second set of images distinct from the current image and not yet encoded according to the encoding sequence of the second set of images; and performing a learning phase of the neural network on the basis of input data, and on the basis of the reference prediction of the current block included in reference data, to generate a prediction model of the characteristic of the block current in the images of the second set of images not yet encoded according to the encoding sequence.

[Claim 8] The method of claim 7, wherein the plurality of encoding modes comprises at least one time correlation prediction type encoding mode using a plurality of images of a set of images to be encoded, the method further comprising, for a current block of a current image of the second set of images: determining a motion estimation vector of the current block, the motion estimation vector pointing to a block correlated to the current block in an image the second set of images distinct from the current image and previously encoded according to the sequence for encoding the images of the second set of images; and wherein the learning of the neural network is further performed based on the motion estimate vector of the current block included in the input data.

[Claim 9] A method according to any one of claims 7 and 8, wherein the learning of the neural network is performed on the basis of the current image included in the input data and / or on the basis of an image of the second set of images distinct from the current image and previously encoded according to the sequence for encoding the images of the second set of images, included in the input data.

[Claim 10] A method according to any of claims 7 to 9, wherein the neural network is convolutional.

[Claim 11] A method according to any of claims 7 to 10, wherein the prediction of the characteristic of the current block is determined using the prediction model, based on the first image and based on the au at least a second image included in input data of the prediction model.

[Claim 12] A method according to any of claims 7 to 11, wherein the plurality of coding modes comprises at least one time correlation prediction type coding mode using a plurality of images of the first set of images. , the method further comprising: determining a motion estimation vector of the current block, the motion estimation vector pointing to a block correlated with the current block in an image of the first set of images distinct from the first image and previously encoded according to the sequence of encoding the images of the first set of images; and wherein the prediction of the current block is determined using the prediction model, based on the motion estimate vector included in the input data of the prediction model.

[Claim 13] An image encoding device comprising: an input interface configured to receive a first image of a set of images; an image encoding unit, operably coupled to the input interface, and configured to: slice the first image into blocks; and encoding each block according to one of a plurality of encoding modes according to the method of any one of claims 1 to 12.

[Claim 14] A computer program, loadable into a memory associated with a processor, and comprising portions of code for implementation. implementation of the steps of a method according to any one of claims 1 to 12 during the execution of said program by the processor.

[Claim 15] A set of data representing, for example by compression or encoding, a computer program according to claim 14.