WO2022222842A1 - Dynamic image encoding and decoding methods, apparatus and device and storage medium - Google Patents

Dynamic image encoding and decoding methods, apparatus and device and storage medium Download PDF

Info

Publication number
WO2022222842A1
WO2022222842A1 PCT/CN2022/086880 CN2022086880W WO2022222842A1 WO 2022222842 A1 WO2022222842 A1 WO 2022222842A1 CN 2022086880 W CN2022086880 W CN 2022086880W WO 2022222842 A1 WO2022222842 A1 WO 2022222842A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
area
moving
objects
frame
Prior art date
Application number
PCT/CN2022/086880
Other languages
French (fr)
Chinese (zh)
Inventor
闫宁
陈焕浜
李照洋
马飞龙
宋星光
周建同
杨海涛
李江
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022222842A1 publication Critical patent/WO2022222842A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/167Position within a video image, e.g. region of interest [ROI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Embodiments of the present application disclose dynamic image encoding and decoding methods, an apparatus and device and a storage medium, which belong to the technical field of encoding and decoding. In the encoding method, semantic segmentation is performed on any image frame in a dynamic image to obtain an image segmentation mask, the dynamic image comprising multiple objects, and the image segmentation mask comprising multiple image regions in one-to-one correspondence to multiple objects; a moving image sequence is determined on the basis of the dynamic image, each image frame in the moving image sequence comprising an image region in which one or more moving objects among the multiple objects are located; position indication information is determined on the basis of the image segmentation mask, the position indication information being used to indicate the position of the image region in which the one or more moving objects are located; and the moving image sequence and the position indication information are encoded into a code stream. The embodiments of the present application improve encoding efficiency, and effectively reduce decoding complexity and power consumption.

Description

动态图像的编解码方法、装置、设备及存储介质Dynamic image encoding and decoding method, device, device and storage medium
本申请要求于2021年04月19日提交的申请号为202110421196.1、发明名称为“动态图像的编解码方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with application number 202110421196.1 filed on April 19, 2021 and the invention title is "Encoding and decoding method, device, device and storage medium for dynamic images", the entire contents of which are incorporated by reference in in this application.
技术领域technical field
本申请实施例涉及编解码技术领域,特别涉及一种动态图像的编解码方法、装置、设备及存储介质。The embodiments of the present application relate to the technical field of encoding and decoding, and in particular, to a method, apparatus, device, and storage medium for encoding and decoding a dynamic image.
背景技术Background technique
动态图像是介于静态图像和视频之间的一种媒体格式,是将一组静态图像按照指定频率切换而产生动态效果的图像。与静态图像相比,动态图像具有多帧图像,该多帧图像之间存在时域相关性。与视频相比,动态图像具有较少的帧间相关性,没有固定的帧率。A dynamic image is a media format between a static image and a video. It is an image that switches a group of static images at a specified frequency to produce a dynamic effect. Compared with static images, dynamic images have multiple frames of images, and there are temporal correlations among the multiple frames of images. Compared with video, dynamic images have less inter-frame correlation and no fixed frame rate.
与视频相比,目前动态图像的编解码器具有轻量化、低功耗的特点,没有采用流式传输。目前应用最为广泛的动态图像的编解码方法是图像互换格式(graphics interchange format,GIF),但是这种编解码方法的编码效率低、编码画质差,越来越难以满足现在对高分辨率动态图像的应用需求。Compared with video, the current dynamic image codec has the characteristics of light weight and low power consumption, and does not use streaming transmission. At present, the most widely used dynamic image encoding and decoding method is the Graphics Interchange Format (GIF). Application requirements for dynamic images.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种动态图像的编解码方法、装置、设备及存储介质,可以提高编码效率,以及降低解码复杂度和功耗。所述技术方案如下:Embodiments of the present application provide a dynamic image encoding and decoding method, apparatus, device, and storage medium, which can improve encoding efficiency and reduce decoding complexity and power consumption. The technical solution is as follows:
第一方面,提供了一种动态图像的编码方法,在该方法中,对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与多个对象一一对应的多个图像区域。基于动态图像,确定运动图像序列,运动图像序列中的每帧图像包括多个对象中的一个或多个运动对象所处的图像区域。基于图像分割掩膜,确定位置指示信息,位置指示信息用于指示一个或多个运动对象所处的图像区域的位置。将运动图像序列以及位置指示信息编入码流。A first aspect provides a method for encoding a dynamic image. In the method, semantic segmentation is performed on any frame of image in the dynamic image to obtain an image segmentation mask, the dynamic image includes a plurality of objects, and the image segmentation mask includes Multiple image regions corresponding to multiple objects one-to-one. Based on the moving images, a moving image sequence is determined, and each frame of the image in the moving image sequence includes an image area where one or more moving objects among the plurality of objects are located. Based on the image segmentation mask, the position indication information is determined, and the position indication information is used to indicate the position of the image area in which the one or more moving objects are located. The moving image sequence and position indication information are encoded into the code stream.
由于动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,且运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置,所以,将该运动图像序列和位置指示信息编入码流,即可在后续解码出动态图像,无需将静止对象所处的图像区域编入码流,提高了编码效率。Since only the image area where the moving object is located in the dynamic image will change, the image area where the stationary object is located will not change, and each frame of image in the moving image sequence includes one or more moving objects among the multiple objects The image area in which it is located, the position indication information is used to indicate the position of the image area where the one or more moving objects are located. For dynamic images, there is no need to encode the image area where the still object is located into the code stream, which improves the encoding efficiency.
由于动态图像中每个对象所在的位置区域基本不变,只有对象自身存在变动,所以本申请实施例可以对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜。通常情况下,可以对动态图像中的第一帧图像进行语义分割,得到图像分割掩膜。Since the location area of each object in the dynamic image is basically unchanged, and only the object itself changes, the embodiment of the present application can perform semantic segmentation on any frame of image in the dynamic image to obtain an image segmentation mask. Usually, the first frame image in the dynamic image can be semantically segmented to obtain an image segmentation mask.
另外,由于图像分割掩膜包括与该多个对象一一对应的多个图像区域,所以,为了便于区分各个对象,该多个对象对应的图像区域通常会采用不同的像素值来表示,同一对象对应的图像区域采用同一像素值来表示。In addition, since the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one, in order to facilitate distinguishing the objects, the image regions corresponding to the multiple objects are usually represented by different pixel values. Corresponding image regions are represented by the same pixel value.
需要说明的是,该动态图像中的每个对象可以为该动态图像中的单个个体。比如,在动态图像包括用户、草地、山坡、河流和天空的情况下,该动态图像中的多个对象包括用户、草地、山坡、河流和天空。It should be noted that each object in the dynamic image may be a single individual in the dynamic image. For example, in the case where the dynamic image includes the user, grass, hillside, river and sky, the multiple objects in the dynamic image include the user, grass, hillside, river and sky.
另外,该动态图像包括的多个对象通常被划分为运动对象和静止对象。运动对象是指自身存在变动的对象,也可以称为处于运动状态的对象。比如,动态图像中的河流内的水存在变动,用户的五官或者肢体存在变动,所以河流和用户可以称为运动对象。静止对象是指自身不存在变动的对象,也可以称为处于静止状态的对象,比如,动态图像中的草地、山坡和天空不存在变动,所以草地、山坡和天空可以称为静止对象。In addition, the plurality of objects included in the dynamic image are generally divided into moving objects and stationary objects. A moving object refers to an object that changes itself, and can also be called an object in a state of motion. For example, the water in the river in the dynamic image changes, and the user's facial features or limbs change, so the river and the user can be called moving objects. A stationary object refers to an object that does not change itself, and can also be called an object in a stationary state. For example, the grass, hillside and sky in a dynamic image do not change, so the grass, hillside and sky can be called stationary objects.
需要说明的是,运动图像序列可以包括与该一个或多个运动对象一一对应的一个或多个子图像序列,也可以为动态图像本身。位置指示信息可以为图像分割掩膜,也可以为该一个或多个运动对象中每个运动对象所处的图像区域的指定位置在动态图像中的坐标。因此,接下来将分为多种情况进行介绍。It should be noted that the moving image sequence may include one or more sub-image sequences corresponding to the one or more moving objects, or may be the moving image itself. The position indication information may be an image segmentation mask, or may be the coordinates in the dynamic image of the specified position of the image region where each moving object of the one or more moving objects is located. Therefore, the following will be divided into various cases to introduce.
第一种情况,运动图像序列包括一个或多个子图像序列,位置指示信息为图像分割掩膜。In the first case, the moving image sequence includes one or more sub-image sequences, and the position indication information is an image segmentation mask.
在第一种情况中,基于动态图像,确定运动图像序列的实现过程包括:基于图像分割掩膜和动态图像,提取出一个或多个子图像序列,该一个或多个子图像序列与一个或多个运动对象一一对应。In the first case, the implementation process of determining the moving image sequence based on the dynamic image includes: based on the image segmentation mask and the dynamic image, extracting one or more sub-image sequences, the one or more sub-image sequences and the one or more sub-image sequences One-to-one correspondence between moving objects.
每个运动对象对应的子图像序列的提取方式相同,因此,可以从该一个或多个运动对象中选择一个运动对象,按照以下操作确定选择的运动对象对应的子图像序列,直至确定出每个运动对象对应的子图像序列为止:基于图像分割掩膜,确定选择的运动对象所在的位置区域,基于选择的运动对象所在的位置区域,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所在的图像区域,得到选择的运动对象对应的子图像序列。The extraction method of the sub-image sequence corresponding to each moving object is the same. Therefore, a moving object can be selected from the one or more moving objects, and the sub-image sequence corresponding to the selected moving object can be determined according to the following operations, until each Up to the sub-image sequence corresponding to the moving object: Based on the image segmentation mask, determine the location area where the selected moving object is located. The image area where the selected moving object is located is extracted from , and the sub-image sequence corresponding to the selected moving object is obtained.
由于图像分割掩膜中包括与该多个对象一一对应的多个图像区域,也就是说,图像分割掩膜中已经划分出该多个对象中每个对象所处的图像区域,而且基于上文描述,图像分割掩膜中同一对象所处的图像区域采用同一像素值来表示,不同对象所处的图像区域采用不同的像素值来表示。因此,基于图像分割掩膜,确定选择的运动对象所在的位置区域的实现过程包括:对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐标集合,该像素坐标集合包括多个像素点的坐标。将选择的运动对象对应的像素坐标集合构成的位置区域确定为选择的运动对象所在的位置区域。Because the image segmentation mask includes multiple image regions corresponding to the multiple objects, that is to say, the image segmentation mask has already divided the image region where each object of the multiple objects is located, and based on the above The paper describes that the image area where the same object is located in the image segmentation mask is represented by the same pixel value, and the image area where different objects are located is represented by different pixel values. Therefore, based on the image segmentation mask, the realization process of determining the location area where the selected moving object is located includes: scanning each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, the pixel coordinate set Include the coordinates of multiple pixels. The location area formed by the set of pixel coordinates corresponding to the selected moving object is determined as the location area where the selected moving object is located.
也即是,通过对图像分割掩膜中的各个像素点进行扫描,从而确定出像素值为选择的运动对象对应的像素值的像素点,将这些像素点的坐标确定为选择的运动对象对应的像素坐标集合,进而能够确定出选择的运动对象所在的位置区域,该位置区域是指选择的运动对象实际所在的位置,该位置区域的边界为选择的运动对象的轮廓。That is, by scanning each pixel in the image segmentation mask, the pixels whose pixel value is the corresponding pixel value of the selected moving object are determined, and the coordinates of these pixel points are determined as the corresponding pixel value of the selected moving object. The set of pixel coordinates can further determine the location area where the selected moving object is located, the location area refers to the actual location of the selected moving object, and the boundary of the location area is the outline of the selected moving object.
通常情况下,运动对象的轮廓构成的区域是不规则区域,即运动对象所在的位置区域不是规则区域,因此,可以直接从动态图像中除第一帧图像之外的每帧图像中提取出运动对象所在的位置区域内的图像区域。当然,在另一些实施例中,也可以将运动对象所在的位置区域处理为规则区域,然后再从动态图像中除第一帧图像之外的每帧图像中提取出该规则区域 内的图像区域。Usually, the area formed by the outline of the moving object is an irregular area, that is, the location area where the moving object is located is not a regular area. Therefore, the motion can be directly extracted from each frame of the dynamic image except the first frame of image. The image area within the location area where the object is located. Of course, in other embodiments, the location area where the moving object is located can also be processed as a regular area, and then the image area within the regular area is extracted from each frame of images in the dynamic image except the first frame of image .
也即是,基于选择的运动对象的位置区域,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所在的图像区域的实现过程包括:从动态图像中除第一帧图像之外的每帧图像中,提取出位于选择的运动对象所在的位置区域内的图像区域。或者,对选择的运动对象所在的位置区域进行扩展,以使扩展后的位置区域为方形区域,从动态图像中除第一帧图像之外的每帧图像中,提取出位于扩展后的位置区域内的图像区域。That is, based on the position area of the selected moving object, the realization process of extracting the image area where the selected moving object is located from each frame of the image except the first frame image in the dynamic image includes: removing the first image area from the dynamic image. In each frame of images other than one frame of image, the image area located in the position area where the selected moving object is located is extracted. Or, expand the position area where the selected moving object is located, so that the expanded position area is a square area, and extract the expanded position area from each frame of the dynamic image except the first frame image. within the image area.
需要说明的是,对运动对象所在的位置区域进行扩展的实现方式包括多种,比如,从选择的运动对象对应的像素坐标集合中确定最小横坐标、最小纵坐标、最大横坐标和最大纵坐标,然后,确定横坐标在最小横坐标与最大横坐标之间,且纵坐标在最小纵坐标与最大纵坐标之间的方形区域,将该方形区域确定为扩展后的位置区域。或者,直接基于运动对象所在的位置区域,绘制该位置区域的外接方形区域,将该外接方形区域确定为扩展后的位置区域。It should be noted that there are various implementations for extending the location area where the moving object is located, for example, determining the minimum abscissa, the minimum ordinate, the maximum abscissa and the maximum ordinate from the set of pixel coordinates corresponding to the selected moving object. , and then, determine a square area where the abscissa is between the minimum abscissa and the maximum abscissa, and the ordinate is between the minimum ordinate and the maximum ordinate, and the square area is determined as the expanded location area. Or, directly based on the location area where the moving object is located, a square area circumscribing the location area is drawn, and the circumscribing square area is determined as the expanded location area.
第二种情况,运动图像序列包括一个或多个子图像序列,位置指示信息包括一个或多个指定位置的坐标。In the second case, the moving image sequence includes one or more sub-image sequences, and the location indication information includes coordinates of one or more designated locations.
在第二种情况中,基于动态图像,确定运动图像序列的实现过程包括:基于图像分割掩膜和动态图像,提取出一个或多个子图像序列,该一个或多个子图像序列与一个或多个运动对象一一对应。此时,基于图像分割掩膜,确定位置指示信息的实现过程包括:基于图像分割掩膜,确定该一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在动态图像中的坐标。In the second case, the implementation process of determining the moving image sequence based on the dynamic image includes: based on the image segmentation mask and the dynamic image, extracting one or more sub-image sequences, the one or more sub-image sequences and the one or more sub-image sequences One-to-one correspondence between moving objects. At this time, based on the image segmentation mask, the implementation process of determining the position indication information includes: based on the image segmentation mask, determining that the specified position in the image area where each moving object of the one or more moving objects is located is in the dynamic image coordinate of.
其中,第二种情况中的内容可以参考上述第一种情况中的相关描述,本申请实施例对此不再进行赘述。For the content in the second case, reference may be made to the relevant description in the above-mentioned first case, which is not repeated in this embodiment of the present application.
需要说明的是,运动对象所处的图像区域内的指定位置可以为坐标最小的位置,也可以为坐标最大的位置,还可以为几何中心点的位置。当然,还可以为其他的位置,本申请实施例对此不做限定。It should be noted that the designated position in the image area where the moving object is located may be the position with the smallest coordinates, the position with the largest coordinates, or the position of the geometric center point. Certainly, other positions may also be used, which are not limited in this embodiment of the present application.
可选地,在第二种情况下,还可以将该一个或多个运动对象的数量编入码流。这样,对于解码端来说,可以基于该一个或多个运动对象的数量,确定该一个或多个子图像序列中是否存在传输失败的子图像序列,从而保证动态图像解码的可靠性。Optionally, in the second case, the number of the one or more moving objects may also be encoded into the code stream. In this way, for the decoding end, based on the number of the one or more moving objects, it can be determined whether there is a sub-image sequence that fails to transmit in the one or more sub-image sequences, thereby ensuring the reliability of dynamic image decoding.
第三种情况,运动图像序列为动态图像,位置指示信息为图像分割掩膜。In the third case, the moving image sequence is a moving image, and the position indication information is an image segmentation mask.
第四种情况,运动图像序列为动态图像,位置指示信息为图像分割掩膜。此时,该方法还包括:基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域,按照该多个分割区域,对动态图像中除第一帧图像之外的每帧图像进行区域划分,得到多个图像区域。确定该多个分割区域中每个分割区域对应的对象状态,对象状态包括静止状态或运动状态。这样,将运动图像序列编入码流的实现过程包括:将该多个图像区域编入码流。该方法还包括:将该多个分割区域中每个分割区域对应的对象状态编入码流。In the fourth case, the moving image sequence is a moving image, and the position indication information is an image segmentation mask. At this time, the method further includes: determining, based on the image segmentation mask, a plurality of segmentation regions corresponding to the plurality of objects one-to-one; The image is divided into regions to obtain multiple image regions. An object state corresponding to each of the plurality of divided regions is determined, and the object state includes a static state or a motion state. In this way, the implementation process of encoding the moving image sequence into the code stream includes: encoding the plurality of image regions into the code stream. The method further includes: encoding the object state corresponding to each of the plurality of divided regions into the code stream.
基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域的实现过程包括:基于图像分割掩膜,确定该多个对象中每个对象所在的位置区域。在该多个对象中任一对象所在的位置区域不包含整数个编码树单元(coder tree unit,CTU)的情况下,对该任一对象所在的位置区域的边界进行扩展,以使任一对象所在的位置区域包含整数个CTU。将扩展处理后该多个对象所在的位置区域,确定为该多个分割区域。Based on the image segmentation mask, an implementation process of determining a plurality of segmentation regions corresponding to the plurality of objects one-to-one includes: determining, based on the image segmentation mask, a location region where each object of the plurality of objects is located. In the case where the location area where any one of the multiple objects is located does not contain an integer number of coding tree units (coder tree units, CTUs), the boundary of the location area where any object is located is extended, so that any object is located The location area you are in contains an integer number of CTUs. The location areas where the multiple objects are located after the expansion processing are determined as the multiple divided areas.
也即是,在进行扩展处理后,每个对象所在的位置区域包括整数个CTU。此时,可以将 扩展处理后的位置区域确定该分割区域。也就是说,该多个分割区域中的每个分割区域均包括整数个CTU。That is, after the expansion processing is performed, the location area where each object is located includes an integer number of CTUs. At this time, the position area after the expansion process can be determined as the divided area. That is, each of the plurality of divided areas includes an integer number of CTUs.
在这种情况下,将该多个图像区域编入码流的实现过程包括:将该多个图像区域中的每个图像区域分别作为一个编码块编入码流。或者,将该多个图像区域中每个图像区域内的每一行CTU组成的区域作为一个编码块编入码流。其中,参考编码块所处的位置区域位于被参考编码块所处的位置区域内。In this case, the implementation process of encoding the multiple image areas into the code stream includes: encoding each image area in the multiple image areas as an encoding block into the code stream respectively. Or, an area composed of each row of CTUs in each of the multiple image areas is encoded into the code stream as a code block. Wherein, the location area where the reference coding block is located is located in the location area where the referenced coding block is located.
由于每个图像区域包括整数个CTU,因此,将整个图像区域(tile)作为一个编码块单独编入码流,或者将每个图像区域内的每一行CTU组成的区域(slice)作为一个编码块单独编入码流,这样在后续解码时可以单独进行解码。Since each image area includes an integer number of CTUs, the entire image area (tile) is coded as a coding block into the code stream separately, or the area (slice) composed of each row of CTUs in each image area is used as a coding block. Encoded into the code stream separately, so that it can be decoded separately during subsequent decoding.
另外,对于某个编码块来说,这个编码块的解码可能需要参考当前帧之前的某一帧图像中的编码块,也即是,当前帧中的某个编码块的解码依赖于参考帧中的编码块,因此,为了能够成功解码,这里需要限定参考帧中的编码块所在的位置区域需要位于当前帧的编码块所在的位置区域内,这样才能在参考编码块的基础上解码当前编码块。In addition, for a coding block, the decoding of this coding block may need to refer to the coding block in a certain frame image before the current frame, that is, the decoding of a coding block in the current frame depends on the reference frame. Therefore, in order to be able to decode successfully, it is necessary to limit the location area of the encoding block in the reference frame to be located in the location area of the encoding block of the current frame, so that the current encoding block can be decoded on the basis of the reference encoding block. .
对于上述四种情况来说,还可以将动态图像的第一帧图像编入码流。For the above four cases, the first frame image of the dynamic image can also be encoded into the code stream.
第二方面,提供了一种动态图像的解码方法,在该方法中,从码流中解析出第一帧图像,从码流中解析出运动图像序列和位置指示信息,该运动图像序列中的每帧图像包括一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置。基于该运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。In a second aspect, a method for decoding a moving image is provided. In the method, a first frame image is parsed from a code stream, and a moving image sequence and position indication information are parsed from the code stream. Each frame of image includes an image area where one or more moving objects are located, and the position indication information is used to indicate the location of the image area where the one or more moving objects are located. Based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located is rendered and displayed in the first frame of image to obtain a moving image.
也即是,在进行动态图像的解码时,在解码出第一帧图像之后,对于后续的图像只需要解码出运动对象所处的图像区域,无需解码静止对象所处的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。That is, when decoding a dynamic image, after the first frame of image is decoded, only the image area where the moving object is located for subsequent images needs to be decoded, and there is no need to decode the image area where the still object is located, which effectively reduces the need for decoding. Decoding complexity and power consumption. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
需要说明的是,运动图像序列可以包括与该一个或多个运动对象一一对应的一个或多个子图像序列,也可以为动态图像本身。位置指示信息可以为图像分割掩膜,也可以为该一个或多个运动对象中每个运动对象所处的图像区域的指定位置在动态图像中的坐标。因此,接下来将分为多种情况进行介绍。It should be noted that the moving image sequence may include one or more sub-image sequences corresponding to the one or more moving objects, or may be the moving image itself. The position indication information may be an image segmentation mask, or may be the coordinates in the dynamic image of the specified position of the image region where each moving object of the one or more moving objects is located. Therefore, the following will be divided into various cases to introduce.
第一种情况,运动图像序列包括一个或多个子图像序列,该一个或多个子图像序列与一个或多个运动对象一一对应。位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象。In the first case, the moving image sequence includes one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with one or more moving objects. The position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects.
在第一种情况中,基于运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示的实现过程包括:从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置。按照选择的运动对象所处的图像区域的位置,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。In the first case, based on the moving image sequence and the position indication information, the implementation process of rendering and displaying the image area where the one or more moving objects are located in the first frame image includes: from the one or more moving objects Select a moving object from the moving objects, and render and display the image area where the selected moving object is located according to the following operations, until the image area where each moving object is located is rendered and displayed: Based on the image segmentation mask, determine The position of the image area where the selected moving object is located. According to the position of the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is rendered and displayed in the first frame of image.
由于图像分割掩膜中包括与该多个对象一一对应的多个图像区域,也就是说,图像分割掩膜中已经划分出该多个对象中每个对象所处的图像区域,而且基于上文描述,图像分割掩 膜中同一对象所处的图像区域采用同一像素值来表示,不同对象所处的图像区域采用不同的像素值来表示。因此,基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置的实现过程包括:对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐标集合,该像素坐标集合包括多个像素点的坐标。将该像素坐标集合构成的位置区域确定为选择的运动对象所处的图像区域的位置,或者,对该像素坐标集合构成的位置区域进行扩展,以使扩展后的位置区域为方形区域,将扩展后的位置区域确定为选择的运动对象所处的图像区域的位置。Because the image segmentation mask includes multiple image regions corresponding to the multiple objects, that is to say, the image segmentation mask has already divided the image region where each object of the multiple objects is located, and based on the above The paper describes that the image area where the same object is located in the image segmentation mask is represented by the same pixel value, and the image area where different objects are located is represented by different pixel values. Therefore, based on the image segmentation mask, the realization process of determining the position of the image area where the selected moving object is located includes: scanning each pixel in the image segmentation mask to obtain a set of pixel coordinates corresponding to the selected moving object. The pixel coordinate set includes the coordinates of a plurality of pixel points. The location area formed by the pixel coordinate set is determined as the position of the image area where the selected moving object is located, or the location area formed by the pixel coordinate set is expanded, so that the expanded location area is a square area, and the expanded location area is a square area. The latter position area is determined as the position of the image area where the selected moving object is located.
也即是,通过对图像分割掩膜中的各个像素点进行扫描,从而确定出像素值为选择的运动对象对应的像素值的像素点,将这些像素点的坐标确定为选择的运动对象对应的像素坐标集合,进而能够确定出选择的运动对象所处的图像区域在动态图像中的位置。That is, by scanning each pixel in the image segmentation mask, the pixels whose pixel value is the corresponding pixel value of the selected moving object are determined, and the coordinates of these pixel points are determined as the corresponding pixel value of the selected moving object. The set of pixel coordinates can then determine the position of the image area where the selected moving object is located in the dynamic image.
通常情况下,运动对象的轮廓构成的区域是不规则区域,即该像素坐标集合构成的位置区域不是规则区域,因此,在一些实施例中,可以直接将运动对象对应的像素坐标集合构成的位置区域确定为运动对象所处的图像区域在动态图像中的位置。当然,在另一些实施例中,也可以将该像素坐标集合构成的位置区域处理为规则区域,然后再将该规则区域的位置确定为运动对象所处的图像区域在动态图像中的位置。Usually, the area formed by the outline of the moving object is an irregular area, that is, the location area formed by the pixel coordinate set is not a regular area. Therefore, in some embodiments, the location formed by the pixel coordinate set corresponding to the moving object can be directly The area is determined as the position of the image area where the moving object is located in the dynamic image. Of course, in other embodiments, the position area formed by the pixel coordinate set may also be processed as a regular area, and then the position of the regular area is determined as the position of the image area where the moving object is located in the dynamic image.
第二种情况,运动图像序列包括一个或多个子图像序列,该一个或多个子图像序列与该一个或多个运动对象一一对应。位置指示信息包括该一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在动态图像中的坐标。In the second case, the moving image sequence includes one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects. The position indication information includes the coordinates in the dynamic image of a specified position within the image area where each of the one or more moving objects is located.
在第二种情况中,基于运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示的实现过程包括:从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:按照选择的运动对象所处的图像区域的指定位置在动态图像中的坐标,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。In the second case, based on the moving image sequence and the position indication information, the implementation process of rendering and displaying the image area where the one or more moving objects are located in the first frame image includes: from the one or more moving objects Select a moving object among the moving objects, and render and display the image area where the selected moving object is located according to the following operations, until the image area where each moving object is located is rendered and displayed: The coordinates of the specified position of the image area in the dynamic image, and the image area included in the sub-image sequence corresponding to the selected moving object is rendered and displayed in the first frame of image.
由于编码端是直接将每个运动对象所处的图像区域内的指定位置在动态图像中的坐标编入码流,因此,在本申请实施例中,从码流中解析出选择的运动对象对应的指定位置的坐标之后,即可直接在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示,提高了图像的重构速度。Since the encoding end directly encodes the coordinates of the specified position in the dynamic image in the image area where each moving object is located into the code stream, therefore, in the embodiment of the present application, the selected moving object corresponding to the selected moving object is parsed from the code stream. After the coordinates of the specified position are determined, the image area included in the sub-image sequence corresponding to the selected moving object can be directly rendered and displayed in the first frame image, which improves the image reconstruction speed.
需要说明的是,运动对象所处的图像区域内的指定位置可以为坐标最小的位置,也可以为坐标最大的位置,还可以为几何中心点的位置。当然,还可以为其他的位置,本申请实施例对此不做限定。It should be noted that the designated position in the image area where the moving object is located may be the position with the smallest coordinates, the position with the largest coordinates, or the position of the geometric center point. Certainly, other positions may also be used, which are not limited in this embodiment of the present application.
在编码端将该一个或多个运动对象的数量编入码流的情况下,本申请实施例还可以从码流中解析出该一个或多个运动对象的数量。这样,通过将该一个或多个运动对象的数量与该一个或多个子图像序列的数量进行比较,可以确定出该一个或多个子图像序列中是否存在传输失败的子图像序列,从而提高了动态图像解码的可靠性。When the encoding end encodes the number of the one or more moving objects into the code stream, the embodiment of the present application may also parse the number of the one or more moving objects from the code stream. In this way, by comparing the number of the one or more moving objects with the number of the one or more sub-image sequences, it can be determined whether there is a sub-image sequence that fails to transmit in the one or more sub-image sequences, thereby improving dynamic performance. Reliability of image decoding.
第三种情况,运动图像序列为动态图像,位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象。In the third case, the moving image sequence is a dynamic image, and the position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more motions object.
在第三种情况中,基于运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示的实现过程包括:从该一个或多个运动对象中选择 一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置,基于选择的运动对象所处的图像区域的位置,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所处的图像区域。按照选择的运动对象所处的图像区域的位置,在第一帧图像中对选择的运动对象在动态图像的每帧图像中所处的图像区域进行渲染并显示。In the third case, based on the moving image sequence and the position indication information, the implementation process of rendering and displaying the image area where the one or more moving objects are located in the first frame image includes: from the one or more moving objects Select a moving object from the moving objects, and render and display the image area where the selected moving object is located according to the following operations, until the image area where each moving object is located is rendered and displayed: Based on the image segmentation mask, determine The position of the image area where the selected moving object is located, based on the position of the image area where the selected moving object is located, extract the image area where the selected moving object is located from each frame of the dynamic image except the first frame image. image area. According to the position of the image area where the selected moving object is located, the image area where the selected moving object is located in each frame of the dynamic image is rendered and displayed in the first frame of image.
第四种情况,位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象。此时,从码流中解析出运动图像序列的实现过程包括:基于图像分割掩膜,确定与多个对象一一对应的多个分割区域,从码流中解析出该多个分割区域中每个分割区域对应的对象状态,该对象状态包括静止状态或运动状态。基于该多个分割区域中每个分割区域对应的对象状态,从码流中解析出运动状态对应的分割区域所划分出的图像区域,得到运动图像序列。In the fourth case, the position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects. At this time, the realization process of parsing the moving image sequence from the code stream includes: based on the image segmentation mask, determining a plurality of segmented regions corresponding to multiple objects one-to-one, and parsing out each of the multiple segmented regions from the codestream The object state corresponding to each segmented region, and the object state includes a static state or a moving state. Based on the object state corresponding to each of the plurality of divided areas, the image area divided by the divided area corresponding to the motion state is parsed from the code stream, and the moving image sequence is obtained.
基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域的实现过程包括:基于图像分割掩膜,确定该多个对象中每个对象所在的位置区域。在该多个对象中任一对象所在的位置区域不包含整数个CTU的情况下,对该任一对象所在的位置区域的边界进行扩展,以使任一对象所在的位置区域包含整数个CTU。将扩展处理后该多个对象所在的位置区域,确定为该多个分割区域。The realization process of determining a plurality of segmentation regions corresponding to the plurality of objects one-to-one based on the image segmentation mask includes: determining a location region where each object of the plurality of objects is located based on the image segmentation mask. If the location area where any object is located among the plurality of objects does not include an integer number of CTUs, the boundary of the location area where any object is located is extended so that the location area where any object is located includes an integer number of CTUs. The location areas where the multiple objects are located after the expansion processing are determined as the multiple divided areas.
也即是,在进行扩展处理后,每个对象所在的位置区域包括整数个CTU。此时,可以将扩展处理后的位置区域确定该分割区域。也就是说,该多个分割区域中的每个分割区域均包括整数个CTU。That is, after the expansion processing is performed, the location area where each object is located includes an integer number of CTUs. At this time, the position area after the expansion process can be determined as the divided area. That is, each of the plurality of divided areas includes an integer number of CTUs.
对于上述四种情况来说,上述提到的一个或多个运动对象可以为动态图像包括的多个对象中的所有运动对象。当然,该一个或多个运动对象也可以为该多个对象中的部分运动对象。也即是,对于动态图像中的运动对象来说,在解码端还可以确定这些运动对象是全部处于运动状态,还是需要再筛选出一部分对象处于运动状态。For the above four cases, the one or more moving objects mentioned above may be all moving objects among the multiple objects included in the dynamic image. Of course, the one or more moving objects may also be part of the moving objects among the multiple objects. That is, for moving objects in a dynamic image, the decoding end can also determine whether all these moving objects are in a moving state, or it is necessary to filter out a part of the objects that are in a moving state.
即,接收对象选择指令,该对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象。将通过对象选择指令所选择的一个或多个对象确定为上述步骤中的一个或多个运动对象。That is, an object selection instruction for selecting one or more objects from a plurality of objects included in the dynamic image is received. One or more objects selected by the object selection instruction are determined as one or more moving objects in the above steps.
其中,该对象选择指令可以由用户基于第一帧图像触发,比如,第一帧图像中标注有动态图像中的所有运动对象,用户可以在第一帧图像中的所有运动对象中选择部分或全部对象,选择出的对象即为上述步骤中的一个或多个运动对象。The object selection instruction may be triggered by the user based on the first frame of image. For example, the first frame of image is marked with all moving objects in the dynamic image, and the user can select some or all of all the moving objects in the first frame of image. object, the selected object is one or more moving objects in the above steps.
另外,编码端进行编码的编码器类型可以为编码端和解码端事先约定的,也可以为编码端的用户选择的。在用户选择的情况下,编码端还需要将进行编码的编码器类型编入码流。对于解码端来说,还需要从码流中解析出用于进行编码的编码器类型,按照解析出的编码器类型,确定对应的解码器类型,从而按照确定的解码器类型,从码流中解析出上述的图像或图像序列。In addition, the encoder type for encoding at the encoding end may be pre-agreed by the encoding end and the decoding end, or may be selected by the user of the encoding end. In the case of user selection, the encoding end also needs to encode the encoder type for encoding into the code stream. For the decoding end, it is also necessary to parse out the encoder type used for encoding from the code stream, and determine the corresponding decoder type according to the parsed encoder type, so that according to the determined decoder type, from the code stream Parse out the above image or image sequence.
第三方面,提供了一种动态图像的编码装置,所述编码装置具有实现上述第一方面中动态图像的编码方法行为的功能。所述编码装置包括至少一个模块,该至少一个模块用于实现上述第一方面所提供的动态图像的编码方法。A third aspect provides an apparatus for encoding a moving image, the encoding apparatus having a function of implementing the behavior of the method for encoding a moving image in the first aspect. The encoding device includes at least one module, and the at least one module is configured to implement the dynamic image encoding method provided in the first aspect above.
第四方面,提供了一种动态图像的解码装置,所述解码装置具有实现上述第二方面中动态图像的解码方法行为的功能。所述解码装置包括至少一个模块,该至少一个模块用于实现上述第二方面所提供的动态图像的解码方法。In a fourth aspect, a moving image decoding apparatus is provided, and the decoding apparatus has a function of implementing the behavior of the moving image decoding method in the second aspect. The decoding apparatus includes at least one module, and the at least one module is configured to implement the dynamic image decoding method provided in the second aspect above.
第五方面,提供了一种编码端设备,所述编码端设备包括处理器和存储器,所述存储器用于存储执行上述第一方面所提供的动态图像的编码方法的程序。所述处理器被配置为用于执行所述存储器中存储的程序,以实现上述第一方面提供的动态图像的编码方法。In a fifth aspect, an encoding end device is provided, the encoding end device includes a processor and a memory, and the memory is used for storing a program for executing the dynamic image encoding method provided in the first aspect. The processor is configured to execute the program stored in the memory, so as to implement the dynamic image encoding method provided in the first aspect.
可选地,所述编码端设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。Optionally, the encoding end device may further include a communication bus, and the communication bus is used to establish a connection between the processor and the memory.
第六方面,提供了一种解码端设备,所述解码端设备包括处理器和存储器,所述存储器用于存储执行上述第二方面所提供的动态图像的解码方法的程序。所述处理器被配置为用于执行所述存储器中存储的程序,以实现上述第二方面提供的动态图像的解码方法。In a sixth aspect, a decoding end device is provided, and the decoding end device includes a processor and a memory, and the memory is used for storing a program for executing the dynamic image decoding method provided in the second aspect. The processor is configured to execute the program stored in the memory, so as to implement the dynamic image decoding method provided in the second aspect.
可选地,所述解码端设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。Optionally, the decoding end device may further include a communication bus, and the communication bus is used to establish a connection between the processor and the memory.
第七方面,提供了一种计算机可读存储介质,所述存储介质内存储有指令,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的动态图像的编码方法的步骤,或者执行上述第二方面所述的动态图像的解码方法的步骤。In a seventh aspect, a computer-readable storage medium is provided, and instructions are stored in the storage medium, and when the instructions are executed on a computer, the computer is made to execute the steps of the dynamic image encoding method described in the first aspect above. , or perform the steps of the method for decoding a moving image described in the second aspect above.
第八方面,提供了一种包含指令的计算机程序产品,当所述指令在计算机上运行时,使得计算机执行上述第一方面所述的动态图像的编码方法的步骤,或者执行上述第二方面所述的动态图像的解码方法的步骤。In an eighth aspect, there is provided a computer program product comprising instructions, which, when the instructions are run on a computer, cause the computer to execute the steps of the method for encoding a dynamic image described in the first aspect above, or execute the steps of the method for encoding a dynamic image in the second aspect above. The steps of the decoding method of the moving picture described above.
上述第三方面、第四方面、第五方面、第六方面、第七方面和第八方面所获得的技术效果与第一方面或第二方面中对应的技术手段获得的技术效果近似,在这里不再赘述。The technical effects obtained by the third aspect, the fourth aspect, the fifth aspect, the sixth aspect, the seventh aspect and the eighth aspect are similar to the technical effects obtained by the corresponding technical means in the first aspect or the second aspect, here No longer.
本申请实施例提供的技术方案至少可以带来以下有益效果:The technical solutions provided in the embodiments of the present application can at least bring the following beneficial effects:
由于动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,且运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置,所以,将该运动图像序列和位置指示信息编入码流,即可在后续解码出动态图像,无需将静止对象所处的图像区域编入码流,提高了编码效率。Since only the image area where the moving object is located in the dynamic image will change, the image area where the stationary object is located will not change, and each frame of image in the moving image sequence includes one or more moving objects among the multiple objects The image area in which it is located, the position indication information is used to indicate the position of the image area where the one or more moving objects are located. For dynamic images, there is no need to encode the image area where the still object is located into the code stream, which improves the encoding efficiency.
附图说明Description of drawings
图1是本申请实施例提供的一种实施环境的示意图;1 is a schematic diagram of an implementation environment provided by an embodiment of the present application;
图2是本申请实施例提供的一种示例性的实施环境的示意图;FIG. 2 is a schematic diagram of an exemplary implementation environment provided by an embodiment of the present application;
图3是本申请实施例提供的一种编码器的示意性结构框图;3 is a schematic structural block diagram of an encoder provided by an embodiment of the present application;
图4是本申请实施例提供的一种解码器的示意性结构框图;4 is a schematic structural block diagram of a decoder provided by an embodiment of the present application;
图5是本申请实施例提供的第一种动态图像的编码方法的流程图;5 is a flowchart of a first dynamic image encoding method provided by an embodiment of the present application;
图6是本申请实施例提供的第一种动态图像的解码方法的流程图;6 is a flowchart of a first dynamic image decoding method provided by an embodiment of the present application;
图7是本申请实施例提供的第一种示例性编解码方法的框图;7 is a block diagram of a first exemplary encoding and decoding method provided by an embodiment of the present application;
图8是本申请实施例提供的第二种示例性编解码方法的框图;8 is a block diagram of a second exemplary encoding and decoding method provided by an embodiment of the present application;
图9是本申请实施例提供的第二种动态图像的编码方法的流程图;9 is a flowchart of a second dynamic image encoding method provided by an embodiment of the present application;
图10是本申请实施例提供的第二种动态图像的解码方法的流程图;10 is a flowchart of a second method for decoding a dynamic image provided by an embodiment of the present application;
图11是本申请实施例提供的第三种示例性编解码方法的框图;11 is a block diagram of a third exemplary encoding and decoding method provided by an embodiment of the present application;
图12是本申请实施例提供的第四种示例性编解码方法的框图;12 is a block diagram of a fourth exemplary encoding and decoding method provided by an embodiment of the present application;
图13是本申请实施例提供的第三种动态图像的编码方法的流程图;13 is a flowchart of a third dynamic image encoding method provided by an embodiment of the present application;
图14是本申请实施例提供的第三种动态图像的解码方法的流程图;14 is a flowchart of a third dynamic image decoding method provided by an embodiment of the present application;
图15是本申请实施例提供的第四种动态图像的编码方法的流程图;15 is a flowchart of a fourth dynamic image encoding method provided by an embodiment of the present application;
图16是本申请实施例提供的第四种动态图像的解码方法的流程图;16 is a flowchart of a fourth dynamic image decoding method provided by an embodiment of the present application;
图17是本申请实施例提供的第五种示例性编解码方法的框图;17 is a block diagram of a fifth exemplary encoding and decoding method provided by an embodiment of the present application;
图18是本申请实施例提供的一种动态图像的编码装置的结构示意图;18 is a schematic structural diagram of an apparatus for encoding a dynamic image provided by an embodiment of the present application;
图19是本申请实施例提供的一种动态图像的解码装置的结构示意图;19 is a schematic structural diagram of an apparatus for decoding a dynamic image provided by an embodiment of the present application;
图20是本申请实施例提供的一种编解码装置的示意性框图。FIG. 20 is a schematic block diagram of an encoding and decoding apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
在对本申请实施例提供的动态图像的编解码方法进行详细地解释说明之前,先对本申请实施例涉及的术语和实施环境进行介绍。Before explaining the dynamic image encoding and decoding method provided by the embodiments of the present application in detail, the terms and implementation environments involved in the embodiments of the present application are first introduced.
为了便于理解,首先对本申请实施例涉及的术语进行解释。For ease of understanding, the terms involved in the embodiments of the present application are explained first.
编码:是指将待编码图像压缩成码流的处理过程。其中,编码主要分为图像编码和视频编码。图像编码是将待编码的静态图像压缩成码流的处理过程,视频编码是将待编码的视频包括的图像序列压缩成码流的处理过程。Encoding: refers to the process of compressing the image to be encoded into a code stream. Among them, coding is mainly divided into image coding and video coding. Image encoding is a process of compressing a still image to be encoded into a code stream, and video encoding is a process of compressing a sequence of images included in a video to be encoded into a code stream.
动态图像是将一组静态图像按照指定频率切换而产生动态效果的图像,在本申请实施例中,将动态图像的编码分为静态图像的编码和视频的编码。A dynamic image is an image in which a group of static images is switched according to a specified frequency to generate a dynamic effect. In the embodiment of the present application, the encoding of the dynamic image is divided into the encoding of the static image and the encoding of the video.
需要说明的是,静态图像被压缩成码流之后也可以称为经编码的静态图像,视频被压缩成码流之后也可以称为经编码的视频。同理,对于动态图像的编码来说,动态图像被压缩成码流之后也可以称为经编码的动态图像。It should be noted that, after a static image is compressed into a code stream, it may also be called an encoded still image, and after a video is compressed into a code stream, it may also be called an encoded video. Similarly, for the encoding of dynamic images, the dynamic images can also be called encoded dynamic images after being compressed into a code stream.
解码:是指将编码码流按照特定的语法规则和处理方法恢复成重建图像的处理过程。其中,解码主要分为图像码流的解码和视频码流的解码。图像码流的解码是指将图像码流恢复成重建图像的处理过程,视频码流的解码是指将视频码流恢复成重建视频的处理过程。Decoding: refers to the process of restoring the encoded code stream into a reconstructed image according to specific grammar rules and processing methods. Among them, the decoding is mainly divided into the decoding of the image code stream and the decoding of the video code stream. The decoding of the image code stream refers to the process of restoring the image code stream into a reconstructed image, and the decoding of the video code stream refers to the process of restoring the video code stream into a reconstructed video.
子图像序列:是指从图像序列包括的每帧图像中提取出的图像区域的序列。Sub-image sequence: refers to a sequence of image regions extracted from each frame of image included in the image sequence.
编码块:是指将待编码图像进行划分后得到的编码区域,一帧图像可被划分为多个编码块,这多个编码块共同组成该帧图像。其中,每个编码块能够独立编码。Coding block: refers to the coding area obtained by dividing the image to be coded. A frame of image can be divided into multiple coding blocks, and the multiple coding blocks together form the frame image. Among them, each coding block can be independently coded.
其中,编码块可以由瓦片(tile)组成,也可以由条带(slice)组成,一个tile包括至少一 个编码树单元(coding tree unit,CTU),一个slice包括多个CTU。The coding block may be composed of tiles or slices. One tile includes at least one coding tree unit (coding tree unit, CTU), and one slice includes multiple CTUs.
接下来对本申请实施例涉及的实施环境进行介绍。Next, the implementation environment involved in the embodiments of the present application will be introduced.
请参考图1,图1是本申请实施例提供的一种实施环境的示意图。该实施环境包括源装置10、目的地装置20、链路30和存储装置40。其中,源装置10可以产生经编码的动态图像。因此,源装置10也可以被称为动态图像编码装置。目的地装置20可以对由源装置10所产生的经编码的动态图像进行解码。因此,目的地装置20也可以被称为动态图像解码装置。链路30可以接收源装置10所产生的经编码的动态图像,并可以将该经编码的动态图像传输给目的地装置20。存储装置40可以接收源装置10所产生的经编码的动态图像,并可以将该经编码的动态图像进行存储,这样的条件下,目的地装置20可以直接从存储装置40中获取经编码的动态图像。或者,存储装置40可以对应于文件服务器或可以保存由源装置10产生的经编码的动态图像的另一中间存储装置,这样的条件下,目的地装置20可以经由流式传输或下载存储装置40存储的经编码的动态图像。Please refer to FIG. 1 , which is a schematic diagram of an implementation environment provided by an embodiment of the present application. The implementation environment includes source device 10 , destination device 20 , link 30 and storage device 40 . Therein, the source device 10 may generate an encoded dynamic image. Therefore, the source device 10 may also be referred to as a moving image coding device. The destination device 20 may decode the encoded moving image generated by the source device 10 . Therefore, the destination device 20 may also be referred to as a moving picture decoding device. Link 30 may receive the encoded dynamic image generated by source device 10 and may transmit the encoded dynamic image to destination device 20 . The storage device 40 can receive the encoded dynamic image generated by the source device 10, and can store the encoded dynamic image. Under such conditions, the destination device 20 can directly obtain the encoded dynamic image from the storage device 40. image. Alternatively, storage device 40 may correspond to a file server or another intermediate storage device that may hold encoded dynamic images generated by source device 10, in which case destination device 20 may transmit or download storage device 40 via streaming or download Stored encoded moving images.
源装置10和目的地装置20均可以包括一个或多个处理器以及耦合到该一个或多个处理器的存储器,该存储器可以包括随机存取存储器(random access memory,RAM)、只读存储器(read-only memory,ROM)、带电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、快闪存储器、可用于以可由计算机存取的指令或数据结构的形式存储所要的程序代码的任何其它媒体等。例如,源装置10和目的地装置20均可以包括桌上型计算机、移动计算装置、笔记型(例如,膝上型)计算机、平板计算机、机顶盒、例如所谓的“智能”电话等电话手持机、电视机、相机、显示装置、数字媒体播放器、视频游戏控制台、车载计算机或其类似者。 Source device 10 and destination device 20 may each include one or more processors and memory coupled to the one or more processors, which may include random access memory (RAM), read only memory ( read-only memory, ROM), electrically erasable programmable read-only memory (EEPROM), flash memory, which can be used to store desired programs in the form of instructions or data structures that can be accessed by a computer any other medium of code, etc. For example, both source device 10 and destination device 20 may include desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, telephone handsets such as so-called "smart" phones, Televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers or the like.
链路30可以包括能够将经编码的动态图像从源装置10传输到目的地装置20的一个或多个媒体或装置。在一种可能的实现方式中,链路30可以包括能够使源装置10实时地将经编码的动态图像直接发送到目的地装置20的一个或多个通信媒体。在本申请实施例中,源装置10可以根据通信标准来调制经编码的动态图像,该通信标准可以为无线通信协议等,并且可以将经调制的动态图像发送给目的地装置20。该一个或多个通信媒体可以包括无线和/或有线通信媒体,例如该一个或多个通信媒体可以包括射频(radio frequency,RF)频谱或一个或多个物理传输线。该一个或多个通信媒体可以形成基于分组的网络的一部分,基于分组的网络可以为局域网、广域网或全球网络(例如,因特网)等。该一个或多个通信媒体可以包括路由器、交换器、基站或促进从源装置10到目的地装置20的通信的其它设备等,本申请实施例对此不做具体限定。Link 30 may include one or more media or devices capable of transmitting encoded moving images from source device 10 to destination device 20 . In one possible implementation, link 30 may include one or more communication media that enable source device 10 to transmit encoded dynamic images directly to destination device 20 in real-time. In the embodiment of the present application, the source device 10 may modulate the encoded moving image according to a communication standard, which may be a wireless communication protocol, etc., and may transmit the modulated moving image to the destination device 20 . The one or more communication media may include wireless and/or wired communication media, eg, the one or more communication media may include a radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, which may be a local area network, a wide area network, or a global network (eg, the Internet), among others. The one or more communication media may include routers, switches, base stations, or other devices that facilitate communication from the source device 10 to the destination device 20, etc., which are not specifically limited in this embodiment of the present application.
在一种可能的实现方式中,存储装置40可以将接收到的由源装置10发送的经编码的动态图像进行存储,目的地装置20可以直接从存储装置40中获取经编码的动态图像。这样的条件下,存储装置40可以包括多种分布式或本地存取的数据存储媒体中的任一者,例如,该多种分布式或本地存取的数据存储媒体中的任一者可以为硬盘驱动器、蓝光光盘、数字多功能光盘(digital versatile disc,DVD)、只读光盘(compact disc read-only memory,CD-ROM)、快闪存储器、易失性或非易失性存储器,或用于存储经编码动态图像的任何其它合适的数字存储媒体等。In a possible implementation manner, the storage device 40 may store the received encoded dynamic images sent by the source device 10 , and the destination device 20 may directly acquire the encoded dynamic images from the storage device 40 . Under such conditions, storage device 40 may include any of a variety of distributed or locally-accessed data storage media, for example, any of the plurality of distributed or locally-accessed data storage media may be Hard disk drive, Blu-ray disc, digital versatile disc (DVD), compact disc read-only memory (CD-ROM), flash memory, volatile or nonvolatile memory, or use Any other suitable digital storage medium for storing encoded moving images, etc.
在一种可能的实现方式中,存储装置40可以对应于文件服务器或可以保存由源装置10 产生的经编码动态图像的另一中间存储装置,目的地装置20可经由流式传输或下载存储装置40存储的动态图像。文件服务器可以为能够存储经编码的动态图像并且将经编码的动态图像发送给目的地装置20的任意类型的服务器。在一种可能的实现方式中,文件服务器可以包括网络服务器、文件传输协议(file transfer protocol,FTP)服务器、网络附属存储(network attached storage,NAS)装置或本地磁盘驱动器等。目的地装置20可以通过任意标准数据连接(包括因特网连接)来获取经编码动态图像。任意标准数据连接可以包括无线信道(例如,Wi-Fi连接)、有线连接(例如,数字用户线路(digital subscriber line,DSL)、电缆调制解调器等),或适合于获取存储在文件服务器上的经编码的动态图像的两者的组合。经编码的动态图像从存储装置40的传输可为流式传输、下载传输或两者的组合。In one possible implementation, storage device 40 may correspond to a file server or another intermediate storage device that may hold encoded dynamic images generated by source device 10, destination device 20 may store via streaming or download 40 stored dynamic images. The file server may be any type of server capable of storing encoded moving images and transmitting the encoded moving images to the destination device 20 . In a possible implementation manner, the file server may include a network server, a file transfer protocol (FTP) server, a network attached storage (NAS) device, a local disk drive, or the like. The destination device 20 may acquire the encoded moving images over any standard data connection, including an Internet connection. Any standard data connection may include a wireless channel (eg, a Wi-Fi connection), a wired connection (eg, a digital subscriber line (DSL), cable modem, etc.), or suitable for obtaining encoded data stored on a file server A combination of both of the dynamic images. Transmission of the encoded moving images from storage device 40 may be streaming, download transmission, or a combination of the two.
图1所示的实施环境仅为一种可能的实现方式,并且本申请实施例的技术不仅可以适用于图1所示的可以对动态图像进行编码的源装置10,以及可以对经编码的动态图像进行解码的目的地装置20,还可以适用于其他可以对动态图像进行编码和对经编码的动态图像进行解码的装置,本申请实施例对此不做具体限定。The implementation environment shown in FIG. 1 is only a possible implementation manner, and the techniques of the embodiments of the present application are not only applicable to the source device 10 shown in FIG. The destination device 20 for decoding images can also be applied to other devices that can encode moving images and decode the encoded moving images, which is not specifically limited in this embodiment of the present application.
在图1所示的实施环境中,源装置10包括数据源120、编码器100和输出接口140。在一些实施例中,输出接口140可以包括调节器/解调器(调制解调器)和/或发送器,其中发送器也可以称为发射器。数据源120可以包括图像捕获装置(例如,摄像机等)、含有先前捕获的动态图像的存档、用于从动态图像内容提供者接收动态图像的馈入接口,和/或用于产生动态图像的计算机图形系统,或动态图像的这些来源的组合。In the implementation environment shown in FIG. 1 , the source device 10 includes a data source 120 , an encoder 100 and an output interface 140 . In some embodiments, output interface 140 may include a conditioner/demodulator (modem) and/or a transmitter, which may also be referred to as a transmitter. Data source 120 may include an image capture device (eg, a camera, etc.), an archive containing previously captured dynamic images, a feed interface for receiving dynamic images from dynamic image content providers, and/or a computer for generating dynamic images A graphics system, or a combination of these sources of dynamic images.
数据源120可以向编码器100发送动态图像,编码器100可以对接收到由数据源120发送的动态图像进行编码,得到经编码的动态图像。编码器可以将经编码的动态图像发送给输出接口。在一些实施例中,源装置10经由输出接口140将经编码的动态图像直接发送到目的地装置20。在其它实施例中,经编码的动态图像还可存储到存储装置40上,供目的地装置20以后获取并用于解码和/或显示。The data source 120 may send a dynamic image to the encoder 100, and the encoder 100 may encode the dynamic image received from the data source 120 to obtain an encoded dynamic image. The encoder can send the encoded moving image to the output interface. In some embodiments, source device 10 sends the encoded dynamic image directly to destination device 20 via output interface 140 . In other embodiments, the encoded dynamic images may also be stored on storage device 40 for later retrieval by destination device 20 and for decoding and/or display.
在图1所示的实施环境中,目的地装置20包括输入接口240、解码器200和显示装置220。在一些实施例中,输入接口240包括接收器和/或调制解调器。输入接口240可经由链路30和/或从存储装置40接收经编码的动态图像,然后再发送给解码器200,解码器200可以对接收到的经编码的动态图像进行解码,得到经解码的动态图像。解码器可以将经解码的动态图像发送给显示装置220。显示装置220可与目的地装置20集成或可在目的地装置20外部。一般来说,显示装置220显示经解码的动态图像。显示装置220可以为多种类型中的任一种类型的显示装置,例如,显示装置220可以为液晶显示器(liquid crystal display,LCD)、等离子显示器、有机发光二极管(organic light-emitting diode,OLED)显示器或其它类型的显示装置。In the implementation environment shown in FIG. 1 , the destination device 20 includes an input interface 240 , a decoder 200 and a display device 220 . In some embodiments, input interface 240 includes a receiver and/or a modem. The input interface 240 may receive the encoded moving image via the link 30 and/or from the storage device 40, and then send it to the decoder 200, and the decoder 200 may decode the received encoded moving image to obtain the decoded moving image. dynamic images. The decoder may transmit the decoded moving image to the display device 220 . Display device 220 may be integrated with destination device 20 or may be external to destination device 20 . Generally, the display device 220 displays the decoded moving image. The display device 220 may be any of various types of display devices, for example, the display device 220 may be a liquid crystal display (liquid crystal display, LCD), a plasma display, an organic light-emitting diode (organic light-emitting diode, OLED) Display or other type of display device.
尽管图1中未示出,但在一些方面,编码器100和解码器200可各自与音频编码器和解码器集成,且可以包括适当的多路复用器-多路分用器(multiplexer-demultiplexer,MUX-DEMUX)单元或其它硬件和软件,用于共同数据流或单独数据流中的音频和视频两者的编码。在一些实施例中,如果适用的话,那么MUX-DEMUX单元可符合ITU H.223多路复用器协议,或例如用户数据报协议(user datagram protocol,UDP)等其它协议。Although not shown in FIG. 1, in some aspects encoder 100 and decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer- demultiplexer, MUX-DEMUX) unit or other hardware and software for encoding of both audio and video in a common data stream or separate data streams. In some embodiments, MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, or other protocols such as the user datagram protocol (UDP), if applicable.
编码器100和解码器200各自可为以下各项电路中的任一者:一个或多个微处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated  circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、离散逻辑、硬件或其任何组合。如果部分地以软件来实施本申请实施例的技术,那么装置可将用于软件的指令存储在合适的非易失性计算机可读存储媒体中,且可使用一个或多个处理器在硬件中执行所述指令从而实施本申请实施例的技术。前述内容(包括硬件、软件、硬件与软件的组合等)中的任一者可被视为一个或多个处理器。编码器100和解码器200中的每一者都可以包括在一个或多个编码器或解码器中,所述编码器或所述解码器中的任一者可以集成为相应装置中的组合编码器/解码器(编码解码器)的一部分。The encoder 100 and the decoder 200 may each be any of the following circuits: one or more microprocessors, digital signal processing (DSP), application specific integrated circuit (ASIC) ), field-programmable gate array (FPGA), discrete logic, hardware, or any combination thereof. If the techniques of the present embodiments are implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium, and may use one or more processors in hardware The instructions are executed to implement the techniques of the embodiments of the present application. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered one or more processors. Each of the encoder 100 and the decoder 200 may be included in one or more encoders or decoders, either of which may be integrated into a combined encoding in a corresponding device part of the encoder/decoder (codec).
本申请实施例可大体上将编码器100称为将某些信息“发信号通知”或“发送”到例如解码器200的另一装置。术语“发信号通知”或“发送”可大体上指代用于对经压缩的动态图像进行解码的语法元素和/或其它数据的传送。此传送可实时或几乎实时地发生。替代地,此通信可经过一段时间后发生,例如可在编码时在经编码位流中将语法元素存储到计算机可读存储媒体时发生,解码装置接着可在所述语法元素存储到此媒体之后的任何时间检索所述语法元素。Embodiments of the present application may generally refer to encoder 100 as "signaling" or "sending" certain information to another device, such as decoder 200 . The terms "signaling" or "sending" may generally refer to the transmission of syntax elements and/or other data used to decode compressed moving images. This transfer can occur in real time or near real time. Alternatively, this communication may occur over a period of time, such as may occur when the syntax elements are stored to the computer-readable storage medium in the encoded bitstream at the time of encoding, and the decoding device may then store the syntax elements to this medium. to retrieve the syntax element at any time.
请参考图2,图2是本申请实施例提供的一种示例性的实施环境的示意图。该实施环境包括云端服务器101和终端设备201,云端服务器101与终端设备201进行通信连接。该通信连接可以为无线连接,也可以为有线连接,本申请实施例对此不做限定。Please refer to FIG. 2 , which is a schematic diagram of an exemplary implementation environment provided by an embodiment of the present application. The implementation environment includes a cloud server 101 and a terminal device 201 , and the cloud server 101 is in communication connection with the terminal device 201 . The communication connection may be a wireless connection or a wired connection, which is not limited in this embodiment of the present application.
云端服务器101可以为上述图1所示的实施环境中的源装置10。云端服务器101用于基于对动态图像进行编码,并将经编码的动态图像传输给终端设备201。The cloud server 101 may be the source device 10 in the implementation environment shown in FIG. 1 above. The cloud server 101 is used to encode the dynamic image based on the encoding, and transmit the encoded dynamic image to the terminal device 201 .
终端设备201可以为上述图1所示的实施环境中的目的地装置20。终端设备201用于对云端服务器101传输的经编码的动态图像进行解码,并将解码后得到的该动态图像进行显示。The terminal device 201 may be the destination device 20 in the implementation environment shown in FIG. 1 above. The terminal device 201 is used for decoding the encoded dynamic image transmitted by the cloud server 101, and displaying the dynamic image obtained after decoding.
可选地,终端设备201还用于采集图像,并将采集的图像传输到云端服务器101,由云端服务器101基于终端设备201采集的图像生成动态图像,为云端服务器101提供数据源。Optionally, the terminal device 201 is further configured to collect images and transmit the collected images to the cloud server 101 , and the cloud server 101 generates dynamic images based on the images collected by the terminal device 201 to provide the cloud server 101 with a data source.
其中,终端设备201可以是任何一种可与用户通过键盘、触摸板、触摸屏、遥控器、语音交互或手写设备等一种或多种方式进行人机交互的电子产品,例如个人计算机(personal computer,PC)、手机、智能手机、个人数字助手(Personal Digital Assistant,PDA)、可穿戴设备、掌上电脑PPC(pocket PC)、平板电脑、智能车机、智能电视、智能音箱等。The terminal device 201 can be any electronic product that can interact with the user through one or more ways such as a keyboard, a touchpad, a touchscreen, a remote control, a voice interaction, or a handwriting device, for example, a personal computer. , PC), mobile phones, smart phones, personal digital assistants (Personal Digital Assistant, PDA), wearable devices, PPC (pocket PC), tablet PCs, smart cars, smart TVs, smart speakers, etc.
云端服务器101可以是一台服务器,也可以是由多台服务器组成的服务器集群,或者是一个云计算服务中心。The cloud server 101 may be a server, a server cluster composed of multiple servers, or a cloud computing service center.
本领域技术人员应能理解上述终端设备201和云端服务器101仅为举例,其他现有的或今后可能出现的终端或服务器如可适用于本申请实施例,也应包含在本申请实施例保护范围以内,并在此以引用方式包含于此。Those skilled in the art should understand that the above-mentioned terminal device 201 and the cloud server 101 are only examples, and other existing or future terminals or servers, if applicable to the embodiments of the present application, should also be included in the protection scope of the embodiments of the present application and is hereby incorporated by reference.
请参考图3,图3是本申请实施例提供的一种编码器100的示意性结构框图。该编码器100包括编码模式确定模块110、语义分割模块111、图像序列提取模块112、位置指示信息编码模块113、图像编码模块114、第一视频编码模块115、第一码流封装模块116、第二视频编码模块117和第二码流封装模块118。Please refer to FIG. 3 , which is a schematic structural block diagram of an encoder 100 provided by an embodiment of the present application. The encoder 100 includes an encoding mode determination module 110, a semantic segmentation module 111, an image sequence extraction module 112, a position indication information encoding module 113, an image encoding module 114, a first video encoding module 115, a first code stream packaging module 116, Two video encoding module 117 and second code stream packaging module 118 .
编码模式确定模块110用于确定动态图像的编码模式,也即是,用于确定动态图像是以区域分割编码模式进行编码,还是以视频编码模式进行编码。其中,区域分割编码模式是指 本申请实施例提供的编码模式,视频编码模式是指传统的编码模式。也就是说,动态图像可以按照本申请实施例提供的编码模式进行编码,也可以按照传统的视频编码模式进行编码。The encoding mode determination module 110 is configured to determine the encoding mode of the dynamic image, that is, to determine whether the dynamic image is encoded in the region division encoding mode or in the video encoding mode. The region division coding mode refers to the coding mode provided by the embodiments of the present application, and the video coding mode refers to the traditional coding mode. That is to say, the dynamic image may be coded according to the coding mode provided by the embodiment of the present application, or may be coded according to the traditional video coding mode.
在动态图像以区域分割编码模式进行编码时,该编码器100包括语义分割模块111、图像序列提取模块112、位置指示信息编码模块113、图像编码模块114、第一视频编码模块115和第一码流封装模块116。在动态图像以视频编码模式进行编码时,该编码器100包括第二视频编码模块117和第二码流封装模块118。When the dynamic image is encoded in the region segmentation encoding mode, the encoder 100 includes a semantic segmentation module 111, an image sequence extraction module 112, a position indication information encoding module 113, an image encoding module 114, a first video encoding module 115 and a first code Stream encapsulation module 116 . When the dynamic image is encoded in the video encoding mode, the encoder 100 includes a second video encoding module 117 and a second code stream encapsulation module 118 .
语义分割模块111用于对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜。图像序列提取模块112用于从动态图像中提取运动图像序列,该运动图像序列可以为动态图像中运动对象对应的子图像序列,也可以为动态图像本身,下述各个实施例将分情况进行说明,此处不进行详细阐述。位置指示信息编码模块113用于对位置指示信息进行编码,得到包括经编码的位置指示信息的码流,该位置指示信息可以为图像分割掩膜,也可以为运动对象所处的图像区域内的指定位置在动态图像中的坐标。其中,运动对象所处的图像区域内的指定位置在动态图像中的坐标可以基于图像分割掩膜来确定。The semantic segmentation module 111 is used to perform semantic segmentation on any frame image in the dynamic image to obtain an image segmentation mask. The image sequence extraction module 112 is used to extract a moving image sequence from the dynamic image, and the moving image sequence may be a sub-image sequence corresponding to a moving object in the dynamic image, or may be the dynamic image itself. The following embodiments will be described according to the situation. , will not be elaborated here. The position indication information encoding module 113 is used to encode the position indication information to obtain a code stream including the encoded position indication information, and the position indication information can be an image segmentation mask, or it can be the image area in which the moving object is located. Specifies the coordinates of the location in the dynamic image. The coordinates of the specified position in the image area where the moving object is located in the dynamic image may be determined based on the image segmentation mask.
图像编码模块114用于对动态图像中的第一帧图像进行编码,得到经编码的第一帧图像的码流。需要说明的是,运动图像序列可以为动态图像中运动对象对应的子图像序列,也可以为动态图像本身,在运动图像序列为动态图像中的运动对象对应的子图像序列的情况下,需要通过图像编码模块114对第一帧图像进行编码。在运动图像序列为动态图像本身的情况下,可以不用对第一帧图像进行编码。此时,该编码器100可以不包括图像编码模块114。The image encoding module 114 is configured to encode the first frame of image in the dynamic image to obtain the encoded code stream of the first frame of image. It should be noted that the moving image sequence may be the sub-image sequence corresponding to the moving object in the moving image, or the moving image itself. In the case that the moving image sequence is the sub-image sequence corresponding to the moving object in the moving image, the The image encoding module 114 encodes the first frame of image. When the moving image sequence is the moving image itself, the first frame image may not be encoded. At this time, the encoder 100 may not include the image encoding module 114 .
第一视频编码模块115用于对图像序列提取模块112确定的运动图像序列进行编码,得到经编码的运动图像序列的码流。第一码流封装模块116用于对位置指示信息编码模块113、图像编码模块114以及第一视频编码模块115编码得到的码流进行封装,从而得到合并的码流,并将该合并的码流发送给输出接口140。输出接口140可以将该合并的码流发送给解码器200。The first video encoding module 115 is configured to encode the moving image sequence determined by the image sequence extraction module 112 to obtain a code stream of the encoded moving image sequence. The first code stream encapsulation module 116 is used to encapsulate the code stream encoded by the position indication information encoding module 113, the image encoding module 114 and the first video encoding module 115, so as to obtain a combined code stream, and then combine the combined code stream. sent to the output interface 140. The output interface 140 can send the combined code stream to the decoder 200 .
需要说明的是,对于区域分割编码模式,本申请实施例提供了多种实现方式,在不同的实现方式中,编码器100可能包括位置指示信息编码模块113、图像编码模块114和第一视频编码模块115中的全部模块,也可能包括位置指示信息编码模块113、图像编码模块114和第一视频编码模块115中的部分模块。It should be noted that, for the region segmentation coding mode, the embodiments of the present application provide various implementations. In different implementations, the encoder 100 may include a position indication information coding module 113, an image coding module 114, and a first video coding module All the modules in the module 115 may also include the position indication information encoding module 113 , the image encoding module 114 and some modules in the first video encoding module 115 .
第二视频编码模块117用于对动态图像以视频编码的方式进行编码,得到包括经编码的动态图像的码流。第二码流封装模块118用于对第二视频编码模块117编码得到的码流进行封装,并将封装后的码流发送给输出接口140。输出接口140可以将该合并的码流发送给解码器200。The second video encoding module 117 is configured to encode the moving image in a video encoding manner to obtain a code stream including the encoded moving image. The second code stream encapsulation module 118 is configured to encapsulate the code stream encoded by the second video encoding module 117 , and send the encapsulated code stream to the output interface 140 . The output interface 140 can send the combined code stream to the decoder 200 .
应当理解的是,图3所示的编码器100仅为本申请实施例提供的一种实现方式,在其他实现方式中,编码器100可以包括比图3中所示的模块更多或更少的模块。本申请实施例对此不做限定。It should be understood that the encoder 100 shown in FIG. 3 is only an implementation manner provided by the embodiments of the present application, and in other implementation manners, the encoder 100 may include more or less modules than those shown in FIG. 3 module. This embodiment of the present application does not limit this.
请参考图4,图4是本申请实施例提供的一种解码器200的示意性结构框图。该解码器200包括解码模式确定模块210、位置指示信息解码模块211、图像解码模块212、第一视频解码模块213、图像合成模块214和第二视频解码模块215。Please refer to FIG. 4 , which is a schematic structural block diagram of a decoder 200 provided by an embodiment of the present application. The decoder 200 includes a decoding mode determination module 210 , a position indication information decoding module 211 , an image decoding module 212 , a first video decoding module 213 , an image synthesis module 214 and a second video decoding module 215 .
解码模式确定模块210用于确定动态图像的解码模式,也即是,用于确定动态图像是以 区域分割解码模式进行解码,还是以视频解码模式进行解码。其中,区域分割解码模式是指本申请实施例提供的解码模式,视频解码模式是指传统的解码模式。也就是说,在动态图像按照本申请实施例提供的编码模式进行编码的情况下,该动态图像可以按照本申请实施例提供的解码模式进行解码,在动态图像按照传统的编码模式进行编码的情况下,可以按照传统的视频解码模式进行解码。The decoding mode determination module 210 is used for determining the decoding mode of the moving image, that is, for determining whether the moving image is decoded in the region division decoding mode or in the video decoding mode. The region division decoding mode refers to the decoding mode provided by the embodiments of the present application, and the video decoding mode refers to the traditional decoding mode. That is to say, in the case where the dynamic image is encoded according to the encoding mode provided by the embodiment of the present application, the dynamic image can be decoded in accordance with the decoding mode provided by the embodiment of the present application. In the case where the dynamic image is encoded in accordance with the traditional encoding mode In this case, the decoding can be performed according to the traditional video decoding mode.
在动态图像以区域分割解码模式进行解码时,该解码器200包括位置指示信息解码模块211、图像解码模块212、第一视频解码模块213和图像合成模块214。在动态图像以视频解码模式进行解码时,该解码器200包括第二视频解码模块215。When the moving image is decoded in the region division decoding mode, the decoder 200 includes a position indication information decoding module 211 , an image decoding module 212 , a first video decoding module 213 and an image synthesis module 214 . When the moving image is decoded in the video decoding mode, the decoder 200 includes a second video decoding module 215 .
位置指示信息解码模块211用于对包括经编码的位置指示信息的码流进行解码,得到位置指示信息。其中,位置指示信息可以为图像分割掩膜,也可以为运动对象所处的图像区域内的指定位置在动态图像中的坐标。The location indication information decoding module 211 is configured to decode the code stream including the encoded location indication information to obtain the location indication information. The position indication information may be an image segmentation mask, or may be the coordinates in the dynamic image of a specified position in the image area where the moving object is located.
图像解码模块212用于从码流中解析出第一帧图像。需要说明的是,运动图像序列可以为动态图像中运动对象对应的子图像序列,也可以为动态图像本身,在运动图像序列为动态图像中的运动对象对应的子图像序列的情况下,编码端传输的码流中包括经编码的第一帧图像的码流,此时,图像解码模块212用于对包括经编码的第一帧图像的码流进行解码,得到第一帧图像。在运动图像序列为动态图像本身的情况下,图像解码模块212用于从包括经编码的动态图像的码流中解析出第一帧图像。The image decoding module 212 is used for parsing the first frame of image from the code stream. It should be noted that the moving image sequence may be the sub-image sequence corresponding to the moving object in the moving image, or may be the moving image itself. In the case where the moving image sequence is the sub-image sequence corresponding to the moving object in the moving image, the encoding The transmitted code stream includes the encoded code stream of the first frame of image, and at this time, the image decoding module 212 is configured to decode the code stream including the encoded first frame of image to obtain the first frame of image. When the moving image sequence is the moving image itself, the image decoding module 212 is configured to parse the first frame image from the code stream including the encoded moving image.
第一视频解码模块213用于对包括经编码的运动图像序列的码流进行解码,得到运动图像序列。该运动图像序列可以为动态图像中运动对象对应的子图像序列,也可以为动态图像本身,下述各个实施例将分情况进行说明,此处不进行详细阐述。图像合成模块214用于对位置指示信息解码模块211、图像解码模块212以及第一视频解码模块213解码得到的图像进行合成,从而得到动态图像,并将该动态图像传输给显示装置220。显示装置220可以将动态图像进行显示。The first video decoding module 213 is configured to decode the code stream including the encoded moving image sequence to obtain the moving image sequence. The moving image sequence may be a sub-image sequence corresponding to a moving object in the moving image, or may be the moving image itself. The following embodiments will be described according to situations, and will not be described in detail here. The image synthesis module 214 is configured to synthesize the images decoded by the position indication information decoding module 211 , the image decoding module 212 and the first video decoding module 213 to obtain a dynamic image, and transmit the dynamic image to the display device 220 . The display device 220 can display moving images.
需要说明的是,对于区域分割解码模式,本申请实施例提供了多种实现方式,在不同的实现方式中,解码器200可能包括位置指示信息解码模块211、图像解码模块212和第一视频解码模块213中的全部模块,也可能包括位置指示信息解码模块211、图像解码模块212和第一视频解码模块213中的部分模块。It should be noted that, for the region segmentation decoding mode, the embodiments of the present application provide various implementations. In different implementations, the decoder 200 may include a position indication information decoding module 211, an image decoding module 212, and a first video decoding module. All the modules in the module 213 may also include the position indication information decoding module 211 , the image decoding module 212 and some modules in the first video decoding module 213 .
第二视频解码模块215用于对包括经解码的动态图像的码流进行解码,得到动态图像。之后,可以将动态图像传输给显示装置220。显示装置220可以将动态图像进行显示。The second video decoding module 215 is configured to decode the code stream including the decoded moving image to obtain the moving image. After that, the dynamic image can be transmitted to the display device 220 . The display device 220 can display moving images.
应当理解的是,图4所示的解码器200仅为本申请实施例提供的一种实现方式,在其他实现方式中,解码器200可以包括比图4中所示的模块更多或更少的模块。本申请实施例对此不做限定。It should be understood that the decoder 200 shown in FIG. 4 is only an implementation manner provided by the embodiments of the present application, and in other implementation manners, the decoder 200 may include more or less modules than those shown in FIG. 4 . module. This embodiment of the present application does not limit this.
接下来对本申请实施例提供的动态图像的编解码方法进行说明。需要说明的是,结合图1所示的实施环境,下文中的任一种动态图像的编码方法可以是源装置10中的编码器100执行的。以图2为例,下文中的任一种动态图像的编码方法可以是图2中的云端服务器101执行的。下文中的任一种动态图像的解码方法可以是目的地装置20中的解码器200执行的。以图2为例,下文中的任一种动态图像的解码方法可以是图2中的终端设备201执行的。Next, the method for encoding and decoding a dynamic image provided by the embodiment of the present application will be described. It should be noted that, with reference to the implementation environment shown in FIG. 1 , any of the following dynamic image encoding methods may be executed by the encoder 100 in the source device 10 . Taking FIG. 2 as an example, any of the following dynamic image encoding methods may be executed by the cloud server 101 in FIG. 2 . Any of the following methods of decoding a moving image may be performed by the decoder 200 in the destination device 20 . Taking FIG. 2 as an example, any of the following dynamic image decoding methods may be performed by the terminal device 201 in FIG. 2 .
在本申请实施例提供的动态图像的编码方法中,可以对动态图像中的任一帧图像进行语 义分割,得到图像分割掩膜,该动态图像包括多个对象,图像分割掩膜包括与多个对象一一对应的多个图像区域。基于动态图像,确定运动图像序列,该运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域。基于图像分割掩膜,确定位置指示信息,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置。将运动图像序列以及位置指示信息编入码流。In the dynamic image encoding method provided in the embodiment of the present application, any frame of image in the dynamic image can be semantically segmented to obtain an image segmentation mask, where the dynamic image includes multiple objects, and the image segmentation mask includes multiple Multiple image regions corresponding to objects one-to-one. Based on the moving images, a moving image sequence is determined, and each frame of the image in the moving image sequence includes an image area where one or more moving objects of the plurality of objects are located. Based on the image segmentation mask, position indication information is determined, the position indication information being used to indicate the position of the image area in which the one or more moving objects are located. The moving image sequence and position indication information are encoded into the code stream.
由于动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,且运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置,所以,将该运动图像序列和位置指示信息编入码流,即可在后续解码出动态图像,无需将静止对象所处的图像区域编入码流,提高了编码效率。Since only the image area where the moving object is located in the dynamic image will change, the image area where the stationary object is located will not change, and each frame of image in the moving image sequence includes one or more moving objects among the multiple objects The image area in which it is located, the position indication information is used to indicate the position of the image area where the one or more moving objects are located. For dynamic images, there is no need to encode the image area where the still object is located into the code stream, which improves the encoding efficiency.
在本申请实施例提供的动态图像的解码方法中,可以从码流中解析出第一帧图像,从码流中解析出运动图像序列和位置指示信息,该运动图像序列中的每帧图像包括一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置。基于该运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。In the dynamic image decoding method provided by the embodiment of the present application, the first frame of image can be parsed from the code stream, and the moving image sequence and position indication information can be parsed from the code stream, and each frame of image in the moving image sequence includes The image area where one or more moving objects are located, and the position indication information is used to indicate the location of the image area where the one or more moving objects are located. Based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located is rendered and displayed in the first frame of image to obtain a moving image.
也即是,在进行动态图像的解码时,在解码出第一帧图像之后,对于后续的图像只需要解码出运动对象所处的图像区域,无需解码静止对象所处的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。That is, when decoding a dynamic image, after the first frame of image is decoded, only the image area where the moving object is located for subsequent images needs to be decoded, and there is no need to decode the image area where the still object is located, which effectively reduces the need for decoding. Decoding complexity and power consumption. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
需要说明的是,运动图像序列可以包括与该一个或多个运动对象一一对应的一个或多个子图像序列,也可以为动态图像本身。位置指示信息可以为图像分割掩膜,也可以为该一个或多个运动对象中每个运动对象所处的图像区域的指定位置在动态图像中的坐标。因此,接下来将分为多个实施例,对本申请实施例提供的动态图像的编解码方法进行详细地解释说明。It should be noted that the moving image sequence may include one or more sub-image sequences corresponding to the one or more moving objects, or may be the moving image itself. The position indication information may be an image segmentation mask, or may be the coordinates in the dynamic image of the specified position of the image region where each moving object of the one or more moving objects is located. Therefore, the following will be divided into several embodiments, and the method for encoding and decoding a dynamic image provided by the embodiments of the present application will be explained in detail.
请参考图5,图5是本申请实施例提供的第一种动态图像的编码方法的流程图。在该方法中,运动图像序列包括一个或多个子图像序列,位置指示信息为图像分割掩膜。该编码方法包括如下步骤。Please refer to FIG. 5 . FIG. 5 is a flowchart of a first dynamic image encoding method provided by an embodiment of the present application. In this method, the moving image sequence includes one or more sub-image sequences, and the position indication information is an image segmentation mask. The encoding method includes the following steps.
步骤501:对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与该多个对象一一对应的多个图像区域,图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 501: Semantic segmentation is performed on any frame image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one, and the image segmentation The mask is used to indicate the location of the image area in which the one or more moving objects are located.
由于动态图像中每个对象所在的位置区域基本不变,只有对象自身存在变动,所以本申请实施例可以对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜。通常情况下,可以对动态图像中的第一帧图像进行语义分割,得到图像分割掩膜。Since the location area of each object in the dynamic image is basically unchanged, and only the object itself changes, the embodiment of the present application can perform semantic segmentation on any frame of image in the dynamic image to obtain an image segmentation mask. Usually, the first frame image in the dynamic image can be semantically segmented to obtain an image segmentation mask.
另外,由于图像分割掩膜包括与该多个对象一一对应的多个图像区域,所以,为了便于区分各个对象,该多个对象对应的图像区域通常会采用不同的像素值来表示,同一对象对应的图像区域采用同一像素值来表示。In addition, since the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one, in order to facilitate distinguishing the objects, the image regions corresponding to the multiple objects are usually represented by different pixel values. Corresponding image regions are represented by the same pixel value.
需要说明的是,该动态图像中的每个对象可以为该动态图像中的单个个体。比如,在动态图像包括用户、草地、山坡、河流和天空的情况下,该动态图像中的多个对象包括用户、草地、山坡、河流和天空。It should be noted that each object in the dynamic image may be a single individual in the dynamic image. For example, in the case where the dynamic image includes the user, grass, hillside, river and sky, the multiple objects in the dynamic image include the user, grass, hillside, river and sky.
另外,该动态图像包括的多个对象通常被划分为运动对象和静止对象。运动对象是指自身存在变动的对象,也可以称为处于运动状态的对象。比如,动态图像中的河流内的水存在变动,用户的五官或者肢体存在变动,所以河流和用户可以称为运动对象。静止对象是指自身不存在变动的对象,也可以称为处于静止状态的对象,比如,动态图像中的草地、山坡和天空不存在变动,所以草地、山坡和天空可以称为静止对象。In addition, the plurality of objects included in the dynamic image are generally divided into moving objects and stationary objects. A moving object refers to an object that changes itself, and can also be called an object in a state of motion. For example, the water in the river in the dynamic image changes, and the user's facial features or limbs change, so the river and the user can be called moving objects. A stationary object refers to an object that does not change itself, and can also be called an object in a stationary state. For example, the grass, hillside and sky in a dynamic image do not change, so the grass, hillside and sky can be called stationary objects.
步骤502:基于图像分割掩膜和动态图像,提取出一个或多个子图像序列,该一个或多个子图像序列与该多个对象中的一个或多个运动对象一一对应。Step 502: Extract one or more sub-image sequences based on the image segmentation mask and the dynamic image, where the one or more sub-image sequences correspond to one or more moving objects among the multiple objects one-to-one.
每个运动对象对应的子图像序列的提取方式相同,因此,在一些实施例中,可以从该一个或多个运动对象中选择一个运动对象,按照以下操作确定选择的运动对象对应的子图像序列,直至确定出每个运动对象对应的子图像序列为止:基于图像分割掩膜,确定选择的运动对象所在的位置区域,基于选择的运动对象所在的位置区域,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所在的图像区域,得到选择的运动对象对应的子图像序列。The extraction method of the sub-image sequence corresponding to each moving object is the same. Therefore, in some embodiments, a moving object may be selected from the one or more moving objects, and the sub-image sequence corresponding to the selected moving object is determined according to the following operations , until the sub-image sequence corresponding to each moving object is determined: based on the image segmentation mask, determine the location area where the selected moving object is located, and remove the first frame image from the dynamic image based on the location area where the selected moving object is located. The image area where the selected moving object is located is extracted from each frame of images except for the selected moving object, and the sub-image sequence corresponding to the selected moving object is obtained.
由于图像分割掩膜中包括与该多个对象一一对应的多个图像区域,也就是说,图像分割掩膜中已经划分出该多个对象中每个对象所处的图像区域,而且基于上文描述,图像分割掩膜中同一对象所处的图像区域采用同一像素值来表示,不同对象所处的图像区域采用不同的像素值来表示。因此,基于图像分割掩膜,确定选择的运动对象所在的位置区域的实现过程包括:对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐标集合,该像素坐标集合包括多个像素点的坐标。将选择的运动对象对应的像素坐标集合构成的位置区域确定为选择的运动对象所在的位置区域。Because the image segmentation mask includes multiple image regions corresponding to the multiple objects, that is to say, the image segmentation mask has already divided the image region where each object of the multiple objects is located, and based on the above The paper describes that the image area where the same object is located in the image segmentation mask is represented by the same pixel value, and the image area where different objects are located is represented by different pixel values. Therefore, based on the image segmentation mask, the realization process of determining the location area where the selected moving object is located includes: scanning each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, the pixel coordinate set Include the coordinates of multiple pixels. The location area formed by the set of pixel coordinates corresponding to the selected moving object is determined as the location area where the selected moving object is located.
也即是,通过对图像分割掩膜中的各个像素点进行扫描,从而确定出像素值为选择的运动对象对应的像素值的像素点,将这些像素点的坐标确定为选择的运动对象对应的像素坐标集合,进而能够确定出选择的运动对象所在的位置区域,该位置区域是指选择的运动对象实际所在的位置,该位置区域的边界为选择的运动对象的轮廓。That is, by scanning each pixel in the image segmentation mask, the pixels whose pixel value is the corresponding pixel value of the selected moving object are determined, and the coordinates of these pixel points are determined as the corresponding pixel value of the selected moving object. The set of pixel coordinates can further determine the location area where the selected moving object is located, the location area refers to the actual location of the selected moving object, and the boundary of the location area is the outline of the selected moving object.
通常情况下,运动对象的轮廓构成的区域是不规则区域,即运动对象所在的位置区域不是规则区域,因此,在一些实施例中,可以直接从动态图像中除第一帧图像之外的每帧图像中提取出运动对象所在的位置区域内的图像区域。当然,在另一些实施例中,也可以将运动对象所在的位置区域处理为规则区域,然后再从动态图像中除第一帧图像之外的每帧图像中提取出该规则区域内的图像区域。In general, the area formed by the outline of the moving object is an irregular area, that is, the location area where the moving object is located is not a regular area. Therefore, in some embodiments, each frame of the dynamic image except the first frame image can be directly extracted from the moving image. The image area in the position area where the moving object is located is extracted from the frame image. Of course, in other embodiments, the location area where the moving object is located can also be processed as a regular area, and then the image area within the regular area is extracted from each frame of images in the dynamic image except the first frame of image .
也即是,基于选择的运动对象的位置区域,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所在的图像区域的实现过程包括:从动态图像中除第一帧图像之外的每帧图像中,提取出位于选择的运动对象所在的位置区域内的图像区域。或者,对选择的运动对象所在的位置区域进行扩展,以使扩展后的位置区域为方形区域,从动态图像中除第一帧图像之外的每帧图像中,提取出位于扩展后的位置区域内的图像区域。That is, based on the position area of the selected moving object, the realization process of extracting the image area where the selected moving object is located from each frame of the image except the first frame image in the dynamic image includes: removing the first image area from the dynamic image. In each frame of images other than one frame of image, the image area located in the position area where the selected moving object is located is extracted. Or, expand the position area where the selected moving object is located, so that the expanded position area is a square area, and extract the expanded position area from each frame of the dynamic image except the first frame image. within the image area.
需要说明的是,对运动对象所在的位置区域进行扩展的实现方式包括多种,比如,从选择的运动对象对应的像素坐标集合中确定最小横坐标、最小纵坐标、最大横坐标和最大纵坐标,然后,确定横坐标在最小横坐标与最大横坐标之间,且纵坐标在最小纵坐标与最大纵坐标之间的方形区域,将该方形区域确定为扩展后的位置区域。或者,直接基于运动对象所在的位置区域,绘制该位置区域的外接方形区域,将该外接方形区域确定为扩展后的位置区域。 本申请实施例对扩展的方式不做限定,只要扩展后的位置区域包含运动对象所在的位置区域即可。It should be noted that there are various implementations for extending the location area where the moving object is located, for example, determining the minimum abscissa, the minimum ordinate, the maximum abscissa and the maximum ordinate from the set of pixel coordinates corresponding to the selected moving object. , and then, determine a square area where the abscissa is between the minimum abscissa and the maximum abscissa, and the ordinate is between the minimum ordinate and the maximum ordinate, and the square area is determined as the expanded location area. Or, directly based on the location area where the moving object is located, a square area circumscribing the location area is drawn, and the circumscribing square area is determined as the expanded location area. This embodiment of the present application does not limit the manner of expansion, as long as the expanded location area includes the location area where the moving object is located.
比如,对于运动对象K来说,运动对象K在图像分割掩膜中的像素值为Mk。对图像分割掩膜中的各个像素点进行扫描,确定像素值为Mk的像素点的坐标,从而得到像素坐标集合:{(x k1,y k1),(x k2,y k2),......,(x kN,y kN)},其中N为像素值为Mk的像素点的个数。此时,可以确定出最小横坐标min_X k、最小纵坐标min_Y k、最大横坐标max_X k和最大纵坐标max_Y k,即,min_Xk=min{x k1,x k2,......,x kN},min_Yk=min{y k1,y k2,......,y kN},max_Xk=max{x k1,x k2,......,x kN},max_Yk=max{y k1,y k2,......,y kN}。此时,可以将集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=x<=max_Yk)}中坐标所在的方形区域确定为运动对象K对应的扩展后的位置区域。然后,从动态图像中除第一帧图像之外的每帧图像中,提取出位于扩展后的位置区域内的图像区域。 For example, for a moving object K, the pixel value of the moving object K in the image segmentation mask is Mk. Scan each pixel in the image segmentation mask to determine the coordinates of the pixel whose pixel value is Mk, so as to obtain the pixel coordinate set: {(x k1 , y k1 ), (x k2 , y k2 ), ... ..., (x kN , y kN )}, where N is the number of pixels whose pixel value is Mk. At this time, the minimum abscissa min_X k , the minimum ordinate min_Y k , the maximum abscissa max_X k and the maximum ordinate max_Y k , that is, min_Xk=min{x k1 , x k2 ,  …, x kN }, min_Yk=min{y k1 , y k2 ,...,y kN },max_Xk=max{x k1 ,x k2 ,...,x kN },max_Yk=max{y k1 , y k2 , ..., y kN }. At this time, the square area where the coordinates in the set {(x,y)|min_Xk<=x<=max_Xk, min_Yk<=x<=max_Yk)} may be determined as the expanded position area corresponding to the moving object K. Then, an image area located in the expanded position area is extracted from each frame of the moving image except the first frame image.
步骤503:将动态图像中的第一帧图像、该一个或多个子图像序列以及该图像分割掩膜编入码流。Step 503: Encode the first frame image in the dynamic image, the one or more sub-image sequences and the image segmentation mask into the code stream.
对于动态图像中的第一帧图像以及图像分割掩膜来说,可以采用图像编码器编入码流。对于该一个或多个子图像序列中的每个子图像序列来说,可以采用视频编码器编入码流。For the first frame image and the image segmentation mask in the dynamic image, an image encoder can be used to encode the code stream. For each sub-picture sequence in the one or more sub-picture sequences, a video encoder may be used to encode the code stream.
为了便于描述,将动态图像中的第一帧图像所采用的图像编码器称为第一图像编码器,将图像分割掩膜所采用的图像编码器称为第二图像编码器,将该一个或多个子图像序列所采用的视频编码器称为第一视频编码器。其中,第一图像编码器和第二图像编码器可以相同或者不同。For the convenience of description, the image encoder used in the first frame image in the dynamic image is called the first image encoder, and the image encoder used in the image segmentation mask is called the second image encoder. The video encoder employed by the multiple sub-image sequences is called the first video encoder. Wherein, the first image encoder and the second image encoder may be the same or different.
通常情况下,由于动态图像中的第一帧图像中同一对象所处的图像区域内的各个像素点的像素值不同,而图像分割掩膜中同一对象所处的图像区域内的各个像素点的像素值相同,因此,可以采用编码效率较高的图像编码器对动态图像中的第一帧图像进行编码,采用一般的图像编码器对图像分割掩膜进行编码。Usually, because the pixel values of each pixel in the image area where the same object is located in the first frame image in the dynamic image are different, the pixel values of each pixel in the image area where the same object is located in the image segmentation mask are different. The pixel values are the same. Therefore, an image encoder with higher encoding efficiency can be used to encode the first frame image in the dynamic image, and a general image encoder can be used to encode the image segmentation mask.
需要说明的是,编码端和解码端可以事先对第一图像编码器、第二图像编码器以及第一视频编码器进行约定。当然,第一图像编码器、第二图像编码器以及第一视频编码器也可以由用户来选择。在用户选择第一图像编码器、第二图像编码器和第一视频编码器的情况下,还需要将第一图像编码器的类型、第二图像编码器的类型以及第一视频编码器的类型编入码流。而且这些图像编码和视频编码器可以为编码端本身包括的编码器。It should be noted that, the encoding end and the decoding end may agree on the first image encoder, the second image encoder, and the first video encoder in advance. Of course, the first image encoder, the second image encoder and the first video encoder can also be selected by the user. When the user selects the first image encoder, the second image encoder and the first video encoder, the type of the first image encoder, the type of the second image encoder and the type of the first video encoder also need to be set into the code stream. Moreover, these image encoders and video encoders may be encoders included in the encoding end itself.
对于上述编码得到的各个码流来说,还需要将各个码流进行封装,得到合并的码流,然后将合并的码流传输给解码端。For each code stream obtained by the above encoding, it is also necessary to encapsulate each code stream to obtain a combined code stream, and then transmit the combined code stream to the decoding end.
其中,本申请实施例可以采用国际标准化组织基本媒体文件格式(international organization for standards basic media file format,ISOBMFF)(ISO/IEC 14496-12–MPEG-4 Part 12)标准对上述各个码流进行封装,本申请实施例对此不做限定。当然,本申请实施例还可以对HEIF(ISO/IEC 23008-12标准)格式进行扩展,以对上述各个码流进行封装。Wherein, the embodiment of the present application may adopt the International Organization for Standardization Basic Media File Format (international organization for standards basic media file format, ISOBMFF) (ISO/IEC 14496-12-MPEG-4 Part 12) standard to encapsulate the above-mentioned respective code streams, This embodiment of the present application does not limit this. Certainly, the embodiment of the present application may also extend the HEIF (ISO/IEC 23008-12 standard) format to encapsulate the above-mentioned code streams.
比如,假设本申请实施例在高效图像文件格式(high efficiency image file format,HEIF)(ISO/IEC 23008-12标准)格式的基础上增加一个派生图像序列,类型为sovl,表示该派生图像序列为通过将一个或者多个子图像序列叠加到第一帧图像上得到。该一个或多个子图像序列及第一帧图像通过序列参考盒(SequenceReferenceBox)指定。其中,该一个或多个子图像序列封装在HEIF标准规定的track中,第一帧图像封装在HEIF标准规定的item中。For example, it is assumed that the embodiment of the present application adds a derived image sequence based on the high efficiency image file format (HEIF) (ISO/IEC 23008-12 standard) format, and the type is sovl, indicating that the derived image sequence is Obtained by superimposing one or more sub-image sequences on the first frame image. The one or more sub-image sequences and the first frame image are specified by a sequence reference box (SequenceReferenceBox). The one or more sub-image sequences are encapsulated in the track specified by the HEIF standard, and the first frame image is encapsulated in the item specified by the HEIF standard.
该派生图像序列的语法如下:The syntax for this derived image sequence is as follows:
Figure PCTCN2022086880-appb-000001
Figure PCTCN2022086880-appb-000001
其中,output_width和output_height为输出的派生图像序列的宽和高。where output_width and output_height are the width and height of the output derived image sequence.
reference_count通过SequenceReferenceBox确定,表示该一个或多个子图像序列的个数。The reference_count is determined by SequenceReferenceBox, and represents the number of the one or more sub-image sequences.
horizontal_offset和vertical_offset表示子图像序列相对于第一帧图像的左上角的偏移。horizontal_offset and vertical_offset represent the offset of the sub-image sequence relative to the upper left corner of the first frame image.
Figure PCTCN2022086880-appb-000002
Figure PCTCN2022086880-appb-000002
其中,from_track_id表示派生图像序列的标识,to_item_id表示第一帧图像的标识,reference_count表示该一个或多个子图像序列的个数,to_track_id表示子图像序列的标识。Wherein, from_track_id represents the identifier of the derived image sequence, to_item_id represents the identifier of the first frame image, reference_count represents the number of the one or more sub-image sequences, and to_track_id represents the identifier of the sub-image sequence.
在本申请实施例中,动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,而且图像分割掩膜用于指示一个或多个运动对象所处的图像区域的位置,因此,在从动态图像中除第一帧图像之外的每帧图像中提取出每个运动对象所处的图像区域之后,将第一帧图像、图像分割掩膜,以及每个运动对象在动态图像的每帧图像中所处的图像区域编入码流,即可在后续解码出动态图像。也即是,将动态图像中运动对象所处的图像区域和静止对象所处的图像区域进行分割,然后将运动对象所处的图像区域编入码流,无需将静止对象所处的图像区域编入码流,提高了编码效率。另外,由于本申请实施例可以直接复用编码器本身包括的编码器,只需对编码得到的各个码流进行封装,无需单独设计对应的编码器。In the embodiment of the present application, only the image area where the moving object is located in the dynamic image will change, and the image area where the stationary object is located will not change, and the image segmentation mask is used to indicate where one or more moving objects are located. Therefore, after extracting the image area where each moving object is located from each frame of the dynamic image except the first frame image, the first frame image, the image segmentation mask, and the The image area where each moving object is located in each frame of the dynamic image is encoded into the code stream, and the dynamic image can be subsequently decoded. That is, the image area where the moving object is located and the image area where the stationary object is located in the dynamic image are divided, and then the image area where the moving object is located is encoded into the code stream, without the need to encode the image area where the stationary object is located. into the code stream to improve the coding efficiency. In addition, since the embodiment of the present application can directly multiplex the encoder included in the encoder itself, it is only necessary to encapsulate each code stream obtained by encoding, and there is no need to design the corresponding encoder separately.
请参考图6,图6是本申请实施例提供的第一种动态图像的解码方法的流程图,该解码方法对应于图5所示的编码方法。该解码方法包括如下步骤。Please refer to FIG. 6 . FIG. 6 is a flowchart of a first dynamic image decoding method provided by an embodiment of the present application, and the decoding method corresponds to the encoding method shown in FIG. 5 . The decoding method includes the following steps.
步骤601:从码流中解析出第一帧图像。Step 601: Parse the first frame of image from the code stream.
基于上文描述,将第一帧图像所采用的图像编码器称为第一图像编码器,为了便于描述,还可以将第一帧图像所采用的图像解码器称为第一图像解码器。Based on the above description, the image encoder used for the first frame of image is called the first image encoder, and for convenience of description, the image decoder used for the first frame of image may also be called the first image decoder.
由于第一图像编码器可以为编码端和解码端事先约定的,也可以为编码过程中用户选择的。因此,在第一图像编码器为编码端和解码端事先约定的情况下,第一图像解码器也为编码端和解码端事先约定的,此时,可以直接按照约定的第一图像解码器从码流中解析出第一帧图像。在第一图像编码器为用户选择的情况下,需要先从码流中解析出第一图像编码器的类型,进而基于第一图像编码器的类型,确定第一图像解码器,然后按照确定的第一图像解码器从码流中解析出第一帧图像。Because the first image encoder may be pre-agreed by the encoding end and the decoding end, or may be selected by the user during the encoding process. Therefore, if the first image encoder is pre-agreed by the encoding end and the decoding end, the first image decoder is also pre-agreed by the encoding end and the decoding end. The first frame of image is parsed from the code stream. In the case where the first image encoder is selected by the user, the type of the first image encoder needs to be parsed from the code stream, and then the first image decoder is determined based on the type of the first image encoder, and then according to the determined The first image decoder parses the first frame of image from the code stream.
步骤602:从码流中解析出一个或多个子图像序列以及图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该一个或多个子图像序列与该多个对象包括的一个或多个运动对象一一对应,图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 602: Parse out one or more sub-image sequences and an image segmentation mask from the code stream. The image segmentation mask includes multiple image regions corresponding to multiple objects one-to-one. One or more moving objects included in the object are in one-to-one correspondence, and the image segmentation mask is used to indicate the position of the image area where the one or more moving objects are located.
基于上文描述,将图像分割掩膜所采用的图像编码器称为第二图像编码器,为了便于描述,还可以将图像分割掩膜所采用的图像解码器称为第二图像解码器。同理,将该一个或多个子图像序列所采用的视频解码器称为第一视频解码器。Based on the above description, the image encoder used by the image segmentation mask is called the second image encoder, and for convenience of description, the image decoder used by the image segmentation mask may also be called the second image decoder. Similarly, the video decoder adopted by the one or more sub-image sequences is referred to as the first video decoder.
由于第二图像编码器可以为编码端和解码端事先约定的,也可以为编码过程中用户选择的。因此,在第二图像编码器为编码端和解码端事先约定的情况下,第二图像解码器也为编码端和解码端事先约定的,此时,可以直接按照约定的第二图像解码器从码流中解析出图像分割掩膜。在第二图像编码器为用户选择的情况下,需要先从码流中解析出第二图像编码器的类型,进而基于第二图像编码器的类型,确定第二图像解码器,然后按照确定的第二图像解码器从码流中解析出图像分割掩膜。Because the second image encoder may be pre-agreed by the encoding end and the decoding end, or may be selected by the user during the encoding process. Therefore, if the second image encoder is pre-agreed by the encoding end and the decoding end, the second image decoder is also pre-agreed by the encoding end and the decoding end. The image segmentation mask is parsed from the code stream. In the case where the second image encoder is selected by the user, the type of the second image encoder needs to be parsed from the code stream first, and then the second image decoder is determined based on the type of the second image encoder, and then according to the determined type The second image decoder parses the image segmentation mask from the code stream.
同理,由于第一视频编码器可以为编码端和解码端事先约定的,也可以为编码过程中用户选择的。因此,在第一视频编码器为编码端和解码端事先约定的情况下,第一视频解码器也为编码端和解码端事先约定的,此时,可以直接按照约定的第一视频解码器从码流中解析出该一个或多个子图像序列中的每个子图像序列。在第一视频编码器为用户选择的情况下,需要先从码流中解析出第一视频编码器的类型,进而基于第一视频编码器的类型,确定对应的第一视频解码器,然后按照确定的第一视频解码器从码流中解析出该一个或多个子图像序列中的每个子图像序列。Similarly, since the first video encoder may be pre-agreed by the encoding end and the decoding end, or may be selected by the user during the encoding process. Therefore, in the case where the first video encoder is pre-agreed by the encoding end and the decoding end, the first video decoder is also pre-agreed by the encoding end and the decoding end. Each sub-image sequence in the one or more sub-image sequences is parsed from the code stream. In the case where the first video encoder is selected by the user, the type of the first video encoder needs to be parsed from the code stream, and then the corresponding first video decoder is determined based on the type of the first video encoder, and then according to The determined first video decoder parses each sub-image sequence in the one or more sub-image sequences from the code stream.
步骤603:基于该一个或多个子图像序列和图像分割掩膜,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。Step 603: Based on the one or more sub-image sequences and the image segmentation mask, render and display the image area where the one or more moving objects are located in the first frame of image to obtain a dynamic image.
在第一帧图像中对每个运动对象所处的图像区域进行渲染并显示的过程相同,因此,在一些实施例中,可以从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置。按照选择的运动对象所处的图像区域的位置,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。The process of rendering and displaying the image area where each moving object is located in the first frame of image is the same. Therefore, in some embodiments, a moving object may be selected from the one or more moving objects, and the following operations are performed. Render and display the image area where the selected moving objects are located until the image area where each moving object is located is rendered and displayed: Determine the position of the image area where the selected moving objects are located based on the image segmentation mask . According to the position of the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is rendered and displayed in the first frame of image.
由于图像分割掩膜中包括与该多个对象一一对应的多个图像区域,也就是说,图像分割掩膜中已经划分出该多个对象中每个对象所处的图像区域,而且基于上文描述,图像分割掩膜中同一对象所处的图像区域采用同一像素值来表示,不同对象所处的图像区域采用不同的像素值来表示。因此,基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置的实现过程包括:对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐 标集合,该像素坐标集合包括多个像素点的坐标。将该像素坐标集合构成的位置区域确定为选择的运动对象所处的图像区域的位置,或者,对该像素坐标集合构成的位置区域进行扩展,以使扩展后的位置区域为方形区域,将扩展后的位置区域确定为选择的运动对象所处的图像区域的位置。Because the image segmentation mask includes multiple image regions corresponding to the multiple objects, that is to say, the image segmentation mask has already divided the image region where each object of the multiple objects is located, and based on the above The paper describes that the image area where the same object is located in the image segmentation mask is represented by the same pixel value, and the image area where different objects are located is represented by different pixel values. Therefore, based on the image segmentation mask, the realization process of determining the position of the image area where the selected moving object is located includes: scanning each pixel in the image segmentation mask to obtain a set of pixel coordinates corresponding to the selected moving object. The pixel coordinate set includes the coordinates of a plurality of pixel points. The location area formed by the pixel coordinate set is determined as the position of the image area where the selected moving object is located, or the location area formed by the pixel coordinate set is expanded, so that the expanded location area is a square area, and the expanded location area is a square area. The latter position area is determined as the position of the image area where the selected moving object is located.
也即是,通过对图像分割掩膜中的各个像素点进行扫描,从而确定出像素值为选择的运动对象对应的像素值的像素点,将这些像素点的坐标确定为选择的运动对象对应的像素坐标集合,进而能够确定出选择的运动对象所处的图像区域在动态图像中的位置。That is, by scanning each pixel in the image segmentation mask, the pixels whose pixel value is the corresponding pixel value of the selected moving object are determined, and the coordinates of these pixel points are determined as the corresponding pixel value of the selected moving object. The set of pixel coordinates can then determine the position of the image area where the selected moving object is located in the dynamic image.
通常情况下,运动对象的轮廓构成的区域是不规则区域,即该像素坐标集合构成的位置区域不是规则区域,因此,在一些实施例中,可以直接将运动对象对应的像素坐标集合构成的位置区域确定为运动对象所处的图像区域在动态图像中的位置。当然,在另一些实施例中,也可以将该像素坐标集合构成的位置区域处理为规则区域,然后再将该规则区域的位置确定为运动对象所处的图像区域在动态图像中的位置。Usually, the area formed by the outline of the moving object is an irregular area, that is, the location area formed by the pixel coordinate set is not a regular area. Therefore, in some embodiments, the location formed by the pixel coordinate set corresponding to the moving object can be directly The area is determined as the position of the image area where the moving object is located in the dynamic image. Of course, in other embodiments, the position area formed by the pixel coordinate set may also be processed as a regular area, and then the position of the regular area is determined as the position of the image area where the moving object is located in the dynamic image.
需要说明的是,对该像素坐标集合构成的位置区域进行扩展处理的实现过程可以参考前文步骤502中的相关描述,本申请实施例对此不再赘述。It should be noted that, for the implementation process of performing the expansion processing on the location area formed by the pixel coordinate set, reference may be made to the relevant description in the foregoing step 502, which is not repeated in this embodiment of the present application.
另外,该一个或多个运动对象所处的图像区域的渲染顺序与图像区域的码流在整个码流中的顺序一致。In addition, the rendering sequence of the image area where the one or more moving objects are located is consistent with the sequence of the code stream of the image area in the entire code stream.
在编码端通过对HEIF(ISO/IEC 23008-12标准)格式进行扩展来对各个码流进行封装的情况下,本申请实施例可以通过to_item_id获取到第一帧图像的码流,进而进行解码得到第一帧图像,按照to_track_id获取到每个子图像序列的码流,进而进行解码得到子图像序列,然后根据horizontal_offset和vertical_offset,按照to_track_id解析的顺序,将该一个或多个子图像序列叠加到第一帧图像上,得到派生图像序列的重建图像,即重建的动态图像。In the case where the encoding end encapsulates each code stream by extending the HEIF (ISO/IEC 23008-12 standard) format, the embodiment of the present application can obtain the code stream of the first frame image through to_item_id, and then decode to obtain For the first frame of image, the code stream of each sub-image sequence is obtained according to to_track_id, and then decoded to obtain the sub-image sequence. Then, according to the horizontal_offset and vertical_offset, in the order of to_track_id analysis, the one or more sub-image sequences are superimposed on the first frame On the image, the reconstructed image of the derived image sequence, that is, the reconstructed dynamic image is obtained.
上述步骤601-603中提到的一个或多个运动对象可以为动态图像包括的多个对象中的所有运动对象。当然,该一个或多个运动对象也可以为该多个对象中的部分运动对象。也即是,对于动态图像中的运动对象来说,在解码端还可以确定这些运动对象是全部处于运动状态,还是需要再筛选出一部分对象处于运动状态。The one or more moving objects mentioned in the above steps 601-603 may be all moving objects among the multiple objects included in the dynamic image. Of course, the one or more moving objects may also be part of the moving objects among the multiple objects. That is, for moving objects in a dynamic image, the decoding end can also determine whether all these moving objects are in a moving state, or it is necessary to filter out a part of the objects that are in a moving state.
即,接收对象选择指令,该对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象。将通过对象选择指令所选择的一个或多个对象确定为上述步骤中的一个或多个运动对象。That is, an object selection instruction for selecting one or more objects from a plurality of objects included in the dynamic image is received. One or more objects selected by the object selection instruction are determined as one or more moving objects in the above steps.
其中,该对象选择指令可以由用户基于第一帧图像触发,比如,第一帧图像中标注有动态图像中的所有运动对象,用户可以在第一帧图像中的所有运动对象中选择部分或全部对象,选择出的对象即为上述步骤中的一个或多个运动对象。The object selection instruction may be triggered by the user based on the first frame of image. For example, the first frame of image is marked with all moving objects in the dynamic image, and the user can select some or all of all the moving objects in the first frame of image. object, the selected object is one or more moving objects in the above steps.
在本申请实施例中,由于图像分割掩膜用于指示该一个或多个运动对象所处的图像区域在动态图像中的位置,因此,从码流中解析出第一帧图像之后,可以按照每个运动对象所处的图像区域在动态图像中的位置,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示。也即是,在进行动态图像的解码时,在解码出第一帧图像之后,对于后续的图像只需要解码出运动对象所处的图像区域,无需解码静止对象所处的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。In this embodiment of the present application, since the image segmentation mask is used to indicate the position of the image region where the one or more moving objects are located in the dynamic image, after parsing the first frame of image from the code stream, the The position of the image area where each moving object is located in the dynamic image, and the image area where the one or more moving objects are located is rendered and displayed in the first frame of image. That is, when decoding a dynamic image, after the first frame of image is decoded, only the image area where the moving object is located for subsequent images needs to be decoded, and there is no need to decode the image area where the still object is located, which effectively reduces the need for decoding. Decoding complexity and power consumption. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
接下来结合图7,对上述图5和图6所示的实施例提供的动态图像的编解码方法进行示例性说明。Next, with reference to FIG. 7 , the method for encoding and decoding a dynamic image provided by the embodiments shown in FIG. 5 and FIG. 6 will be exemplarily described.
编码端步骤:Encoding side steps:
1)用户选择使用的编码器,并在系统层编码如下语法元素进行指示:1) The user selects the encoder to use, and encodes the following syntax elements at the system layer to indicate:
image_codec_type:对第一帧图像进行编码的图像编码器类型,例如image_codec_type可以取0或1,0表示联合图像专家小组(joint photographic experts group,JPEG),1表示可移植网络图形格式(Portable Network Graphic Format,PNG);还可以指示其他类型的编码器,例如更好的便携式图形(better portable graphics,BPG),在此不做限定。image_codec_type: The image encoder type that encodes the first frame of image, for example, image_codec_type can be 0 or 1, 0 means joint photographic experts group (JPEG), 1 means Portable Network Graphic Format , PNG); other types of encoders can also be indicated, such as better portable graphics (BPG), which is not limited here.
mask_codec_type:对图像分割掩膜进行编码的图像编码器类型,例如JPEG或PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。mask_codec_type: The image encoder type that encodes the image segmentation mask, such as JPEG or PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对子图像序列进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the sub-picture sequence, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
2)根据image_codec_type调用相应的编码器对第一帧图像进行编码,第一帧图像的编码可以使用高效的图像编码器;2) according to image_codec_type calling the corresponding encoder to encode the first frame image, the encoding of the first frame image can use an efficient image encoder;
3)根据mask_codec_type调用相应的编码器编码图像分割掩膜,图像分割掩膜的编码可以使用一般的图像编码器;3) Call the corresponding encoder to encode the image segmentation mask according to mask_codec_type, and the encoding of the image segmentation mask can use a general image encoder;
4)利用图像分割掩膜从动态图像中除第一帧图像之外的图像中提取运动对象所处的图像区域,组成若干个子视频序列。对象K在mask中的值记为Mk。对象K所处的图像区域的提取方式如下:4) The image area where the moving object is located is extracted from the images in the dynamic image except the first frame image by using the image segmentation mask to form several sub-video sequences. The value of object K in mask is denoted as Mk. The extraction method of the image area where the object K is located is as follows:
对每个运动对象进行循环,假设当前提取对象K所处的图像区域,则逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)},其中N为坐标点的个数;Loop each moving object, assuming the image area where the current extraction object K is located, scan the image segmentation mask line by line, record the coordinates of the pixel value Mk in the image segmentation mask, and form a set: {(xk,1 ,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)}, where N is the number of coordinate points;
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=min{yk,1,yk,2,…,yk,N}max_Yk=min{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=x<=max_Yk)}中坐标的位置即为对象K所在位置。The position of the coordinates in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=x<=max_Yk)} is the position of the object K.
将动态图像中除第一帧图像之外的图像中对象K所在位置的方形区域提取出来作为子图像序列。Extract the square area where the object K is located in the image except the first frame image in the dynamic image as a sub-image sequence.
5)根据video_codec_type调用相应的视频编码器对每个子图像序列进行编码;5) according to video_codec_type calling corresponding video encoder to encode each sub-image sequence;
6)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。6) According to ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps.
解码端步骤:Decoding side steps:
1)在系统层解码如下信息:1) Decode the following information at the system layer:
image_codec_typeimage_codec_type
mask_codec_typemask_codec_type
video_codec_typevideo_codec_type
2)根据image_codec_type,调用对应的图像解码器对第一帧图像进行解码并显示;2) According to image_codec_type, call the corresponding image decoder to decode and display the first frame image;
3)根据mask_codec_type,调用相应的解码器对图像分割掩膜进行解码;3) According to mask_codec_type, call the corresponding decoder to decode the image segmentation mask;
4)利用图像分割掩膜确定每个运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk。运动对象位置确定方式如下:4) Use the image segmentation mask to determine the position of each moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk. The position of the moving object is determined as follows:
对每个运动对象进行循环,假设当前确定对象K的位置,则逐行扫描图像分割掩膜,记录像素值为Mk的坐标。这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Loop for each moving object, assuming that the position of the object K is currently determined, scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk. These coordinates form a set: {(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}中对象的位置即为对象K所在的位置。The position of the object in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the position of the object K.
5)根据video_codec_type,调用相应的解码器对子图像序列进行解码;5) According to video_codec_type, call the corresponding decoder to decode the sub-image sequence;
6)对每个运动对象在其对应位置进行渲染、显示刷新。对象渲染的顺序与该对象的码流在整个码流的顺序一致。6) Render and display refresh for each moving object at its corresponding position. The order of object rendering is consistent with the order of the object's codestream in the entire codestream.
在图7的基础上还可以增加用户的交互性,也即是,解码端的用户可以选择让特定对象运动,而其他区域则保持静止。接下来结合图8,对上述图5和图6所示的实施例提供的动态图像的编解码方法进行示例性说明。On the basis of FIG. 7 , user interactivity can also be increased, that is, the user at the decoding end can choose to make a specific object move, while other areas remain stationary. Next, with reference to FIG. 8 , the method for encoding and decoding a dynamic image provided by the embodiments shown in FIG. 5 and FIG. 6 will be exemplarily described.
编码端步骤:Encoding side steps:
1)用户选择使用的编码器类型(例如H.265编码器),并在系统层编码如下语法元素进行指示:1) The user selects the encoder type used (such as H.265 encoder), and encodes the following syntax elements at the system layer to indicate:
image_codec_type:对第一帧图像进行编码的图像编码器类型,例如image_codec_type可以取0或1,0表示JPEG,1表示PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。image_codec_type: The image encoder type that encodes the first frame of image, for example, image_codec_type can be 0 or 1, 0 means JPEG, 1 means PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
mask_codec_type:对图像分割掩膜进行编码的图像编码器类型,例如JPEG或PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。mask_codec_type: The image encoder type that encodes the image segmentation mask, such as JPEG or PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对子图像序列或者动态图像本身进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the sub-picture sequence or the moving picture itself, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
2)根据image_codec_type调用相应的编码器对第一帧图像进行编码,第一帧图像的编码可以使用高效的图像编码器;2) according to image_codec_type calling the corresponding encoder to encode the first frame image, the encoding of the first frame image can use an efficient image encoder;
3)根据mask_codec_type调用相应的编码器编码图像分割掩膜,图像分割掩膜的编码可以使用一般的图像编码器;3) Call the corresponding encoder to encode the image segmentation mask according to mask_codec_type, and the encoding of the image segmentation mask can use a general image encoder;
4)利用图像分割掩膜从动态图像中除第一帧图像之外的图像中提取运动对象所处的图像区域,组成若干个子视频序列。对象K在mask中的值记为Mk。对象K所处的图像区域的提取方式如下:4) The image area where the moving object is located is extracted from the images in the dynamic image except the first frame image by using the image segmentation mask to form several sub-video sequences. The value of object K in mask is denoted as Mk. The extraction method of the image area where the object K is located is as follows:
对每个运动对象进行循环,假设当前提取对象K所处的图像区域,则逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,组成集合:{(xk,1,yk,1),(xk,2,yk,2),…, (xk,N,yk,N)},其中N为坐标点的个数;Loop each moving object, assuming the image area where the current extraction object K is located, scan the image segmentation mask line by line, record the coordinates of the pixel value Mk in the image segmentation mask, and form a set: {(xk,1 ,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)}, where N is the number of coordinate points;
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=min{yk,1,yk,2,…,yk,N}max_Yk=min{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=x<=max_Yk)}中坐标的位置即为对象K所在位置。The position of the coordinates in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=x<=max_Yk)} is the position of the object K.
将动态图像中除第一帧图像之外的图像中对象K所在位置的方形区域提取出来作为子图像序列。Extract the square area where the object K is located in the image except the first frame image in the dynamic image as a sub-image sequence.
5)根据video_codec_type调用相应的视频编码器对每个子图像序列进行编码;5) according to video_codec_type calling corresponding video encoder to encode each sub-image sequence;
6)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。6) According to ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps.
解码端步骤:Decoding side steps:
1)系统层解码如下信息:1) The system layer decodes the following information:
image_codec_type;image_codec_type;
mask_codec_type;mask_codec_type;
video_codec_type。video_codec_type.
2)根据image_codec_type,选择对应的图像解码器对第一帧图像进行解码并显示;2) according to image_codec_type, select the corresponding image decoder to decode and display the first frame image;
3)用户点击第一帧图像中的对应位置,选择让该位置所属的对象运动;或者选择让全部对象运动;3) The user clicks on the corresponding position in the first frame of image, and chooses to move the object to which the position belongs; or chooses to move all the objects;
4)根据mask_codec_type,调用对应的解码器对图像分割掩膜进行解码;4) According to mask_codec_type, call the corresponding decoder to decode the image segmentation mask;
5)根据用户点击的位置(x,y),利用步骤6)中的方案确定每个运动对象的坐标范围,根据当前点击位置可以确定当前所属运动对象的位置范围和码流索引。对所选运动对象的子码流进行解码,得到所选运动对象的重建;5) According to the position (x, y) clicked by the user, use the solution in step 6) to determine the coordinate range of each moving object, and according to the current click position, the position range and code stream index of the current moving object can be determined. Decoding the substream of the selected moving object to obtain the reconstruction of the selected moving object;
6)利用图像分割掩膜确定运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk。具体的对象位置确定方式如下:6) Use the image segmentation mask to determine the position of the moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk. The specific object position determination method is as follows:
假设当前确定对象K的位置,则逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Assuming that the position of the object K is currently determined, scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk in the image segmentation mask. These coordinates form a set: {(xk,1,yk,1),(xk, 2,yk,2),…,(xk,N,yk,N)};
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}中对象的位置即为对象K所在位置。The position of the object in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the position of the object K.
7)对每个运动对象在其对应位置进行渲染、显示刷新。对象渲染的顺序与该对象的码流在整个码流的顺序一致。7) Render and display refresh for each moving object at its corresponding position. The order of object rendering is consistent with the order of the object's codestream in the entire codestream.
请参考图9,图9是本申请实施例提供的第二种动态图像的编码方法的流程图。在该方法中,运动图像序列包括一个或多个子图像序列,位置指示信息包括一个或多个指定位置的坐标。该编码方法包括如下步骤。Please refer to FIG. 9 , which is a flowchart of a second dynamic image encoding method provided by an embodiment of the present application. In the method, the sequence of moving images includes one or more sequences of sub-images, and the location indication information includes coordinates of one or more designated locations. The encoding method includes the following steps.
步骤901:对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与该多个对象一一对应的多个图像区域。Step 901: Semantically segment any frame of image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one.
其中,步骤901的内容可以参考步骤501中的相关描述,本申请实施例对此不再赘述。For the content of step 901, reference may be made to the relevant description in step 501, which is not repeated in this embodiment of the present application.
步骤902:基于图像分割掩膜和动态图像,提取出一个或多个子图像序列,该一个或多个子图像序列与该多个对象中的一个或多个运动对象一一对应。Step 902 : Extract one or more sub-image sequences based on the image segmentation mask and the dynamic image, where the one or more sub-image sequences correspond one-to-one with one or more moving objects in the plurality of objects.
其中,步骤902的内容可以参考步骤502中的相关描述,本申请实施例对此不再赘述。For the content of step 902, reference may be made to the relevant description in step 502, which is not repeated in this embodiment of the present application.
步骤903:确定该一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在动态图像中的坐标,得到一个或多个指定位置的坐标。Step 903 : Determine the coordinates in the dynamic image of a specified position in the image area where each of the one or more moving objects is located, and obtain the coordinates of one or more specified positions.
由于步骤902中提取子图像序列的过程中已确定出每个运动对象所处的图像区域的位置,即每个运动对象所在的位置区域,或者每个运动对象所在的位置区域进行扩展后的方形区域。因此,可以直接确定每个运动对象所处的图像区域内的指定位置的坐标。Since the position of the image area where each moving object is located in the process of extracting the sub-image sequence in step 902 has been determined, that is, the location area where each moving object is located, or the square where the location area where each moving object is located is expanded area. Therefore, the coordinates of the specified position within the image area where each moving object is located can be directly determined.
需要说明的是,运动对象所处的图像区域内的指定位置可以为坐标最小的位置,也可以为坐标最大的位置,还可以为几何中心点的位置。当然,还可以为其他的位置,本申请实施例对此不做限定。It should be noted that the designated position in the image area where the moving object is located may be the position with the smallest coordinates, the position with the largest coordinates, or the position of the geometric center point. Certainly, other positions may also be used, which are not limited in this embodiment of the present application.
步骤904:将动态图像中的第一帧图像、该一个或多个子图像序列以及该一个或多个指定位置的坐标编入码流。Step 904: Encode the first frame image in the dynamic image, the one or more sub-image sequences, and the coordinates of the one or more designated positions into the code stream.
可选地,本申请实施例还可以将该一个或多个运动对象的数量编入码流。这样,对于解码端来说,可以基于该一个或多个运动对象的数量,确定该一个或多个子图像序列中是否存在传输失败的子图像序列,从而保证动态图像解码的可靠性。Optionally, in this embodiment of the present application, the number of the one or more moving objects may also be encoded into the code stream. In this way, for the decoding end, based on the number of the one or more moving objects, it can be determined whether there is a sub-image sequence that fails to transmit in the one or more sub-image sequences, thereby ensuring the reliability of dynamic image decoding.
其中,步骤904中的其他内容可以参考步骤503中的相关描述,本申请实施例对此不再赘述。For other contents in step 904, reference may be made to the relevant description in step 503, which is not repeated in this embodiment of the present application.
在本申请实施例中,动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,因此,在从动态图像中除第一帧图像之外的每帧图像中提取出每个运动对象所处的图像区域,以及确定每个运动对象所处的图像区域内的指定位置在动态图像中的坐标之后,将第一帧图像、每个运动对象在动态图像的每帧图像中所处的图像区域,以及每个运动对象所处的图像区域内的指定位置在动态图像中的坐标编入码流,即可在后续解码出动态图像。也即是,将动态图像中运动对象所处的图像区域和静止对象所处的图像区域进行分割,然后将运动对象所处的图像区域编入码流,无需将静止对象所处的图像区域编入码流,提高了编码效率。另外,由于本申请实施例可以直接复用编码器本身包括的编码器,只需对编码得到的各个码流进行封装,无需单独设计对应的编码器。In this embodiment of the present application, only the image area where the moving object is located in the dynamic image will change, and the image area where the stationary object is located will not change. After extracting the image area where each moving object is located from the frame image, and determining the coordinates of the specified position in the image area where each moving object is located in the dynamic image, the first frame image, each moving object in the dynamic image The image area in each frame of the image and the coordinates of the specified position in the image area where each moving object is located in the dynamic image are encoded into the code stream, and the dynamic image can be decoded subsequently. That is, the image area where the moving object is located and the image area where the stationary object is located in the dynamic image are divided, and then the image area where the moving object is located is encoded into the code stream, without the need to encode the image area where the stationary object is located. into the code stream to improve the coding efficiency. In addition, since the embodiment of the present application can directly multiplex the encoder included in the encoder itself, it is only necessary to encapsulate each code stream obtained by encoding, and there is no need to design the corresponding encoder separately.
请参考图10,图10是本申请实施例提供的第二种动态图像的解码方法的流程图,该解码方法对应于图9所示的编码方法。该解码方法包括如下步骤。Please refer to FIG. 10 . FIG. 10 is a flowchart of a second dynamic image decoding method provided by an embodiment of the present application, and the decoding method corresponds to the encoding method shown in FIG. 9 . The decoding method includes the following steps.
步骤1001:从码流中解析出第一帧图像。Step 1001: Parse the first frame of image from the code stream.
其中,步骤1001中的内容可以参考步骤601中的相关描述,本申请实施例对此不再赘述。For the content in step 1001, reference may be made to the relevant description in step 601, which is not repeated in this embodiment of the present application.
步骤1002:从码流中解析出一个或多个子图像序列以及一个或多个指定位置的坐标,该一个或多个子图像序列与一个或多个运动对象一一对应,该一个或多个指定位置的坐标与该一个或多个运动对象一一对应,指定位置是指相应运动对象所处的图像区域内的指定位置。Step 1002: Parse out one or more sub-image sequences and the coordinates of one or more specified positions from the code stream, the one or more sub-image sequences are in one-to-one correspondence with one or more moving objects, and the one or more specified positions The coordinates of are in one-to-one correspondence with the one or more moving objects, and the designated position refers to the designated position in the image area where the corresponding moving object is located.
其中,步骤1002中的内容可以参考步骤602中的相关描述,本申请实施例对此不再赘述。For the content in step 1002, reference may be made to the relevant description in step 602, which is not repeated in this embodiment of the present application.
步骤1003:基于该一个或多个子图像序列以及一个或多个指定位置的坐标,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。Step 1003: Based on the one or more sub-image sequences and the coordinates of one or more designated positions, render and display the image area where the one or more moving objects are located in the first frame of image to obtain a dynamic image.
在第一帧图像中对每个运动对象所处的图像区域进行渲染并显示的过程相同,因此,在一些实施例中,可以从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:按照选择的运动对象所处的图像区域的指定位置在动态图像中的坐标,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。The process of rendering and displaying the image area where each moving object is located in the first frame of image is the same. Therefore, in some embodiments, a moving object may be selected from the one or more moving objects, and the following operations are performed. Render and display the image area where the selected moving object is located, until the image area where each moving object is located is rendered and displayed: according to the specified position of the image area where the selected moving object is located in the dynamic image. coordinates, and render and display the image area included in the sub-image sequence corresponding to the selected moving object in the first frame of image.
由于编码端是直接将每个运动对象所处的图像区域内的指定位置在动态图像中的坐标编入码流,因此,在本申请实施例中,从码流中解析出选择的运动对象对应的指定位置的坐标之后,即可直接在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示,提高了图像的重构速度。Since the encoding end directly encodes the coordinates of the specified position in the dynamic image in the image area where each moving object is located into the code stream, therefore, in the embodiment of the present application, the selected moving object corresponding to the selected moving object is parsed from the code stream. After the coordinates of the specified position are determined, the image area included in the sub-image sequence corresponding to the selected moving object can be directly rendered and displayed in the first frame image, which improves the image reconstruction speed.
需要说明的是,运动对象所处的图像区域内的指定位置可以为坐标最小的位置,也可以为坐标最大的位置,还可以为几何中心点的位置。当然,还可以为其他的位置,本申请实施例对此不做限定。It should be noted that the designated position in the image area where the moving object is located may be the position with the smallest coordinates, the position with the largest coordinates, or the position of the geometric center point. Certainly, other positions may also be used, which are not limited in this embodiment of the present application.
另外,该一个或多个运动对象所处的图像区域的渲染顺序与图像区域的码流在整个码流中的顺序一致。In addition, the rendering sequence of the image area where the one or more moving objects are located is consistent with the sequence of the code stream of the image area in the entire code stream.
在编码端将该一个或多个运动对象的数量编入码流的情况下,本申请实施例还可以从码流中解析出该一个或多个运动对象的数量。这样,通过将该一个或多个运动对象的数量与该一个或多个子图像序列的数量进行比较,可以确定出该一个或多个子图像序列中是否存在传输失败的子图像序列,从而提高了动态图像解码的可靠性。When the encoding end encodes the number of the one or more moving objects into the code stream, the embodiment of the present application may also parse the number of the one or more moving objects from the code stream. In this way, by comparing the number of the one or more moving objects with the number of the one or more sub-image sequences, it can be determined whether there is a sub-image sequence that fails to transmit in the one or more sub-image sequences, thereby improving dynamic performance. Reliability of image decoding.
上述步骤1001-1003中提到的一个或多个运动对象可以为动态图像包括的多个对象中的所有运动对象。当然,该一个或多个运动对象也可以为该多个对象中的部分运动对象。也即是,对于动态图像中的运动对象来说,在解码端还可以确定这些运动对象是全部处于运动状态,还是需要再筛选出一部分对象处于运动状态。The one or more moving objects mentioned in the above steps 1001-1003 may be all moving objects among the multiple objects included in the dynamic image. Of course, the one or more moving objects may also be part of the moving objects among the multiple objects. That is, for moving objects in a dynamic image, the decoding end can also determine whether all these moving objects are in a moving state, or it is necessary to filter out a part of the objects that are in a moving state.
即,接收对象选择指令,该对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象。将通过对象选择指令所选择的一个或多个对象确定为上述步骤中的一个或多个运动对象。That is, an object selection instruction for selecting one or more objects from a plurality of objects included in the dynamic image is received. One or more objects selected by the object selection instruction are determined as one or more moving objects in the above steps.
其中,该对象选择指令可以由用户基于第一帧图像触发,比如,第一帧图像中标注有动态图像中的所有运动对象,用户可以在第一帧图像中的所有运动对象中选择部分或全部对象,选择出的对象即为上述步骤中的一个或多个运动对象。The object selection instruction may be triggered by the user based on the first frame of image. For example, the first frame of image is marked with all moving objects in the dynamic image, and the user can select some or all of all the moving objects in the first frame of image. object, the selected object is one or more moving objects in the above steps.
其中,步骤1003中的其他内容可以参考步骤603中的相关描述,本申请实施例对此不再赘述。For other content in step 1003, reference may be made to the relevant description in step 603, which is not repeated in this embodiment of the present application.
在本申请实施例中,从码流中解析出第一帧图像之后,可以按照每个运动对象所处的图像区域内的指定位置在动态图像中的坐标,在第一帧图像中对该一个或多个运动对象所处的 图像区域进行渲染并显示。也即是,在进行动态图像的解码时,在解码出第一帧图像之后,对于后续的图像只需要解码出运动对象所处的图像区域,无需解码静止对象所处的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。In the embodiment of the present application, after parsing the first frame of image from the code stream, according to the coordinates of the specified position in the image area where each moving object is located in the dynamic image, the first frame of image or multiple moving objects are located in the image area for rendering and display. That is, when decoding a dynamic image, after the first frame of image is decoded, only the image area where the moving object is located for subsequent images needs to be decoded, and there is no need to decode the image area where the still object is located, which effectively reduces the need for decoding. Decoding complexity and power consumption. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
接下来结合图11,对于上述图9和图10所示的实施例提供的动态图像的编解码方法进行示例性说明。其中,在该实施例中,无需编码图像分割掩膜,只需要编码每个运动对象的起始位置即可。Next, with reference to FIG. 11 , an exemplary description will be given of the dynamic image encoding and decoding methods provided by the embodiments shown in FIG. 9 and FIG. 10 . Wherein, in this embodiment, the image segmentation mask does not need to be encoded, and only the starting position of each moving object needs to be encoded.
编码端步骤:Encoding side steps:
1)利用图像分割掩膜确定每个运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk,对象数目num_sub_sequences设置为0。具体的运动对象位置的确定方式如下:1) Use the image segmentation mask to determine the position of each moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk, and the number of objects num_sub_sequences is set to 0. The specific way of determining the position of the moving object is as follows:
对每个运动对象进行循环,假设当前提取对象K。Loop for each moving object, assuming the current extraction object K.
逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk in the image segmentation mask. These coordinates form a set: {(xk,1,yk,1),(xk,2,yk,2),…, (xk,N,yk,N)};
若上述坐标集合非空,num_sub_sequences=num_sub_sequences+1;If the above coordinate set is not empty, num_sub_sequences=num_sub_sequences+1;
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}表示的区域即为对象K所在区域。将该区域的对象所在的方形区域提取出来得到子图像序列;The area represented by the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the area where the object K is located. Extract the square area where the object of the area is located to obtain the sub-image sequence;
子图像序列的宽度为max_Xk-min_Xk,高度为max_Yk-min_Yk;The width of the sub-image sequence is max_Xk-min_Xk, and the height is max_Yk-min_Yk;
将min_Xk加入position_top_left_x_list;Add min_Xk to position_top_left_x_list;
将min_Yk加入position_top_left_y_list。Add min_Yk to position_top_left_y_list.
2)系统层编码指示如下信息:2) The system layer coding indicates the following information:
image_codec_type:对第一帧图像进行编码的图像编码器类型,例如image_codec_type可以取0或1,0表示JPEG,1表示PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。image_codec_type: The image encoder type that encodes the first frame of image, for example, image_codec_type can be 0 or 1, 0 means JPEG, 1 means PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对子图像序列或者动态图像本身进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the sub-picture sequence or the moving picture itself, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
num_sub_sequences:子图像序列的个数num_sub_sequences: the number of sub-image sequences
position_top_left_x_list:左上角水平位置坐标列表position_top_left_x_list: list of upper left horizontal position coordinates
position_top_left_y_list:左上角垂直位置坐标列表position_top_left_y_list: List of vertical position coordinates of the upper left corner
3)根据image_codec_type调用相应的编码器对第一帧图像进行编码,第一帧图像的编码可以使用高效的图像编码器;3) according to image_codec_type calling the corresponding encoder to encode the first frame image, the encoding of the first frame image can use an efficient image encoder;
4)根据video_codec_type调用相应的视频编码器对每个子图像序列进行编码;4) according to video_codec_type calling corresponding video encoder to encode each sub-image sequence;
5)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。5) According to the ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps.
解码端步骤:Decoding side steps:
1)在系统层解码如下信息:1) Decode the following information at the system layer:
image_codec_typeimage_codec_type
video_codec_typevideo_codec_type
num_sub_sequencesnum_sub_sequences
position_top_left_x_listposition_top_left_x_list
position_top_left_y_listposition_top_left_y_list
2)根据image_codec_type,选择对应的图像解码器对第一帧图像进行解码并显示;2) according to image_codec_type, select the corresponding image decoder to decode and display the first frame image;
3)根据video_codec_type,选择对应的视频解码器对子图像序列进行解码。具体对第K帧图像中的每个子图像j进行处理:3) According to video_codec_type, select the corresponding video decoder to decode the sub-image sequence. Specifically, each sub-image j in the K-th frame image is processed:
对应的码流进行解码,得到该对象重建;The corresponding code stream is decoded to obtain the object reconstruction;
获得该对象的最左上角位置:position_top_left_x_list[j],position_top_left_y_list[j];Get the top-leftmost position of the object: position_top_left_x_list[j], position_top_left_y_list[j];
4)对每个运动对象在其对应位置进行渲染、显示刷新。对象渲染的顺序与该对象的码流在整个码流的顺序一致。4) Render and display refresh for each moving object at its corresponding position. The order of object rendering is consistent with the order of the object's codestream in the entire codestream.
可选地,本申请实施例可以对HEIF(ISO/IEC 23008-12标准)格式进行扩展,以对上述各个码流进行封装。比如,在HEIF(ISO/IEC 23008-12标准)格式的基础上增加一个派生图像序列,类型为sovl,表示该派生图像序列为通过将一个或者多个子图像序列叠加到第一帧图像上得到。该一个或多个子图像序列及第一帧图像通过序列参考盒(SequenceReferenceBox)指定。其中,该一个或多个子图像序列封装在HEIF标准规定的track中,第一帧图像封装在HEIF标准规定的item中。Optionally, in this embodiment of the present application, the HEIF (ISO/IEC 23008-12 standard) format may be extended to encapsulate the foregoing code streams. For example, adding a derived image sequence based on the HEIF (ISO/IEC 23008-12 standard) format, the type is sovl, indicating that the derived image sequence is obtained by superimposing one or more sub-image sequences on the first frame image. The one or more sub-image sequences and the first frame image are specified by a sequence reference box (SequenceReferenceBox). The one or more sub-image sequences are encapsulated in the track specified by the HEIF standard, and the first frame image is encapsulated in the item specified by the HEIF standard.
该派生图像序列的语法如下:The syntax for this derived image sequence is as follows:
Figure PCTCN2022086880-appb-000003
Figure PCTCN2022086880-appb-000003
其中,output_width和output_height为输出的派生图像序列的宽和高。where output_width and output_height are the width and height of the output derived image sequence.
reference_count通过SequenceReferenceBox确定,表示该一个或多个子图像序列的个数。The reference_count is determined by SequenceReferenceBox, and represents the number of the one or more sub-image sequences.
horizontal_offset和vertical_offset表示子图像序列相对于第一帧图像的左上角的偏移。horizontal_offset and vertical_offset represent the offset of the sub-image sequence relative to the upper left corner of the first frame image.
Figure PCTCN2022086880-appb-000004
Figure PCTCN2022086880-appb-000004
Figure PCTCN2022086880-appb-000005
Figure PCTCN2022086880-appb-000005
其中,from_track_id表示派生图像序列的标识,to_item_id表示第一帧图像的标识,reference_count表示该一个或多个子图像序列的个数,to_track_id表示子图像序列的标识。Wherein, from_track_id represents the identifier of the derived image sequence, to_item_id represents the identifier of the first frame image, reference_count represents the number of the one or more sub-image sequences, and to_track_id represents the identifier of the sub-image sequence.
在编码端通过对HEIF(ISO/IEC 23008-12标准)格式进行扩展来对各个码流进行封装的情况下,本申请实施例可以通过to_item_id获取到第一帧图像的码流,进而进行解码得到第一帧图像,按照to_track_id获取到每个子图像序列的码流,进而进行解码得到子图像序列,然后根据horizontal_offset和vertical_offset,按照to_track_id解析的顺序,将该一个或多个子图像序列叠加到第一帧图像上,得到派生图像序列的重建图像,即重建的动态图像。In the case where the encoding end encapsulates each code stream by extending the HEIF (ISO/IEC 23008-12 standard) format, the embodiment of the present application can obtain the code stream of the first frame image through to_item_id, and then decode to obtain For the first frame of image, the code stream of each sub-image sequence is obtained according to to_track_id, and then decoded to obtain the sub-image sequence. Then, according to the horizontal_offset and vertical_offset, in the order of to_track_id analysis, the one or more sub-image sequences are superimposed on the first frame On the image, the reconstructed image of the derived image sequence, that is, the reconstructed dynamic image is obtained.
在图11的基础上可以增加用户的交互性,也即是,解码端的用户可以选择让特定对象运动,而其他区域则保持静止。接下来结合图12,对上述图9和图10所示的实施例提供的动态图像的编解码方法进行示例性说明。On the basis of FIG. 11, the user's interactivity can be increased, that is, the user at the decoding end can choose to make a specific object move, while other areas remain stationary. Next, with reference to FIG. 12 , an exemplary description will be given of the dynamic image encoding and decoding methods provided by the embodiments shown in FIG. 9 and FIG. 10 .
编码端步骤:Encoding side steps:
1)利用图像分割掩膜确定每个运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk,对象数目num_sub_sequences设置为0。具体的运动对象位置的确定方式如下:1) Use the image segmentation mask to determine the position of each moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk, and the number of objects num_sub_sequences is set to 0. The specific way of determining the position of the moving object is as follows:
对每个运动对象进行循环,假设当前提取对象K。Loop for each moving object, assuming the current extraction object K.
逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk in the image segmentation mask. These coordinates form a set: {(xk,1,yk,1),(xk,2,yk,2),…, (xk,N,yk,N)};
若上述坐标集合非空,num_sub_sequences=num_sub_sequences+1;If the above coordinate set is not empty, num_sub_sequences=num_sub_sequences+1;
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}表示的区域即为对象K所在区域。将该区域的对象所在的方形区域提取出来得到子图像序列;The area represented by the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the area where the object K is located. Extract the square area where the object of the area is located to obtain the sub-image sequence;
子图像序列的宽度为max_Xk-min_Xk,高度为max_Yk-min_Yk;The width of the sub-image sequence is max_Xk-min_Xk, and the height is max_Yk-min_Yk;
将min_Xk加入position_top_left_x_list;Add min_Xk to position_top_left_x_list;
将min_Yk加入position_top_left_y_list。Add min_Yk to position_top_left_y_list.
2)系统层编码指示如下信息:2) The system layer coding indicates the following information:
image_codec_type:对第一帧图像进行编码的图像编码器类型,例如image_codec_type可以取0或1,0表示JPEG,1表示PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。image_codec_type: The image encoder type that encodes the first frame of image, for example, image_codec_type can be 0 or 1, 0 means JPEG, 1 means PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对子图像序列或者动态图像本身进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the sub-picture sequence or the moving picture itself, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
num_sub_sequences:子图像序列的个数num_sub_sequences: the number of sub-image sequences
position_top_left_x_list:左上角水平位置坐标列表position_top_left_x_list: list of upper left horizontal position coordinates
position_top_left_y_list:左上角垂直位置坐标列表position_top_left_y_list: List of vertical position coordinates of the upper left corner
3)根据image_codec_type调用相应的编码器对第一帧图像进行编码,第一帧图像的编码可以使用高效的图像编码器;3) according to image_codec_type calling the corresponding encoder to encode the first frame image, the encoding of the first frame image can use an efficient image encoder;
4)根据video_codec_type调用相应的视频编码器对每个子图像序列进行编码;4) according to video_codec_type calling corresponding video encoder to encode each sub-image sequence;
5)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。5) According to the ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps.
解码端步骤:Decoding side steps:
1)在系统层解码如下信息:1) Decode the following information at the system layer:
image_codec_typeimage_codec_type
video_codec_typevideo_codec_type
num_sub_sequencesnum_sub_sequences
position_top_left_x_listposition_top_left_x_list
position_top_left_y_listposition_top_left_y_list
2)根据image_codec_type,选择对应的图像解码器对第一帧图像进行解码并显示;2) according to image_codec_type, select the corresponding image decoder to decode and display the first frame image;
3)用户下达指令,选择让特定对象运动或者全部对象运动;3) The user issues an instruction to choose to move a specific object or all objects;
4)根据用户下达的命令信号,对所选对象的子码流进行解码,得到所选对象的重建;4) according to the command signal issued by the user, the sub-code stream of the selected object is decoded to obtain the reconstruction of the selected object;
5)根据video_codec_type,选择对应的视频解码器对子图像序列进行解码。具体对第K帧图像中的目标对象对应的子图像j进行处理:5) According to video_codec_type, select the corresponding video decoder to decode the sub-image sequence. Specifically, the sub-image j corresponding to the target object in the K-th frame image is processed:
对应的码流进行解码,得到该对象重建;The corresponding code stream is decoded to obtain the object reconstruction;
获得该对象的最左上角位置:position_top_left_x_list[j],position_top_left_y_list[j];Get the top-leftmost position of the object: position_top_left_x_list[j], position_top_left_y_list[j];
6)对每个运动对象在其对应位置进行渲染、显示刷新。对象渲染的顺序与该对象的码流在整个码流的顺序一致。6) Render and display refresh for each moving object at its corresponding position. The order of object rendering is consistent with the order of the object's codestream in the entire codestream.
可选地,本申请实施例可以对HEIF(ISO/IEC 23008-12标准)格式进行扩展,增加派生图像序列的语法,对第一帧图像及子图像序列进行封装。具体参考上述内容描述。Optionally, the embodiment of the present application may extend the HEIF (ISO/IEC 23008-12 standard) format, add a syntax for deriving the image sequence, and encapsulate the first frame image and the sub-image sequence. Specific reference is made to the above content description.
请参考图13,图13是本申请实施例提供的第三种动态图像的编码方法的流程图。在该方法中,运动图像序列为动态图像,位置指示信息为图像分割掩膜。该编码方法包括如下步骤。Please refer to FIG. 13 . FIG. 13 is a flowchart of a third dynamic image encoding method provided by an embodiment of the present application. In this method, the moving image sequence is a moving image, and the position indication information is an image segmentation mask. The encoding method includes the following steps.
步骤1301:对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与该多个对象一一对应的多个图像区域。Step 1301: Semantically segment any frame of images in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one.
其中,步骤1301的内容可以参考步骤501中的相关描述,本申请实施例对此不再赘述。For the content of step 1301, reference may be made to the relevant description in step 501, which is not repeated in this embodiment of the present application.
步骤1302:将该图像分割掩膜和动态图像编入码流,该图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 1302: Encode the image segmentation mask and the moving image into a code stream, where the image segmentation mask is used to indicate the position of the image area where the one or more moving objects are located.
对于图像分割掩膜来说,可以采用图像编码器编入码流。对于该动态图像来说,可以采用视频编码器编入码流。为了便于描述,将图像分割掩膜所采用的图像编码器称为第二图像编码器,将动态图像所采用的视频编码器称为第二视频编码器。其中,第二视频编码器和上述的第一视频编码器可以相同或者不同。For the image segmentation mask, an image encoder can be used to encode the code stream. For the moving image, a video encoder can be used to encode the code stream. For convenience of description, the image encoder used for the image segmentation mask is called the second image encoder, and the video encoder used for the moving image is called the second video encoder. Wherein, the second video encoder and the above-mentioned first video encoder may be the same or different.
需要说明的是,编码端和解码端可以事先对第二图像编码器以及第二视频编码器进行约定。当然,第二图像编码器以及第二视频编码器也可以由用户来选择。在用户选择第二图像 编码器以及第二视频编码器的情况下,还需要将第二图像编码器的类型以及第二视频编码器的类型编入码流。而且这些图像编码和视频编码器可以为编码端本身包括的编码器。It should be noted that, the encoding end and the decoding end may make an agreement on the second image encoder and the second video encoder in advance. Of course, the second image encoder and the second video encoder can also be selected by the user. When the user selects the second image encoder and the second video encoder, the type of the second image encoder and the type of the second video encoder also need to be encoded into the code stream. Moreover, these image encoders and video encoders may be encoders included in the encoding end itself.
对于上述编码得到的各个码流来说,还需要将各个码流进行封装,得到合并的码流,然后将合并的码流传输给解码端。其中,本申请实施例可以采用ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准对上述各个码流进行封装,本申请实施例对此不做限定。For each code stream obtained by the above encoding, it is also necessary to encapsulate each code stream to obtain a combined code stream, and then transmit the combined code stream to the decoding end. The embodiments of the present application may use the ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard to encapsulate the above-mentioned code streams, which are not limited in the embodiments of the present application.
在本申请实施例中,动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,而且图像分割掩膜用于指示一个或多个运动对象所处的图像区域的位置,因此,将图像分割掩膜以及整个动态图像编入码流,这样在后续解码之后,可以基于图像分割掩膜从动态图像中提取出运动对象所处的图像区域,进而在第一帧图像的基础上行进行渲染并显示,而无需再次对静止对象所处的图像区域进行渲染并显示,降低了显示功耗。另外,由于本申请实施例可以直接复用编码器本身包括的编码器,只需对编码得到的各个码流进行封装,无需单独设计对应的编码器。In the embodiment of the present application, only the image area where the moving object is located in the dynamic image will change, and the image area where the stationary object is located will not change, and the image segmentation mask is used to indicate where one or more moving objects are located. Therefore, the image segmentation mask and the entire dynamic image are encoded into the code stream, so that after subsequent decoding, the image area where the moving object is located can be extracted from the dynamic image based on the image segmentation mask. The basic line of the first frame image is rendered and displayed, without rendering and displaying the image area where the stationary object is located again, which reduces the display power consumption. In addition, since the embodiment of the present application can directly multiplex the encoder included in the encoder itself, it is only necessary to encapsulate each code stream obtained by encoding, and there is no need to design the corresponding encoder separately.
请参考图14,图14是本申请实施例提供的第三种动态图像的解码方法的流程图,该解码方法对应于图13所示的编码方法。该解码方法包括如下步骤。Please refer to FIG. 14 . FIG. 14 is a flowchart of a third dynamic image decoding method provided by an embodiment of the present application, and the decoding method corresponds to the encoding method shown in FIG. 13 . The decoding method includes the following steps.
步骤1401:从码流中解析出第一帧图像。Step 1401: Parse the first frame of image from the code stream.
其中,步骤1401中的内容可以参考步骤601中的相关描述,本申请实施例对此不再赘述。For the content in step 1401, reference may be made to the relevant description in step 601, which is not repeated in this embodiment of the present application.
步骤1402:从码流中解析出图像分割掩膜和动态图像,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象,图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 1402: Parse out the image segmentation mask and the dynamic image from the code stream. The image segmentation mask includes multiple image regions corresponding to multiple objects one-to-one, the multiple objects include the one or more moving objects, and the image segmentation The mask is used to indicate the location of the image area in which the one or more moving objects are located.
其中,从码流中解析出图像分割掩膜的实现过程可以参考步骤602中的相关描述,本申请实施例对此不再赘述。For the implementation process of parsing the image segmentation mask from the code stream, reference may be made to the relevant description in step 602, which is not repeated in this embodiment of the present application.
基于上文描述,将动态图像所采用的视频编码器称为第二视频编码器,为了便于描述,还可以将动态图像所采用的视频解码器称为第二视频解码器。Based on the above description, the video encoder used for the moving image is called the second video encoder, and for convenience of description, the video decoder used for the moving image may also be called the second video decoder.
由于第二视频编码器可以为编码端和解码端事先约定的,也可以为编码过程中用户选择的。因此,在第二视频编码器为编码端和解码端事先约定的情况下,第二视频解码器也为编码端和解码端事先约定的,此时,可以直接按照约定的第二视频解码器从码流中解析出动态图像。在第二视频编码器为用户选择的情况下,需要先从码流中解析出第二视频编码器的类型,进而基于第二视频编码器的类型,确定第二视频解码器,然后按照确定的第二视频解码器从码流中解析出动态图像。Because the second video encoder may be pre-agreed by the encoding end and the decoding end, or may be selected by the user during the encoding process. Therefore, if the second video encoder is pre-agreed by the encoding end and the decoding end, the second video decoder is also pre-agreed by the encoding end and the decoding end. The dynamic image is parsed from the code stream. In the case where the second video encoder is selected by the user, the type of the second video encoder needs to be parsed from the code stream first, and then the second video decoder is determined based on the type of the second video encoder, and then the second video decoder is determined according to the determined type. The second video decoder parses the dynamic image from the code stream.
步骤1403:基于该图像分割掩膜和该动态图像,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。Step 1403: Based on the image segmentation mask and the dynamic image, render and display the image area where the one or more moving objects are located in the first frame of image to obtain a dynamic image.
在第一帧图像中对每个运动对象所处的图像区域进行渲染并显示的过程相同,因此,在一些实施例中,可以从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置,基于选择的运动对象所处的图像区域的位置,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所处的图像区域。按照选择的运动对象所处的图像区域的位置,在第一帧图像中对 选择的运动对象在动态图像的每帧图像中所处的图像区域进行渲染并显示。The process of rendering and displaying the image area where each moving object is located in the first frame of image is the same. Therefore, in some embodiments, a moving object may be selected from the one or more moving objects, and the following operations are performed. Render and display the image area where the selected moving objects are located until the image area where each moving object is located is rendered and displayed: Determine the position of the image area where the selected moving objects are located based on the image segmentation mask , based on the position of the image area where the selected moving object is located, extract the image area where the selected moving object is located from each frame of images in the dynamic image except the first frame of image. According to the position of the image area where the selected moving object is located, the image area where the selected moving object is located in each frame of the dynamic image is rendered and displayed in the first frame of image.
其中,基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置的实现过程可以参考上述步骤603中的相关描述,本申请实施例对此不再赘述。基于选择的运动对象所处的图像区域的位置,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所处的图像区域的实现过程可以参考上述步骤502中的相关描述,本申请实施例同样对此不再赘述。The implementation process of determining the position of the image region where the selected moving object is located based on the image segmentation mask may refer to the relevant description in the foregoing step 603, which will not be repeated in this embodiment of the present application. Based on the position of the image area where the selected moving object is located, the implementation process of extracting the image area where the selected moving object is located from each frame of images in the dynamic image except the first frame image may refer to the above step 502. Related descriptions are also omitted in this embodiment of the present application.
需要说明的是,该一个或多个运动对象所处的图像区域的渲染顺序与图像区域的码流在整个码流中的顺序一致。It should be noted that the rendering sequence of the image region where the one or more moving objects are located is consistent with the sequence of the code stream of the image region in the entire code stream.
上述步骤1401-1403中提到的一个或多个运动对象可以为动态图像包括的多个对象中的所有运动对象。当然,该一个或多个运动对象也可以为该多个对象中的部分运动对象。也即是,对于动态图像中的运动对象来说,在解码端还可以确定这些运动对象是全部处于运动状态,还是需要再筛选出一部分对象处于运动状态。The one or more moving objects mentioned in the above steps 1401-1403 may be all moving objects among the multiple objects included in the dynamic image. Of course, the one or more moving objects may also be part of the moving objects among the multiple objects. That is, for moving objects in a dynamic image, the decoding end can also determine whether all these moving objects are in a moving state, or it is necessary to filter out a part of the objects that are in a moving state.
即,接收对象选择指令,该对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象。将通过对象选择指令所选择的一个或多个对象确定为上述步骤中的一个或多个运动对象。That is, an object selection instruction for selecting one or more objects from a plurality of objects included in the dynamic image is received. One or more objects selected by the object selection instruction are determined as one or more moving objects in the above steps.
其中,该对象选择指令可以由用户基于第一帧图像触发,比如,第一帧图像中标注有动态图像中的所有运动对象,用户可以在第一帧图像中的所有运动对象中选择部分或全部对象,选择出的对象即为上述步骤中的一个或多个运动对象。The object selection instruction may be triggered by the user based on the first frame of image. For example, the first frame of image is marked with all moving objects in the dynamic image, and the user can select some or all of all the moving objects in the first frame of image. object, the selected object is one or more moving objects in the above steps.
在本申请实施例中,由于图像分割掩膜用于指示该一个或多个运动对象所处的图像区域在动态图像中的位置,因此,从码流中解析出第一帧图像之后,可以按照每个运动对象所处的图像区域在动态图像中的位置,从动态图像中除第一帧图像之外的每帧图像中提取出每个运动对象所处的图像区域,进而在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示。也即是,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。In this embodiment of the present application, since the image segmentation mask is used to indicate the position of the image region where the one or more moving objects are located in the dynamic image, after parsing the first frame of image from the code stream, the The position of the image area where each moving object is located in the dynamic image, the image area where each moving object is located is extracted from each frame of the dynamic image except the first frame image, and then in the first frame image Render and display the image area where the one or more moving objects are located. That is, in the process of displaying a dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
对于上述图13和图14所示的实施例提供的动态图像的编解码方法进行示例性说明。其中,在该实施例中,用户可以选择让特定对象运动,而其他区域则保持静止。The encoding and decoding methods for moving images provided by the embodiments shown in the above-mentioned FIG. 13 and FIG. 14 are exemplified. Among them, in this embodiment, the user can choose to make certain objects move, while other areas remain stationary.
编码端步骤:Encoding side steps:
1)用户选择使用的编码器类型,并在系统层编码如下语法元素进行指示:1) The user selects the encoder type used, and encodes the following syntax elements at the system layer to indicate:
mask_codec_type:对图像分割掩膜进行编码的图像编码器类型,例如JPEG或PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。mask_codec_type: The image encoder type that encodes the image segmentation mask, such as JPEG or PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对子图像序列或者动态图像本身进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the sub-picture sequence or the moving picture itself, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
2)根据mask_codec_type调用相应的编码器编码图像分割掩膜,图像分割掩膜的编码可以使用一般的图像编码器;2) Call the corresponding encoder to encode the image segmentation mask according to mask_codec_type, and the encoding of the image segmentation mask can use a general image encoder;
3)将动态图像看作一个完整的视频,根据video_codec_type调用相应的视频编码器对该动态图像进行编码;3) regard the dynamic image as a complete video, and call the corresponding video encoder to encode the dynamic image according to video_codec_type;
4)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。或者将图像分割掩膜的编码码流通过SEI message进行传输。4) According to the ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps. Or transmit the encoded code stream of the image segmentation mask through SEI message.
解码端步骤:Decoding side steps:
1)系统层解码如下信息:1) The system layer decodes the following information:
mask_codec_type;mask_codec_type;
video_codec_type。video_codec_type.
2)根据mask_codec_type,选择对应的解码器对图像分割掩膜进行解码;2) According to mask_codec_type, select the corresponding decoder to decode the image segmentation mask;
3)根据video_codec_type,选择对应的解码器对运动图像进行解码重建;3) According to video_codec_type, select the corresponding decoder to decode and reconstruct the moving image;
4)用户下达指令,选择让特定对象运动或者全部对象运动;4) The user issues an instruction and chooses to move a specific object or all objects;
5)根据用户下达的命令信号,对指定对象在其对应位置进行渲染、显示刷新。5) According to the command signal issued by the user, the designated object is rendered at its corresponding position, and the display is refreshed.
请参考图15,图15是本申请实施例提供的第四种动态图像的编码方法的流程图。在该方法中,运动对象序列包括动态图像,位置指示信息包括图像分割掩膜。该编码方法包括如下步骤。Please refer to FIG. 15. FIG. 15 is a flowchart of a fourth dynamic image encoding method provided by an embodiment of the present application. In this method, the sequence of moving objects includes dynamic images, and the position indication information includes an image segmentation mask. The encoding method includes the following steps.
步骤1501:对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与该多个对象一一对应的多个图像区域。Step 1501: Semantically segment any frame of image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one.
其中,步骤1501的内容可以参考步骤501中的相关描述,本申请实施例对此不再赘述。For the content of step 1501, reference may be made to the relevant description in step 501, which is not repeated in this embodiment of the present application.
步骤1502:基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域。Step 1502: Based on the image segmentation mask, determine a plurality of segmentation regions corresponding to the plurality of objects one-to-one.
在一些实施例中,可以基于图像分割掩膜,确定该多个对象中每个对象所在的位置区域。在该多个对象中任一对象所在的位置区域不包含整数个CTU的情况下,对该任一对象所在的位置区域的边界进行扩展,以使任一对象所在的位置区域包含整数个CTU。将扩展处理后该多个对象所在的位置区域,确定为该多个分割区域。In some embodiments, the location area where each object of the plurality of objects is located may be determined based on the image segmentation mask. If the location area where any object is located among the plurality of objects does not include an integer number of CTUs, the boundary of the location area where any object is located is extended so that the location area where any object is located includes an integer number of CTUs. The location areas where the multiple objects are located after the expansion processing are determined as the multiple divided areas.
也即是,在进行扩展处理后,每个对象所在的位置区域包括整数个CTU。此时,可以将扩展处理后的位置区域确定该分割区域。也就是说,该多个分割区域中的每个分割区域均包括整数个CTU。That is, after the expansion processing is performed, the location area where each object is located includes an integer number of CTUs. At this time, the position area after the expansion process can be determined as the divided area. That is, each of the plurality of divided areas includes an integer number of CTUs.
步骤1503:按照该多个分割区域,对动态图像中除第一帧图像之外的每帧图像进行区域划分,得到多个图像区域。Step 1503 : Divide each frame of the dynamic image except the first frame of images according to the plurality of divided areas to obtain a plurality of image areas.
由于该多个分割区域与该多个对象一一对应,因此,对动态图像中除第一帧图像之外的每帧图像进行区域划分之后,每帧图像中都会包括与该多个分割区域一一对应的图像区域,也即是,与该多个对象一一对应的图像区域。Since the plurality of divided regions are in one-to-one correspondence with the plurality of objects, after each frame of images in the dynamic image except the first frame of image is divided into regions, each frame of image will include the same number of divided regions as the plurality of divided regions. A corresponding image area, that is, an image area corresponding to the plurality of objects one-to-one.
步骤1504:确定该多个分割区域中每个分割区域对应的对象状态,该对象状态包括静止状态或运动状态。Step 1504: Determine an object state corresponding to each of the plurality of divided regions, where the object state includes a static state or a moving state.
由于一个分割区域对应一个对象,因此,可以每个分割区域对应的对象所处的状态确定为相应分割区域对应的对象状态。Since one segmented area corresponds to one object, the state of the object corresponding to each segmented area may be determined as the state of the object corresponding to the corresponding segmented area.
步骤1505:将动态图像中的第一帧图像、该多个图像区域、该多个分割区域中每个分割区域对应的对象状态,以及图像分割掩膜编入码流,该图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 1505: Encode the first frame image in the dynamic image, the multiple image regions, the object state corresponding to each segmented region in the multiple segmented regions, and the image segmentation mask into the code stream. to indicate the position of the image area where the one or more moving objects are located.
其中,将动态图像中的第一帧图像以及图像分割掩膜进行编码的内容可以参考步骤503中的相关描述,本申请实施例对此不再赘述。For the content of encoding the first frame of image and the image segmentation mask in the dynamic image, reference may be made to the relevant description in step 503, which is not repeated in this embodiment of the present application.
对于该多个图像区域来说,将该多个图像区域编入码流的实现过程包括:将该多个图像区域中的每个图像区域分别作为一个编码块编入码流。或者,将该多个图像区域中每个图像 区域内的每一行CTU组成的区域作为一个编码块编入码流。其中,参考编码块所处的位置区域位于被参考编码块所处的位置区域内。For the multiple image areas, the implementation process of encoding the multiple image areas into the code stream includes: encoding each image area in the multiple image areas as an encoding block into the code stream respectively. Or, an area composed of each row of CTUs in each of the multiple image areas is encoded into the code stream as a coding block. Wherein, the location area where the reference coding block is located is located in the location area where the referenced coding block is located.
由于每个图像区域包括整数个CTU,因此,将整个图像区域(tile)作为一个编码块单独编入码流,或者将每个图像区域内的每一行CTU组成的区域(slice)作为一个编码块单独编入码流,这样在后续解码时可以单独进行解码。Since each image area includes an integer number of CTUs, the entire image area (tile) is coded as a coding block into the code stream separately, or the area (slice) composed of each row of CTUs in each image area is used as a coding block. Encoded into the code stream separately, so that it can be decoded separately during subsequent decoding.
另外,对于某个编码块来说,这个编码块的解码可能需要参考当前帧之前的某一帧图像中的编码块,也即是,当前帧中的某个编码块的解码依赖于参考帧中的编码块,因此,为了能够成功解码,这里需要限定参考帧中的编码块所在的位置区域需要位于当前帧的编码块所在的位置区域内,这样才能在参考编码块的基础上解码当前编码块。In addition, for a coding block, the decoding of this coding block may need to refer to the coding block in a certain frame image before the current frame, that is, the decoding of a coding block in the current frame depends on the reference frame. Therefore, in order to be able to decode successfully, it is necessary to limit the location area of the encoding block in the reference frame to be located in the location area of the encoding block of the current frame, so that the current encoding block can be decoded on the basis of the reference encoding block. .
对于上述编码得到的各个码流来说,还需要将各个码流进行封装,得到合并的码流,然后将合并的码流传输给解码端。其中,本申请实施例可以采用ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准对上述各个码流进行封装,本申请实施例对此不做限定。For each code stream obtained by the above encoding, it is also necessary to encapsulate each code stream to obtain a combined code stream, and then transmit the combined code stream to the decoding end. The embodiments of the present application may use the ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard to encapsulate the above-mentioned code streams, which are not limited in the embodiments of the present application.
在本申请实施例中,动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,而且通过多个分割区域对该动态图像中除第一帧图像之外的每帧图像进行区域划分之后,将第一帧图像、划分得到的多个图像区域、每个分割区域对应的对象状态以及图像分割掩膜编入码流之后,即可在后续解码出动态图像。也即是,将动态图像中运动对象所处的图像区域和静止对象所处的图像区域进行分割,然后将运动对象所处的图像区域和静止对象所处的图像区域分别单独编入码流,这样在后续解码时只需要解码运动状态对应的图像区域,无需解码静止状态对应的图像区域,提高了解码效率。另外,由于本申请实施例可以直接复用编码器本身包括的编码器,只需对编码得到的各个码流进行封装,无需单独设计对应的编码器。In the embodiment of the present application, only the image area where the moving object is located in the dynamic image will change, and the image area where the stationary object is located will not change, and the dynamic image is divided into the first frame image through multiple segmentation areas. After each frame of image is divided into regions, the first frame of image, the divided image regions, the object state corresponding to each divided region, and the image segmentation mask are encoded into the code stream, and then decoded later. dynamic images. That is, the image area where the moving object is located and the image area where the stationary object is located in the dynamic image are divided, and then the image area where the moving object is located and the image area where the stationary object is located are separately encoded into the code stream, In this way, in subsequent decoding, only the image area corresponding to the moving state needs to be decoded, and the image area corresponding to the static state does not need to be decoded, which improves the decoding efficiency. In addition, since the embodiment of the present application can directly multiplex the encoder included in the encoder itself, it is only necessary to encapsulate each code stream obtained by encoding, and there is no need to design the corresponding encoder separately.
请参考图16,图16是本申请实施例提供的第四种动态图像的解码方法的流程图,该解码方法对应于图15所示的编码方法。该解码方法包括如下步骤。Please refer to FIG. 16 . FIG. 16 is a flowchart of a fourth dynamic image decoding method provided by an embodiment of the present application, and the decoding method corresponds to the encoding method shown in FIG. 15 . The decoding method includes the following steps.
步骤1601:从码流中解析出第一帧图像。Step 1601: Parse the first frame of image from the code stream.
其中,步骤1601中的内容可以参考步骤601中的相关描述,本申请实施例对此不再赘述。For the content in step 1601, reference may be made to the relevant description in step 601, which is not repeated in this embodiment of the present application.
步骤1602:从码流中解析出图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象,图像分割掩膜用于指示该一个或多个运动对象所处的图像区域的位置。Step 1602: Parse out the image segmentation mask from the code stream. The image segmentation mask includes multiple image regions corresponding to multiple objects one-to-one, and the multiple objects include the one or more moving objects. The image segmentation mask uses to indicate the position of the image area where the one or more moving objects are located.
其中,步骤1602中的内容可以参考步骤602中的相关描述,本申请实施例对此不再赘述。For the content in step 1602, reference may be made to the relevant description in step 602, which is not repeated in this embodiment of the present application.
步骤1603:基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域。Step 1603: Based on the image segmentation mask, determine a plurality of segmentation regions corresponding to the plurality of objects one-to-one.
其中,步骤1603中的内容可以参考步骤1602中的相关描述,本申请实施例对此不再赘述。For the content in step 1603, reference may be made to the relevant description in step 1602, which is not repeated in this embodiment of the present application.
步骤1604:从码流中解析出该多个分割区域中每个分割区域对应的对象状态,该对象状态包括静止状态或运动状态。Step 1604: Parse out the object state corresponding to each of the plurality of divided regions from the code stream, where the object state includes a static state or a moving state.
步骤1605:基于该多个分割区域中每个分割区域对应的对象状态,从码流中解析出运动状态对应的分割区域所划分出的图像区域。Step 1605: Based on the object state corresponding to each of the plurality of divided regions, parse out the image region divided by the divided region corresponding to the motion state from the code stream.
由于一个分割区域对应一个对象状态,该对象状态可以为运动状态,也可以为静止状态,而且,码流中的各个图像区域是通过分割区域划分出来的,因此,可以直接从码流中解析出运动状态对应的分割区域所划分出的图像区域。Since a segmented area corresponds to an object state, the object state can be a moving state or a static state, and each image area in the code stream is divided by the divided area, so it can be directly parsed from the code stream. The image area divided by the segmentation area corresponding to the motion state.
步骤1606:基于运动状态对应的分割区域所划分出的图像区域,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。Step 1606: Render and display the image area where the one or more moving objects are located in the first frame of image based on the image area divided by the segmentation area corresponding to the motion state to obtain a dynamic image.
也即是,在第一帧图像中对运动状态对应的分割区域所划分出的图像区域进行渲染并显示,得到动态图像。That is, in the first frame of image, the image area divided by the segmentation area corresponding to the motion state is rendered and displayed to obtain a dynamic image.
需要说明的是,该一个或多个运动对象所处的图像区域的渲染顺序与图像区域的码流在整个码流中的顺序一致。It should be noted that the rendering sequence of the image region where the one or more moving objects are located is consistent with the sequence of the code stream of the image region in the entire code stream.
上述步骤1601-1606中提到的一个或多个运动对象可以为动态图像包括的多个对象中的所有运动对象。当然,该一个或多个运动对象也可以为该多个对象中的部分运动对象。也即是,对于动态图像中的运动对象来说,在解码端还可以确定这些运动对象是全部处于运动状态,还是需要再筛选出一部分对象处于运动状态。The one or more moving objects mentioned in the above steps 1601-1606 may be all moving objects among the multiple objects included in the dynamic image. Of course, the one or more moving objects may also be part of the moving objects among the multiple objects. That is, for moving objects in a dynamic image, the decoding end can also determine whether all these moving objects are in a moving state, or it is necessary to filter out a part of the objects that are in a moving state.
即,接收对象选择指令,该对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象。将通过对象选择指令所选择的一个或多个对象确定为上述步骤中的一个或多个运动对象。That is, an object selection instruction for selecting one or more objects from a plurality of objects included in the dynamic image is received. One or more objects selected by the object selection instruction are determined as one or more moving objects in the above steps.
其中,该对象选择指令可以由用户基于第一帧图像触发,比如,第一帧图像中标注有动态图像中的所有运动对象,用户可以在第一帧图像中的所有运动对象中选择部分或全部对象,选择出的对象即为上述步骤中的一个或多个运动对象。The object selection instruction may be triggered by the user based on the first frame of image. For example, the first frame of image is marked with all moving objects in the dynamic image, and the user can select some or all of all the moving objects in the first frame of image. object, the selected object is one or more moving objects in the above steps.
在本申请实施例中,从码流中解析出第一帧图像之后,可以按照每个分割区域对应的对象状态,从码流中解析出运动状态对应的分割区域所划分出的图像区域,无需解析出静止状态对应的分割区域所划分出的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动状态对应的分割区域所划分出的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。In this embodiment of the present application, after parsing the first frame of image from the code stream, the image area divided by the segment area corresponding to the motion state can be parsed from the code stream according to the object state corresponding to each segment area, without the need for The image area divided by the segmentation area corresponding to the static state is parsed, which effectively reduces decoding complexity and power consumption. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area divided by the divided area corresponding to the motion state on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
接下来结合图17,对上述图15和图16所示的实施例提供的动态图像的编解码方法进行示例性说明。在该实施例中,利用现有的视频编码标准中的图像划分方式。在编码端,利用图像分割掩膜将动态图像中的每帧图像分成若干个固定模式的slice或tile,每个slice/tile可以单独进行解码。Next, with reference to FIG. 17 , an exemplary description will be given of the dynamic image encoding and decoding methods provided by the embodiments shown in FIG. 15 and FIG. 16 . In this embodiment, the image division method in the existing video coding standard is used. On the encoding side, each frame of the dynamic image is divided into several fixed-pattern slices or tiles using an image segmentation mask, and each slice/tile can be decoded independently.
编码端步骤:Encoding side steps:
1)用户选择使用的编码器类型,在系统层编码如下语法元素进行指示:1) The user selects the encoder type used, and encodes the following syntax elements at the system layer to indicate:
image_codec_type:对第一帧图像进行编码的图像编码器类型,例如image_codec_type可以取0或1,0表示JPEG,1表示PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。image_codec_type: The image encoder type that encodes the first frame of image, for example, image_codec_type can be 0 or 1, 0 means JPEG, 1 means PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
mask_codec_type:对图像分割掩膜进行编码的图像编码器类型,例如JPEG或PNG;还可以指示其他类型的编码器,例如BPG,在此不做限定。mask_codec_type: The image encoder type that encodes the image segmentation mask, such as JPEG or PNG; it can also indicate other types of encoders, such as BPG, which is not limited here.
video_codec_type:对动态图像本身进行编码的视频编码器类型,例如H.265。还可以指示其他类型的编码器,例如H.264,在此不做限定。video_codec_type: The type of video encoder that encodes the moving image itself, such as H.265. Other types of encoders can also be indicated, such as H.264, which is not limited here.
2)根据image_codec_type调用相应的编码器对第一帧图像进行编码,第一帧图像的编码 可以使用高效的图像编码器;2) call corresponding encoder according to image_codec_type to encode the first frame image, the encoding of the first frame image can use an efficient image encoder;
3)根据mask_codec_type调用相应的编码器编码图像分割掩膜,图像分割掩膜的编码可以使用主流的图像编码器;3) Call the corresponding encoder to encode the image segmentation mask according to mask_codec_type, and the encoding of the image segmentation mask can use a mainstream image encoder;
4)利用图像分割掩膜将动态图像中的每帧图像划分成固定模式的slice或tile。4) Use the image segmentation mask to divide each frame of image in the dynamic image into slices or tiles with a fixed pattern.
首先利用图像分割掩膜确定运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk。具体的对象位置确定方式如下:First, the image segmentation mask is used to determine the position of the moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk. The specific object position determination method is as follows:
假设当前确定对象K的位置,则逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Assuming that the position of the object K is currently determined, scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk in the image segmentation mask. These coordinates form a set: {(xk,1,yk,1),(xk, 2,yk,2),…,(xk,N,yk,N)};
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}中对象的位置即为对象K所在的位置;The position of the object in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the position of the object K;
边界处理:若上述坐标集合确定出的区域不含整数个CTU(上、下、左、右不在CTU边界),则分别向上、下、左、右填充若干行/列像素点,使得当前区域包含整数个CTU;Boundary processing: If the area determined by the above coordinate set does not contain an integer number of CTUs (upper, lower, left, and right are not on the CTU boundary), then fill up several rows/columns of pixels up, down, left, and right respectively, so that the current area contains an integer number of CTUs;
若使用tile做区域分割,则上述方形区域直接可以作为独立的tile;若使用slice做区域分割,则上述方形区域每一行中的CTU组成的区域作为单独的slice;If tile is used for area division, the above square area can be directly used as an independent tile; if slice is used for area division, the area composed of CTUs in each row of the above square area is used as a separate slice;
5)根据video_codec_type调用对应的视频编码器对划分的slice或tile进行单独编码。编码时需要约束帧间预测运动矢量范围必须在参考图像对应位置的slice/tile,对H.265编码器可以使用MCTS;5) Invoke the corresponding video encoder according to video_codec_type to encode the divided slices or tiles individually. When encoding, it is necessary to constrain the range of the inter-frame prediction motion vector to be in the slice/tile of the corresponding position of the reference image, and MCTS can be used for the H.265 encoder;
6)按照ISOBMFF(ISO/IEC 14496-12–MPEG-4 Part 12)标准将上述步骤中得到的码流进行拼接、封装(并传输)。6) According to ISOBMFF (ISO/IEC 14496-12-MPEG-4 Part 12) standard, splicing, encapsulating (and transmitting) the code stream obtained in the above steps.
解码端步骤:Decoding side steps:
1)系统层解码如下信息:1) The system layer decodes the following information:
image_codec_type;image_codec_type;
mask_codec_type;mask_codec_type;
video_codec_type。video_codec_type.
2)系统层从码流中抽取出各个子码流,用于后续解码;2) The system layer extracts each sub-code stream from the code stream for subsequent decoding;
3)根据image_codec_type,调用对应的图像解码器解码第一帧图像并显示;3) According to image_codec_type, call the corresponding image decoder to decode the first frame image and display it;
4)根据mask_codec_type,调用对应的解码器解码图像分割掩膜;4) According to mask_codec_type, call the corresponding decoder to decode the image segmentation mask;
5)系统层根据图像分割掩膜控制解码器只解码运动状态的对象对应的slice或tile;5) The system layer controls the decoder to only decode the slice or tile corresponding to the object in the motion state according to the image segmentation mask;
6)利用图像分割掩膜将动态图像划分成固定模式的slice或tile;6) Use the image segmentation mask to divide the dynamic image into slices or tiles of a fixed pattern;
首先利用图像分割掩膜确定运动对象在图像中的位置。对象K在图像分割掩膜中的像素值记为Mk。具体的对象位置确定方式如下:First, the image segmentation mask is used to determine the position of the moving object in the image. The pixel value of object K in the image segmentation mask is denoted as Mk. The specific object position determination method is as follows:
假设当前确定对象K的位置,则逐行扫描图像分割掩膜,记录下图像分割掩膜中像素值为Mk的坐标,这些坐标组成集合:{(xk,1,yk,1),(xk,2,yk,2),…,(xk,N,yk,N)};Assuming that the position of the object K is currently determined, scan the image segmentation mask line by line, and record the coordinates of the pixel value Mk in the image segmentation mask. These coordinates form a set: {(xk,1,yk,1),(xk, 2,yk,2),…,(xk,N,yk,N)};
找出上述坐标中的最小值和最大值:Find the minimum and maximum values in the above coordinates:
min_Xk=min{xk,1,xk,2,…,xk,N}min_Xk=min{xk,1,xk,2,...,xk,N}
min_Yk=min{yk,1,yk,2,…,yk,N}min_Yk=min{yk,1,yk,2,…,yk,N}
max_Xk=max{xk,1,xk,2,…,xk,N}max_Xk=max{xk,1,xk,2,...,xk,N}
max_Yk=max{yk,1,yk,2,…,yk,N}max_Yk=max{yk,1,yk,2,…,yk,N}
集合{(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)}中对象的位置即为对象K所在的位置;The position of the object in the set {(x,y)|min_Xk<=x<=max_Xk,min_Yk<=y<=max_Yk)} is the position of the object K;
边界处理:若上述集合构成的区域不含整数个CTU(上、下、左、右不在CTU边界),则分别向上、下、左、右填充若干行/列像素,使得当前区域包含整数个CTU;Boundary processing: If the area formed by the above set does not contain an integer number of CTUs (the upper, lower, left, and right are not on the CTU boundary), then fill several rows/columns of pixels up, down, left, and right, respectively, so that the current area contains an integer number of CTUs ;
若使用tile做区域分割,则上述方形区域直接可以作为独立的tile;若使用slice做区域分割,则上述方形区域每一行中的CTU组成的区域作为单独的slice;If tile is used for area division, the above square area can be directly used as an independent tile; if slice is used for area division, the area composed of CTUs in each row of the above square area is used as a separate slice;
7)系统层利用图像分割掩膜得到的分割区域以及对象状态,跳过静止状态的对象对应的slice/tile,只解码运动状态的对象对应的slice/tile;7) The system layer uses the segmentation area and object state obtained by the image segmentation mask, skips the slice/tile corresponding to the object in the static state, and only decodes the slice/tile corresponding to the object in the moving state;
8)对每个对象在其对应位置进行渲染、显示刷新。对象渲染的顺序与slice/tile的码流在整个码流的顺序一致。8) Render and display refresh for each object at its corresponding position. The order in which objects are rendered is consistent with the order of the slice/tile codestream in the entire codestream.
其中,在图17的基础上还可以增加用户的交互性,也即是,解码端的用户可以选择让特定对象运动,而其他区域则保持静止。Among them, on the basis of FIG. 17 , the interactivity of the user can also be increased, that is, the user at the decoding end can choose to make a specific object move, while other areas remain stationary.
图18是本申请实施例提供的一种动态图像的编码装置的结构示意图,该编码装置可以由软件、硬件或者两者的结合实现成为编码端设备的部分或者全部,该编码端设备可以为图1所示的源装置,也可以为图2所示的云端服务器。参见图18,该装置包括:语义分割模块1801、图像序列提取模块1802、位置指示信息确定模块1803和第一编码模块1804。FIG. 18 is a schematic structural diagram of a dynamic image encoding apparatus provided by an embodiment of the present application. The encoding apparatus may be implemented by software, hardware, or a combination of the two to become part or all of an encoding end device, and the encoding end device may be shown in FIG. The source device shown in 1 may also be the cloud server shown in FIG. 2 . Referring to FIG. 18 , the apparatus includes: a semantic segmentation module 1801 , an image sequence extraction module 1802 , a position indication information determination module 1803 and a first encoding module 1804 .
语义分割模块1801,用于对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,动态图像包括多个对象,图像分割掩膜包括与该多个对象一一对应的多个图像区域。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,语义分割模块1801对应于图3中的语义分割模块111。The semantic segmentation module 1801 is used to perform semantic segmentation on any frame image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes multiple images corresponding to the multiple objects one-to-one area. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The semantic segmentation module 1801 corresponds to the semantic segmentation module 111 in FIG. 3 .
图像序列提取模块1802,用于基于动态图像,确定运动图像序列,运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,图像序列提取模块1802对应于图3中的图像序列提取模块112。The image sequence extraction module 1802 is configured to determine a moving image sequence based on the moving image, and each frame of the image in the moving image sequence includes an image area where one or more moving objects in the plurality of objects are located. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The image sequence extraction module 1802 corresponds to the image sequence extraction module 112 in FIG. 3 .
位置指示信息确定模块1803,用于基于图像分割掩膜,确定位置指示信息,位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,图3中未示出位置指示信息确定模块1803对应的模块。The location indication information determining module 1803 is configured to determine location indication information based on the image segmentation mask, where the location indication information is used to indicate the location of the image area where the one or more moving objects are located. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The modules corresponding to the location indication information determining module 1803 are not shown in FIG. 3 .
第一编码模块1804,用于将运动图像序列以及位置指示信息编入码流。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,第一编码模块1804对应于图3中的位置指示信息编码模块113和第一视频编码模块115。The first encoding module 1804 is used for encoding the moving image sequence and position indication information into the code stream. For the detailed implementation process, refer to the corresponding content in each of the foregoing embodiments, which will not be repeated here. The first encoding module 1804 corresponds to the location indication information encoding module 113 and the first video encoding module 115 in FIG. 3 .
可选地,运动图像序列包括一个或多个子图像序列,位置指示信息为图像分割掩膜;Optionally, the moving image sequence includes one or more sub-image sequences, and the position indication information is an image segmentation mask;
图像序列提取模块1802包括:Image sequence extraction module 1802 includes:
图像序列提取子模块,用于基于图像分割掩膜和动态图像,提取出该一个或多个子图像 序列,该一个或多个子图像序列与该一个或多个运动对象一一对应。The image sequence extraction sub-module is used to extract the one or more sub-image sequences based on the image segmentation mask and the dynamic image, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects.
可选地,运动图像序列包括一个或多个子图像序列,位置指示信息包括一个或多个指定位置的坐标;Optionally, the moving image sequence includes one or more sub-image sequences, and the location indication information includes coordinates of one or more designated locations;
图像序列提取模块1802包括:Image sequence extraction module 1802 includes:
图像序列提取子模块,用于基于图像分割掩膜和动态图像,提取出该一个或多个子图像序列,该一个或多个子图像序列与该一个或多个运动对象一一对应;an image sequence extraction sub-module, configured to extract the one or more sub-image sequences based on the image segmentation mask and the dynamic image, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
位置指示信息确定模块1803包括:The location indication information determination module 1803 includes:
位置坐标确定子模块,用于基于图像分割掩膜,确定该一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在动态图像中的坐标。The position coordinate determination sub-module is used for determining the coordinates in the dynamic image of the specified position in the image area where each moving object of the one or more moving objects is located based on the image segmentation mask.
可选地,图像序列提取子模块包括:Optionally, the image sequence extraction submodule includes:
选择子模块,用于从该一个或多个运动对象中选择一个运动对象,通过以下模块确定选择的运动对象对应的子图像序列,直至确定出每个运动对象对应的子图像序列为止:The selection sub-module is used to select a moving object from the one or more moving objects, and the sub-image sequence corresponding to the selected moving object is determined by the following modules, until the sub-image sequence corresponding to each moving object is determined:
位置区域确定子模块,用于基于图像分割掩膜,确定选择的运动对象所在的位置区域;The location area determination sub-module is used to determine the location area where the selected moving object is located based on the image segmentation mask;
图像区域提取子模块,用于基于位置区域,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所在的图像区域,得到选择的运动对象对应的子图像序列。The image area extraction sub-module is used to extract the image area where the selected moving object is located from each frame of the dynamic image except the first frame image based on the position area, and obtain the sub-image sequence corresponding to the selected moving object.
可选地,位置区域确定子模块具体用于:Optionally, the location area determination submodule is specifically used for:
对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐标集合,像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, and the pixel coordinate set includes the coordinates of a plurality of pixel points;
将像素坐标集合构成的位置区域确定为选择的运动对象所在的位置区域。The location area formed by the pixel coordinate set is determined as the location area where the selected moving object is located.
可选地,图像区域提取子模块具体用于:Optionally, the image region extraction submodule is specifically used for:
从动态图像中除第一帧图像之外的每帧图像中,提取出位于位置区域内的图像区域;Extracting the image area located in the position area from each frame of image in the dynamic image except the first frame image;
或者,or,
对位置区域进行扩展,以使扩展后的位置区域为方形区域,从动态图像中除第一帧图像之外的每帧图像中,提取出位于扩展后的位置区域内的图像区域。The position area is expanded so that the expanded position area is a square area, and an image area located in the expanded position area is extracted from each frame of images in the dynamic image except the first frame image.
可选地,指定位置为坐标最小的位置,或者为坐标最大的位置。Optionally, the specified position is the position with the smallest coordinates, or the position with the largest coordinates.
可选地,该装置还包括:Optionally, the device also includes:
第二编码模块,用于将该一个或多个运动对象的数量编入码流。其中,图3中未示出第二编码模块对应的模块。The second encoding module is configured to encode the number of the one or more moving objects into the code stream. The modules corresponding to the second encoding module are not shown in FIG. 3 .
可选地,运动图像序列为动态图像,位置指示信息为图像分割掩膜。Optionally, the moving image sequence is a moving image, and the position indication information is an image segmentation mask.
可选地,该装置还包括:Optionally, the device also includes:
分割区域确定模块,用于基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域;a segmentation area determination module, configured to determine a plurality of segmentation areas corresponding to the multiple objects one-to-one based on the image segmentation mask;
区域划分模块,用于按照该多个分割区域,对动态图像中除第一帧图像之外的每帧图像进行区域划分,得到多个图像区域;an area division module, configured to perform area division on each frame of image in the dynamic image except the first frame image according to the plurality of divided areas, so as to obtain a plurality of image areas;
对象状态确定模块,用于确定该多个分割区域中每个分割区域对应的对象状态,对象状态包括静止状态或运动状态;an object state determination module, configured to determine an object state corresponding to each segmented region in the plurality of segmented regions, and the object state includes a static state or a motion state;
第一编码模块包括:The first encoding module includes:
图像区域编码子模块,用于将该多个图像区域编入码流;an image region encoding submodule, used for encoding the multiple image regions into a code stream;
该装置还包括:The device also includes:
第三编码模块,用于将该多个分割区域中每个分割区域对应的对象状态编入码流。The third encoding module is configured to encode the object state corresponding to each of the plurality of divided regions into the code stream.
其中,图3中未示出分割区域确定模块、区域划分模块、对象状态确定模块和第三编码模块所对应的模块。图像区域编码子模块对应于图3中的第一视频编码模块115。Wherein, Fig. 3 does not show the modules corresponding to the segmentation region determination module, the region division module, the object state determination module and the third encoding module. The image region coding sub-module corresponds to the first video coding module 115 in FIG. 3 .
可选地,分割区域确定模块具体用于:Optionally, the segmented area determination module is specifically used for:
基于图像分割掩膜,确定该多个对象中每个对象所在的位置区域;determining the location area where each object in the plurality of objects is located based on the image segmentation mask;
在该多个对象中任一对象所在的位置区域不包含整数个编码树单元CTU的情况下,对任一对象所在的位置区域的边界进行扩展,以使任一对象所在的位置区域包含整数个CTU;If the location area where any object is located among the multiple objects does not include an integer number of coding tree units CTUs, the boundary of the location area where any object is located is extended, so that the location area where any object is located includes an integer number of CTUs CTU;
将扩展处理后该多个对象所在的位置区域,确定为该多个分割区域。The location areas where the multiple objects are located after the expansion processing are determined as the multiple divided areas.
可选地,图像区域编码子模块具体用于:Optionally, the image region coding submodule is specifically used for:
将该多个图像区域中的每个图像区域分别作为一个编码块编入码流;Encoding each image area in the multiple image areas as an encoding block into the code stream respectively;
或者,or,
将该多个图像区域中每个图像区域内的每一行CTU组成的区域作为一个编码块编入码流;Encoding the region composed of each row of CTUs in each of the multiple image regions as a coding block into the code stream;
其中,参考编码块所处的位置区域位于被参考编码块所处的位置区域内。Wherein, the location area where the reference coding block is located is located in the location area where the referenced coding block is located.
可选地,该装置还包括:Optionally, the device also includes:
第四编码模块,用于将动态图像的第一帧图像编入码流。其中,第四编码模块对应于图3中的图像编码模块114。The fourth encoding module is used for encoding the first frame image of the dynamic image into the code stream. The fourth encoding module corresponds to the image encoding module 114 in FIG. 3 .
由于动态图像中只有运动对象所处的图像区域会发生变化,静止对象所处的图像区域不会发生变化,且运动图像序列中的每帧图像包括该多个对象中的一个或多个运动对象所处的图像区域,该位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置,所以,将该运动图像序列和位置指示信息编入码流,即可在后续解码出动态图像,无需将静止对象所处的图像区域编入码流,提高了编码效率。Since only the image area where the moving object is located in the dynamic image will change, the image area where the stationary object is located will not change, and each frame of image in the moving image sequence includes one or more moving objects among the multiple objects The image area in which it is located, the position indication information is used to indicate the position of the image area where the one or more moving objects are located. For dynamic images, there is no need to encode the image area where the still object is located into the code stream, which improves the encoding efficiency.
需要说明的是:上述实施例提供的动态图像的编码装置在进行动态图像的编码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的动态图像的编码装置与动态图像的编码方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that: when the dynamic image encoding apparatus provided in the above embodiments performs encoding of dynamic images, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions may be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the dynamic image encoding apparatus provided in the above embodiments and the dynamic image encoding method embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
图19是本申请实施例提供的一种动态图像的解码装置的结构示意图,该解码装置可以由软件、硬件或者两者的结合实现成为解码端设备的部分或者全部,该解码端设备可以为图1所示的目的地装置,也可以为图2所示的终端设备。参见图19,该装置包括:图像解码模块1901、第一解码模块1902和图像合成模块1903。FIG. 19 is a schematic structural diagram of a dynamic image decoding apparatus provided by an embodiment of the present application. The decoding apparatus may be implemented by software, hardware, or a combination of the two to become part or all of a decoding end device, and the decoding end device may be as shown in FIG. The destination device shown in 1 may also be the terminal device shown in FIG. 2 . Referring to FIG. 19 , the apparatus includes: an image decoding module 1901 , a first decoding module 1902 and an image synthesis module 1903 .
图像解码模块1901,用于从码流中解析出第一帧图像。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,图像解码模块1901对应于图4中的图像解码模块212。The image decoding module 1901 is used for parsing the first frame of image from the code stream. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The image decoding module 1901 corresponds to the image decoding module 212 in FIG. 4 .
第一解码模块1902,用于从码流中解析出运动图像序列和位置指示信息,运动图像序列中的每帧图像包括一个或多个运动对象所处的图像区域,位置指示信息用于指示该一个或多个运动对象所处的图像区域的位置。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,第一解码模块1902对应于图4中的位置指示信息解码模块211和第一视频 解码模块213。The first decoding module 1902 is used to parse out the moving image sequence and position indication information from the code stream, each frame of image in the moving image sequence includes an image area where one or more moving objects are located, and the position indication information is used to indicate the location indication information. The location of the image area where one or more moving objects are located. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. Wherein, the first decoding module 1902 corresponds to the location indication information decoding module 211 and the first video decoding module 213 in FIG. 4 .
图像合成模块1903,用于基于运动图像序列和位置指示信息,在第一帧图像中对该一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,图像合成模块1903对应于图4中的图像合成模块214。The image synthesis module 1903 is configured to render and display the image area where the one or more moving objects are located in the first frame image based on the moving image sequence and the position indication information, to obtain a moving image. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The image synthesis module 1903 corresponds to the image synthesis module 214 in FIG. 4 .
可选地,运动图像序列包括一个或多个子图像序列,该一个或多个子图像序列与该一个或多个运动对象一一对应;Optionally, the moving image sequence includes one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象。The position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects.
可选地,图像合成模块1903包括:Optionally, the image synthesis module 1903 includes:
选择子模块,用于从该一个或多个运动对象中选择一个运动对象,通过以下模块对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:The selection sub-module is used to select a moving object from the one or more moving objects, and the following modules are used to render and display the image area where the selected moving object is located, until the image area where each moving object is located is performed. Render and display so far:
位置确定子模块,用于基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置;a position determination sub-module for determining the position of the image area where the selected moving object is located based on the image segmentation mask;
渲染显示子模块,用于按照选择的运动对象所处的图像区域的位置,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。The rendering and display sub-module is configured to render and display the image area included in the sub-image sequence corresponding to the selected moving object in the first frame of image according to the position of the image area where the selected moving object is located.
可选地,位置确定子模块具体用于:Optionally, the location determination submodule is specifically used for:
对图像分割掩膜中的各个像素点进行扫描,得到选择的运动对象对应的像素坐标集合,像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, and the pixel coordinate set includes the coordinates of a plurality of pixel points;
将像素坐标集合构成的位置区域确定为选择的运动对象所处的图像区域的位置,或者,对像素坐标集合构成的位置区域进行扩展,以使扩展后的位置区域为方形区域,将扩展后的位置区域确定为选择的运动对象所处的图像区域的位置。The location area formed by the pixel coordinate set is determined as the position of the image area where the selected moving object is located, or the location area formed by the pixel coordinate set is expanded, so that the expanded location area is a square area, and the expanded location area is a square area. The location area is determined as the location of the image area where the selected moving object is located.
可选地,运动图像序列包括一个或多个子图像序列,该一个或多个子图像序列与该一个或多个运动对象一一对应;Optionally, the moving image sequence includes one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
位置指示信息包括该一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在动态图像中的坐标。The position indication information includes the coordinates in the dynamic image of a specified position within the image area where each of the one or more moving objects is located.
可选地,图像合成模块1903具体用于:Optionally, the image synthesis module 1903 is specifically used for:
从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:Select a moving object from the one or more moving objects, and perform the following operations to render and display the image area where the selected moving object is located, until the image area where each moving object is located is rendered and displayed:
按照选择的运动对象所处的图像区域内的指定位置在动态图像中的坐标,在第一帧图像中对选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。According to the coordinates in the dynamic image of the specified position in the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is rendered and displayed in the first frame of image.
可选地,指定位置为坐标最小的位置,或者为坐标最大的位置。Optionally, the specified position is the position with the smallest coordinates, or the position with the largest coordinates.
可选地,该装置还包括:Optionally, the device also includes:
第二解码模块,用于从码流中解析出该一个或多个运动对象的数量。详细实现过程参考上述各个实施例中对应的内容,此处不再赘述。其中,图4中未示出第二解码模块对应的模块。The second decoding module is configured to parse out the number of the one or more moving objects from the code stream. For the detailed implementation process, refer to the corresponding contents in the foregoing embodiments, which will not be repeated here. The modules corresponding to the second decoding module are not shown in FIG. 4 .
可选地,运动图像序列为动态图像,位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象。Optionally, the moving image sequence is a moving image, and the position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects. .
可选地,图像合成模块1903具体用于:Optionally, the image synthesis module 1903 is specifically used for:
从该一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:Select a moving object from the one or more moving objects, and perform the following operations to render and display the image area where the selected moving object is located, until the image area where each moving object is located is rendered and displayed:
基于图像分割掩膜,确定选择的运动对象所处的图像区域的位置;Determine the position of the image area where the selected moving object is located based on the image segmentation mask;
基于选择的运动对象所处的图像区域的位置,从动态图像中除第一帧图像之外的每帧图像中提取出选择的运动对象所处的图像区域;Based on the position of the image area where the selected moving object is located, extracting the image area where the selected moving object is located from each frame of images in the dynamic image except the first frame image;
按照选择的运动对象所处的图像区域的位置,在第一帧图像中对选择的运动对象在动态图像的每帧图像中所处的图像区域进行渲染并显示。According to the position of the image area where the selected moving object is located, the image area where the selected moving object is located in each frame of the dynamic image is rendered and displayed in the first frame of image.
可选地,位置指示信息为图像分割掩膜,图像分割掩膜包括与多个对象一一对应的多个图像区域,该多个对象包括该一个或多个运动对象;Optionally, the position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects;
第一解码模块1902包括:The first decoding module 1902 includes:
分割区域确定子模块,用于基于图像分割掩膜,确定与该多个对象一一对应的多个分割区域;a segmentation area determination submodule, used for determining a plurality of segmentation areas corresponding to the multiple objects one-to-one based on the image segmentation mask;
对象状态确定子模块,用于从码流中解析出该多个分割区域中每个分割区域对应的对象状态,对象状态包括静止状态或运动状态;an object state determination submodule, used for parsing out the object state corresponding to each of the plurality of divided regions from the code stream, and the object state includes a static state or a motion state;
图像区域解码子模块,用于基于该多个分割区域中每个分割区域对应的对象状态,从码流中解析出运动状态对应的分割区域所划分出的图像区域,得到运动图像序列。The image area decoding sub-module is used for analyzing the image area divided by the divided area corresponding to the motion state from the code stream based on the object state corresponding to each divided area in the plurality of divided areas to obtain a moving image sequence.
可选地,分割区域确定子模块具体用于:Optionally, the segmented area determination submodule is specifically used for:
基于图像分割掩膜,确定该多个对象中每个对象所在的位置区域;determining the location area where each object in the plurality of objects is located based on the image segmentation mask;
在该多个对象中任一对象所在的位置区域不包含整数个CTU的情况下,对任一对象所在的位置区域的边界进行扩展,以使任一对象所在的位置区域包含整数个CTU;In the case where the location area where any object is located in the plurality of objects does not contain an integer number of CTUs, the boundary of the location area where any object is located is extended, so that the location area where any object is located contains an integer number of CTUs;
将扩展处理后该多个对象所在的位置区域,确定为该多个分割区域。The location areas where the multiple objects are located after the expansion processing are determined as the multiple divided areas.
可选地,该装置还包括:Optionally, the device also includes:
指令接收模块,用于接收对象选择指令,对象选择指令用于从动态图像包括的多个对象中选择一个或多个对象;an instruction receiving module, configured to receive an object selection instruction, and the object selection instruction is used to select one or more objects from a plurality of objects included in the dynamic image;
运动对象确定模块,用于将通过对象选择指令所选择的一个或多个对象确定为该一个或多个运动对象。The moving object determination module is configured to determine one or more objects selected by the object selection instruction as the one or more moving objects.
可选地,该装置还包括:Optionally, the device also includes:
第三解码模块,用于从码流中解析出用于进行编码的编码器类型;The third decoding module is used to parse out the encoder type used for encoding from the code stream;
解码器类型确定模块,用于按照解析出的编码器类型,确定对应的解码器类型。The decoder type determination module is used to determine the corresponding decoder type according to the parsed encoder type.
其中,图4中未示出指令接收模块、运动对象确定模块、第三解码模块和解码器类型确定模块所对应的模块。The modules corresponding to the instruction receiving module, the moving object determination module, the third decoding module and the decoder type determination module are not shown in FIG. 4 .
在本申请实施例提供的动态图像的解码方法中,在解码出第一帧图像之后,对于后续的图像只需要解码出运动对象所处的图像区域,无需解码静止对象所处的图像区域,有效降低了解码复杂度和功耗。而且,在动态图像的显示过程中,只需要在第一帧图像的基础上,对运动对象所处的图像区域进行渲染并刷新显示,从而有效降低了显示的功耗。In the dynamic image decoding method provided by the embodiment of the present application, after decoding the first frame of image, only the image area where the moving object is located needs to be decoded for subsequent images, and the image area where the still object is located does not need to be decoded, effectively Decoding complexity and power consumption are reduced. Moreover, in the process of displaying the dynamic image, it is only necessary to render and refresh the image area where the moving object is located on the basis of the first frame of image, thereby effectively reducing the power consumption of the display.
需要说明的是:上述实施例提供的动态图像的解码装置在进行动态图像的解码时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的动态图像的解码装置与动态图像的解码方法实施例属于 同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that when the dynamic image decoding apparatus provided in the above embodiments decodes the dynamic image, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions may be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above. In addition, the apparatus for decoding a dynamic image provided in the above-mentioned embodiments and the embodiments of the method for decoding a dynamic image belong to the same concept, and the specific implementation process is detailed in the method embodiment, which will not be repeated here.
图20为用于本申请实施例的一种编解码装置2000的示意性框图。其中,编解码装置2000可以包括处理器2001、存储器2002和总线系统2003。其中,处理器2001和存储器2002通过总线系统2003相连,该存储器2002用于存储指令,该处理器2001用于执行该存储器2002存储的指令,以执行本申请实施例描述的各种动态图像的编码或解码方法。为避免重复,这里不再详细描述。FIG. 20 is a schematic block diagram of an encoding and decoding apparatus 2000 used in an embodiment of the present application. The encoding and decoding apparatus 2000 may include a processor 2001 , a memory 2002 and a bus system 2003 . The processor 2001 and the memory 2002 are connected through a bus system 2003, the memory 2002 is used to store instructions, and the processor 2001 is used to execute the instructions stored in the memory 2002, so as to execute the coding of various dynamic images described in the embodiments of this application or decoding method. To avoid repetition, detailed description is omitted here.
在本申请实施例中,该处理器2001可以是中央处理单元(central processing unit,CPU),该处理器2001还可以是其他通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。In this embodiment of the present application, the processor 2001 may be a central processing unit (central processing unit, CPU), and the processor 2001 may also be other general-purpose processors, DSPs, ASICs, FPGAs, or other programmable logic devices, discrete gates Or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
该存储器2002可以包括ROM设备或者RAM设备。任何其他适宜类型的存储设备也可以用作存储器2002。存储器2002可以包括由处理器2001使用总线2003访问的代码和数据20021。存储器2002可以进一步包括操作系统20023和应用程序20022,该应用程序20022包括允许处理器2001执行本申请实施例描述的动态图像的编码或解码方法的至少一个程序。例如,应用程序20022可以包括应用1至N,其进一步包括执行在本申请实施例描述的动态图像的编码或解码方法的动态图像编码或解码应用(简称动态图像编解码应用)。The memory 2002 may include a ROM device or a RAM device. Any other suitable type of storage device may also be used as memory 2002. Memory 2002 may include code and data 20021 accessed by processor 2001 using bus 2003 . The memory 2002 may further include an operating system 20023 and an application program 20022, where the application program 20022 includes at least one program that allows the processor 2001 to execute the dynamic image encoding or decoding method described in the embodiments of the present application. For example, the application 20022 may include applications 1 to N, which further include moving image encoding or decoding applications (referred to as moving image encoding and decoding applications) that execute the moving image encoding or decoding methods described in the embodiments of the present application.
该总线系统2003除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统2003。In addition to the data bus, the bus system 2003 may also include a power bus, a control bus, a status signal bus, and the like. However, for the sake of clarity, the various buses are labeled as bus system 2003 in the figure.
可选地,编解码装置2000还可以包括一个或多个输出设备,诸如显示器2004。在一个示例中,显示器2004可以是触感显示器,其将显示器与可操作地感测触摸输入的触感单元合并。显示器2004可以经由总线2003连接到处理器2001。Optionally, the codec apparatus 2000 may also include one or more output devices, such as a display 2004 . In one example, the display 2004 may be a touch-sensitive display that incorporates a display with a touch-sensitive unit operable to sense touch input. Display 2004 may be connected to processor 2001 via bus 2003 .
需要指出的是,编解码装置2000可以执行本申请实施例中的动态图像的编码方法,也可执行本申请实施例中的动态图像的解码方法。It should be noted that the encoding and decoding apparatus 2000 may execute the method for encoding a dynamic image in the embodiment of the present application, and may also execute the method for decoding a dynamic image in the embodiment of the present application.
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。Those skilled in the art will appreciate that the functions described in connection with the various illustrative logical blocks, modules, and algorithm steps described in this disclosure may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions described by the various illustrative logical blocks, modules, and steps may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) . In this manner, a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave. Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application. The computer program product may comprise a computer-readable medium.
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的 定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、DVD和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。By way of example and not limitation, such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory or may be used to store instructions or data structures desired program code in the form of any other medium that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave are used to transmit instructions from a website, server, or other remote source, then the coaxial cable Wire, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of media. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but are instead directed to non-transitory, tangible storage media. As used herein, magnetic disks and optical disks include compact disks (CDs), laser disks, optical disks, DVDs, and Blu-ray disks, where disks typically reproduce data magnetically, while disks reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。在一种示例下,编码器100及解码器200中的各种说明性逻辑框、单元、模块可以理解为对应的电路器件或逻辑元件。may be processed by one or more of, for example, one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits to execute the instruction. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or in combination with into the combined codec. Furthermore, the techniques may be fully implemented in one or more circuits or logic elements. In one example, various illustrative logical blocks, units, and modules in the encoder 100 and the decoder 200 may be understood as corresponding circuit devices or logic elements.
本申请实施例的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请实施例中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。The techniques of the present embodiments may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (IC), or a set of ICs (eg, a chip set). Various components, modules, or units are described in the embodiments of the present application to emphasize functional aspects of means for performing the disclosed techniques, but do not necessarily need to be implemented by different hardware units. Indeed, as described above, the various units may be combined in codec hardware units in conjunction with suitable software and/or firmware, or by interoperating hardware units (including one or more processors as described above) supply.
也就是说,在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意结合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络或其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质,或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))或半导体介质(例如:固态硬盘(solid state disk,SSD))等。值得注意的是,本申请实施例提到的计算机可读存储介质可以为非易失性存储介质,换句话说,可以是非瞬时性存储介质。That is to say, the above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network or other programmable device. The computer instructions may be stored in or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer, or a data storage device such as a server, a data center, etc. that includes one or more available media integrated. The available media may be magnetic media (eg: floppy disk, hard disk, magnetic tape), optical media (eg: digital versatile disc (DVD)) or semiconductor media (eg: solid state disk (SSD)) Wait. It should be noted that the computer-readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.
应当理解的是,本文提及的“多个”是指两个或两个以上。在本申请实施例的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请实施例的技术方案,在本申请实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限 定,并且“第一”、“第二”等字样也并不限定一定不同。It should be understood that references herein to "a plurality" means two or more. In the description of the embodiments of the present application, unless otherwise specified, "/" means or means, for example, A/B can mean A or B; "and/or" in this document is only an association that describes an associated object Relation, it means that there can be three kinds of relations, for example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone. In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and execution order, and the words "first" and "second" are not necessarily different.
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above-mentioned examples provided for this application are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application. Inside.

Claims (57)

  1. 一种动态图像的编码方法,其特征在于,所述方法包括:A method for encoding a dynamic image, wherein the method comprises:
    对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,所述动态图像包括多个对象,所述图像分割掩膜包括与所述多个对象一一对应的多个图像区域;Perform semantic segmentation on any frame of image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes multiple image regions corresponding to the multiple objects one-to-one;
    基于所述动态图像,确定运动图像序列,所述运动图像序列中的每帧图像包括所述多个对象中的一个或多个运动对象所处的图像区域;Based on the dynamic images, determine a sequence of motion images, where each frame of the image in the sequence of motion images includes an image area where one or more motion objects in the plurality of objects are located;
    基于所述图像分割掩膜,确定位置指示信息,所述位置指示信息用于指示所述一个或多个运动对象所处的图像区域的位置;determining position indication information based on the image segmentation mask, where the position indication information is used to indicate the position of the image area where the one or more moving objects are located;
    将所述运动图像序列以及所述位置指示信息编入码流。The moving image sequence and the position indication information are encoded into a code stream.
  2. 如权利要求1所述的方法,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述位置指示信息为所述图像分割掩膜;The method of claim 1, wherein the moving image sequence includes one or more sub-image sequences, and the position indication information is the image segmentation mask;
    所述基于所述动态图像,确定运动图像序列,包括:The determining a sequence of moving images based on the moving images includes:
    基于所述图像分割掩膜和所述动态图像,提取出所述一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应。Based on the image segmentation mask and the dynamic image, the one or more sub-image sequences are extracted, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects.
  3. 如权利要求1所述的方法,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述位置指示信息包括一个或多个指定位置的坐标;The method of claim 1, wherein the moving image sequence includes one or more sub-image sequences, and the location indication information includes coordinates of one or more designated locations;
    所述基于所述动态图像,确定运动图像序列,包括:The determining a sequence of moving images based on the moving images includes:
    基于所述图像分割掩膜和所述动态图像,提取出所述一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;extracting the one or more sub-image sequences based on the image segmentation mask and the dynamic image, where the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
    所述基于所述图像分割掩膜,确定位置指示信息,包括:The determining the position indication information based on the image segmentation mask includes:
    基于所述图像分割掩膜,确定所述一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在所述动态图像中的坐标。Based on the image segmentation mask, determine the coordinates in the dynamic image of a specified position within an image area where each of the one or more moving objects is located.
  4. 如权利要求2或3所述的方法,其特征在于,所述基于所述图像分割掩膜和所述动态图像,提取出所述一个或多个子图像序列,包括:The method according to claim 2 or 3, wherein the extracting the one or more sub-image sequences based on the image segmentation mask and the dynamic image comprises:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作确定选择的运动对象对应的子图像序列,直至确定出每个运动对象对应的子图像序列为止:A moving object is selected from the one or more moving objects, and the sub-image sequence corresponding to the selected moving object is determined according to the following operations, until the sub-image sequence corresponding to each moving object is determined:
    基于所述图像分割掩膜,确定所述选择的运动对象所在的位置区域;determining the location area where the selected moving object is located based on the image segmentation mask;
    基于所述位置区域,从所述动态图像中除第一帧图像之外的每帧图像中提取出所述选择的运动对象所在的图像区域,得到所述选择的运动对象对应的子图像序列。Based on the location area, an image area where the selected moving object is located is extracted from each frame of the dynamic image except the first frame of image, and a sub-image sequence corresponding to the selected moving object is obtained.
  5. 如权利要求4所述的方法,其特征在于,所述基于所述图像分割掩膜,确定所述选择的运动对象所在的位置区域,包括:The method according to claim 4, wherein the determining, based on the image segmentation mask, the location area where the selected moving object is located, comprising:
    对所述图像分割掩膜中的各个像素点进行扫描,得到所述选择的运动对象对应的像素坐标集合,所述像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, where the pixel coordinate set includes coordinates of a plurality of pixel points;
    将所述像素坐标集合构成的位置区域确定为所述选择的运动对象所在的位置区域。The location area formed by the pixel coordinate set is determined as the location area where the selected moving object is located.
  6. 如权利要求4或5所述的方法,其特征在于,所述基于所述位置区域,从所述动态图像中除第一帧图像之外的每帧图像中提取出所述选择的运动对象所在的图像区域,包括:The method according to claim 4 or 5, wherein, based on the location area, extracting the location of the selected moving object from each frame of the moving image except the first frame of image image area, including:
    从所述动态图像中除第一帧图像之外的每帧图像中,提取出位于所述位置区域内的图像区域;extracting an image area located in the location area from each frame of image in the dynamic image except the first frame image;
    或者,or,
    对所述位置区域进行扩展,以使扩展后的位置区域为方形区域,从所述动态图像中除第一帧图像之外的每帧图像中,提取出位于所述扩展后的位置区域内的图像区域。Expand the position area, so that the expanded position area is a square area, and extract the position area located in the expanded position area from each frame of image in the dynamic image except the first frame image. image area.
  7. 如权利要求3所述的方法,其特征在于,所述指定位置为坐标最小的位置,或者为坐标最大的位置。The method according to claim 3, wherein the specified position is the position with the smallest coordinates or the position with the largest coordinates.
  8. 如权利要求3所述的方法,其特征在于,所述方法还包括:The method of claim 3, wherein the method further comprises:
    将所述一个或多个运动对象的数量编入码流。The number of the one or more moving objects is encoded into the codestream.
  9. 如权利要求1所述的方法,其特征在于,所述运动图像序列为所述动态图像,所述位置指示信息为所述图像分割掩膜。The method of claim 1, wherein the moving image sequence is the moving image, and the position indication information is the image segmentation mask.
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:The method of claim 9, wherein the method further comprises:
    基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域;based on the image segmentation mask, determining a plurality of segmentation regions corresponding to the plurality of objects one-to-one;
    按照所述多个分割区域,对所述动态图像中除第一帧图像之外的每帧图像进行区域划分,得到多个图像区域;According to the plurality of divided regions, each frame of image in the dynamic image except the first frame of image is divided into regions to obtain a plurality of image regions;
    确定所述多个分割区域中每个分割区域对应的对象状态,所述对象状态包括静止状态或运动状态;determining an object state corresponding to each segmented region in the plurality of segmented regions, where the object state includes a static state or a motion state;
    所述将所述运动图像序列编入码流,包括:The encoding of the moving image sequence into the code stream includes:
    将所述多个图像区域编入码流;encoding the plurality of image regions into a code stream;
    所述方法还包括:The method also includes:
    将所述多个分割区域中每个分割区域对应的对象状态编入码流。The object state corresponding to each of the plurality of divided regions is encoded into the code stream.
  11. 如权利要求10所述的方法,其特征在于,所述基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域,包括:The method of claim 10, wherein the determining, based on the image segmentation mask, a plurality of segmentation regions corresponding to the plurality of objects one-to-one comprises:
    基于所述图像分割掩膜,确定所述多个对象中每个对象所在的位置区域;determining, based on the image segmentation mask, a location area where each object in the plurality of objects is located;
    在所述多个对象中任一对象所在的位置区域不包含整数个编码树单元CTU的情况下,对所述任一对象所在的位置区域的边界进行扩展,以使所述任一对象所在的位置区域包含整数个CTU;In the case where the location area where any object is located in the plurality of objects does not include an integer number of coding tree units CTUs, the boundary of the location area where the any object is located is extended, so that the location area where the any object is located is extended. The location area contains an integer number of CTUs;
    将扩展处理后所述多个对象所在的位置区域,确定为所述多个分割区域。Determine the location regions where the multiple objects are located after the expansion process as the multiple segmented regions.
  12. 如权利要求11所述的方法,其特征在于,所述将所述多个图像区域编入码流,包括:The method of claim 11, wherein the encoding the multiple image regions into the code stream comprises:
    将所述多个图像区域中的每个图像区域分别作为一个编码块编入码流;Encoding each image area in the plurality of image areas as an encoding block into the code stream respectively;
    或者,or,
    将所述多个图像区域中每个图像区域内的每一行CTU组成的区域作为一个编码块编入码流;Encoding the region composed of each row of CTUs in each of the plurality of image regions as a coding block into the code stream;
    其中,参考编码块所处的位置区域位于被参考编码块所处的位置区域内。Wherein, the location area where the reference coding block is located is located in the location area where the referenced coding block is located.
  13. 如权利要求1-12任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-12, wherein the method further comprises:
    将所述动态图像的第一帧图像编入码流。The first frame image of the dynamic image is encoded into the code stream.
  14. 一种动态图像的解码方法,其特征在于,所述方法包括:A method for decoding a dynamic image, wherein the method comprises:
    从码流中解析出第一帧图像;Parse the first frame image from the code stream;
    从所述码流中解析出运动图像序列和位置指示信息,所述运动图像序列中的每帧图像包括一个或多个运动对象所处的图像区域,所述位置指示信息用于指示所述一个或多个运动对象所处的图像区域的位置;A moving image sequence and position indication information are parsed from the code stream, each frame of image in the moving image sequence includes an image area where one or more moving objects are located, and the position indication information is used to indicate the one or more moving objects. or the position of the image area in which multiple moving objects are located;
    基于所述运动图像序列和所述位置指示信息,在所述第一帧图像中对所述一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。Based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located is rendered and displayed in the first frame of image to obtain a moving image.
  15. 如权利要求14所述的方法,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;The method of claim 14, wherein the moving image sequence comprises one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
    所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象。The position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects.
  16. 如权利要求15所述的方法,其特征在于,所述基于所述运动图像序列和所述位置指示信息,在所述第一帧图像中对所述一个或多个运动对象所处的图像区域进行渲染并显示,包括:The method according to claim 15, wherein, based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located in the first frame image Render and display, including:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:A moving object is selected from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed according to the following operations, until the image area where each moving object is located is rendered and displayed:
    基于所述图像分割掩膜,确定所述选择的运动对象所处的图像区域的位置;determining the position of the image region where the selected moving object is located based on the image segmentation mask;
    按照所述选择的运动对象所处的图像区域的位置,在所述第一帧图像中对所述选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。According to the position of the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is rendered and displayed in the first frame of image.
  17. 如权利要求16所述的方法,其特征在于,所述基于所述图像分割掩膜,确定所述选择的运动对象所处的图像区域的位置,包括:The method according to claim 16, wherein the determining, based on the image segmentation mask, the position of the image area where the selected moving object is located comprises:
    对所述图像分割掩膜中的各个像素点进行扫描,得到所述选择的运动对象对应的像素坐标集合,所述像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, where the pixel coordinate set includes coordinates of a plurality of pixel points;
    将所述像素坐标集合构成的位置区域确定为所述选择的运动对象所处的图像区域的位置,或者,对所述像素坐标集合构成的位置区域进行扩展,以使扩展后的位置区域为方形区域,将所述扩展后的位置区域确定为所述选择的运动对象所处的图像区域的位置。Determine the location area formed by the pixel coordinate set as the position of the image area where the selected moving object is located, or expand the location area formed by the pixel coordinate set, so that the expanded location area is a square area, and the expanded location area is determined as the location of the image area where the selected moving object is located.
  18. 如权利要求14所述的方法,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;The method of claim 14, wherein the moving image sequence comprises one or more sub-image sequences, and the one or more sub-image sequences are in one-to-one correspondence with the one or more moving objects;
    所述位置指示信息包括所述一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在所述动态图像中的坐标。The position indication information includes coordinates in the dynamic image of a specified position within an image area where each of the one or more moving objects is located.
  19. 如权利要求18所述的方法,其特征在于,所述基于所述运动图像序列和所述位置指示信息,在所述第一帧图像中对所述一个或多个运动对象所处的图像区域进行渲染并显示,包括:The method according to claim 18, wherein, based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located in the first frame image Render and display, including:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:A moving object is selected from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed according to the following operations, until the image area where each moving object is located is rendered and displayed:
    按照所述选择的运动对象所处的图像区域内的指定位置在所述动态图像中的坐标,在所述第一帧图像中对所述选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。According to the coordinates in the dynamic image of the specified position in the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is performed in the first frame image. Render and display.
  20. 如权利要求18或19所述的方法,其特征在于,所述指定位置为坐标最小的位置,或者为坐标最大的位置。The method according to claim 18 or 19, wherein the specified position is the position with the smallest coordinates or the position with the largest coordinates.
  21. 如权利要求18-20任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 18-20, wherein the method further comprises:
    从所述码流中解析出所述一个或多个运动对象的数量。The number of the one or more moving objects is parsed from the code stream.
  22. 如权利要求14所述的方法,其特征在于,所述运动图像序列为所述动态图像,所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象。The method of claim 14, wherein the moving image sequence is the moving image, the position indication information is an image segmentation mask, and the image segmentation mask includes a one-to-one correspondence with a plurality of objects a plurality of image regions, the plurality of objects including the one or more moving objects.
  23. 如权利要求22所述的方法,其特征在于,所述基于所述运动图像序列和所述位置指示信息,在所述第一帧图像中对所述一个或多个运动对象所处的图像区域进行渲染并显示,包括:The method according to claim 22, wherein, based on the moving image sequence and the position indication information, the image area where the one or more moving objects are located in the first frame image Render and display, including:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:A moving object is selected from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed according to the following operations, until the image area where each moving object is located is rendered and displayed:
    基于所述图像分割掩膜,确定所述选择的运动对象所处的图像区域的位置;determining the position of the image region where the selected moving object is located based on the image segmentation mask;
    基于所述选择的运动对象所处的图像区域的位置,从所述动态图像中除第一帧图像之外的每帧图像中提取出所述选择的运动对象所处的图像区域;based on the position of the image area where the selected moving object is located, extracting the image area where the selected moving object is located from each frame of images in the dynamic image except the first frame of image;
    按照所述选择的运动对象所处的图像区域的位置,在所述第一帧图像中对所述选择的运动对象在所述动态图像的每帧图像中所处的图像区域进行渲染并显示。According to the position of the image area where the selected moving object is located, the image area where the selected moving object is located in each frame of the dynamic image is rendered and displayed in the first frame of image.
  24. 如权利要求14所述的方法,其特征在于,所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象;The method of claim 14, wherein the position indication information is an image segmentation mask, the image segmentation mask comprising a plurality of image regions corresponding to a plurality of objects one-to-one, the plurality of objects comprising the one or more moving objects;
    所述从所述码流中解析出运动图像序列,包括:The parsing of the moving image sequence from the code stream includes:
    基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域;based on the image segmentation mask, determining a plurality of segmentation regions corresponding to the plurality of objects one-to-one;
    从所述码流中解析出所述多个分割区域中每个分割区域对应的对象状态,所述对象状态包括静止状态或运动状态;Parse out, from the code stream, an object state corresponding to each of the plurality of divided regions, where the object state includes a static state or a motion state;
    基于所述多个分割区域中每个分割区域对应的对象状态,从所述码流中解析出运动状态对应的分割区域所划分出的图像区域,得到所述运动图像序列。Based on the object state corresponding to each of the plurality of divided areas, the image area divided by the divided area corresponding to the motion state is parsed from the code stream to obtain the moving image sequence.
  25. 如权利要求24所述的方法,其特征在于,所述基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域,包括:The method of claim 24, wherein the determining, based on the image segmentation mask, a plurality of segmentation regions corresponding to the plurality of objects one-to-one comprises:
    基于所述图像分割掩膜,确定所述多个对象中每个对象所在的位置区域;determining, based on the image segmentation mask, a location area where each object in the plurality of objects is located;
    在所述多个对象中任一对象所在的位置区域不包含整数个CTU的情况下,对所述任一对象所在的位置区域的边界进行扩展,以使所述任一对象所在的位置区域包含整数个CTU;In the case where the location area where any one of the multiple objects is located does not contain an integer number of CTUs, the boundary of the location area where the any object is located is extended, so that the location area where the any object is located includes an integer number of CTUs;
    将扩展处理后所述多个对象所在的位置区域,确定为所述多个分割区域。Determine the location regions where the multiple objects are located after the expansion process as the multiple segmented regions.
  26. 如权利要求14-20、22-25任一所述的方法,其特征在于,所述从所述码流中解析出运动图像序列和位置指示信息之前,还包括:The method according to any one of claims 14-20 and 22-25, wherein before parsing the moving image sequence and position indication information from the code stream, the method further comprises:
    接收对象选择指令,所述对象选择指令用于从所述动态图像包括的多个对象中选择一个或多个对象;receiving an object selection instruction, the object selection instruction being used to select one or more objects from a plurality of objects included in the dynamic image;
    将通过所述对象选择指令所选择的一个或多个对象确定为所述一个或多个运动对象。One or more objects selected by the object selection instruction are determined as the one or more moving objects.
  27. 如权利要求14-26任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 14-26, wherein the method further comprises:
    从所述码流中解析出用于进行编码的编码器类型;Parse out the encoder type used for encoding from the code stream;
    按照解析出的编码器类型,确定对应的解码器类型。Determine the corresponding decoder type according to the parsed encoder type.
  28. 一种动态图像的编码装置,其特征在于,所述装置包括:A dynamic image encoding device, characterized in that the device comprises:
    语义分割模块,用于对动态图像中的任一帧图像进行语义分割,得到图像分割掩膜,所述动态图像包括多个对象,所述图像分割掩膜包括与所述多个对象一一对应的多个图像区域;The semantic segmentation module is used to perform semantic segmentation on any frame of image in the dynamic image to obtain an image segmentation mask, the dynamic image includes multiple objects, and the image segmentation mask includes a one-to-one correspondence with the multiple objects of multiple image regions;
    图像序列提取模块,用于基于所述动态图像,确定运动图像序列,所述运动图像序列中的每帧图像包括所述多个对象中的一个或多个运动对象所处的图像区域;an image sequence extraction module, configured to determine a moving image sequence based on the moving image, where each frame of image in the moving image sequence includes an image area where one or more moving objects in the plurality of objects are located;
    位置指示信息确定模块,用于基于所述图像分割掩膜,确定位置指示信息,所述位置指示信息用于指示所述一个或多个运动对象所处的图像区域的位置;a location indication information determination module, configured to determine location indication information based on the image segmentation mask, where the location indication information is used to indicate the location of the image area where the one or more moving objects are located;
    第一编码模块,用于将所述运动图像序列以及所述位置指示信息编入码流。The first encoding module is used for encoding the moving image sequence and the position indication information into a code stream.
  29. 如权利要求28所述的装置,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述位置指示信息为所述图像分割掩膜;The apparatus of claim 28, wherein the moving image sequence includes one or more sub-image sequences, and the position indication information is the image segmentation mask;
    所述图像序列提取模块包括:The image sequence extraction module includes:
    图像序列提取子模块,用于基于所述图像分割掩膜和所述动态图像,提取出所述一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应。An image sequence extraction sub-module, configured to extract the one or more sub-image sequences based on the image segmentation mask and the dynamic image, the one or more sub-image sequences are identical to the one or more moving objects. A correspondence.
  30. 如权利要求28所述的装置,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述位置指示信息包括一个或多个指定位置的坐标;The apparatus of claim 28, wherein the moving image sequence includes one or more sub-image sequences, and the location indication information includes coordinates of one or more designated locations;
    所述图像序列提取模块包括:The image sequence extraction module includes:
    图像序列提取子模块,用于基于所述图像分割掩膜和所述动态图像,提取出所述一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;An image sequence extraction sub-module, configured to extract the one or more sub-image sequences based on the image segmentation mask and the dynamic image, the one or more sub-image sequences are identical to the one or more moving objects. one correspondence;
    所述位置指示信息确定模块包括:The location indication information determination module includes:
    位置坐标确定子模块,用于基于所述图像分割掩膜,确定所述一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在所述动态图像中的坐标。A position coordinate determination sub-module is configured to determine, based on the image segmentation mask, the coordinates in the dynamic image of a specified position within the image area where each of the one or more moving objects is located.
  31. 如权利要求29或30所述的装置,其特征在于,所述图像序列提取子模块包括:The apparatus according to claim 29 or 30, wherein the image sequence extraction submodule comprises:
    选择子模块,用于从所述一个或多个运动对象中选择一个运动对象,通过以下模块确定选择的运动对象对应的子图像序列,直至确定出每个运动对象对应的子图像序列为止:The selection sub-module is used to select a moving object from the one or more moving objects, and the sub-image sequence corresponding to the selected moving object is determined by the following modules, until the sub-image sequence corresponding to each moving object is determined:
    位置区域确定子模块,用于基于所述图像分割掩膜,确定所述选择的运动对象所在的位置区域;a location area determination sub-module for determining the location area where the selected moving object is located based on the image segmentation mask;
    图像区域提取子模块,用于基于所述位置区域,从所述动态图像中除第一帧图像之外的每帧图像中提取出所述选择的运动对象所在的图像区域,得到所述选择的运动对象对应的子图像序列。The image area extraction sub-module is used to extract the image area where the selected moving object is located from each frame of the dynamic image except the first frame image based on the position area, and obtain the selected image area. The sub-image sequence corresponding to the moving object.
  32. 如权利要求31所述的装置,其特征在于,所述位置区域确定子模块具体用于:The apparatus of claim 31, wherein the location area determination submodule is specifically used for:
    对所述图像分割掩膜中的各个像素点进行扫描,得到所述选择的运动对象对应的像素坐标集合,所述像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, where the pixel coordinate set includes coordinates of a plurality of pixel points;
    将所述像素坐标集合构成的位置区域确定为所述选择的运动对象所在的位置区域。The location area formed by the pixel coordinate set is determined as the location area where the selected moving object is located.
  33. 如权利要求31或32所述的装置,其特征在于,所述图像区域提取子模块具体用于:The device according to claim 31 or 32, wherein the image region extraction submodule is specifically used for:
    从所述动态图像中除第一帧图像之外的每帧图像中,提取出位于所述位置区域内的图像区域;extracting an image area located in the location area from each frame of image in the dynamic image except the first frame image;
    或者,or,
    对所述位置区域进行扩展,以使扩展后的位置区域为方形区域,从所述动态图像中除第一帧图像之外的每帧图像中,提取出位于所述扩展后的位置区域内的图像区域。Expand the position area, so that the expanded position area is a square area, and extract the position area located in the expanded position area from each frame of image in the dynamic image except the first frame image. image area.
  34. 如权利要求30所述的装置,其特征在于,所述指定位置为坐标最小的位置,或者为坐标最大的位置。The device according to claim 30, wherein the designated position is the position with the smallest coordinates or the position with the largest coordinates.
  35. 如权利要求30所述的装置,其特征在于,所述装置还包括:The apparatus of claim 30, wherein the apparatus further comprises:
    第二编码模块,用于将所述一个或多个运动对象的数量编入码流。The second encoding module is configured to encode the number of the one or more moving objects into the code stream.
  36. 如权利要求28所述的装置,其特征在于,所述运动图像序列为所述动态图像,所述位置指示信息为所述图像分割掩膜。The apparatus of claim 28, wherein the moving image sequence is the moving image, and the position indication information is the image segmentation mask.
  37. 如权利要求36所述的装置,其特征在于,所述装置还包括:The apparatus of claim 36, wherein the apparatus further comprises:
    分割区域确定模块,用于基于所述图像分割掩膜,确定与所述多个对象一一对应的多个 分割区域;A segmented region determination module, configured to determine a plurality of segmented regions corresponding to the multiple objects one-to-one based on the image segmentation mask;
    区域划分模块,用于按照所述多个分割区域,对所述动态图像中除第一帧图像之外的每帧图像进行区域划分,得到多个图像区域;an area division module, configured to perform area division on each frame of image in the dynamic image except the first frame image according to the plurality of divided areas to obtain a plurality of image areas;
    对象状态确定模块,用于确定所述多个分割区域中每个分割区域对应的对象状态,所述对象状态包括静止状态或运动状态;an object state determination module, configured to determine an object state corresponding to each segmented region in the plurality of segmented regions, where the object state includes a static state or a motion state;
    所述第一编码模块包括:The first encoding module includes:
    图像区域编码子模块,用于将所述多个图像区域编入码流;an image region encoding submodule, used for encoding the multiple image regions into a code stream;
    所述装置还包括:The device also includes:
    第三编码模块,用于将所述多个分割区域中每个分割区域对应的对象状态编入码流。The third encoding module is configured to encode the object state corresponding to each of the plurality of divided regions into the code stream.
  38. 如权利要求37所述的装置,其特征在于,所述分割区域确定模块具体用于:The apparatus of claim 37, wherein the segmented region determination module is specifically configured to:
    基于所述图像分割掩膜,确定所述多个对象中每个对象所在的位置区域;determining, based on the image segmentation mask, a location area where each object in the plurality of objects is located;
    在所述多个对象中任一对象所在的位置区域不包含整数个编码树单元CTU的情况下,对所述任一对象所在的位置区域的边界进行扩展,以使所述任一对象所在的位置区域包含整数个CTU;In the case where the location area where any object is located in the plurality of objects does not contain an integer number of coding tree units CTUs, the boundary of the location area where the any object is located is extended, so that the location area where the any object is located is extended. The location area contains an integer number of CTUs;
    将扩展处理后所述多个对象所在的位置区域,确定为所述多个分割区域。Determine the location regions where the multiple objects are located after the expansion process as the multiple segmented regions.
  39. 如权利要求38所述的装置,其特征在于,所述图像区域编码子模块具体用于:The apparatus of claim 38, wherein the image region coding submodule is specifically used for:
    将所述多个图像区域中的每个图像区域分别作为一个编码块编入码流;Encoding each image area in the plurality of image areas as an encoding block into the code stream respectively;
    或者,or,
    将所述多个图像区域中每个图像区域内的每一行CTU组成的区域作为一个编码块编入码流;Encoding the region composed of each row of CTUs in each of the plurality of image regions as a coding block into the code stream;
    其中,参考编码块所处的位置区域位于被参考编码块所处的位置区域内。Wherein, the location area where the reference coding block is located is located in the location area where the referenced coding block is located.
  40. 如权利要求28-39任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 28-39, wherein the device further comprises:
    第四编码模块,用于将所述动态图像的第一帧图像编入码流。The fourth encoding module is used for encoding the first frame of the dynamic image into the code stream.
  41. 一种动态图像的解码装置,其特征在于,所述装置包括:A moving image decoding device, characterized in that the device comprises:
    图像解码模块,用于从码流中解析出第一帧图像;The image decoding module is used to parse the first frame image from the code stream;
    第一解码模块,用于从所述码流中解析出运动图像序列和位置指示信息,所述运动图像序列中的每帧图像包括一个或多个运动对象所处的图像区域,所述位置指示信息用于指示所述一个或多个运动对象所处的图像区域的位置;a first decoding module, configured to parse out a moving image sequence and position indication information from the code stream, each frame of image in the moving image sequence includes an image area where one or more moving objects are located, and the position indication information is used to indicate the location of the image area in which the one or more moving objects are located;
    图像合成模块,用于基于所述运动图像序列和所述位置指示信息,在所述第一帧图像中对所述一个或多个运动对象所处的图像区域进行渲染并显示,得到动态图像。An image synthesis module, configured to render and display the image area where the one or more moving objects are located in the first frame of image based on the moving image sequence and the position indication information, to obtain a moving image.
  42. 如权利要求41所述的装置,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;The apparatus of claim 41, wherein the moving image sequence comprises one or more sub-image sequences, and the one or more sub-image sequences correspond to the one or more moving objects one-to-one;
    所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象。The position indication information is an image segmentation mask, and the image segmentation mask includes a plurality of image regions corresponding to a plurality of objects one-to-one, and the plurality of objects include the one or more moving objects.
  43. 如权利要求42所述的装置,其特征在于,所述图像合成模块包括:The apparatus of claim 42, wherein the image synthesis module comprises:
    选择子模块,用于从所述一个或多个运动对象中选择一个运动对象,通过以下模块对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:The selection sub-module is used to select a moving object from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed by the following modules, until the image area where each moving object is located is located. Render and display so far:
    位置确定子模块,用于基于所述图像分割掩膜,确定所述选择的运动对象所处的图像区域的位置;a position determination submodule for determining the position of the image area where the selected moving object is located based on the image segmentation mask;
    渲染显示子模块,用于按照所述选择的运动对象所处的图像区域的位置,在所述第一帧图像中对所述选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。A rendering and display sub-module, configured to render and display the image area included in the sub-image sequence corresponding to the selected moving object in the first frame of image according to the position of the image area where the selected moving object is located .
  44. 如权利要求43所述的装置,其特征在于,所述位置确定子模块具体用于:The apparatus of claim 43, wherein the position determination submodule is specifically used for:
    对所述图像分割掩膜中的各个像素点进行扫描,得到所述选择的运动对象对应的像素坐标集合,所述像素坐标集合包括多个像素点的坐标;Scan each pixel in the image segmentation mask to obtain a pixel coordinate set corresponding to the selected moving object, where the pixel coordinate set includes coordinates of a plurality of pixel points;
    将所述像素坐标集合构成的位置区域确定为所述选择的运动对象所处的图像区域的位置,或者,对所述像素坐标集合构成的位置区域进行扩展,以使扩展后的位置区域为方形区域,将所述扩展后的位置区域确定为所述选择的运动对象所处的图像区域的位置。Determine the location area formed by the pixel coordinate set as the position of the image area where the selected moving object is located, or expand the location area formed by the pixel coordinate set, so that the expanded location area is a square area, and the expanded location area is determined as the location of the image area where the selected moving object is located.
  45. 如权利要求41所述的装置,其特征在于,所述运动图像序列包括一个或多个子图像序列,所述一个或多个子图像序列与所述一个或多个运动对象一一对应;The apparatus of claim 41, wherein the moving image sequence comprises one or more sub-image sequences, and the one or more sub-image sequences correspond to the one or more moving objects one-to-one;
    所述位置指示信息包括所述一个或多个运动对象中每个运动对象所处的图像区域内的指定位置在所述动态图像中的坐标。The position indication information includes coordinates in the dynamic image of a specified position within an image area where each of the one or more moving objects is located.
  46. 如权利要求45所述的装置,其特征在于,所述图像合成模块具体用于:The apparatus of claim 45, wherein the image synthesis module is specifically configured to:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:A moving object is selected from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed according to the following operations, until the image area where each moving object is located is rendered and displayed:
    按照所述选择的运动对象所处的图像区域内的指定位置在所述动态图像中的坐标,在所述第一帧图像中对所述选择的运动对象对应的子图像序列包括的图像区域进行渲染并显示。According to the coordinates in the dynamic image of the specified position in the image area where the selected moving object is located, the image area included in the sub-image sequence corresponding to the selected moving object is performed in the first frame image. Render and display.
  47. 如权利要求45或46所述的装置,其特征在于,所述指定位置为坐标最小的位置,或者为坐标最大的位置。The device according to claim 45 or 46, wherein the designated position is the position with the smallest coordinates or the position with the largest coordinates.
  48. 如权利要求45-47任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 45-47, wherein the device further comprises:
    第二解码模块,用于从所述码流中解析出所述一个或多个运动对象的数量。The second decoding module is configured to parse out the quantity of the one or more moving objects from the code stream.
  49. 如权利要求41所述的装置,其特征在于,所述运动图像序列为所述动态图像,所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象。The apparatus of claim 41, wherein the moving image sequence is the moving image, the position indication information is an image segmentation mask, and the image segmentation mask includes a one-to-one correspondence with a plurality of objects a plurality of image regions, the plurality of objects including the one or more moving objects.
  50. 如权利要求49所述的装置,其特征在于,所述图像合成模块具体用于:The apparatus of claim 49, wherein the image synthesis module is specifically used for:
    从所述一个或多个运动对象中选择一个运动对象,按照以下操作对选择的运动对象所处的图像区域进行渲染并显示,直至对每个运动对象所处的图像区域进行渲染并显示为止:A moving object is selected from the one or more moving objects, and the image area where the selected moving object is located is rendered and displayed according to the following operations, until the image area where each moving object is located is rendered and displayed:
    基于所述图像分割掩膜,确定所述选择的运动对象所处的图像区域的位置;determining the position of the image region where the selected moving object is located based on the image segmentation mask;
    基于所述选择的运动对象所处的图像区域的位置,从所述动态图像中除第一帧图像之外的每帧图像中提取出所述选择的运动对象所处的图像区域;based on the position of the image area where the selected moving object is located, extracting the image area where the selected moving object is located from each frame of images in the dynamic image except the first frame of image;
    按照所述选择的运动对象所处的图像区域的位置,在所述第一帧图像中对所述选择的运动对象在所述动态图像的每帧图像中所处的图像区域进行渲染并显示。According to the position of the image area where the selected moving object is located, the image area where the selected moving object is located in each frame of the dynamic image is rendered and displayed in the first frame of image.
  51. 如权利要求41所述的装置,其特征在于,所述位置指示信息为图像分割掩膜,所述图像分割掩膜包括与多个对象一一对应的多个图像区域,所述多个对象包括所述一个或多个运动对象;The apparatus of claim 41, wherein the position indication information is an image segmentation mask, the image segmentation mask comprising a plurality of image regions corresponding to a plurality of objects one-to-one, the plurality of objects comprising the one or more moving objects;
    所述第一解码模块包括:The first decoding module includes:
    分割区域确定子模块,用于基于所述图像分割掩膜,确定与所述多个对象一一对应的多个分割区域;a segmentation area determination submodule, configured to determine a plurality of segmentation areas corresponding to the multiple objects one-to-one based on the image segmentation mask;
    对象状态确定子模块,用于从所述码流中解析出所述多个分割区域中每个分割区域对应的对象状态,所述对象状态包括静止状态或运动状态;an object state determination submodule, configured to parse out the object state corresponding to each of the plurality of divided regions from the code stream, and the object state includes a static state or a motion state;
    图像区域解码子模块,用于基于所述多个分割区域中每个分割区域对应的对象状态,从所述码流中解析出运动状态对应的分割区域所划分出的图像区域,得到所述运动图像序列。An image area decoding sub-module, configured to analyze the image area divided by the divided area corresponding to the motion state from the code stream based on the object state corresponding to each divided area in the plurality of divided areas, and obtain the motion image sequence.
  52. 如权利要求51所述的装置,其特征在于,所述分割区域确定子模块具体用于:The apparatus according to claim 51, wherein the sub-module for determining the segmented region is specifically configured to:
    基于所述图像分割掩膜,确定所述多个对象中每个对象所在的位置区域;determining, based on the image segmentation mask, a location area where each object in the plurality of objects is located;
    在所述多个对象中任一对象所在的位置区域不包含整数个CTU的情况下,对所述任一对象所在的位置区域的边界进行扩展,以使所述任一对象所在的位置区域包含整数个CTU;In the case where the location area where any one of the multiple objects is located does not contain an integer number of CTUs, the boundary of the location area where the any object is located is extended, so that the location area where the any object is located includes an integer number of CTUs;
    将扩展处理后所述多个对象所在的位置区域,确定为所述多个分割区域。Determine the location regions where the multiple objects are located after the expansion process as the multiple segmented regions.
  53. 如权利要求41-47、49-52任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 41-47 and 49-52, wherein the device further comprises:
    指令接收模块,用于接收对象选择指令,所述对象选择指令用于从所述动态图像包括的多个对象中选择一个或多个对象;an instruction receiving module, configured to receive an object selection instruction, where the object selection instruction is used to select one or more objects from a plurality of objects included in the dynamic image;
    运动对象确定模块,用于将通过所述对象选择指令所选择的一个或多个对象确定为所述一个或多个运动对象。A moving object determination module, configured to determine one or more objects selected by the object selection instruction as the one or more moving objects.
  54. 如权利要求41-53任一所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 41-53, wherein the device further comprises:
    第三解码模块,用于从所述码流中解析出用于进行编码的编码器类型;A third decoding module, configured to parse out the encoder type used for encoding from the code stream;
    解码器类型确定模块,用于按照解析出的编码器类型,确定对应的解码器类型。The decoder type determination module is used to determine the corresponding decoder type according to the parsed encoder type.
  55. 一种编码端设备,其特征在于,所述编码端设备包括存储器和处理器;A coding end device, characterized in that the coding end device comprises a memory and a processor;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以实现权利要求1-13任一所述的动态图像的编码方法。The memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so as to realize the encoding method of a dynamic image according to any one of claims 1-13.
  56. 一种解码端设备,其特征在于,所述解码端设备包括存储器和处理器;A decoding end device, characterized in that the decoding end device includes a memory and a processor;
    所述存储器用于存储计算机程序,所述处理器用于执行所述存储器中存储的计算机程序,以实现权利要求14-27任一所述的动态图像的解码方法。The memory is used for storing a computer program, and the processor is used for executing the computer program stored in the memory, so as to realize the decoding method of a dynamic image according to any one of claims 14-27.
  57. 一种计算机可读存储介质,其特征在于,所述存储介质内存储有指令,当所述指令在所述计算机上运行时,使得所述计算机执行权利要求1-27任一所述的方法的步骤。A computer-readable storage medium, wherein instructions are stored in the storage medium, and when the instructions are executed on the computer, the computer is made to execute the method of any one of claims 1-27. step.
PCT/CN2022/086880 2021-04-19 2022-04-14 Dynamic image encoding and decoding methods, apparatus and device and storage medium WO2022222842A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110421196.1 2021-04-19
CN202110421196.1A CN115225901A (en) 2021-04-19 2021-04-19 Method, device, equipment and storage medium for encoding and decoding dynamic image

Publications (1)

Publication Number Publication Date
WO2022222842A1 true WO2022222842A1 (en) 2022-10-27

Family

ID=83604064

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086880 WO2022222842A1 (en) 2021-04-19 2022-04-14 Dynamic image encoding and decoding methods, apparatus and device and storage medium

Country Status (2)

Country Link
CN (1) CN115225901A (en)
WO (1) WO2022222842A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115861330A (en) * 2023-03-03 2023-03-28 深圳市小辉智驾智能有限公司 Camera image data transmission method, camera image data identification method and camera image data identification device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030194148A1 (en) * 2000-04-28 2003-10-16 Paul Haeberli System and method of cropping an image
CN103997687A (en) * 2013-02-20 2014-08-20 英特尔公司 Techniques for adding interactive features to videos
WO2017176349A1 (en) * 2016-04-07 2017-10-12 Intel Corporation Automatic cinemagraph

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030194148A1 (en) * 2000-04-28 2003-10-16 Paul Haeberli System and method of cropping an image
CN103997687A (en) * 2013-02-20 2014-08-20 英特尔公司 Techniques for adding interactive features to videos
WO2017176349A1 (en) * 2016-04-07 2017-10-12 Intel Corporation Automatic cinemagraph

Also Published As

Publication number Publication date
CN115225901A (en) 2022-10-21

Similar Documents

Publication Publication Date Title
US20210029381A1 (en) Method and apparatus for obtaining global matched patch
US20210118188A1 (en) Point Cloud Encoding Method, Point Cloud Decoding Method, Encoder, and Decoder
US11538196B2 (en) Predictive coding for point cloud compression
US11373338B2 (en) Image padding in video-based point-cloud compression CODEC
TWI674790B (en) Method of picture data encoding and decoding and apparatus
US20190273929A1 (en) De-Blocking Filtering Method and Terminal
US10027970B2 (en) Render-orientation information in video bitstream
CN103348695B (en) Low latency wireless display for graphics
US11388442B2 (en) Point cloud encoding method, point cloud decoding method, encoder, and decoder
US20200021831A1 (en) Image encoding/decoding method, video encoder/decoder, and video coding/decoding system
KR20180013879A (en) Method and apparatus for generating and transmitting metadata for virtual reality
CN110944187B (en) Point cloud encoding method and encoder
US9888247B2 (en) Video coding using region of interest to omit skipped block information
CN109218755B (en) Media data processing method and device
US20190268601A1 (en) Efficient streaming video for static video content
CN111327902B (en) Point cloud encoding and decoding method and device
CN111641836A (en) Method and device for point cloud compression, computer equipment and storage medium
WO2022222842A1 (en) Dynamic image encoding and decoding methods, apparatus and device and storage medium
US20230353747A1 (en) Storage of evc decoder configuration information
CN111031389A (en) Video processing method, electronic device and storage medium
CN110996122A (en) Video frame transmission method and device, computer equipment and storage medium
KR20220023341A (en) Sub-pictures and sub-picture sets with level derivation
CN113615201A (en) Method and device for point cloud compression
WO2018219202A1 (en) Method for presenting and packaging video image, and device for presenting and packaging video image
WO2020015517A1 (en) Point cloud encoding method, point cloud decoding method, encoder and decoder

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22790950

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22790950

Country of ref document: EP

Kind code of ref document: A1