WO2021244182A1 - 视频编码方法、视频解码方法及相关设备 - Google Patents

视频编码方法、视频解码方法及相关设备 Download PDF

Info

Publication number
WO2021244182A1
WO2021244182A1 PCT/CN2021/089583 CN2021089583W WO2021244182A1 WO 2021244182 A1 WO2021244182 A1 WO 2021244182A1 CN 2021089583 W CN2021089583 W CN 2021089583W WO 2021244182 A1 WO2021244182 A1 WO 2021244182A1
Authority
WO
WIPO (PCT)
Prior art keywords
current
pixels
string
block
reference string
Prior art date
Application number
PCT/CN2021/089583
Other languages
English (en)
French (fr)
Inventor
王英彬
许晓中
刘杉
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21817776.4A priority Critical patent/EP4030756A4/en
Publication of WO2021244182A1 publication Critical patent/WO2021244182A1/zh
Priority to US17/706,951 priority patent/US12063353B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/43Hardware specially adapted for motion estimation or compensation
    • H04N19/433Hardware specially adapted for motion estimation or compensation characterised by techniques for memory access
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Definitions

  • the present disclosure relates to the technical field of video coding and decoding, in particular, to video coding and decoding.
  • the Internet is about to enter a new era of 5G (5th generation mobile networks or 5th generation wireless systems, 5th-Generation, fifth-generation mobile communication technology), and images (videos) appearing in various Internet applications have become the main consumption of Internet bandwidth.
  • 5G 5th generation mobile networks or 5th generation wireless systems, 5th-Generation, fifth-generation mobile communication technology
  • images (videos) appearing in various Internet applications have become the main consumption of Internet bandwidth.
  • the image traffic of the mobile Internet is increasing day by day, and there will be explosive growth in the 5G era, which will inject a new and powerful driving force into the accelerated development of image coding and decoding technology.
  • it also poses many new and severe challenges to the image coding and decoding technology that have not been encountered in the past.
  • In the 5G era all things are interconnected, and the new Internet images produced in various emerging applications are diverse and differentiated. Therefore, it has become an urgent need to study efficient image coding and decoding technologies in accordance with the characteristics of new Internet images with diversity and differences.
  • video data is usually compressed before being transmitted over modern telecommunications networks.
  • video compression devices Before transmission, video compression devices usually use software and/or hardware on the source side to encode video data, thereby reducing the amount of data required to represent digital video images.
  • the compressed data is then received at the destination by the video decompression device, which decodes the video data.
  • the string prediction scheme in the related technology also called intra-frame string replication technology or string matching technology
  • the embodiments of the present disclosure provide a video encoding method, a video decoding method, an electronic device, and a computer-readable storage medium, which can simplify the hardware implementation of string prediction.
  • an embodiment of the present disclosure provides a video encoding method.
  • the method includes acquiring a current image, the current image includes a maximum coding unit, and the maximum coding unit includes a current maximum coding unit and an encoded maximum coding unit, so
  • the current maximum coding unit includes a current coding block, and the current coding block includes a current string; a first part of a storage space of M*W size is used to store pixels in the current coding block, and a second part of the storage space is used Store at least part of the coded blocks in the coded maximum coding unit and the current maximum coding unit, where M is a positive integer greater than or equal to W; search for the reference of the current string in the second part of the storage space String, to obtain the predicted value of the current string according to the reference string, and encode the current string.
  • an embodiment of the present disclosure provides a video decoding method, the method includes: obtaining a code stream of a current image, the code stream includes a maximum coding unit, and the maximum coding unit includes the current maximum coding unit and the coded maximum A coding unit, the current maximum coding unit includes a current coding block, the current coding block includes a current string; the first part of a storage space of M*W size is used to store pixels in the current decoding block, and the storage space is used The second part of the storage space stores the encoded maximum coding unit and at least part of the decoded blocks in the current maximum coding unit, where M is a positive integer greater than or equal to W; the second part of the storage space is searched for the The reference string of the current string is used to obtain the predicted value of the current string according to the reference string, and the current string is decoded.
  • the embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the video encoding method or the video decoding method as described in the foregoing aspect.
  • an embodiment of the present disclosure provides an electronic device, including: at least one processor; a storage device configured to store at least one program, and when the at least one program is executed by the at least one processor, the The at least one processor implements the video encoding method or the video decoding method as described in the foregoing aspect.
  • embodiments of the present application provide a computer program product including instructions, which when run on a computer, cause the computer to execute the video encoding method or the video decoding method described in the above aspect.
  • the pixels in the current encoding block are stored by using the first part of the M*W storage space, and the second part of the storage space is used to store the Encoding the largest coding unit and at least part of the encoded blocks in the current largest coding unit, where M is a positive integer greater than or equal to W; and restricting the search for the reference string of the current string in the second part of the storage space, This can simplify the hardware implementation of string prediction.
  • the reconstruction dependency between different strings can be relieved, so that different strings can be reconstructed in parallel, thereby simplifying
  • the hardware implementation of string prediction is improved, and the efficiency of string prediction is improved.
  • Figure 1 schematically shows a basic block diagram of a video coding in related technologies
  • Fig. 2 schematically shows a schematic diagram of inter-frame prediction in related technologies
  • Fig. 3 schematically shows the position of candidate MVPs in the spatial domain of the Merge mode of inter-frame prediction in the related art
  • FIG. 4 schematically shows a schematic diagram of the positions of candidate MVPs in the time domain of the Merge mode of inter-frame prediction in the related art
  • Fig. 5 schematically shows a schematic diagram of intra-frame string replication in the related art
  • Fig. 6 schematically shows a schematic diagram of a video encoding method according to an embodiment of the present disclosure
  • FIG. 7 schematically shows a schematic diagram of the search range of the ISC reference block in the related art
  • Fig. 8 schematically shows a schematic diagram of a video decoding method according to an embodiment of the present disclosure
  • FIG. 9 shows a schematic structural diagram of an electronic device suitable for implementing the embodiments of the present disclosure.
  • AVS Audio Video Coding Standard, audio and video coding standard.
  • HEVC High Efficiency Video Coding, high-efficiency video coding, also known as H.265.
  • VVC Versatile Video Coding, multifunctional video coding, also known as H.266.
  • Intra (picture) Prediction intra prediction.
  • Inter(picture)Prediction Inter-frame prediction.
  • SCC screen content coding, screen content/image coding.
  • Loop Filtering Loop filtering
  • QP Quantization Parameter, quantization parameter.
  • LCU Largest Coding Unit, the largest coding unit.
  • CTU Coding Tree Unit, coding tree unit, generally divided down from the largest coding unit.
  • CU Coding Unit, coding unit.
  • PU Prediction Unit, prediction unit.
  • MV Motion Vector
  • MVP Motion Vector Prediction, motion vector prediction value.
  • MVD Motion Vector Difference, the difference between the MVP and the true valuation of MV.
  • AMVP Advanced Motion Vector Prediction, advanced motion vector prediction.
  • ME Motion Estimation, motion estimation, the process of obtaining a motion vector MV is called motion estimation, which is a technology in motion compensation (Motion Compensation, MC).
  • Motion compensation is a method of describing the difference between adjacent frames (adjacent here means that they are adjacent in the encoding relationship, and the two frames may not be adjacent in the playback order). Specifically, it describes each of the previous frames in the adjacent frames. How to move a small block to a certain position in the current frame in the adjacent frame. This method is often used by video compression/video codecs to reduce spatial redundancy in video sequences. Adjacent frames are usually very similar, that is, they contain a lot of redundancy. The purpose of using motion compensation is to improve the compression ratio by eliminating this redundancy.
  • I Slice Intra Slice, intra-frame slice/slice.
  • the image can be divided into one frame or two fields, and the frame can be divided into one or several slices.
  • Video signals from the perspective of signal acquisition methods, can include those captured by a camera and those generated by a computer. Due to different statistical characteristics, the corresponding compression coding methods may also be different.
  • Video coding technologies such as HEVC, VVC and AVS, use a hybrid coding framework. As shown in Figure 1, the pictures in the original video signal (input video) are sequentially coded, and the following series of operations are performed And processing:
  • Block partition structure The input image is divided into several non-overlapping processing units, and similar compression operations will be performed on each processing unit.
  • This processing unit can be called CTU or LCU. Further down the CTU or LCU, you can continue to perform finer division to obtain at least one basic coding unit, which is called CU.
  • CU basic coding unit
  • Each CU is the most basic element in a coding link. The following describes the various encoding methods that may be adopted for each CU.
  • Predictive Coding It includes intra-frame prediction and inter-frame prediction. After the original video signal is predicted by the selected reconstructed video signal, the residual video signal is obtained. The encoding end needs to decide for the current CU to select the most suitable one among the many possible predictive encoding modes, and inform the decoding end.
  • Intra-frame prediction The predicted signal comes from an area that has been coded and reconstructed in the same image.
  • adjacent pixels refer to reconstructed pixels of the coded CU around the current CU.
  • Inter-frame prediction The predicted signal comes from another image that has been coded and is different from the current image (referred to as a reference image).
  • the encoding end also needs to select one of the transformations for the current CU to be encoded and notify the decoding end.
  • the fineness of quantization is usually determined by quantization parameters (QP).
  • QP quantization parameters
  • the transform coefficients with larger value range will be quantized into the same output, so it will usually bring larger value.
  • the transform coefficients of the smaller value range will be quantized into the same output, so it usually brings less distortion and corresponds to a higher code Rate.
  • Entropy coding or statistical coding The quantized transform domain signal will be subjected to statistical compression coding according to the frequency of each value, and finally a binary (0 or 1) compressed bitstream (bitstream) is output.
  • encoding produces other information, such as selected coding modes (coding modes), motion vectors, etc., and entropy coding is also required to reduce the bit rate.
  • VLC Variable Length Coding
  • CABAC Content Adaptive Binary Arithmetic Coding
  • Loop Filtering After the coded image, through the inverse quantization, inverse transformation and predictive compensation operations (the inverse operation of the above steps 2) to 4), the reconstructed decoded picture can be obtained. ). Comparing the reconstructed decoded image with the original input image, due to the influence of quantization, part of the information is different from the original input image, resulting in distortion (Distortion). Filtering the reconstructed decoded image, such as deblocking, SAO (Sample Adaptive Offset, sample adaptive compensation), or ALF (Adaptive Loop Filter, adaptive loop filter) filters, can be effective Reduce the degree of distortion caused by quantization. Since these filtered reconstructed decoded images will be used as references for subsequent coded images to predict future signals, the above filtering operation is also called loop filtering, and the filtering operation in the coding loop .
  • SAO Sample Adaptive Offset, sample adaptive compensation
  • ALF Adaptive Loop Filter, adaptive loop filter
  • Figure 1 shows the basic flow chart of a video encoder.
  • the k-th CU (marked as sk [x, y]) is taken as an example for illustration.
  • k is a positive integer greater than or equal to 1 and less than or equal to the number of CUs in the input current image
  • s k [x, y] represents a pixel with coordinates [x, y] in the k-th CU
  • x represents the abscissa of the pixel
  • y represents the ordinate of the pixel.
  • s k [x, y] The prediction signal is obtained after a better process of motion compensation or intra prediction, etc.
  • the quantized output data has two different destinations: one is sent to the entropy encoder entropy encoding the encoded code stream is output to a buffer (buffer) is stored, waiting sent out; another application is the inverse quantization and inverse transformation, to obtain the signal u 'k [x, y] .
  • s * k [x, y] obtains f(s * k [x, y]) after intra-image prediction.
  • s * k [x, y] is loop-filtered to obtain s'k [x, y], and s'k [x, y] is sent to the decoded image buffer for storage to be used to generate the reconstructed video.
  • s'k [x,y] is obtained after motion-compensation prediction
  • s'r [x+m x ,y+m y ] represents the reference block
  • m x And m y represent the horizontal and vertical components of the motion vector, respectively.
  • Some video coding standards such as HEVC, VVC, AVS3, all adopt a block-based hybrid coding framework. They divide the original video data into a series of coding blocks, and combine video coding methods such as prediction, transformation, and entropy coding to achieve video data compression.
  • motion compensation is a type of prediction method commonly used in video coding. Motion compensation is based on the redundancy characteristics of the video content in the time domain or the space domain, and derives the prediction value of the current coding block from the coded area.
  • Such prediction methods include: inter-frame prediction, intra-frame block copy prediction, intra-frame string copy prediction, and so on. In a specific coding implementation, these prediction methods may be used alone or in combination.
  • the displacement vector may have different names.
  • the description is uniformly described in the following manner: 1)
  • the displacement vector in inter-frame prediction is called a motion vector (MV); 2)
  • the displacement vector in intra-frame block copying is called a block vector or a block displacement vector;
  • the displacement vector in intra-frame string copying is called a string vector (SV).
  • Fig. 2 schematically shows a schematic diagram of inter prediction in the related art.
  • inter-frame prediction uses the correlation of the video time domain to predict the pixels of the current image using neighboring coded image pixels to effectively remove the video time domain redundancy and effectively save bits of coded residual data.
  • P is the current frame
  • Pr is the reference frame
  • B is the current block to be coded
  • Br is the reference block of B.
  • the coordinate positions of B'and B in the image are the same.
  • MV motion vector
  • inter-frame prediction includes two MV prediction technologies, Merge and AMVP.
  • the Merge mode will establish an MV candidate list for the current PU, in which there are 5 candidate MVs (and their corresponding reference images). Traverse these 5 candidate MVs, and select the one with the least rate-distortion cost as the optimal MV. If the codec builds the MV candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the MV candidate list.
  • HEVC's MV prediction technology also has a skip mode, which is a special case of the merge mode. After finding the optimal MV in the merge mode, if the current block to be coded is basically the same as the reference block, there is no need to transmit residual data, only the index of the MV and a skip flag (a flag indicating whether the coding is in skip mode).
  • the MV candidate list established in the Merge mode contains two scenarios: the spatial domain and the time domain.
  • the airspace provides up to 4 candidate MVs, and its establishment is shown in Figure 3.
  • the MV candidate list in the current block space is established in the order of A1->B1->B0->A0->B2, where B2 is a substitute, that is, when at least one of A1, B1, B0, A0 does not exist, it is required Use B2's exercise information. That is, the size of the MV candidate list is 5, but at most 4 of them are used in HEVC (even if all five exist), and when one of them is not available, the next one is used in order.
  • the analog spatial domain uses adjacent blocks
  • the time domain uses the MV of the co-located (corresponding position) PU of the adjacent frame to estimate the MV of the current position.
  • the time domain only provides one candidate MV at most, and its establishment is shown in Figure 4.
  • the pictures of inter-frame prediction must have reference pictures, like B-frames refer to the pictures before it.
  • the current image is called cur
  • the reference image of the current image is called cur_ref
  • the co-located image of the current image is called col
  • the reference image of the co-located image is called col_ref.
  • tb can be the sequence number of the co-located image and its reference image (picture order count, POC)
  • td can be the difference between the picture order count (POC) of the current picture and its reference picture, which can be obtained by scaling the MV of the co-located PU as follows:
  • curMV td*colMV/tb (2)
  • curMV and colMV respectively represent the MV of the current PU and the co-located PU, so that the MV of the current image can be derived from the col (co-located) image. If the PU at position D0 on the same block is unavailable, use the same PU at position D1 to replace it.
  • the MV candidate list established in the Merge mode includes the above-mentioned spatial domain and the time domain.
  • B Slice also includes the method of combining the list.
  • the B frame it is the frame that needs to refer to the forward frame MV and the backward frame MV. It has two MV lists, list0 and list. Therefore, for the PU in B Slice, since there are two MVs, its MV candidate list also needs to provide two MVPs.
  • HEVC generates a combined list for B Slice by combining the first 4 candidate MVs in the MV candidate list in pairs.
  • the AMVP mode uses the MV correlation between spatial and temporal adjacent blocks to first establish a candidate prediction MV list for the current PU.
  • AMVP can use the correlation of motion vectors in space and time to establish a spatial candidate list and a temporal candidate list respectively, and then select the final MVP from the candidate list.
  • the decoder can calculate the MV of the current decoded block by establishing the same list and only need the sequence numbers of the MVD and MVP in the list.
  • the AMVP candidate prediction MV list also contains two scenarios, the spatial domain and the time domain. The difference is that the length of the AMVP list is only 2.
  • MVD needs to be encoded.
  • the resolution of MVD is controlled by use_integer_mv_flag in slice_header (slice header or slice header or slice data header).
  • slice_header slice header or slice header or slice data header.
  • MVD is encoded at 1/4 (brightness) pixel resolution; when this flag is The value of is 1, MVD uses the whole (brightness) pixel resolution for encoding.
  • a method of adaptive motion vector resolution is used in VVC. This method allows each CU to adaptively select the resolution of the coded MVD.
  • the selectable resolutions include 1/4, 1/2, 1 and 4 pixels.
  • a flag is first encoded to indicate whether a quarter-luminance sample MVD precision is used for the CU. If the flag is 0, the MVD of the current CU is coded with 1/4 pixel resolution. Otherwise, the second flag needs to be encoded to indicate that the CU uses 1/2 pixel resolution or other MVD resolution. Otherwise, the third flag is encoded to indicate whether to use 1-pixel resolution or 4-pixel resolution for the CU.
  • the available resolutions include 1/16 pixel, 1/4 (brightness) pixel, and 1 pixel.
  • Screen images are images produced by computers, mobile phones, TVs and other electronic devices. They mainly contain two types of content: one is computer-generated non-continuous tone content, containing a large number of small and sharp line shapes, such as text, icons, buttons, and Grid, etc.; the other is the content that contains a large number of continuous tones taken by the camera, such as movies, TV clips, natural image videos, etc.
  • Video coding standards based on block-based hybrid coding methods in the related art, such as AVS and HEVC have a high compression ratio for natural images and videos that contain a large amount of continuous content, but they are not effective for screen images that contain non-continuous tone content. good.
  • ISC better improves the screen image coding effect, which makes two-dimensional images one-dimensional one by one coding unit CU.
  • ISC divides an encoding block into a series of pixel strings or unmatched pixels according to a certain scanning order (raster scan, round-trip scan, Zig-Zag scan, etc.). Each string looks for a reference string of the same shape in the encoded area of the current image, and derives the predicted value of the current string. By encoding the residual between the pixel value of the current string and the predicted value, instead of directly encoding the pixel value, it can effectively save bits. .
  • Figure 5 shows a schematic diagram of intra-frame string replication.
  • the dark area is the coded area
  • the white 28 pixels are string 1
  • the light 35 pixels are string 2
  • the black pixel represents unmatched pixels. If a pixel does not find a corresponding reference in the referenceable area, it is called an unmatched pixel, also called an isolated point.
  • the pixel value of the unmatched pixel is directly encoded instead of derived from the predicted value of the reference string.
  • the ISC technology needs to encode the string vector (String Vector, SV) corresponding to each string in the current coding block, the string length, and the flag of whether there is a matching reference string, etc.
  • the string vector (SV) represents the displacement of the string to be encoded (the current string, that is, the string currently to be encoded) to its reference string.
  • the string length indicates the number of pixels contained in the current string. In different implementations, there are many ways to encode the length of the string.
  • the field with "_” indicates the syntax element that needs to be decoded, and the field without "_" and the capital letter indicates the variable.
  • the value of the variable can be obtained by decoding the syntax element.
  • the current ISC scheme has parts that are not conducive to hardware implementation.
  • the solutions provided in the embodiments of the present disclosure are used to solve at least one technical problem described above.
  • the method provided in the embodiments of the present disclosure can be applied to video codecs or video compression products that use ISC, can be applied to the encoding and decoding of lossy data compression, and can also be applied to the encoding and decoding of lossless data compression.
  • the data involved in the encoding and decoding process refers to one or a combination of the following examples:
  • the coding block is a coding area of the image, which should include at least one of the following: a group of images, a predetermined number of images, an image, One frame of image, one field of image, image sub-images, slices, macroblocks, maximum coding unit LCU, coding tree unit CTU, and coding unit CU.
  • Fig. 6 schematically shows a schematic diagram of a video encoding method according to an embodiment of the present disclosure. It should be noted that the methods involved in the embodiments of the present disclosure can be used alone or in combination.
  • the embodiment of FIG. 6 is introduced by taking the encoding end as an example. Among them, the encoding end and the decoding end are corresponding, and the encoding end performs a series of analysis to determine the value of each syntax element, and the present disclosure does not limit the analysis process.
  • the encoding terminal described here may be a video compression device deployed with a video encoder, and the video compression device includes a terminal device or a server that has a function of implementing a video encoder, and so on.
  • the method provided by the embodiment of the present disclosure may include the following steps.
  • a current image is acquired, the current image includes a maximum coding unit, the maximum coding unit includes a current maximum coding unit and an encoded maximum coding unit, the current maximum coding unit includes a current coding block, and the current coding block Including the current string.
  • the encoder at the encoding end receives the original video signal, and sequentially encodes the images in the original video signal.
  • the image to be encoded is called the current image, which can be any frame in the original video signal. image.
  • the current image can be divided into blocks, for example, into several non-overlapping CTUs or LCUs.
  • the CTU may continue to be more finely divided to obtain at least one CU.
  • the current CU to be coded is referred to as the current coding block, but the present disclosure is not limited to this, for example, it may also be a PU or a TU.
  • CU is used as an example.
  • the CTU corresponding to the current CU is called the current CTU
  • the CTU that is in the coded area of the current image and does not belong to the current CTU is called the coded CTU.
  • the pixels in the current CU are encoded using ISC, and the pixels in the current CU are divided into strings or unmatched pixels according to a certain scanning order.
  • a reference string of the same shape is searched for each string in the coded area of the current image, and the string of the reference string currently to be searched is called the current string.
  • the first part of the storage space of M*W size is used to store the pixels in the current coding block
  • the second part of the storage space is used to store the coded maximum coding unit and the current maximum coding unit
  • M is a positive integer greater than or equal to W.
  • both M and W are positive integers greater than or equal to 1, and M may be greater than or equal to W.
  • the ISC only uses 1 CTU-sized memory as the storage space.
  • the ISC is limited to only use a 128*128 memory.
  • the first part of a 64*64 size in the 128*128 memory is used to store the pixels in the current CU (Curr in the figure) of the current 64*64 size to be encoded, in the 128*128 memory
  • the second part of the three 64*64 sized CUs is used to store the coded pixels in the three 64*64 coded CUs.
  • the coded CU is referred to as a coded block. Therefore, the ISC can only search for the reference string in these three 64*64 encoded CUs.
  • the first part of a 64*64 size in the 128*128 memory is used to store the pixels in the current CTU of the current 64*64 size to be encoded, and the three second parts of the 64*64 size in the 128*128 memory are used for Store the coded pixels in three coded CTUs with a size of 64*64, where the coded CTU is referred to as a coded block. Therefore, the ISC can only search for the reference string in these three 64*64 coded CTUs.
  • the reference string of the current string is searched in the second part of the storage space to obtain the predicted value of the current string according to the reference string, and the current string is encoded.
  • the predicted value can reduce the number of bits and improve the coding efficiency by encoding the pixel value of the current string and the residual of its predicted value. Similar processing is performed on each frame of image in the original video signal, and finally a bit stream can be generated, which can be transmitted to the decoder at the decoding end.
  • the pixels in the current encoding block are stored by using the first part of the M*W storage space, and the second part of the storage space is used to store the encoded maximum encoding unit
  • M is a positive integer greater than or equal to W
  • the search for the reference string of the current string in the second part of the storage space is restricted, so that the string can be simplified Hardware implementation of predictions.
  • the current ISC solution has parts that are not conducive to hardware implementation. For example, the following situation exists: the position of the reference string overlaps with the current CU to be reconstructed, which causes the string reconstruction to be dependent. For example, suppose that a CU is divided into two strings, called string 1 and string 2, and string 2 refers to string 1. In this case, string 2 needs to wait for string 1 to be rebuilt before starting to rebuild.
  • the reference string may be set to satisfy the following conditions: the reference string is within the range of the current maximum coding unit and the N coded maximum coding units, the N coded maximum coding units and the The target side of the current largest coding unit is adjacent, and N is a positive integer greater than or equal to 1; when the pixels in the reference string are in the N encoded largest coding units, the pixels of the reference string go to a predetermined direction After moving a predetermined pixel, the pixels in the corresponding target area have not been coded; the pixels in the reference string are located within the boundary of the independent coding area of the current image; the pixels in the reference string are not identical to the uncoded blocks of the current image overlapping.
  • the size of N may be determined according to the size of the largest coding unit.
  • the size of the maximum coding unit is M*M
  • the number of the predetermined pixels is M
  • the target area is the pixels of the reference string and moves M pixels in the predetermined direction After the corresponding area.
  • the size of the largest coding unit is K*K, and K is a positive integer less than M, then the number of predetermined pixels is N*K, and the target area is the size of the reference string.
  • the corresponding largest coding unit after the pixel moves N*K pixels in the predetermined direction.
  • the smallest coordinate in the target area is not equal to the smallest coordinate in the current coding block.
  • the target area may include at least one CU.
  • the current coding block may be the first CU in the target area.
  • the unencoded block includes the currently encoded block, and pixels in the reference string do not overlap with pixels in the currently encoded block.
  • the abscissa of the pixel in the reference string is smaller than the abscissa of the pixel in the current coding block; or, the ordinate of the pixel in the reference string is smaller than the abscissa of the pixel in the current coding block. The ordinate of the pixel.
  • the uncoded block of the current image may include the current CU to be coded and other CUs that have not been coded, that is, the reference string is not allowed to overlap with the coded pixels in the current CU, so that the string and the string can be realized. There is no interdependence, so that the complexity of coding can be reduced, and parallel coding can be realized.
  • the unencoded block does not include the current encoding block
  • the pixels in the reference string are allowed to overlap with the encoded pixels in the current encoding block
  • the pixels in the reference string are The uncoded pixels in the current coding block do not overlap.
  • the uncoded block of the current image does not include the current CU that is currently to be encoded, that is, the reference string is allowed to overlap with the encoded pixels in the current CU.
  • This situation is called inter-string dependency.
  • the string needs to wait for the previous string encoding to be completed before it can be encoded, but compared to the intra-string dependency, the inter-string dependency is less complicated; at the same time, because the neighboring pixels are more closely related to the current pixel to be encoded in the current CU The greater the correlation, the better prediction effect can be achieved by using neighboring pixels as references. Therefore, the performance of inter-string dependence is higher than that of no dependence.
  • the dependency within the string means that the position of the reference string overlaps with the current CU and overlaps with the position of the current string currently to be encoded. In this case, the string can only be encoded pixel by pixel according to the scanning order.
  • the pixels in the reference string are allowed to overlap with the encoded pixels in the current encoding block, and the pixels in the reference string do not overlap with the rows containing unencoded pixels in the current encoding block.
  • the coded pixels in overlap are allowed to overlap.
  • the pixels in the reference string are allowed to overlap with the coded pixels in the current coding block, but the pixels in the reference string are not allowed to overlap with a row of uncoded pixels in the current coding block (note that One line of the current coding block, not one line of the current image).
  • a row of uncoded pixels in the current coding block note that One line of the current coding block, not one line of the current image.
  • the independent coding area of the current image includes the current image or slices or slices in the current image.
  • the pixels in the reference string are from the same Align the area.
  • the pixels in the reference string come from the same largest coding unit.
  • the circumscribed rectangle of the reference string does not overlap with the uncoded area of the current image.
  • ISC uses only 1 CTU size of memory, for example, assume that the size of 1 CTU is 128*128 samples (in the video coding standard, "sample” can be used to express "pixel”, It can include 128*128 luma samples and corresponding chroma samples), then the ISC is limited to only use a 128*128 memory.
  • a 64*64 space is used to store the current 64*64 current CU to be encoded (the 64*64 CU marked with Curr in Figure 7 corresponds to 128*128
  • the uncoded pixels in the current CTU) there are three 64*64 sized spaces that can be used to store the coded pixels in the three coded CUs in the coded area of the current image. Therefore, ISC can only search for the reference string of the current string of the current CU among these three 64*64 encoded CUs, and the following conditions should be met:
  • the pixels of the reference string pointed to by the string vector should not include the pixels of the current CU.
  • the coordinates of the pixels of the above reference string should meet the condition (xRef_i ⁇ xCb
  • the coding sequence is from left to right and top to bottom. Therefore, the upper left corner is the smallest coordinate or the pixel corresponding to the smallest coordinate, but the embodiment of the present disclosure provides The scheme can also be applied to other coding sequences, which is not limited.
  • N The size of N is determined by The size of the largest coding unit is determined. For example, N can be determined according to the following formula:
  • N (1 ⁇ ((7-(log2_lcu_size_minus2+2)) ⁇ 1))–(((log2_lcu_size_minus2+2) ⁇ 7)? 1:0) (3)
  • the width or height of the largest coding unit is lcu_size
  • lcu_size is a positive integer greater than or equal to 1
  • log2_lcu_size_minus2 log2(lcu_size)-2.
  • the " ⁇ " operator means left shift, which is used to shift all the binary digits of a number to the left by K (K is a positive integer greater than or equal to 1) bits, discard the high bits, and fill the low bits with 0. (((log2_lcu_size_minus2+2) ⁇ 7)?
  • ISC allows searching for the reference string of the current string in the current CU only in the current CTU and the adjacent left CTU to the left.
  • the square represents a 64*64 area.
  • the 128*128 memory is divided into 4 parts, one of which stores the pixels of the current LCU, and the other part is used to store the pixels of N-1 LCUs to the left of the current LCU in the same row. The same applies to smaller LCUs.
  • N can also be determined according to the following formula:
  • N (1 ⁇ ((7-log2_lcu_size) ⁇ 1))–(((log2_lcu_size) ⁇ 7)? 1:0) (4)
  • log2_lcu_size log2(lcu_size).
  • xRefTL For example, suppose the position of the luminance component of the pixel in the reference string is (xRefTL, yRefTL), xRefTL and yRefTL are both integers greater than or equal to 0, and (((xRefTL+128)/64)*64,(yRefTL/64) *64) Unavailable, that is, the coded pixel cannot be found in the memory used to store the coded pixels of three 64*64 coded CUs, then (((xRefTL+128)/64)*64, ( yRefTL/64)*64) should not be equal to the position of the upper left corner of the current CU (xCb, yCb). The division here is rounded down.
  • the mainstream processing unit in the related technology can process a 64*64 image area. Therefore, some coding elements in the standard formulation are also limited to the 64*64 processing capacity, such as transformation The maximum value of the unit, and so on.
  • the current CU is a 64*64 block in the upper left corner of the current CTU, a 64*64 block in the lower right corner of the left CTU, a 64*64 block in the lower left corner, and a 64*64 in the upper right corner
  • the block can be used as a reference for the current CU.
  • the current CU is a 64*64 block in the upper right corner of the current CTU, except for the coded part of the current CTU, if the position (0,64) relative to the current CTU has not been coded, the current CU You can also refer to the 64*64 block in the lower right corner of the left CTU and the 64*64 block in the lower left corner.
  • the current CU can also refer to the 64*64 block in the lower right corner of the left CTU.
  • the current CU is a 64*64 block in the lower right corner of the current CTU, it can only refer to the encoded part of the current CTU.
  • the above step 3) gives the limitation when the maximum coding unit is 128*128, and the above step 4) gives the limitation when the size of the maximum coding unit is less than or equal to 64*64, so that 128*128 can be fully utilized in the encoding process. Memory.
  • the memory size is 128*128 as an example, and the size of the LCU during encoding can be designed through parameters. But for hardware design, if a 128*128 memory has been designed, when the LCU is less than 128*128, the memory should be fully utilized.
  • a slice is a concept in AVS3.
  • a slice is a rectangular area in an image that contains several parts of the largest coding unit in the image. The slices should not overlap.
  • Striping is a concept in HEVC.
  • Any one of the reference string samples in the reference string position pointed to by the string vector should not overlap with the uncoded area or the coded block area currently being coded (that is, the current CU).
  • the circumscribed rectangle of any reference string sample in the reference string position pointed to by the string vector should not overlap with the uncoded area or the coded block area currently being coded.
  • the four corner points of the circumscribed rectangle of the reference string can be used to determine whether the position of the reference string meets the restriction.
  • the circumscribed rectangle does not overlap with the uncoded area or the currently coded block area, which means that the reference string also satisfies the restriction of no overlap with the uncoded area or the currently coded block area.
  • the solutions proposed in the embodiments of the present disclosure make a series of simplifications to the ISC solution, including the restriction on the position of the reference string. These methods simplifies the hardware implementation of the ISC. On the one hand, after the position of the reference string is limited, there is no dependency between the strings, and the strings can be coded in parallel. On the other hand, you can also restrict the reference string to be used only in a 128*128 memory area.
  • the current ISC solution also has other parts that are not conducive to hardware implementation.
  • the sum of the number of strings and the number of isolated points (unmatched pixels) is limited to not more than one-fourth of the number of CU pixels at the encoding end. This will result in a larger number of strings, which will result in a larger number of syntax elements that need to be coded.
  • N1 and N2 are integers greater than or equal to 0.
  • the value range of T1 can be an integer in [1,W*H], W is the width of the current CU, H is the height of the current CU, and both W and H are positive integers greater than or equal to 1.
  • the value range of T1 is restricted to be less than or equal to a quarter of W*H.
  • T1 is preferably 4.
  • the encoding end has the following optional methods:
  • N1+N2 is equal to T1-1, if the number of remaining pixels NR in the current CU (NR is an integer greater than or equal to 0) is equal to 1, there is no need to encode "sp_is_matched_flag", that is, there is no need to encode the matching flag for determining The type of the next remaining pixel can be directly confirmed as an unmatched pixel.
  • N1+N2 is equal to T1-1, if the number NR of the remaining pixels in the current CU is greater than 1, there is no need to encode "sp_is_matched_flag", that is, there is no need to encode the matching flag to determine the type of the next remaining pixel. You can directly Confirm that the remaining pixels are a string, and the string length is NR.
  • T2 Limit N1 to be less than or equal to the second number threshold T2.
  • the value range of T2 can be an integer in [1,W*H].
  • the code "sp_is_matched_flag" is the second value (here assumed to be 1, but the present disclosure is not limited to this, and can be limited according to actual conditions) , Indicating that the remaining pixels in the current CU are a string.
  • N1 is equal to T2-1, directly confirm that the remaining pixels are a string.
  • N1 is equal to T2
  • T3 Limit N2 to be less than or equal to the third number threshold T3.
  • the value range of T3 can be an integer in [1,W*H].
  • N2 is equal to T3
  • sp_is_matched_flag the remaining pixels of the current CU as a string.
  • N2 is equal to T3, without encoding "sp_is_matched_flag", directly confirm that the remaining pixel types of the current CU are all strings, and encode the string length of each string.
  • T4 Limit N1+N2 to be greater than or equal to the fourth number threshold T4.
  • the value range of T4 can be an integer in [1,W*H].
  • T4 is preferably a positive integer greater than 2.
  • the restriction here is that N1+N2 is greater than or equal to T4. It is considered that the number of strings in string prediction is usually not only 1. This restriction can save the coding of syntax elements. There are the following options:
  • N1+N2 is less than T4
  • the code "sp_is_matched_flag” is a third value such as 1 (but the present disclosure is not limited to this, it can be limited according to actual conditions) to confirm that the next remaining pixel is the starting point of the string, Then it can be directly judged that the string is not the last string, and there is no need to encode "sp_last_len_flag" to confirm whether it is the last string, thereby improving the coding efficiency.
  • N1 is less than T4
  • the solution proposed in the embodiments of the present disclosure has carried out a series of simplifications on the ISC solution, including the restriction on the position of the reference string and the restriction on the number of strings. These methods simplifies the hardware implementation of the ISC:
  • the reference string After the position of the reference string is limited, there is no dependency between strings, and strings can be coded in parallel. In addition, similar to IBC, the reference string can also be restricted to be used only in a 128*128 memory area.
  • the current ISC scheme also has other parts that are not conducive to hardware implementation, such as allowing small blocks with a size of 4*4 to use string prediction. Because the string length of the string in the small block is small, and the performance that the small block can bring is small.
  • T11 is related to the block size allowed by the encoder, and the value range can be an integer in the block size allowed by the encoder (minimum size * minimum size, maximum size * maximum size).
  • T11 can be an integer in (4*4, 64*64). At the coding end, T11 can be selected based on coding performance and complexity considerations.
  • T21 is related to the block size allowed by the encoder, and the value range can be an integer in the block size (minimum size, maximum size) allowed by the encoder.
  • T21 can be an integer in (4, 64). At the coding end, T21 can be selected based on coding performance and complexity considerations.
  • T31 is related to the block size allowed by the encoder, and the value range can be an integer in the block size (minimum size, maximum size) allowed by the encoder.
  • T31 can be an integer in (4, 64). At the coding end, T31 can be selected based on coding performance and complexity considerations.
  • T41 is related to the block size allowed by the encoder, and the value range can be an integer in the block size allowed by the encoder (minimum size * minimum size, maximum size * maximum size).
  • T41 can be an integer in (4*4, 64*64). At the coding end, T41 can be selected based on coding performance and complexity considerations.
  • T51 is related to the block size allowed by the encoder, and the value range can be an integer in the block size (minimum size, maximum size) allowed by the encoder.
  • T51 can be an integer in (4, 64). At the coding end, T51 can be selected based on coding performance and complexity considerations.
  • T61 is related to the block size allowed by the encoder, and the value range can be an integer in the block size allowed by the encoder (minimum size, maximum size).
  • T61 can be an integer in (4, 64). At the coding end, T61 can be selected based on coding performance and complexity considerations.
  • the above steps 4)-6) restrict the use of string prediction for large blocks, considering that the performance improvement brought by the use of string prediction for large blocks is small.
  • this limitation can save the coding of syntax elements on the one hand, and on the other hand, it can be skipped
  • the encoding end performs string prediction analysis on the block of this size.
  • a block with a width equal to 4 and a height equal to 4 does not use string matching by default, and does not need to encode "sp_flag”.
  • Blocks with a width equal to 4 or a height equal to 4 do not use string matching by default, and do not need to encode "sp_flag”.
  • Blocks with an area less than or equal to 32 do not use string matching by default, and do not need to encode "sp_flag".
  • the solution proposed by the embodiment of the present disclosure has carried out a series of simplifications on the ISC solution, including the restriction on the position of the reference string, the restriction on the number of strings, and the restriction on the block size.
  • the reference string After the position of the reference string is limited, there is no dependency between strings, and strings can be coded in parallel. In addition, similar to IBC, the reference string can also be restricted to be used only in a 128*128 memory area.
  • Restrictions on the block size can reduce the number of small strings, which is beneficial to reduce the number of memory accesses.
  • the encoding end can skip the analysis of string prediction for blocks of certain sizes (for example, blocks of 4*4 size), which reduces the complexity.
  • the coding of string prediction flags on certain blocks can also be saved, which is beneficial to the improvement of coding performance.
  • Fig. 8 schematically shows a schematic diagram of a video decoding method according to an embodiment of the present disclosure. It should be noted that the methods involved in the embodiments of the present disclosure can be used alone or in combination.
  • the embodiment of FIG. 8 is introduced by taking the decoding end as an example.
  • the decoder described here may be a video decompression device deployed with a video decoder, and the video decompression device includes a terminal device or a server that has the function of implementing a video decoder.
  • the method provided by the embodiment of the present disclosure may include the following steps.
  • the code stream of the current image is acquired, the code stream includes a maximum coding unit, the maximum coding unit includes the current maximum coding unit and the encoded maximum coding unit corresponding to the current image, and the current maximum coding unit includes The current decoded block, and the current decoded block includes the current string.
  • the current decoded block is the current CU as an example for illustration, but the present disclosure is not limited to this.
  • S820 use the first part of the M*W storage space to store the pixels in the current decoding block, and use the second part of the storage space to store the encoded maximum coding unit and the current maximum coding unit At least part of the decoded blocks in, M is a positive integer greater than or equal to W.
  • the reference string of the current string is searched in the second part of the storage space to obtain the predicted value of the current string according to the reference string, and the current string is decoded.
  • the reference string may be set to satisfy the following conditions: the reference string is within the range of the current maximum coding unit and the N coded maximum coding units, the N coded maximum coding units and the The target side of the current largest coding unit is adjacent, and N is a positive integer greater than or equal to 1; when the pixels in the reference string are in the N encoded largest coding units, the pixels of the reference string go to a predetermined direction After moving a predetermined pixel, the pixels in the corresponding target area have not been reconstructed; the pixels in the reference string are located within the boundary of the independent decoding area of the current image; the pixels in the reference string are not identical to the undecoded blocks of the current image overlapping.
  • the size of N is determined according to the size of the largest coding unit.
  • the size of the maximum coding unit is M*M
  • the number of the predetermined pixels is M
  • the target area is the pixels of the reference string and moves M pixels in the predetermined direction After the corresponding area.
  • the size of the largest coding unit is K*K, and K is a positive integer less than M, then the number of predetermined pixels is N*K, and the target area is the size of the reference string.
  • the corresponding largest coding unit after the pixel moves N*K pixels in the predetermined direction.
  • the smallest coordinate in the target area is not equal to the smallest coordinate in the current decoding block.
  • the target area may include at least one CU.
  • the current decoding block may be the first CU in the target area.
  • the undecoded block may include the current decoded block, and pixels in the reference string do not overlap with pixels in the current decoded block.
  • the abscissa of the pixel in the reference string is smaller than the abscissa of the pixel in the current decoding block; or, the ordinate of the pixel in the reference string is smaller than that of the current decoding block. The ordinate of the pixel.
  • the undecoded block may not include the current decoded block, allowing pixels in the reference string to overlap with reconstructed pixels in the current decoded block, and pixels in the reference string It does not overlap with uncoded pixels in the current coding block.
  • the undecoded block of the current image may include the current CU currently to be decoded and other other CUs that have not been decoded, that is, the reference string is not allowed to overlap with the reconstructed pixels in the current CU, so that the string and the string can be realized. There is no interdependence, which can reduce the complexity of decoding, and can realize parallel decoding.
  • the undecoded block does not include the current decoded block, allowing pixels in the reference string to overlap with reconstructed pixels in the current decoded block.
  • the undecoded block of the current image does not include the current CU that is currently to be decoded, that is, the reference string is allowed to overlap with the reconstructed pixels in the current CU.
  • This situation is called inter-string dependency, which is behind according to the scanning order The string needs to wait for the previous string to be decoded before it can be decoded.
  • the inter-string dependency is less complicated; at the same time, because the neighboring pixels are compared with the current CU to be decoded The greater the pixel correlation, the better prediction effect can be achieved by using neighboring pixels as references. Therefore, the performance of inter-string dependence is higher than that of no dependence.
  • the dependency within the string means that the position of the reference string overlaps with the current CU and overlaps with the position of the current string currently to be decoded.
  • the string can only be reconstructed pixel by pixel according to the scanning order.
  • the pixels in the reference string are allowed to overlap with the reconstructed pixels in the current decoded block, and the pixels in the reference string are not aligned with the rows containing unreconstructed pixels in the current decoded block.
  • the reconstructed pixels in overlap are allowed to overlap.
  • the pixels in the reference string are allowed to overlap with the reconstructed pixels in the current decoded block, but the pixels in the reference string are not allowed to overlap with a row of unreconstructed pixels in the current decoded block (note that One line of the current decoded block, not one line of the current image).
  • reconstruction can be done in parallel.
  • the referenced string due to the existence of references that cannot be fully parallelized, it is necessary to wait for the referenced string to be reconstructed before the current string can be reconstructed. After the restriction condition is added, the reference string and the current string are not in the same line of the current decoding block, and the reconstruction can be reconstructed line by line without waiting.
  • the independent decoding area of the current image includes the current image or slices or slices in the current image.
  • the pixels in the reference string are from the same Align the area.
  • the pixels in the reference string come from the same largest coding unit.
  • the circumscribed rectangle of the reference string does not overlap with the undecoded block of the current image.
  • the ISC only uses 1 CTU memory. For example, if the size of 1 CTU is 128*128, the ISC is limited to only use a 128*128 memory. In the 128*128 memory, one 64*64 space is used to store the unreconstructed pixels in the current CU with a size of 64*64 to be reconstructed, and three 64*64 spaces It can be used to store the reconstructed pixels in the 3 decoded CUs in the decoded area of the current image. Therefore, the ISC can only search for the reference string of the current string of the current CU among these three decoded CUs with a size of 64*64, and the following conditions should be met:
  • the pixels of the reference string pointed to by the string vector should not include the pixels of the current CU.
  • the coordinates of the pixels in the above reference string should meet the condition (xRef_i ⁇ xCb
  • the reference string pointed to by the string vector is limited to the range of N CTUs (belonging to the coded CTU) on the left of the current CTU and the current CTU (assuming that the target side is the left side in the referenced coordinate system).
  • the size of N is determined by The size of the largest coding unit is determined, for example, N can be determined according to the above formula (3) or (4).
  • the position of the luminance component of the pixel in the reference string is (xRefTL, yRefTL), and (((xRefTL+128)/64)*64, (yRefTL/64)*64) is not available, it is used to store 3
  • the reconstructed pixel of the 64*64 reconstructed CU cannot be found in the memory of the reconstructed pixel, then (((xRefTL+128)/64)*64,(yRefTL/64)*64) should not be equal to the upper left of the current CU Angular position (xCb, yCb).
  • the division here is rounded down.
  • the above step 3) gives the limitation when the maximum coding unit is 128*128, and the above step 4) gives the limitation when the size of the maximum coding unit is less than or equal to 64*64, so that 128*128 can be fully utilized in the decoding process. Memory.
  • the position of the reference string pointed to by the string vector should not exceed the boundary of the independent decoding area such as image, slice, and stripe.
  • Any reference string sample in the reference string position pointed to by the string vector should not overlap with the unreconstructed area or the coding block area currently being reconstructed.
  • the circumscribed rectangle of any reference string sample in the reference string position pointed to by the string vector should not overlap with the unreconstructed area or the coding block area currently being reconstructed.
  • the four corner points of the circumscribed rectangle of the reference string can be used to determine whether the position of the reference string meets the restriction.
  • the circumscribed rectangle does not overlap with the unreconstructed area or the coding block area currently being reconstructed, which means that the reference string also meets the limitation of non-overlapping with the unreconstructed area or the coding block area currently being reconstructed.
  • the solutions proposed in the embodiments of the present disclosure make a series of simplifications to the ISC solution, including the restriction on the position of the reference string. These methods simplifies the hardware implementation of the ISC. On the one hand, after the position of the reference string is limited, there is no dependency between the strings, and the strings can be reconstructed in parallel. In addition, similar to IBC, the reference string can also be restricted to be used only in a 128*128 memory area.
  • N1 and N2 are integers greater than or equal to 0.
  • the value range of T1 can be an integer in [1,W*H], W is the width of the current CU, H is the height of the current CU, and both W and H are positive integers greater than or equal to 1.
  • the value range of T1 is limited to be less than or equal to a quarter of W*H.
  • T1 is preferably 4.
  • the decoder has the following optional methods:
  • N1+N2 is equal to T1-1, if the number of remaining pixels NR in the current CU (NR is an integer greater than or equal to 0) is equal to 1, there is no need to decode "sp_is_matched_flag", that is, there is no need to decode the matching flag for use To determine the type of the next remaining pixel, you can directly confirm that the remaining pixel is an unmatched pixel.
  • N1+N2 is equal to T1-1, if the number NR of the remaining pixels in the current CU is greater than 1, there is no need to decode the "sp_is_matched_flag", that is, there is no need to decode the matching flag to determine the type of the next remaining pixel. You can directly Confirm that the remaining pixels are a string, and the string length is NR.
  • step iii Another way of the above step ii.
  • N1+N2 is equal to T1-1
  • decode "sp_is_matched_flag” if the decoded "sp_is_matched_flag" is the first value, for example 1 (but the present disclosure is not limited to this, and can be limited according to actual conditions), it is directly confirmed that the remaining pixels of the current CU are a string, and the string length is NR.
  • T2 Limit N1 to be less than or equal to the second number threshold T2.
  • the value range of T2 can be an integer in [1,W*H].
  • N1 is equal to T2-1, and the decoded "sp_is_matched_flag" is a second value such as 1 (but the present disclosure is not limited to this, it can be limited according to the actual situation), then confirm that the next remaining pixel is the starting point of the string, and directly confirm The remaining pixels of the current CU are a string.
  • N1 is equal to T2-1, there is no need to decode "sp_is_matched_flag", and directly confirm that the remaining pixels in the current CU are a string.
  • N1 is equal to T2
  • T3 Limit N2 to be less than or equal to the third number threshold T3.
  • the value range of T3 can be an integer in [1,W*H].
  • N2 is equal to T3
  • the remaining pixels of the current CU are directly used as a string.
  • N2 is equal to T3
  • decoding "sp_is_matched_flag” directly confirm that the remaining pixels of the current CU are all strings, and decode the string length of each string.
  • T4 Limit N1+N2 to be greater than or equal to the fourth number threshold T4.
  • the value range of T4 can be an integer in [1,W*H].
  • T4 is preferably a positive integer greater than 2.
  • the restriction here is that N1+N2 is greater than or equal to T4. It is considered that the number of strings in string prediction is usually not only 1. This restriction can save the decoding of syntax elements.
  • the decoder has the following optional methods:
  • N1+N2 is less than T4, and the next remaining pixel is confirmed as the starting point of the string by decoding "sp_is_matched_flag", for example, if "sp_is_matched_flag" is decoded to a third value such as 1 (but the present disclosure is not limited to this, it can be based on actual (Limited circumstances), at this time, it can be directly determined that the string is not the last string, so there is no need to decode "sp_last_len_flag" to confirm whether it is the last string, thereby improving decoding efficiency.
  • N1 Limit N1 to be greater than or equal to the fourth number threshold T4.
  • the restriction here is that N1 is greater than or equal to T4. It is considered that the number of strings in string prediction is usually not only 1. This restriction can save the decoding of syntax elements. There are the following options:
  • N1 is less than T4
  • the solution proposed in the embodiments of the present disclosure has carried out a series of simplifications on the ISC solution, including the restriction on the position of the reference string and the restriction on the number of strings.
  • the reference string After the position of the reference string is limited, there is no dependency between the strings, and the strings can be reconstructed in parallel.
  • the reference string can also be restricted to be used only in a 128*128 memory area.
  • T11 is related to the block size allowed by the encoder, and the value range can be an integer in the block size allowed by the encoder (minimum size * minimum size, maximum size * maximum size).
  • T11 can be an integer in (4*4, 64*64). T11 can be selected based on coding performance and complexity considerations.
  • the above steps 4)-6) restrict the use of string prediction for large blocks, considering that the performance improvement brought by the use of string prediction for large blocks is small. On the one hand, this limitation can save the decoding of syntax elements on the one hand, and on the other hand, it can be skipped.
  • the decoder performs string prediction analysis on the size of the block.
  • a block with a width equal to 4 and a height equal to 4 does not use string matching by default, and does not need to decode "sp_flag”.
  • Blocks with a width equal to 4 or a height equal to 4 do not use string matching by default, and do not need to decode "sp_flag”.
  • Blocks with an area less than or equal to 32 do not use string matching by default, and do not need to decode "sp_flag".
  • the solution proposed in the embodiments of the present disclosure has carried out a series of simplifications on the ISC solution, including the restriction on the position of the reference string, the restriction on the number of strings, and the restriction on the block size.
  • the reference string After the position of the reference string is limited, there is no dependency between the strings, and the strings can be reconstructed in parallel.
  • the reference string can also be restricted to be used only in a 128*128 memory area.
  • Restrictions on the block size can reduce the number of small strings, which is beneficial to reduce the number of memory accesses.
  • the encoding end can skip the analysis of string prediction for blocks of certain sizes (for example, blocks of 4*4 size), which reduces the complexity.
  • the decoding of string prediction flags on certain blocks can also be saved, which is beneficial to the improvement of decoding performance.
  • an embodiment of the present disclosure also provides a video encoding device, which can be applied to an encoding end/encoder, and the device may include: a current image acquisition unit, which may be used to acquire a current image, and the current image includes a maximum encoding unit ,
  • the maximum coding unit includes a current maximum coding unit and an encoded maximum coding unit, the current maximum coding unit includes a current coding block, and the current coding block includes a current string;
  • the storage space determining unit may be used to use M*W
  • the first part of the storage space of the size stores the pixels in the current coding block, and the second part of the storage space is used to store the coded maximum coding unit and at least part of the coded blocks in the current maximum coding unit, M is a positive integer greater than or equal to W;
  • the reference string search unit may be used to search for the reference string of the current string in the second part of the storage space to obtain the prediction of the current string according to the reference string Value to encode the current string.
  • the reference string may be set to satisfy the following conditions: the reference string is within the range of the current maximum coding unit and the N coded maximum coding units, and the N coded maximum coding units are related to the current maximum coding unit.
  • the target side of the current maximum coding unit is adjacent, and N is a positive integer greater than or equal to 1; when the pixels in the reference string are in the N coded maximum coding units, the pixels of the reference string go to a predetermined After moving a predetermined pixel in the direction, the pixels of the corresponding target area have not been coded; the pixels in the reference string are located within the boundary of the independent coding area of the current image; the pixels in the reference string are not identical to the uncoded pixels of the current image.
  • the blocks overlap.
  • the size of N is determined according to the size of the largest coding unit.
  • the size of the maximum coding unit is M*M
  • the number of the predetermined pixels is M
  • the target area is the pixels of the reference string and moves M pixels in the predetermined direction After the corresponding area.
  • the size of the largest coding unit is K*K, and K is a positive integer less than M, then the number of predetermined pixels is N*K, and the target area is the size of the reference string.
  • the corresponding largest coding unit after the pixel moves N*K pixels in the predetermined direction.
  • the smallest coordinate in the target area is not equal to the smallest coordinate in the current coding block.
  • the unencoded block includes the currently encoded block, and pixels in the reference string do not overlap with pixels in the currently encoded block.
  • the abscissa of the pixel in the reference string is smaller than the abscissa of the pixel in the current coding block; or, the ordinate of the pixel in the reference string is smaller than the abscissa of the pixel in the current coding block. The ordinate of the pixel.
  • the unencoded block does not include the current encoding block
  • the pixels in the reference string are allowed to overlap with the encoded pixels in the current encoding block
  • the pixels in the reference string are The uncoded pixels in the current coding block do not overlap.
  • the pixels in the reference string are allowed to overlap with the encoded pixels in the current encoding block, and the pixels in the reference string do not overlap with the rows containing unencoded pixels in the current encoding block.
  • the coded pixels in overlap are allowed to overlap.
  • the independent coding area of the current image includes the current image or slices or slices in the current image.
  • the pixels in the reference string are from the same Align the area.
  • the pixels in the reference string come from the same largest coding unit.
  • the circumscribed rectangle of the reference string does not overlap with the uncoded block of the current image.
  • each unit in the video encoding device provided in the embodiments of the present disclosure, reference may be made to the content in the above-mentioned video encoding method, and details are not described herein again.
  • an embodiment of the present disclosure also provides a video decoding device, which can be applied to a decoder/decoder, and the device may include: a code stream obtaining unit, which may be used to obtain the code stream of the current image.
  • the code stream includes a maximum coding unit, the maximum coding unit includes a current maximum coding unit and an encoded maximum coding unit corresponding to the current image, the current maximum coding unit includes a current decoding block, and the current decoding block includes a current string;
  • the storage space storage unit may be configured to use the first part of the M*W storage space to store the pixels in the current decoding block, and use the second part of the storage space to store the encoded maximum coding unit and the At least part of the decoded block in the current largest coding unit, M is a positive integer greater than or equal to W;
  • the reference string determining unit may be used to search for the reference string of the current string in the second part of the storage space to Obtain the predicted value of the current string according to the reference string, and
  • the reference string may be set to satisfy the following conditions: the reference string is within the range of the current maximum coding unit and the N coded maximum coding units, and the N coded maximum coding units are related to the current maximum coding unit.
  • the target side of the current maximum coding unit is adjacent, and N is a positive integer greater than or equal to 1; when the pixels in the reference string are in the N coded maximum coding units, the pixels of the reference string go to a predetermined The pixels in the corresponding target area have not been reconstructed after moving a predetermined pixel in the direction; the pixels in the reference string are located within the boundary of the independent decoding area of the current image; the pixels in the reference string are not identical to the undecoded pixels of the current image The blocks overlap.
  • the size of N is determined according to the size of the largest coding unit.
  • the size of the maximum coding unit is M*M
  • the number of the predetermined pixels is M
  • the target area is the pixels of the reference string and moves M pixels in the predetermined direction After the corresponding area.
  • the size of the largest coding unit is K*K, and K is a positive integer less than M, then the number of predetermined pixels is N*K, and the target area is the size of the reference string.
  • the corresponding largest coding unit after the pixel moves N*K pixels in the predetermined direction.
  • the smallest coordinate in the target area is not equal to the smallest coordinate in the current decoding block.
  • the undecoded block includes the current decoded block, and pixels in the reference string do not overlap with pixels in the current decoded block.
  • the abscissa of the pixel in the reference string is smaller than the abscissa of the pixel in the current decoding block; or, the ordinate of the pixel in the reference string is smaller than that of the current decoding block. The ordinate of the pixel.
  • the undecoded block does not include the current decoded block
  • the pixels in the reference string are allowed to overlap with the reconstructed pixels in the current decoded block
  • the pixels in the reference string are The unreconstructed pixels in the current decoded block do not overlap.
  • the pixels in the reference string are allowed to overlap with the reconstructed pixels in the current decoded block, and the pixels in the reference string do not overlap with the rows containing unreconstructed pixels in the current decoded block.
  • the reconstructed pixels in overlap are allowed to overlap with the reconstructed pixels in the current decoded block, and the pixels in the reference string do not overlap with the rows containing unreconstructed pixels in the current decoded block. The reconstructed pixels in overlap.
  • the independent decoding area of the current image includes the current image or slices or slices in the current image.
  • the pixels in the reference string are from the same Align the area.
  • the pixels in the reference string come from the same largest coding unit.
  • the circumscribed rectangle of the reference string does not overlap with the undecoded block of the current image.
  • each unit in the video decoding device provided by the embodiments of the present disclosure, reference may be made to the content in the above-mentioned video encoding method and video decoding method, and details are not described herein again.
  • the embodiment of the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the video encoding method as described in the above-mentioned embodiment is implemented.
  • the embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the video decoding method as described in the above-mentioned embodiment is implemented.
  • An embodiment of the present disclosure provides an electronic device, including: at least one processor; a storage device configured to store at least one program, and when the at least one program is executed by the at least one processor, the at least one processing The device implements the video encoding method as described in the above embodiment.
  • An embodiment of the present disclosure provides an electronic device, including: at least one processor; a storage device configured to store at least one program, and when the at least one program is executed by the at least one processor, the at least one processing The device implements the video decoding method as described in the above embodiment.
  • FIG. 9 shows a schematic structural diagram of an electronic device suitable for implementing the embodiments of the present disclosure.
  • the electronic device 900 includes a central processing unit (CPU, Central Processing Unit) 901, which can be loaded to a random storage unit according to a program stored in a read-only memory (ROM, Read-Only Memory) 902 or from a storage part 908 Access to the program in the memory (RAM, Random Access Memory) 903 to execute various appropriate actions and processing.
  • CPU Central Processing Unit
  • RAM Random Access Memory
  • various programs and data required for system operation are also stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904.
  • An input/output (I/O) interface 905 is also connected to the bus 904.
  • the following components are connected to the I/O interface 905: the input part 906 including keyboard, mouse, etc.; including the output part 907 such as cathode ray tube (CRT, Cathode Ray Tube), liquid crystal display (LCD, Liquid Crystal Display) and speakers, etc. ; A storage section 908 including a hard disk, etc.; and a communication section 909 including a network interface card such as a LAN (Local Area Network) card and a modem. The communication section 909 performs communication processing via a network such as the Internet.
  • the drive 910 is also connected to the I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 910 as needed, so that the computer program read from it is installed into the storage part 908 as needed.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable storage medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 909, and/or installed from the removable medium 911.
  • the computer program is executed by the central processing unit (CPU) 901, it executes various functions defined in the method and/or device of the present application.
  • the computer-readable storage medium shown in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connection with at least one wire, portable computer disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable only Read memory (EPROM (Erasable Programmable Read Only Memory) or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any of the above The right combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable storage medium other than the computer-readable storage medium.
  • the computer-readable storage medium may be sent, propagated, or transmitted for use by or in combination with the instruction execution system, apparatus, or device program of.
  • the program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the foregoing.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains at least one available for realizing the specified logical function.
  • Execute instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after the other can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the electronic device described in the above-mentioned embodiment; or it may exist alone without being assembled into the computer-readable storage medium.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments. For example, the electronic device can implement the steps shown in FIG. 6 or FIG. 8.
  • the embodiments of the present application also provide a computer program product including instructions, which when run on a computer, cause the computer to execute the method provided in the above-mentioned embodiments.
  • the exemplary embodiments described herein can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a non-volatile storage medium which can be a CD-ROM, U disk, mobile hard disk, etc.
  • Including several instructions to make a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开提供一种视频编码方法、视频解码方法及相关设备。该方法包括:获取当前图像,所述当前图像包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串;利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;在第二部分内搜索当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行编码。

Description

视频编码方法、视频解码方法及相关设备
本申请要求于2020年06月02日提交中国专利局、申请号为202010487809.7、申请名称为“视频编码方法、视频解码方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及视频编解码技术领域,具体而言,涉及视频编解码。
背景技术
一方面,互联网即将进入5G(5th generation mobile networks或5th generation wireless systems、5th-Generation,第五代移动通信技术)新时代,各种互联网应用中出现的图像(视频)已经成为互联网带宽的主要消耗者。特别是移动互联网图像流量与日俱增,在5G时代更将有爆发式的增长,将给图像编解码技术的加速发展注入全新的强大推动力。与此同时,也对图像编解码技术提出了诸多过去未曾遇到的严峻的新挑战。5G时代,万物互联,各类新兴应用中产生的新型互联网图像具有多样性和差异性。因此,如何针对具有多样性和差异性的新型互联网图像的特点,研究高效的图像编解码技术成为了迫切需求。
另一方面,用于描绘即使相对较短影片所需的视频数据量也可能相当大,当数据为流式或者以其它方式通过具有有限带宽容量的通信网进行传送时,这可能产生困难。因此,在通过现代电信网络进行传送之前,视频数据通常进行压缩。在传输之前,视频压缩设备通常使用在源侧的软件和/或硬件来对视频数据进行编码,从而减少了用于表示数字视频图像所需的数据量。随后由视频解压缩设备在目的地接收压缩数据,所述视频解压缩设备对视频数据进行解码。在有限的网络资源以及对较高视频质量的需求不断增加的情况下,需要提高图像质量而不会增加位速率的改进的压缩和解压缩技术。
相关技术中的串预测方案(也可称之为帧内串复制技术或者串匹配技术)存在不利于硬件实现的部分。
因此,需要一种新的视频编码方法、视频解码方法、电子设备和计算机可读存储介质。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解。
发明内容
本公开实施例提供一种视频编码方法、视频解码方法、电子设备和计算机可读存储介质,能够简化串预测的硬件实现。
本公开的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本公开的实践而习得。
一方面,本公开实施例提供一种视频编码方法,所述方法包括:获取当前图像,所述当前图像包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串;利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行编码。
另一方面,本公开实施例提供一种视频解码方法,所述方法包括:获取当前图像的码流,所述码流包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串;利用M*W大小的存储空间的第一部分存储所述当前解码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已解码块,M为大于或等于W的正整数;在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行解码。
另一方面,本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如上述方面所述的视频编码方法或者视频解码方法。
另一方面,本公开实施例提供了一种电子设备,包括:至少一个处理器;存储装置,配置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,使得所述至少一个处理器实现如上述方面所述的视频编码方法或者视频解码方法。
又一方面,本申请实施例提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得所述计算机执行以上方面所述的视频编码方法或者视频解码方法。
在本公开的一些实施例所提供的技术方案中,通过利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;并限制在所述存储空间的第二部分内搜索所述当前串的参考串,从而能够简化串预测的硬件实现。
在本公开的一些实施例所提供的技术方案中,通过限制串预测技术中的串的参考串的位置,能够解除不同串之间的重建依赖关系,因此,使得不同串可以并行重建,从而简化了串预测的硬件实现,提高串预测的实现效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示意性示出了相关技术中的一种视频编码基本框图;
图2示意性示出了相关技术中的帧间预测的示意图;
图3示意性示出了相关技术中的帧间预测的Merge(合并)模式的空域的候选MVP的位置示意图;
图4示意性示出了相关技术中的帧间预测的Merge模式的时域的候选MVP的位置示意图;
图5示意性示出了相关技术中的帧内串复制的示意图;
图6示意性示出了根据本公开的一实施例的视频编码方法的示意图;
图7示意性示出了相关技术中的ISC参考块搜索范围的示意图;
图8示意性示出了根据本公开的一实施例的视频解码方法的示意图;
图9示出了适于用来实现本公开实施例的电子设备的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本公开的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本公开的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在至少一个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
首先对本公开实施例中涉及的部分术语进行说明。
AVS:Audio Video Coding Standard,音视频编码标准。
HEVC:High Efficiency Video Coding,高效视频编码,也称之为H.265。
VVC:Versatile Video Coding,多功能视频编码,也称之为H.266。
Intra(picture)Prediction:帧内预测。
Inter(picture)Prediction:帧间预测。
SCC:screen content coding,屏幕内容/图像编码。
Loop Filtering:环路滤波。
QP:Quantization Parameter,量化参数。
LCU:Largest Coding Unit,最大编码单元。
CTU:Coding Tree Unit,编码树单元,一般由最大编码单元开始往下划分。
CU:Coding Unit,编码单元。
PU:Prediction Unit,预测单元。
MV:Motion Vector,运动矢量。
MVP:Motion Vector Prediction,运动矢量预测值。
MVD:Motion Vector Difference,MVP与MV的真正估值之间的差值。
AMVP:Advanced Motion Vector Prediction,高级运动矢量预测。
ME:Motion Estimation,运动估计,得到运动矢量MV的过程称作为运动估计,作为运动补偿(Motion Compensation,MC)中的技术。
MC:根据运动矢量和帧间预测方法,求得当前图像的估计值过程。运动补偿是一种描述相邻帧(相邻在这里表示在编码关系上相邻,在播放顺序上两帧未必相邻)差别的方法,具体来说是描述相邻帧中前面一帧的每个小块怎样移动到相邻帧中当前帧的某个位置去。这种方法经常被视频压缩/视频编解码器用来减少视频序列中的空域冗余。相邻的帧通常很相似,也就是说,包含了很多冗余。使用运动补偿的目的是通过消除这种冗余,来提高压缩比。
I Slice:Intra Slice,帧内条带/片。可以把图像分成一帧(frame)或两场(field),而帧又可以分成一个或几个片(Slice)。
视频信号,从信号的获取方式看,可以包括摄像机拍摄到的以及计算机生成的两种方式。由于统计特性的不同,其对应的压缩编码方式也可能有所区别。
部分视频编码技术例如HEVC,VVC以及AVS,采用了混合编码框架,如图1所示,对输入的原始视频信号(input video)中的图像(pictures)按顺序编码,进行了如下一系列的操作和处理:
1)块划分结构(block partition structure):将输入图像划分成若干个不重叠的处理单元,对每个处理单元将进行类似的压缩操作。这个处理单元可以称之为CTU或者LCU。CTU或者LCU再往下,可以继续进行更加精细的划分,得到至少一个基本编码的单元,称之为CU。每个CU是一个编码环节中最基本的元素。以下描述的是对每一个CU可能采用的各种编码方式。
2)预测编码(Predictive Coding):包括了帧内预测和帧间预测等方式,原始视频信号经过选定的已重建视频信号的预测后,得到残差视频信号。编码端需要为当前CU决定在众多可能的预测编码模式中,选择最适合的一种,并告知解码端。
a.帧内预测:预测的信号来自于同一图像内已经编码重建过的区域。
其中,帧内预测的基本思想就是利用相邻像素的相关性去除空间冗余。在视频编码中,相邻像素指的就是当前CU周围的已编码CU的重建(reconstructed)像素。
b.帧间预测:预测的信号来自已经编码过的,且不同于当前图像的其他图像(称之为参考图像)。
3)变换编码及量化(Transform&Quantization):残差视频信号经过DFT(Discrete Fourier Transform,离散傅里叶变换),DCT(Discrete Cosine Transform,离散余弦变换)等变换操作,将残差视频信号转换到变换域中,称之为变换系数。在变换域中的残差视频信号,进一步的进行有损的量化操作,丢失掉一定的信息,使得量化后的信号有利于压缩表达。
在一些视频编码标准中,可能有多于一种变换方式可以选择,因此,编码端也需要为待编码的当前CU选择其中的一种变换,并告知解码端。
其中,量化的精细程度通常由量化参数(Quantization Parameters,QP)来决定,QP取值较大时,表示更大取值范围的变换系数将被量化为同一个输出,因此通常会带来更大的失真及较低的码率;相反,QP取值较小时,表示较小取值范围的变换系数将被量化为同一个输出,因此通常会带来较小的失真,同时对应较高的码率。
4)熵编码(Entropy Coding)或统计编码:量化后的变换域信号,将根据各个值出现 的频率,进行统计压缩编码,最后输出二值化(0或者1)的压缩码流(bitstream)。
同时,编码产生其他信息,例如选择的编码模式(coding modes),运动矢量等,也需要进行熵编码以降低码率。
其中,统计编码是一种无损编码方式,可以有效的降低表达同样的信号所需要的码率。常见的统计编码方式有变长编码(Variable Length Coding,VLC)或者基于上下文的二值化算术编码(Content Adaptive Binary Arithmetic Coding,CABAC)。
5)环路滤波(Loop Filtering):已经编码过的图像,经过反量化,反变换及预测补偿的操作(上述步骤2)~4)的反向操作),可获得重建的解码图像(decoded picture)。重建的解码图像与原始的输入图像相比,由于存在量化的影响,部分信息与原始的输入图像有所不同,产生失真(Distortion)。对重建的解码图像进行滤波操作,例如去块效应滤波(deblocking),SAO(Sample Adaptive Offset,样点自适应补偿)或者ALF(Adaptive Loop Filter,自适应环路滤波)等滤波器,可以有效的降低量化所产生的失真程度。由于这些经过滤波后的重建的解码图像,将做为后续编码图像的参考,用于对将来的信号进行预测,所以上述的滤波操作也称为环路滤波,及在编码环路内的滤波操作。
图1展示了一个视频编码器的基本流程图。图1中以第k个CU(标记为s k[x,y])为例进行举例说明。其中,k为大于或等于1且小于或等于输入的当前图像中的CU的数量的正整数,s k[x,y]表示第k个CU中坐标为[x,y]的像素点,x表示像素点的横坐标,y表示像素点的纵坐标。s k[x,y]经过运动补偿或者帧内预测等中的一种较优处理后获得预测信号
Figure PCTCN2021089583-appb-000001
s k[x,y]与
Figure PCTCN2021089583-appb-000002
相减得到残差信号u k[x,y],然后对该残差信号u k[x,y]进行变换和量化,量化输出的数据有两个不同的去处:一个是送给熵编码器进行熵编码,编码后的码流输出到一个缓冲器(buffer)中保存,等待传送出去;另一个应用是进行反量化和反变换后,得到信号u' k[x,y]。将信号u' k[x,y]与
Figure PCTCN2021089583-appb-000003
相加得到新的预测信号s * k[x,y],并将s * k[x,y]送至当前图像的缓冲器中保存。s * k[x,y]经过帧内-图像预测获得f(s * k[x,y])。s * k[x,y]经过环路滤波后获得s' k[x,y],并将s' k[x,y]送至解码图像缓冲器中保存,以用于生成重建视频。s' k[x,y]经过运动-补偿预测后获得s′ r[x+m x,y+m y],s′ r[x+m x,y+m y]表示参考块,m x和m y分别表示运动矢量的水平和竖直分量。
一些视频编码标准,如HEVC,VVC,AVS3,均采用基于块的混合编码框架。它们将原始的视频数据分成一系列的编码块,结合预测、变换和熵编码等视频编码方法,实现视频数据的压缩。其中,运动补偿是视频编码常用的一类预测方法,运动补偿基于视频内容在时域或空域的冗余特性,从已编码的区域导出当前编码块的预测值。这类预测方法包括:帧间预测、帧内块复制预测、帧内串复制预测等。在具体的编码实现中,可能单独或组合使用这些预测方法。对于使用了这些预测方法的编码块,通常需要在码流中显式或隐式的编码至少一个二维的位移矢量,指示当前块(或当前块的同位块)相对它的至少一个参考块的位移。
其中,在不同的预测模式及不同的实现下,位移矢量可能有不同的名称,本公开实施例中统一按照以下方式进行描述:1)帧间预测中的位移矢量称为运动矢量(MV);2)帧内块复制中的位移矢量称为块矢量或者块位移矢量;3)帧内串复制中的位移矢量称为串矢量(String Vector,简称SV)。
以下先对帧间预测中相关的技术进行介绍。
图2示意性示出了相关技术中的帧间预测的示意图。
如图2所示,帧间预测利用视频时间域的相关性,使用邻近已编码图像像素预测当前图像的像素,以达到有效去除视频时域冗余的目的,能够有效节省编码残差数据的比特。其中,P为当前帧,Pr为参考帧,B为当前待编码块,Br是B的参考块。B’与B在图像中的坐标位置相同。
假设Br坐标为(xr,yr),B’坐标为(x,y)。当前待编码块B与其参考块Br之间的位移,称为运动向量(MV),即:
MV=(xr-x,yr-y)      (1)
考虑到时域或空域邻近块具有较强的相关性,可以采用MV预测技术进一步减少编码MV所需要的比特。在H.265/HEVC中,帧间预测包含Merge和AMVP两种MV预测技术。
其中,Merge模式会为当前PU建立一个MV候选列表,其中存在5个候选MV(及其对应的参考图像)。遍历这5个候选MV,选取率失真代价最小的作为最优MV。若编解码器依照相同的方式建立MV候选列表,则编码器只需要传输最优MV在MV候选列表中的索引即可。
需要注意的是,HEVC的MV预测技术还有一种skip(跳过)模式,是merge模式的一种特例。在merge模式找到最优MV后,如果当前待编码块和参考块基本一样,那么不需要传输残差数据,只需要传送MV的索引和一个skip flag(指示编码是否是skip模式的标志)。
其中,Merge模式建立的MV候选列表中包含了空域和时域两种情形。
其中,空域最多提供4个候选MV,它的建立如图3所示。当前块空域上的MV候选列表按照A1->B1->B0->A0->B2的顺序建立,其中B2为替补,即当A1,B1,B0,A0中有至少一个不存在时,则需要使用B2的运动信息。即MV候选列表的大小为5,但HEVC中至多使用其中4个(即使五个都存在),并且当其中某一个不可用时,顺序用下一个。
其中,类比空域,空域用相邻块,则时域用相邻帧的同位(对应位置)PU的MV来推测当前位置的MV。时域最多只提供1个候选MV,它的建立如图4所示。帧间预测的图像都要有参考图像,像B帧参考它之前的图像。当前图像叫cur,当前图像的参考图像叫cur_ref,当前图像的同位图像叫col,同位图像的参考图像叫col_ref。假设同位图像col与其参考图像col_ref之间的距离为tb,从当前图像cur与其参考图像cur_ref之间的距离为td,具体实现中,tb可以为同位图像与其参考图像的序号(picture order count,POC)的差值,td可以为当前图像与其参考图像的序号(picture order count,POC)的差值,可以由同位PU的MV按下式伸缩得到:
curMV=td*colMV/tb     (2)
其中,curMV和colMV分别表示当前PU和同位PU的MV,这样就能由col(同位)图像推导出当前图像的MV。若同位块上D0位置PU不可用,则用D1位置的同位PU进行替换。
其中,Merge模式建立的MV候选列表中包含了上述空域和时域两种情形,对于B Slice,还包含组合列表的方式。对于B帧,就是需要参考前向帧MV又需要后向帧MV的帧,它有两个MV列表,list0,list。因此,对于B Slice中的PU,由于存在两个MV,其MV候选列表也需要提供两个MVP。HEVC通过将MV候选列表中的前4个候选MV进行两两组合,产生了用于B Slice的组合列表。
上文中提到的merge模式下直接使用MVP作为MV。AMVP模式,类似merge模式,利用空域和时域邻近块的MV相关性,先为当前PU建立候选预测MV列表。AMVP可以利用空间、时间上运动向量的相关性,分别建立空域候选列表以及时域候选列表,再从候选列表中选取最终的MVP。与Merge模式不同,AMVP模式下从候选预测MV列表中选择最有的预测MV即MVP,与当前待编码块通过运动搜索得到的最优MV(即真正的MV)进行差分编码,即编码MVD=MV-MVP。解码端通过建立相同的列表,仅需要MVD与MVP在该列表中的序号,即可计算当前解码块的MV。AMVP候选预测MV列表也包含空域和时域两种情形,不同的是AMVP列表长度仅为2。
如上所述,在HEVC的AMVP模式中,需要对MVD进行编码。在HEVC中,MVD的分辨率由slice_header(片头或者条带头或者切片数据头)中的use_integer_mv_flag控制,当该标志的值为0,MVD以1/4(亮度)像素分辨率进行编码;当该标志的值为1,MVD采用整(亮度)像素分辨率进行编码。
VVC中使用了一种自适应运动矢量精度(Adaptive motion vector resolution,简称AMVR)的方法。该方法允许每个CU自适应的选择编码MVD的分辨率。在普通的AMVP模式中,可选的分辨率包括1/4,1/2,1和4像素。对于具有至少一个非零MVD分量的CU,首先编码一个标志指示是否将四分之一亮度采样MVD精度用于CU。如果该标志为0,则当前CU的MVD采用1/4像素分辨率进行编码。否则,需要编码第二个标志,以指示CU使用了1/2像素分辨率或其他MVD分辨率。否则,编码第三个标志以指示对于CU是否使用1像素分辨率或4像素分辨率。在Affine AMVP模式中,可选的分辨率包括1/16像素,1/4(亮度)像素,1像素。
屏幕图像即由电脑、手机、电视等电子设备产生的图像,主要包含两类内容:一类是计算机生成的非连续色调的内容,包含大量小而尖的线条形状,如文字、图标、按钮和网格等;另一类是摄像机拍摄的包含大量连续色调的内容,例如电影、电视片段、自然图像视频等。相关技术中的基于块的混合编码方式的视频编码标准例如AVS、HEVC,对于包含大量连续内容的自然图像和视频有很高的压缩比,但是对于包含非连续色调内容的屏幕图像压缩效果并不好。
伴随着云计算、移动通信技术和无线显示技术的快速发展,如何在低码率下使屏幕图像在各类电子终端设备上高质量显示,是SCC需要解决的问题。为提高屏幕图像编码性能,开发HEVC标准的SCC版本,并已经采用了一些有利于屏幕图像编码的工具,例如ISC(Intra String Copy,帧内串复制技术/串预测技术/串匹配技术)。
ISC较好地提高了屏幕图像编码效果,其将二维图像逐个编码单元CU一维化。ISC按照某种扫描顺序(光栅扫描、往返扫描和Zig-Zag扫描等)将一个编码块分成一系列像素串或未匹配像素。每个串在当前图像的已编码区域中寻找相同形状的参考串,导出当前串的预 测值,通过编码当前串的像素值与预测值之间残差,代替直接编码像素值,能够有效节省比特。
图5给出了帧内串复制的示意图,深色的区域为已编码区域,白色的28个像素为串1,浅色的35个像素为串2,黑色的1个像素表示未匹配像素。如果一个像素在可参考的区域中没有找到对应的参考,即称之为未匹配像素,也称为孤立点,未匹配像素的像素值被直接编码,而不是通过参考串的预测值导出。
ISC技术需要编码当前编码块中各个串对应的串矢量(String Vector,SV)、串长度以及是否有匹配的参考串的标志等。其中,串矢量(SV)表示待编码串(当前串,即当前待编码串)到其参考串的位移。串长度表示该当前串所包含的像素数量。在不同的实现方式中,串长度的编码有多种方式,以下给出几种示例(部分示例可能组合使用):1)直接在码流中编码串的串长度;2)在码流中编码处理该当前串后待处理像素数量,解码端则根据当前编码块的大小P,已处理的像素数量P1,解码得到不包括该当前串后的待处理像素数量P2,计算得到当前串的串长度L,L=P-P1-P2,其中,L、P均为大于0的整数,P1和P2均为大于或等于0的整数;3)在码流中编码一个标志指示该串是否为最后一个串,如果是最后一个串,则根据当前块的大小P,已处理的像素数量P1,计算得到当前串的串长度L=P-P1。如果一个像素在可参考的区域中没有找到对应的参考,则作为未匹配像素,将直接对未匹配像素的像素值进行编码。
以下给出当前方案中ISC的解码流程:
Figure PCTCN2021089583-appb-000004
Figure PCTCN2021089583-appb-000005
上述解码过程中,带“_”的字段表示需要解码的语法元素,无“_”且首字母大写的字段表示变量,变量的值可由语法元素解码得到,上述流程中省略了一些本公开实施例无关的细节。
目前的ISC方案存在不利于硬件实现的部分。下面通过本公开实施例提供的方案,来解决上述至少一个技术问题。
本公开实施例提供的方法可以应用到使用了ISC的视频编解码器或视频压缩的产品中,可以适用于有损数据压缩的编码和解码,也可以适用于无损数据压缩的编码和解码。其中,编码和解码过程中所涉及到的数据是指以下例举之一或者其组合:
1)一维数据;
2)二维数据;
3)多维数据;
4)图形;
5)图像;
6)图像的序列;
7)视频;
8)三维场景;
9)持续变化的三维场景的序列;
10)虚拟现实的场景;
11)持续变化的虚拟现实的场景的序列;
12)像素形式的图像;
13)图像的变换域数据;
14)二维或二维以上字节的集合;
15)二维或二维以上比特的集合;
16)像素的集合;
17)三分量像素(Y,U,V)的集合;
18)三分量像素(Y,Cb,Cr)的集合;
19)三分量像素(Y,Cg,Co)的集合;
20)三分量像素(R,G,B)的集合;
21)四分量像素(C,M,Y,K)的集合;
22)四分量像素(R,G,B,A)的集合;
23)四分量像素(Y,U,V,A)的集合;
24)四分量像素(Y,Cb,Cr,A)的集合;
25)四分量像素(Y,Cg,Co,A)的集合。
当数据为上述列举出的图像、或者图像的序列、或者视频时,编码块是图像的一个编码区域,应当至少包括以下一种:一组图像、预定数目的若干幅图像、一幅图像、一帧图像、一场图像、图像的子图像、条带、宏块、最大编码单元LCU、编码树单元CTU、编码单元CU。
图6示意性示出了根据本公开的一实施例的视频编码方法的示意图。需要说明的是,本公开实施例中涉及到的方法,可以单独使用,或者组合起来一起使用图6实施例以编码端为例进行介绍。其中,编码端与解码端是对应的,编码端进行一系列分析决定各语法元素的值,对于分析过程,本公开不做限定。这里所述的编码端可以是部署有视频编码器的视频压缩设备,该视频压缩设备包括终端设备或服务器等具有实施视频编码器功能的设备。
如图6所示,本公开实施例提供的方法可以包括以下步骤。
在S610中,获取当前图像,所述当前图像包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串。
本公开实施例,编码端的编码器接收原始视频信号,按顺序对原始视频信号中的图像进行编码,这里将当前待编码的图像称之为当前图像,其可以是原始视频信号中的任意一帧图像。在编码端,可以对当前图像进行块划分,例如划分为若干个不重叠的CTU或者LCU。CTU可以继续进行更加精细的划分,得到至少一个CU,这里将当前待编码的当前CU称之为当前编码块,但本公开并不限定于此,例如还可以是PU或者TU。在下面的举例说明中,均以CU为例进行举例说明。当前CU所对应的CTU称之为当前CTU,处于所述当前图像的已编码区域且不属于所述当前CTU的CTU称之为已编码CTU。
本公开实施例中,对当前CU中的像素采用ISC进行编码,按照某种扫描顺序将当前CU中的像素分成串或者未匹配像素。在当前图像的已编码区域中为每个串寻找相同形状的参考串,当前待搜索参考串的串称之为当前串。
在S620中,利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数。
本公开实施例中,M和W均为大于或等于1的正整数,且M可以大于或等于W。例如,可以设置M=W=128。再例如,可以设置M=256,W=128。在下面的举例说明中,均以 M=W=128为例进行举例说明,但本公开并不限定于此。
本公开实施例中,为了便于硬件实现,ISC仅使用1个CTU大小的内存作为存储空间,例如假设1个CTU的大小为128*128(即M=W=128,但本公开并不限定于此),则ISC被限制为仅使用一个128*128大小的内存。如图7所示,128*128内存中的一个64*64大小的第一部分用于存储当前待编码的64*64大小的当前CU(图示中的Curr)中的像素,128*128内存中的3个64*64大小的第二部分用于存储3个64*64大小的已编码CU中的已编码像素,这里将已编码CU称之为已编码块。因此,ISC仅能在这3个64*64大小的已编码CU中搜索参考串。
本公开实施例中,为了便于硬件实现,假设一个CTU的大小小于或等于64*64,ISC被限制为仅使用一个128*128(即M=128)大小的内存。128*128内存中的一个64*64大小的第一部分用于存储当前待编码的64*64大小的当前CTU中的像素,128*128内存中的3个64*64大小的第二部分用于存储3个64*64大小的已编码CTU中的已编码像素,这里将已编码CTU称之为已编码块。因此,ISC仅能在这3个64*64大小的已编码CTU中搜索参考串。
在S620中,在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行编码。
在当前图像的已编码区域中搜索所述当前串的参考串,获得当前串及其参考串之间的SV,用SV和/或串长度表示对应参考串通过预定运算获得当前串中的像素的预测值,通过编码所述当前串的像素值及其预测值的残差,可以减少比特数,提高编码效率。对原始视频信号中的每帧图像进行类似的处理,最终可以产生码流,可以传输至解码端的解码器中。
本公开实施方式提供的视频编码方法,通过利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;并限制在所述存储空间的第二部分内搜索所述当前串的参考串,从而能够简化串预测的硬件实现。
目前的ISC方案存在不利于硬件实现的部分,如存在以下情况:参考串的位置与当前待重建CU重叠,造成使串的重建存在依赖。例如,假设将一个CU分为两个串,分别称之为串1和串2,且串2参考了串1。这种情况下,串2需要等待串1重建完成后,才能开始重建。
本公开实施例中,可以设置所述参考串满足以下条件:所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编码单元的目标侧相邻,N为大于或等于1的正整数;当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未编码;所述参考串中的像素位于所述当前图像的独立编码区域的边界内;所述参考串中的像素不与所述当前图像的未编码块重叠。
在示例性实施例中,可以根据所述最大编码单元的尺寸确定N的大小。
在示例性实施例中,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串的像素往所述预定方向移动M个像素后对应的
Figure PCTCN2021089583-appb-000006
区域。
在示例性实施例中,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串的像素往所述预定方向移动N*K个像 素后对应的最大编码单元。
在示例性实施例中,所述目标区域中的最小坐标不等于所述当前编码块中的最小坐标。本公开实施例中,目标区域可以包含至少一个CU,当目标区域包括多个CU时,当前编码块可以在目标区域的第一个CU。
在示例性实施例中,所述未编码块包括所述当前编码块,所述参考串中的像素不与所述当前编码块中的像素重叠。
在示例性实施例中,所述参考串中的像素的横坐标小于所述当前编码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前编码块中的像素的纵坐标。
在一些实施例中,当前图像的未编码块可以包括当前待编码的当前CU和其它尚未编码的其它CU,即不允许参考串与当前CU中的已编码像素重叠,这样可以实现串与串之间无依赖,从而可以降低编码的复杂度,可以实现并行编码。
在示例性实施例中,所述未编码块不包括所述当前编码块,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素与所述当前编码块中的未编码像素不重叠。
在一些实施例中,当前图像的未编码块不包括当前待编码的当前CU,即允许参考串与当前CU中的已编码像素重叠,这样情况称之为串间依赖,根据扫描顺序处于后面的串需要等待前面的串编码完成后,才能进行编码,但其相比于串内依赖而言,串间依赖的复杂度更小;同时,由于越邻近像素与当前CU中的待编码的当前像素相关性越大,使用邻近像素做参考能够取得更好的预测效果,因此,串间依赖的性能高于无依赖的情况。其中,串内依赖是指参考串的位置与当前CU重叠,且与当前待编码的当前串的位置重叠,这种情况下串只能按照扫描顺序逐像素编码。
在示例性实施例中,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素不与所述当前编码块中包含未编码像素的行中的已编码像素重叠。
在一些实施例中,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,但不允许参考串中的像素与当前编码块中包含未编码像素的一行重叠(注意是当前编码块的一行,不是当前图像的一行)。如上文所述,在硬件实现时,如果串与串之间完全无参考,则可以并行的编码。对于串间参考,由于存在参考无法完全并行,需要等待被参考的串编码完成后,当前串才能开始编码。增加了该限制条件后,参考串与当前串不在当前编码块的同一行,则编码时可以一行一行的编码,而不用等待。
在示例性实施例中,所述当前图像的独立编码区域包括所述当前图像或者所述当前图像中的片、条带。
在示例性实施例中,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
Figure PCTCN2021089583-appb-000007
对齐区域。
在示例性实施例中,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
在示例性实施例中,所述参考串的外接矩形不与所述当前图像的未编码区域重叠。
下面通过具体的实例对上述实施例提供的方案,举例说明在编码端如何限制参考串的位置:
为了便于硬件实现,在编码端,ISC仅使用1个CTU大小的内存,例如假设1个CTU的大小为128*128个样本(在视频编码标准中,“样本”可以用于表述“像素”,可以包括128*128个亮度样本以及对应的色度样本),则ISC被限制为仅使用一个128*128大小的内存。该大小为128*128的内存中,1个64*64大小的空间用于存储当前待编码的64*64的当前CU(图7中标记有Curr的64*64的CU,其对应128*128的当前CTU)中的未编码像素,还有3个64*64大小的空间可用于存储当前图像的已编码区域的3个已编码CU中的已编码像素。因此,ISC仅能在这3个64*64大小的已编码CU中搜索当前CU的当前串的参考串,应满足以下条件:
1)串矢量指向的参考串的像素不应包含当前CU的像素。
例如,假设参考串中的像素的坐标为(xRef_i,yRef_i),xRef_i和yRef_i均为大于或等于0的整数,其中i=0,1,2,…,L-1,L为串长度,L为大于1的正整数,当前CU的左上角位置为(xCb,yCb),xCb和yCb均为大于或等于0的整数,则参考串的像素的坐标应满足条件(xRef_i<xCb||yRef_i<yCb)为真,其中,“||”是“逻辑或”,即在该所参考的坐标系中,如图7所示,参考串位于当前CU的左侧或者上面。
需要说明的是,上述参考串的像素的坐标应满足条件(xRef_i<xCb||yRef_i<yCb)为真,是在编码顺序为从左到右、从上到下的情况下限制的,若编码器/标准按其他顺序进行编码,则可以相应的调整该条件,本公开对此不做限定。类似的,在下文中,均是以编码顺序为从左到右、从上到下的情况下进行举例说明的,因此,左上角为最小坐标或者最小坐标对应的像素,但本公开实施例提供的方案也可以适用于其它编码顺序,对此不做限定。
2)串矢量指向的参考串限制在当前CTU和当前CTU的左边(这里假设所参考的坐标系中,目标侧为左边)的N个CTU(属于已编码CTU)的范围内,N的大小由最大编码单元的尺寸决定,例如可以根据以下公式确定N:
N=(1<<((7-(log2_lcu_size_minus2+2))<<1))–(((log2_lcu_size_minus2+2)<7)?1:0)       (3)
上述公式(3)中,记最大编码单元的宽或高为lcu_size,lcu_size为大于或等于1的正整数,则log2_lcu_size_minus2=log2(lcu_size)-2。“<<”运算符表示左移,用来将一个数的各二进制位全部左移K(K为大于或等于1的正整数)位,高位舍弃,低位补0。(((log2_lcu_size_minus2+2)<7)?1:0)是一个三目运算符,先判断((log2_lcu_size_minus2+2)<7)是否成立,若成立,则(((log2_lcu_size_minus2+2)<7)?1:0)=1;若不成立,则(((log2_lcu_size_minus2+2)<7)?1:0)=0。
例如,若LCU大小为128*128,则lcu_size=128,log2(128)=7,log2_lcu_size_minus2=5,N=(1<<(0<<1))-0=1。如图7所示,为了减少编码端内存和计算复杂度,便于硬件实现,ISC允许只在当前CTU及其左边相邻的左侧CTU中搜索当前CU中的当前串的参考串,每个小方块代表一个64*64的区域。
再例如,若LCU大小等于64*64,则lcu_size=64,log2(64)=6,log2_lcu_size_minus2=4, N=(1<<(1<<1))–1=3,N的值相当于把128*128的内存分成4部分,其中一部分存储当前LCU的像素,其他部分用于存储同一行中当前LCU左边N-1个LCU的像素。同理适用于更小的LCU。
或者,也可以根据以下公式确定N:
N=(1<<((7-log2_lcu_size)<<1))–(((log2_lcu_size)<7)?1:0)      (4)
上述公式(4)中,log2_lcu_size=log2(lcu_size)。先判断((log2_lcu_size)<7)是否成立,若成立,则(((log2_lcu_size)<7)?1:0)=1;若不成立,则(((log2_lcu_size)<7)?1:0)=0。
3)当串矢量指向的参考串中的像素落在当前CTU左边的相邻的最大编码单元(属于已编码CTU),且最大编码单元的尺寸为128*128时,应符合以下限制条件:
3.1)参考串中的像素右(预定方向)移128像素(预定像素)后的位置所在的64*64区域(目标区域)的左上角尚未编码。
3.2)参考串中的像素右移128像素后的位置所在的64*64区域的左上角坐标不应与当前CU的左上角坐标位置相同。
例如,假设参考串中的像素的亮度分量位置为(xRefTL,yRefTL),xRefTL和yRefTL均为大于或等于0的整数,且(((xRefTL+128)/64)*64,(yRefTL/64)*64)不可得,即用于存储3个64*64大小的已编码CU的已编码像素的内存中无法找到这个已编码的像素,则(((xRefTL+128)/64)*64,(yRefTL/64)*64)不应等于当前CU左上角位置(xCb,yCb)。这里的除法是向下取整的。
这里是考虑到硬件设计流水线的处理能力,相关技术中共识的主流处理单元能够处理64*64大小的图像区域,因此,标准制定中一些编码元素也是以64*64的处理能力为上限,例如变换单元的最大值,等等。
如图7所示,根据当前CU在当前CTU中的位置可以分为4种情况:
如图7左上角的图所示,如果当前CU是当前CTU左上角的64*64的块,左侧CTU的右下角64*64的块、左下角64*64的块、右上角64*64的块可以作为当前CU的参考。
如图7右上角的图所示,如果当前CU是当前CTU右上角的64*64的块,除了当前CTU已编码部分,如果相对于当前CTU的(0,64)位置还未编码,当前CU也能参考左侧CTU的右下角64*64的块、左下角64*64的块。
如图7左下角的图所示,如果当前CU是当前CTU左下角的64*64的块,除了当前CTU已编码部分,当前CU也能参考左侧CTU的右下角64*64的块。
如图7右下角的图所示,如果当前CU是当前CTU右下角的64*64的块,它只能参考当前CTU已编码部分。
4)当串矢量指向的参考串中的像素落在当前CTU左边相邻的最大编码单元(属于已编码CTU),且最大编码单元的尺寸小于或等于64*64时,应符合以下限制条件:
4.1)参考串中的像素右移N*lcu_size像素后的位置所在的CTU区域的左上角尚未编码。
4.2)参考串中的像素右移N*lcu_size像素后的位置所在的CTU区域的左上角不应与当前CU的左上角坐标相同。
例如,假设参考串中的像素的亮度分量位置为(xRefTL,yRefTL),(((xRefTL+lcu_size*N)/lcu_size)*lcu_size,(yRefTL/lcu_size)*lcu_size)不可得,则(((xRefTL+lcu_size*N)/lcu_size)*lcu_size,(yRefTL/lcu_size)*lcu_size)不应等于当前块左上角位置(xCb,yCb)。
上述步骤3)给出了最大编码单元为128*128时的限制,上述步骤4)给出最大编码单元的尺寸小于或等于64*64时的限制,使得在编码过程中能够充分利用128*128的内存。
需要说明的是,上述举例说明中均以内存大小为128*128为例的,编码时LCU的大小可以通过参数设计。但对硬件设计来说,如果已经设计了128*128的存储器,当LCU小于128*128时,应该充分利用该存储器。
5)对于128*128大小的CTU,串矢量指向的参考串中所有的像素只能来自同一个64*64对齐区域,即参考串中所有样本位置,要求局限在同一个64*64对齐的参考像素区域内。以图7右下角的图为例,左侧128*128的CTU分为了4个64*64的CU,参考串中所有的像素不能跨过64*64的CU的边界。
对于非128*128大小的CTU,参考串中所有的像素将来自同一个CTU,即参考串不能跨过CTU的边界。这种限制降低了内存访问次数,编码端进行ISC预测时,需要访问的64*64大小的内存空间的个数只需要1个。
6)串矢量指向的参考串位置不应超出图像、片、条带等独立编码区域的边界。其中,片是AVS3中的概念,片是图像中的矩形区域,包含若干最大编码单元在图像内的部分,片之间不应重叠。条带是HEVC中的概念。
7)串矢量指向的参考串位置中的任何一个参考串样本,不应与未编码区域或当前正在编码的编码块区域(即当前CU)重叠。
8)可选的,串矢量指向的参考串位置中的任何一个参考串样本的外接矩形,不应与未编码区域或当前正在编码的编码块区域重叠。这是一种简化的方式,可通过参考串的外接矩形的四个角点来判断参考串的位置是否满足限制。外接矩形与未编码区域或当前正在编码的编码块区域不重叠,则表示参考串也满足与未编码区域或当前正在编码的编码块区域不重叠的限制。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,这些方法简化了ISC的硬件实现。一方面,限定了参考串位置后,串与串之间不存在依赖,串可以并行的编码。另一方面,还可以限制参考串仅在一个128*128大小的内存区域中使用。
目前的ISC方案还存在其它不利于硬件实现的部分,如仅在编码端限制了串的数量与孤立点(未匹配像素)的数量之和不大于CU像素数量的四分之一。这样会导致串的数量较多,由此导致需要编码的语法元素较多。
下面通过具体的实例来举例说明如何在编码端对串数量及未匹配像素的数量进行限制:
设当前CU中已编码串数量为N1,未匹配像素数量为N2,N1和N2均为大于或等于0的整数,以下的方式可以单独或以任何形式组合使用:
A)限制N1+N2小于或等于第一数量阈值T1。其中,T1的取值范围可以为[1,W*H]中的整数,W为当前CU的宽度,H为当前CU的高度,W和H均为大于或等于1的正整数。在本 实施例中,为了能够避免编码块被分割得太细,导致复杂度增加,限制T1的取值范围小于或等于W*H的四分之一。而且,根据实验结果,T1优选为4。其中,编码端有以下可选的方式:
i.当N1+N2等于T1-1时,若当前CU中剩余像素数量NR(NR为大于或等于0的整数)等于1,则无需编码“sp_is_matched_flag”,即无需编码匹配标志,以用于确定下一个剩余像素的类型,可以直接确认该剩余像素为未匹配像素。
ii.当N1+N2等于T1-1时,若当前CU中剩余像素的数量NR大于1,则无需编码“sp_is_matched_flag”,即无需编码匹配标志,以用于确定下一个剩余像素的类型,可以直接确认剩余像素为一个串,且串长度为NR。
iii.上述步骤ii.的另一种方式,当N1+N2等于T1-1时,若当前CU中剩余像素数量NR大于1,则编码“sp_is_matched_flag”,如果剩余像素为一个串,则编码“sp_is_matched_flag”为第一值例如1(但本公开并不限定于此,可以根据实际情况限定),且串长度为NR。
B)限制N1小于或等于第二数量阈值T2。其中,T2的取值范围可以为[1,W*H]中的整数。有以下可选的方式:
i.如果N1等于T2-1,且确认下一个剩余像素为串的起点,则编码“sp_is_matched_flag”为第二值(这里假设为1,但本公开并不限定于此,可以根据实际情况限定),指示当前CU中的剩余像素为一个串。
ii.如果N1等于T2-1,则直接确认剩余像素为一个串。
iii.如果N1等于T2,则直接确认剩余像素都为未匹配像素。
C)限制N2小于或等于第三数量阈值T3。其中,T3的取值范围可以为[1,W*H]中的整数。有以下可选的方式:
i.如果N2等于T3,不用编码“sp_is_matched_flag”和串长度,直接将当前CU的剩余像素作为一个串。
ii.如果N2等于T3,不用编码“sp_is_matched_flag”,直接确认当前CU的剩余像素的类型都为串,并编码每个串的串长度。
D)限制N1+N2大于或等于第四数量阈值T4。其中,T4的取值范围可以为[1,W*H]中的整数。本实施例中,T4优选为大于2的正整数。这里限制N1+N2大于或等于T4,考虑的是通常串预测中串数量不仅为1,该限制能够节省语法元素的编码。有以下可选的方式:
i.如果N1+N2小于T4,且编码“sp_is_matched_flag”为第三值例如1(但本公开并不限定于此,可以根据实际情况限定),以用于确认下一个剩余像素为串的起点,则可直接判断该串不为最后一个串,而无需编码“sp_last_len_flag”来确认是否为最后一个串,从而提升编码效率。
E)限制N1大于或等于第四数量阈值T4。这里限制N1大于或等于T4,考虑的是通常串预测中串数量不仅为1,该限制能够节省语法元素的编码。有以下可选的方式:
i.如果N1小于T4,可直接判断该串不为最后一个串,而无需编码“sp_last_len_flag”来确认是否为最后一个串。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,串 数量的限制,这些方法简化了ISC的硬件实现:
1)限定了参考串位置后,串与串之间不存在依赖,串可以并行的编码。除此以外,类似IBC,还可以限制参考串仅在一个128*128大小的内存区域中使用。
2)对串数量的限制,可以使串的数量更少,减少内存访问次数。另一方面可以节省一些语法元素的编码,能够提高编码性能。
目前的ISC方案还存在其它不利于硬件实现的部分,如允许大小为4*4的小块使用串预测。由于小块中串的串长度较小,且允许小块能带来的性能较小。
下面通过具体的实例来举例说明如何在编码端对进行串预测的块大小进行限制:
限制在某些大小的块不使用串预测,设当前CU的宽度为W,高度为H,面积S=W*H,有以下可选的方法:
1)如果当前CU的面积S小于或等于预设的第一面积阈值T11,默认该当前CU不使用串预测,不需要编码“sp_flag”即串预测标志。T11的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸*最小尺寸,最大尺寸*最大尺寸)中的整数。
例如,AVS3中,T11可取(4*4,64*64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T11。
2)如果当前CU的宽度W小于或等于预设的第一宽度阈值T21,默认该当前CU不使用串预测,不需要编码“sp_flag”。T21的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸,最大尺寸)中的整数。
例如,AVS3中,T21可取(4,64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T21。
3)如果当前CU的高度H小于或等于预设的第一高度阈值T31,默认该当前CU不使用串预测,不需要编码“sp_flag”。T31的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸,最大尺寸)中的整数。
例如,AVS3中,T31可取(4,64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T31。
4)如果当前CU的面积S大于或等于预设的第二面积阈值T41,默认该当前CU不使用串预测,不需要编码“sp_flag”。T41的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸*最小尺寸,最大尺寸*最大尺寸)中的整数。
例如,AVS3中,T41可取(4*4,64*64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T41。
5)如果当前CU的宽度W大于或等于预设的第二宽度阈值T51,默认该当前CU不使用串预测,不需要编码“sp_flag”。T51的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸,最大尺寸)中的整数。
例如,AVS3中,T51可取(4,64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T51。
6)如果当前CU的高度H大于或等于预设的第二高度阈值T61,默认该当前CU不使用串预测,不需要编码“sp_flag”。T61的取值与编码器允许的块大小有关,取值范围可以为编 码器允许的块大小(最小尺寸,最大尺寸)中的整数。
例如,AVS3中,T61可取(4,64)中的整数。在编码端,可以基于编码性能和复杂度的考虑选择T61。
上述步骤4)-6)中通过限制对大块使用串预测,是考虑到大块使用串预测带来的性能提升较小,该限制一方面可节省语法元素的编码,另一方面可以跳过编码端对该大小的块进行串预测分析。
7)上述方法可组合使用。
以下给出一些具体的例子:
1)宽度等于4且高度等于4的块默认不使用串匹配,不需要编码“sp_flag”。或者
2)宽度等于4或高度等于4的块默认不使用串匹配,不需要编码“sp_flag”。或者
3)面积小于或等于32的块默认不使用串匹配,不需要编码“sp_flag”。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,串数量的限制,块大小的限制,这些方法简化了ISC的硬件实现:
1)限定了参考串位置后,串与串之间不存在依赖,串可以并行的编码。除此以外,类似IBC,还可以限制参考串仅在一个128*128大小的内存区域中使用。
2)对串数量的限制,可以使串的数量更少,减少内存访问次数。另一方面可以节省一些语法元素的编码,能够提高编码性能。
3)对块大小的限制,一方面,可以减少小串的数量,有利于减少内存访问次数。另一方面,编码端可以跳过某些大小的块(例如4*4大小的块)的串预测的分析,降低了复杂度。此外,还可以节省某些块上串预测标志的编码,有利于编码性能的提升。
图8示意性示出了根据本公开的一实施例的视频解码方法的示意图。需要说明的是,本公开实施例中涉及到的方法,可以单独使用,或者组合起来一起使用图8实施例以解码端为例进行介绍。这里所述的解码端可以是部署有视频解码器的视频解压缩设备,该视频解压缩设备包括终端设备或服务器等具有实施视频解码器功能的设备。
如图8所示,本公开实施例提供的方法可以包括以下步骤。
在S810中,获取当前图像的码流,所述码流包括最大编码单元,所述最大编码单元包括所述当前图像对应的当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前解码块,所述当前解码块包括当前串。
在下面的实施例中,均以当前解码块为当前CU为例进行举例说明,但本公开并不限定于此。
在S820中,利用M*W大小的存储空间的第一部分存储所述当前解码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已解码块,M为大于或等于W的正整数。
在S830中,在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行解码。
本公开实施例中,可以设置所述参考串满足以下条件:所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编 码单元的目标侧相邻,N为大于或等于1的正整数;当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未重建;所述参考串中的像素位于所述当前图像的独立解码区域的边界内;所述参考串中的像素不与所述当前图像的未解码块重叠。
在示例性实施例中,根据所述最大编码单元的尺寸确定N的大小。
在示例性实施例中,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串的像素往所述预定方向移动M个像素后对应的
Figure PCTCN2021089583-appb-000008
区域。
在示例性实施例中,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串的像素往所述预定方向移动N*K个像素后对应的最大编码单元。
在示例性实施例中,所述目标区域中的最小坐标不等于所述当前解码块中的最小坐标。本公开实施例中,目标区域可以包含至少一个CU,当目标区域包括多个CU时,当前解码块可以在目标区域的第一个CU。
在示例性实施例中,所述未解码块可以包括所述当前解码块,所述参考串中的像素不与所述当前解码块中的像素重叠。
在示例性实施例中,所述参考串中的像素的横坐标小于所述当前解码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前解码块中的像素的纵坐标。
在示例性实施例中,所述未解码块可以不包括所述当前解码块,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素与所述当前编码块中的未编码像素不重叠。
在一些实施例中,当前图像的未解码块可以包括当前待解码的当前CU和其它尚未解码的其它CU,即不允许参考串与当前CU中的已重建像素重叠,这样可以实现串与串之间无依赖,从而可以降低解码的复杂度,可以实现并行解码。
在示例性实施例中,所述未解码块不包括所述当前解码块,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠。
在另一些实施例中,当前图像的未解码块不包括当前待解码的当前CU,即允许参考串与当前CU中的已重建像素重叠,这样情况称之为串间依赖,根据扫描顺序处于后面的串需要等待前面的串解码完成后,才能进行解码,但其相比于串内依赖而言,串间依赖的复杂度更小;同时,由于越邻近像素与当前CU中的待解码的当前像素相关性越大,使用邻近像素做参考能够取得更好的预测效果,因此,串间依赖的性能高于无依赖的情况。其中,串内依赖是指参考串的位置与当前CU重叠,且与当前待解码的当前串的位置重叠,这种情况下串只能按照扫描顺序逐像素重建。
在示例性实施例中,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素不与所述当前解码块中包含未重建像素的行中的已重建像素重叠。
在一些实施例中,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,但不允许参考串中的像素与当前解码块中包含未重建像素的一行重叠(注意是当前解码块 的一行,不是当前图像的一行)。如上文所述,在硬件实现时,如果串与串之间完全无参考,则可以并行的重建。对于串间参考,由于存在参考无法完全并行,需要等待被参考的串重建完成后,当前串才能开始重建。增加了该限制条件后,参考串与当前串不在当前解码块的同一行,则重建时可以一行一行的重建,而不用等待。
在示例性实施例中,所述当前图像的独立解码区域包括所述当前图像或者所述当前图像中的片、条带。
在示例性实施例中,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
Figure PCTCN2021089583-appb-000009
对齐区域。
在示例性实施例中,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
在示例性实施例中,所述参考串的外接矩形不与所述当前图像的未解码块重叠。
下面通过具体的实例对上述实施例提供的方案,举例说明在解码端如何根据限制的参考串的位置进行串预测的解码:
为了便于硬件实现,在解码端,ISC仅使用1个CTU大小的内存,例如假设1个CTU的大小为128*128,则ISC被限制为仅使用一个128*128大小的内存。该大小为128*128的内存中,1个64*64大小的空间用于存储当前待重建的、大小为64*64的当前CU中的未重建像素,还有3个64*64大小的空间可用于存储当前图像的已解码区域的3个已解码CU中的已重建像素。因此,ISC仅能在这3个64*64大小的已解码CU中搜索当前CU的当前串的参考串,应满足以下条件:
1)串矢量指向的参考串的像素不应包含当前CU的像素。
例如,假设参考串中的像素的坐标为(xRef_i,yRef_i),xRef_i和yRef_i均为大于或等于0的整数,其中i=0,1,2,…,L-1,L为串长度,L为大于1的正整数,当前CU的左上角位置为(xCb,yCb),xCb和yCb均为大于或等于0的整数,则参考串的像素的坐标应满足条件(xRef_i<xCb||yRef_i<yCb)为真,其中,“||”是“逻辑或”,即在该所参考的坐标系中,参考串位于当前CU的左侧或者上面。
需要说明的是,上述参考串的像素的坐标应满足条件(xRef_i<xCb||yRef_i<yCb)为真,是在解码顺序为从左到右、从上到下的情况下限制的,若解码器/标准按其他顺序进行解码,则可以相应的调整该条件,本公开对此不做限定。类似的,在下文中,均是以解码顺序为从左到右、从上到下的情况下进行举例说明的,因此,左上角为最小坐标或者最小坐标对应的像素,但本公开实施例提供的方案也可以适用于其它解码顺序,对此不做限定。
2)串矢量指向的参考串限制在当前CTU和当前CTU的左边(这里假设所参考的坐标系中,目标侧为左边)的N个CTU(属于已编码CTU)的范围内,N的大小由最大编码单元的尺寸决定,例如可以根据上述公式(3)或(4)确定N。
3)当串矢量指向的参考串中的像素落在当前CTU左边的相邻的最大编码单元(属于已编码CTU),且最大编码单元的尺寸为128*128时,应符合以下限制条件:
3.1)参考串中的像素右移128像素后的位置所在的64*64区域的左上角尚未重建。
3.2)参考串中的像素右移128像素后的位置所在的64*64区域的左上角坐标不应与当前CU的左上角坐标位置相同。
例如,假设参考串中的像素的亮度分量位置为(xRefTL,yRefTL),且(((xRefTL+128)/64)*64,(yRefTL/64)*64)不可得,即用于存储3个64*64大小的已重建CU的已重建像素的内存中无法找到这个已重建像素,则(((xRefTL+128)/64)*64,(yRefTL/64)*64)不应等于当前CU左上角位置(xCb,yCb)。这里的除法是向下取整的。
4)当串矢量指向的参考串中的像素落在当前CTU左边相邻的最大编码单元(属于已编码CTU),且最大编码单元的尺寸小于或等于64*64时,应符合以下限制条件:
4.1)参考串中的像素右移N*lcu_size像素后的位置所在的CTU区域的左上角尚未重建。
4.2)参考串中的像素右移N*lcu_size像素后的位置所在的CTU区域的左上角不应与当前CU的左上角坐标相同。
即:假设参考串中的像素的亮度分量位置为(xRefTL,yRefTL),(((xRefTL+lcu_size*N)/lcu_size)*lcu_size,(yRefTL/lcu_size)*lcu_size)不可得;(((xRefTL+lcu_size*N)/lcu_size)*lcu_size,(yRefTL/lcu_size)*lcu_size)不应等于当前块左上角位置(xCb,yCb)。
上述步骤3)给出了最大编码单元为128*128时的限制,上述步骤4)给出最大编码单元的尺寸小于或等于64*64时的限制,使得在解码过程中能够充分利用128*128的内存。
5)对于128*128大小的CTU,串矢量指向的参考串中所有的像素只能来自同一个64*64对齐区域。对于非128*128大小的CTU,参考串中所有的像素将来自同一个CTU,即参考串不能跨过CTU的边界。这种限制降低了内存访问次数,解码端进行ISC预测时,需要访问的64*64大小的内存空间的个数只需要1个。
6)串矢量指向的参考串位置不应超出图像、片、条带等独立解码区域的边界。
7)串矢量指向的参考串位置中的任何一个参考串样本,不应与未重建区域或当前正在重建的编码块区域重叠。
8)可选的,串矢量指向的参考串位置中的任何一个参考串样本的外接矩形,不应与未重建区域或当前正在重建的编码块区域重叠。这是一种简化的方式,可通过参考串的外接矩形的四个角点来判断参考串的位置是否满足限制。外接矩形与未重建区域或当前正在重建的编码块区域不重叠,则表示参考串也满足与未重建区域或当前正在重建的编码块区域不重叠的限制。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,这些方法简化了ISC的硬件实现。一方面,限定了参考串位置后,串与串之间不存在依赖,串可以并行的重建。除此以外,类似IBC,还可以限制参考串仅在一个128*128大小的内存区域中使用。
下面通过具体的实例来举例说明如何在解码端对串数量及未匹配像素的数量进行限制:
最大串数量限制:设当前块中已解码串数量为N1,未匹配像素数量为N2,N1和N2均为大于或等于0的整数,以下的方式可以单独或以任何形式组合使用:
A)限制N1+N2小于或等于第一数量阈值T1。其中,T1的取值范围可以为[1,W*H]中 的整数,W为当前CU的宽度,H为当前CU的高度,W和H均为大于或等于1的正整数。在本实施例中,为了能够避免编码块被分割得太细,导致复杂度增加,限制T1的取值范围小于或等于W*H的四分之一。而且,根据实验结果,T1优选为4。其中,解码端有以下可选的方式:
i.当N1+N2等于T1-1时,若当前CU中剩余像素的数量NR(NR为大于或等于0的整数)等于1,则无需解码“sp_is_matched_flag”,即无需解码匹配标志,以用于确定下一个剩余像素的类型,可以直接确认该剩余像素为未匹配像素。
ii.当N1+N2等于T1-1时,若当前CU中剩余像素的数量NR大于1,则无需解码“sp_is_matched_flag”,即无需解码匹配标志,以用于确定下一个剩余像素的类型,可以直接确认剩余像素为一个串,且串长度为NR。
iii.上述步骤ii.的另一种方式,当N1+N2等于T1-1时,若当前CU中剩余像素数量NR大于1,则解码“sp_is_matched_flag”,如果解码获得“sp_is_matched_flag”为第一值例如1(但本公开并不限定于此,可以根据实际情况限定),则直接确认当前CU的剩余像素为一个串,串长度为NR。
B)限制N1小于或等于第二数量阈值T2。其中,T2的取值范围可以为[1,W*H]中的整数。有以下可选的方式:
i.如果N1等于T2-1,且解码“sp_is_matched_flag”为第二值例如1(但本公开并不限定于此,可以根据实际情况限定),则确认下一个剩余像素为串的起点,直接确认当前CU的剩余像素为一个串。
ii.如果N1等于T2-1,则不需解码“sp_is_matched_flag”,直接确认当前CU中的剩余像素为一个串。
iii.如果N1等于T2,则不需解码“sp_is_matched_flag”,直接确认当前CU中的剩余像素都为未匹配像素。
C)限制N2小于或等于第三数量阈值T3。其中,T3的取值范围可以为[1,W*H]中的整数。有以下可选的方式:
i.如果N2等于T3,不用解码“sp_is_matched_flag”和串长度,直接将当前CU的剩余像素作为一个串。
ii.如果N2等于T3,不用解码“sp_is_matched_flag”,直接确认当前CU的剩余像素的类型都为串,并解码每个串的串长度。
D)限制N1+N2大于或等于第四数量阈值T4。其中,T4的取值范围可以为[1,W*H]中的整数。本实施例中,T4优选为大于2的正整数。这里限制N1+N2大于或等于T4,考虑的是通常串预测中串数量不仅为1,该限制能够节省语法元素的解码。解码端有以下可选的方式:
i.如果N1+N2小于T4,且通过解码“sp_is_matched_flag”确认下一个剩余像素为串的起点,例如若“sp_is_matched_flag”解码为第三值例如1(但本公开并不限定于此,可以根据实际情况限定),此时,则可直接判断该串不为最后一个串,因此无需解码“sp_last_len_flag”来确认是否为最后一个串,从而提升解码效率。
E)限制N1大于或等于第四数量阈值T4。这里限制N1大于或等于T4,考虑的是通常串预测中串数量不仅为1,该限制能够节省语法元素的解码。有以下可选的方式:
i.如果N1小于T4,可直接判断该串不为最后一个串,而无需解码“sp_last_len_flag”来确认是否为最后一个串。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,串数量的限制,这些方法简化了ISC的硬件实现:
1)限定了参考串位置后,串与串之间不存在依赖,串可以并行的重建。除此以外,类似IBC,还可以限制参考串仅在一个128*128大小的内存区域中使用。
2)对串数量的限制,可以使串的数量更少,减少内存访问次数。另一方面可以节省一些语法元素的解码,能够提高解码性能。
下面通过具体的实例来举例说明如何在解码端对进行串预测的块大小进行限制:
限制在某些大小的块不使用串预测,设当前CU的宽度为W,高度为H,面积S=W*H,有以下可选的方法:
1)如果当前CU的面积S小于或等于预设的第一面积阈值T11,默认该当前CU不使用串预测,不需要解码“sp_flag”即串预测标志。T11的取值与编码器允许的块大小有关,取值范围可以为编码器允许的块大小(最小尺寸*最小尺寸,最大尺寸*最大尺寸)中的整数。
例如,AVS3中,T11可取(4*4,64*64)中的整数。可以基于编码性能和复杂度的考虑选择T11。
2)如果当前CU的宽度W小于或等于预设的第一宽度阈值T21,默认该当前CU不使用串预测,不需要解码“sp_flag”。
3)如果当前CU的高度H小于或等于预设的第一高度阈值T31,默认该当前CU不使用串预测,不需要解码“sp_flag”。
4)如果当前CU的面积S大于或等于预设的第二面积阈值T41,默认该当前CU不使用串预测,不需要解码“sp_flag”。
5)如果当前CU的宽度W大于或等于预设的第二宽度阈值T51,默认该当前CU不使用串预测,不需要解码“sp_flag”。
6)如果当前CU的高度H大于或等于预设的第二高度阈值T61,默认该当前CU不使用串预测,不需要解码“sp_flag”。
上述步骤4)-6)中通过限制对大块使用串预测,是考虑到大块使用串预测带来的性能提升较小,该限制一方面可节省语法元素的解码,另一方面可以跳过解码端对该大小的块进行串预测分析。
7)上述方法可组合使用。
以下给出一些具体的例子:
1)宽度等于4且高度等于4的块默认不使用串匹配,不需要解码“sp_flag”。或者
2)宽度等于4或高度等于4的块默认不使用串匹配,不需要解码“sp_flag”。或者
3)面积小于或等于32的块默认不使用串匹配,不需要解码“sp_flag”。
本公开实施例提出的方案,对ISC方案进行了一系列简化,包括参考串位置的限制,串 数量的限制,块大小的限制,这些方法简化了ISC的硬件实现:
1)限定了参考串位置后,串与串之间不存在依赖,串可以并行的重建。除此以外,类似IBC,还可以限制参考串仅在一个128*128大小的内存区域中使用。
2)对串数量的限制,可以使串的数量更少,减少内存访问次数。另一方面可以节省一些语法元素的解码,能够提高解码性能。
3)对块大小的限制,一方面,可以减少小串的数量,有利于减少内存访问次数。另一方面,编码端可以跳过某些大小的块(例如4*4大小的块)的串预测的分析,降低了复杂度。此外,还可以节省某些块上串预测标志的解码,有利于解码性能的提升。
进一步的,本公开实施例还提供一种视频编码装置,可以应用于编码端/编码器,所述装置可以包括:当前图像获取单元,可以用于获取当前图像,所述当前图像包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串;存储空间确定单元,可以用于利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;参考串搜索单元,可以用于在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行编码。
在示例性实施例中,可以设置所述参考串满足以下条件:所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编码单元的目标侧相邻,N为大于或等于1的正整数;当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未编码;所述参考串中的像素位于所述当前图像的独立编码区域的边界内;所述参考串中的像素不与所述当前图像的未编码块重叠。
在示例性实施例中,根据所述最大编码单元的尺寸确定N的大小。
在示例性实施例中,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串的像素往所述预定方向移动M个像素后对应的
Figure PCTCN2021089583-appb-000010
区域。
在示例性实施例中,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串的像素往所述预定方向移动N*K个像素后对应的最大编码单元。
在示例性实施例中,所述目标区域中的最小坐标不等于所述当前编码块中的最小坐标。
在示例性实施例中,所述未编码块包括所述当前编码块,所述参考串中的像素不与所述当前编码块中的像素重叠。
在示例性实施例中,所述参考串中的像素的横坐标小于所述当前编码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前编码块中的像素的纵坐标。
在示例性实施例中,所述未编码块不包括所述当前编码块,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素与所述当前编码块中的未 编码像素不重叠。
在示例性实施例中,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素不与所述当前编码块中包含未编码像素的行中的已编码像素重叠。
在示例性实施例中,所述当前图像的独立编码区域包括所述当前图像或者所述当前图像中的片、条带。
在示例性实施例中,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
Figure PCTCN2021089583-appb-000011
对齐区域。
在示例性实施例中,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
在示例性实施例中,所述参考串的外接矩形不与所述当前图像的未编码块重叠。
本公开实施例提供的视频编码装置中的各个单元的具体实现可以参照上述视频编码方法中的内容,在此不再赘述。
进一步的,本公开实施例还提供一种视频解码装置,所述装置可应用于解码端/解码器,所述装置可以包括:码流获取单元,可以用于获取当前图像的码流,所述码流包括最大编码单元,所述最大编码单元包括所述当前图像对应的当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前解码块,所述当前解码块包括当前串;存储空间存储单元,可以用于利用M*W大小的存储空间的第一部分存储所述当前解码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已解码块,M为大于或等于W的正整数;参考串确定单元,可以用于在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行解码;。
在示例性实施例中,可以设置所述参考串满足以下条件:所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编码单元的目标侧相邻,N为大于或等于1的正整数;当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未重建;所述参考串中的像素位于所述当前图像的独立解码区域的边界内;所述参考串中的像素不与所述当前图像的未解码块重叠。
在示例性实施例中,根据所述最大编码单元的尺寸确定N的大小。
在示例性实施例中,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串的像素往所述预定方向移动M个像素后对应的
Figure PCTCN2021089583-appb-000012
区域。
在示例性实施例中,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串的像素往所述预定方向移动N*K个像素后对应的最大编码单元。
在示例性实施例中,所述目标区域中的最小坐标不等于所述当前解码块中的最小坐标。
在示例性实施例中,所述未解码块包括所述当前解码块,所述参考串中的像素不与所 述当前解码块中的像素重叠。
在示例性实施例中,所述参考串中的像素的横坐标小于所述当前解码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前解码块中的像素的纵坐标。
在示例性实施例中,所述未解码块不包括所述当前解码块,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素与所述当前解码块中的未重建像素不重叠。
在示例性实施例中,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素不与所述当前解码块中包含未重建像素的行中的已重建像素重叠。
在示例性实施例中,所述当前图像的独立解码区域包括所述当前图像或者所述当前图像中的片、条带。
在示例性实施例中,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
Figure PCTCN2021089583-appb-000013
对齐区域。
在示例性实施例中,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
在示例性实施例中,所述参考串的外接矩形不与所述当前图像的未解码块重叠。
本公开实施例提供的视频解码装置中的各个单元的具体实现可以参照上述视频编码方法和视频解码方法中的内容,在此不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之,上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。
本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如上述实施例中所述的视频编码方法。
本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如上述实施例中所述的视频解码方法。
本公开实施例提供了一种电子设备,包括:至少一个处理器;存储装置,配置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,使得所述至少一个处理器实现如上述实施例中所述的视频编码方法。
本公开实施例提供了一种电子设备,包括:至少一个处理器;存储装置,配置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,使得所述至少一个处理器实现如上述实施例中所述的视频解码方法。
图9示出了适于用来实现本公开实施例的电子设备的结构示意图。
需要说明的是,图9示出的电子设备900仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图9所示,电子设备900包括中央处理单元(CPU,Central Processing Unit)901,其可以根据存储在只读存储器(ROM,Read-Only Memory)902中的程序或者从储存部分908 加载到随机访问存储器(RAM,Random Access Memory)903中的程序而执行各种适当的动作和处理。在RAM 903中,还存储有系统操作所需的各种程序和数据。CPU 901、ROM 902以及RAM 903通过总线904彼此相连。输入/输出(input/output,I/O)接口905也连接至总线904。
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT,Cathode Ray Tube)、液晶显示器(LCD,Liquid Crystal Display)等以及扬声器等的输出部分907;包括硬盘等的储存部分908;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入储存部分908。
特别地,根据本公开的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分909从网络上被下载和安装,和/或从可拆卸介质911被安装。在该计算机程序被中央处理单元(CPU)901执行时,执行本申请的方法和/或装置中限定的各种功能。
需要说明的是,本公开所示的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有至少一个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM(Erasable Programmable Read Only Memory,可擦除可编程只读存储器)或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读存储介质,该计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF(Radio Frequency,射频)等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本公开各种实施例的方法、装置和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含至少一个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中 所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如下述实施例中所述的方法。例如,所述的电子设备可以实现如图6或图8所示的各个步骤。
本申请实施例还提供了一种包括指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述实施例提供的方法。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本公开实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (29)

  1. 一种视频编码方法,所述方法由部署有视频编码器的视频压缩设备执行,所述方法包括:
    获取当前图像,所述当前图像包括最大编码单元,所述最大编码单元包括当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前编码块,所述当前编码块包括当前串;
    利用M*W大小的存储空间的第一部分存储所述当前编码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已编码块,M为大于或等于W的正整数;
    在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行编码。
  2. 根据权利要求1所述的视频编码方法,所述参考串满足以下条件:
    所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编码单元的目标侧相邻,N为大于或等于1的正整数;
    当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未编码;
    所述参考串中的像素位于所述当前图像的独立编码区域的边界内;
    所述参考串中的像素不与所述当前图像的未编码块重叠。
  3. 根据权利要求2所述的视频编码方法,所述方法还包括:
    根据所述最大编码单元的尺寸确定N的大小。
  4. 根据权利要求2所述的视频编码方法,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串中的像素往所述预定方向移动M个像素后对应的
    Figure PCTCN2021089583-appb-100001
    区域。
  5. 根据权利要求2所述的视频编码方法,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串中的像素往所述预定方向移动N*K个像素后对应的最大编码单元。
  6. 根据权利要求2所述的视频编码方法,所述目标区域中的最小坐标不等于所述当前编码块中的最小坐标。
  7. 根据权利要求2所述的视频编码方法,所述未编码块包括所述当前编码块,所述参考串中的像素不与所述当前编码块中的像素重叠。
  8. 根据权利要求7所述的视频编码方法,所述参考串中的像素的横坐标小于所述当前编码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前编码块中的像素的纵坐标。
  9. 根据权利要求2所述的视频编码方法,所述未编码块不包括所述当前编码块,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素与所述当前编码块中的未编码像素不重叠。
  10. 根据权利要求2所述的视频编码方法,允许所述参考串中的像素与所述当前编码块中的已编码像素重叠,且所述参考串中的像素不与所述当前编码块中包含未编码像素的行中的已编码像素重叠。
  11. 根据权利要求1所述的视频编码方法,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
    Figure PCTCN2021089583-appb-100002
    对齐区域。
  12. 根据权利要求1所述的视频编码方法,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
  13. 根据权利要求1~12任一项所述的视频编码方法,所述参考串的外接矩形不与所述当前图像的未编码块重叠。
  14. 一种视频解码方法,所述方法由部署有视频解码器的视频解压缩设备执行,所述方法包括:
    获取当前图像的码流,所述码流包括最大编码单元,所述最大编码单元包括所述当前图像对应的当前最大编码单元和已编码最大编码单元,所述当前最大编码单元包括当前解码块,所述当前解码块包括当前串;
    利用M*W大小的存储空间的第一部分存储所述当前解码块中的像素,并利用所述存储空间的第二部分存储所述已编码最大编码单元及所述当前最大编码单元中的至少部分已解码块,M为大于或等于W的正整数;
    在所述存储空间的第二部分内搜索所述当前串的参考串,以根据所述参考串获得所述当前串的预测值,对所述当前串进行解码。
  15. 根据权利要求14所述的视频解码方法,所述参考串满足以下条件:
    所述参考串在所述当前最大编码单元和N个已编码最大编码单元范围内,所述N个已编码最大编码单元与所述当前最大编码单元的目标侧相邻,N为大于或等于1的正整数;
    当所述参考串中的像素处于所述N个已编码最大编码单元时,则所述参考串的像素往预定方向移动预定像素后对应的目标区域的像素尚未重建;
    所述参考串中的像素位于所述当前图像的独立解码区域的边界内;
    所述参考串中的像素不与所述当前图像的未解码块重叠。
  16. 根据权利要求15所述的视频解码方法,所述方法还包括:
    根据所述最大编码单元的尺寸确定N的大小。
  17. 根据权利要求15所述的视频解码方法,若所述最大编码单元的大小为M*M,则所述预定像素的数量为M,所述目标区域为所述参考串中的像素往所述预定方向移动M个像素后对应的
    Figure PCTCN2021089583-appb-100003
    区域。
  18. 根据权利要求15所述的视频解码方法,若所述最大编码单元的大小为K*K,K为小于M的正整数,则所述预定像素的数量为N*K,所述目标区域为所述参考串中的像素往所述预定方向移动N*K个像素后对应的最大编码单元。
  19. 根据权利要求15所述的视频解码方法,所述目标区域中的最小坐标不等于所述当 前解码块中的最小坐标。
  20. 根据权利要求15所述的视频解码方法,所述未解码块包括所述当前解码块,所述参考串中的像素不与所述当前解码块中的像素重叠。
  21. 根据权利要求20所述的视频解码方法,所述参考串中的像素的横坐标小于所述当前解码块中的像素的横坐标;或者,所述参考串中的像素的纵坐标小于所述当前解码块中的像素的纵坐标。
  22. 根据权利要求15所述的视频解码方法,所述未解码块不包括所述当前解码块,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素与所述当前解码块中的未重建像素不重叠。
  23. 根据权利要求15所述的视频解码方法,允许所述参考串中的像素与所述当前解码块中的已重建像素重叠,且所述参考串中的像素不与所述当前解码块中包含未重建像素的行中的已重建像素重叠。
  24. 根据权利要求15所述的视频解码方法,所述当前图像的独立解码区域包括所述当前图像或者所述当前图像中的片、条带。
  25. 根据权利要求15所述的视频解码方法,若所述最大编码单元的大小为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个
    Figure PCTCN2021089583-appb-100004
    对齐区域。
  26. 根据权利要求15所述的视频解码方法,若所述最大编码单元的大小不为M*M,M为大于或等于1的正整数,则所述参考串中的像素来自同一个最大编码单元。
  27. 根据权利要求15所述的视频解码方法,所述参考串的外接矩形不与所述当前图像的未解码块重叠。
  28. 一种电子设备,包括:
    至少一个处理器;
    存储装置,配置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行时,使得所述至少一个处理器实现如权利要求1至13中任一项所述的视频编码方法或者如权利要求14至27中任一项所述的视频解码方法。
  29. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至13中任一项所述的视频编码方法或者如权利要求14至27中任一项所述的视频解码方法。
PCT/CN2021/089583 2020-06-02 2021-04-25 视频编码方法、视频解码方法及相关设备 WO2021244182A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21817776.4A EP4030756A4 (en) 2020-06-02 2021-04-25 VIDEO ENCODING METHOD, VIDEO DECODING METHOD AND RELATED DEVICE
US17/706,951 US12063353B2 (en) 2020-06-02 2022-03-29 Video encoding method, video decoding method, and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010487809.7A CN112532988B (zh) 2020-06-02 2020-06-02 视频编码方法、视频解码方法及相关设备
CN202010487809.7 2020-06-02

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/706,951 Continuation US12063353B2 (en) 2020-06-02 2022-03-29 Video encoding method, video decoding method, and related device

Publications (1)

Publication Number Publication Date
WO2021244182A1 true WO2021244182A1 (zh) 2021-12-09

Family

ID=74978656

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089583 WO2021244182A1 (zh) 2020-06-02 2021-04-25 视频编码方法、视频解码方法及相关设备

Country Status (4)

Country Link
US (1) US12063353B2 (zh)
EP (1) EP4030756A4 (zh)
CN (2) CN112532988B (zh)
WO (1) WO2021244182A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112532988B (zh) * 2020-06-02 2022-05-06 腾讯科技(深圳)有限公司 视频编码方法、视频解码方法及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104853211A (zh) * 2014-02-16 2015-08-19 上海天荷电子信息有限公司 使用多种形式的参考像素存储空间的图像压缩方法和装置
CN104853209A (zh) * 2014-02-16 2015-08-19 同济大学 图像编码、解码方法及装置
US20180139459A1 (en) * 2016-11-16 2018-05-17 Citrix Systems, Inc. Multi-pixel caching scheme for lossless encoding
US20190325083A1 (en) * 2018-04-20 2019-10-24 International Business Machines Corporation Rapid partial substring matching
CN112532988A (zh) * 2020-06-02 2021-03-19 腾讯科技(深圳)有限公司 视频编码方法、视频解码方法及相关设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3761451B2 (ja) * 2001-11-19 2006-03-29 株式会社ジャストシステム 記号列を格納するデータ構造、記号列検索装置、記号列検索方法、プログラム、ならびに、情報記録媒体
CN102291584B (zh) * 2011-09-01 2013-04-17 西安电子科技大学 帧内图像亮度块预测装置及方法
KR20130085389A (ko) * 2012-01-19 2013-07-29 삼성전자주식회사 서브영역별로 엔트로피 부호화의 병렬 처리가 가능한 비디오 부호화 방법 및 장치, 서브영역별로 엔트로피 복호화의 병렬 처리가 가능한 비디오 복호화 방법 및 장치
JP6150277B2 (ja) * 2013-01-07 2017-06-21 国立研究開発法人情報通信研究機構 立体映像符号化装置、立体映像復号化装置、立体映像符号化方法、立体映像復号化方法、立体映像符号化プログラム及び立体映像復号化プログラム
WO2014122708A1 (ja) * 2013-02-05 2014-08-14 日本電気株式会社 画面符号化装置、画面復号装置、画面符号化伝送システム
CN112383780B (zh) * 2013-08-16 2023-05-02 上海天荷电子信息有限公司 点匹配参考集和索引来回扫描串匹配的编解码方法和装置
CN110691250B (zh) * 2013-10-12 2022-04-08 广州中广国科测控技术有限公司 结合块匹配和串匹配的图像压缩装置
EP3917146A1 (en) * 2014-09-30 2021-12-01 Microsoft Technology Licensing, LLC Rules for intra-picture prediction modes when wavefront parallel processing is enabled
WO2016052977A1 (ko) * 2014-10-01 2016-04-07 주식회사 케이티 비디오 신호 처리 방법 및 장치
CN105704491B (zh) * 2014-11-28 2020-03-13 同济大学 图像编码方法、解码方法、编码装置和解码装置
WO2017137006A1 (zh) * 2016-02-10 2017-08-17 同济大学 数据压缩的编码、解码方法及装置
US11595694B2 (en) * 2020-04-01 2023-02-28 Tencent America LLC Method and apparatus for video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104853211A (zh) * 2014-02-16 2015-08-19 上海天荷电子信息有限公司 使用多种形式的参考像素存储空间的图像压缩方法和装置
CN104853209A (zh) * 2014-02-16 2015-08-19 同济大学 图像编码、解码方法及装置
US20180139459A1 (en) * 2016-11-16 2018-05-17 Citrix Systems, Inc. Multi-pixel caching scheme for lossless encoding
US20190325083A1 (en) * 2018-04-20 2019-10-24 International Business Machines Corporation Rapid partial substring matching
CN112532988A (zh) * 2020-06-02 2021-03-19 腾讯科技(深圳)有限公司 视频编码方法、视频解码方法及相关设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4030756A4

Also Published As

Publication number Publication date
US20220224888A1 (en) 2022-07-14
US12063353B2 (en) 2024-08-13
CN112532988B (zh) 2022-05-06
EP4030756A4 (en) 2022-12-07
CN114827618B (zh) 2023-03-14
EP4030756A1 (en) 2022-07-20
CN112532988A (zh) 2021-03-19
CN114827618A (zh) 2022-07-29

Similar Documents

Publication Publication Date Title
US11330280B2 (en) Frame-level super-resolution-based video coding
TWI843809B (zh) 用於視訊寫碼中具有運動向量差之合併模式之信令傳輸
KR102167350B1 (ko) 동화상 부호화 장치 및 그 동작 방법
WO2021239160A1 (zh) 视频解码方法、视频编码方法、电子设备及存储介质
CN114173114B (zh) 图像预测方法、装置、设备、系统及存储介质
CN114902661A (zh) 用于跨分量线性模型预测的滤波方法和装置
TWI748522B (zh) 視訊編碼器、視訊解碼器及相應方法
CN113196783B (zh) 去块效应滤波自适应的编码器、解码器及对应方法
WO2021244182A1 (zh) 视频编码方法、视频解码方法及相关设备
CN114913249A (zh) 编码、解码方法和相关设备
CN112543333B (zh) 视频解码方法、视频编码方法及相关设备
CN112565767B (zh) 视频解码方法、视频编码方法及相关设备
CN112135137A (zh) 视频编码器、视频解码器及相应方法
CN112532989B (zh) 视频编码方法、视频解码方法及相关设备
CN111669583A (zh) 图像预测方法、装置、设备、系统及存储介质
CN111327894A (zh) 块划分方法、视频编解码方法、视频编解码器
RU2787885C2 (ru) Способ и оборудование взаимного прогнозирования, поток битов и энергонезависимый носитель хранения
CN113330741A (zh) 从帧内子划分译码模式工具限制子分区的尺寸的编码器、解码器、及对应方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21817776

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021817776

Country of ref document: EP

Effective date: 20220414

NENP Non-entry into the national phase

Ref country code: DE