CN113347437B

CN113347437B - Encoding method, encoder, decoder and storage medium based on string prediction

Info

Publication number: CN113347437B
Application number: CN202110405425.0A
Authority: CN
Inventors: 江东; 方诚; 张雪; 林聚财; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-04-15
Filing date: 2021-04-15
Publication date: 2022-09-06
Anticipated expiration: 2041-04-15
Also published as: CN113347437A

Abstract

The application discloses a coding method, a coder, a decoder and a storage medium based on string prediction, wherein the method comprises the following steps: dividing pixels of each coding unit in an image to be coded to obtain a plurality of current strings; after the reference pixel of the initial pixel in the current string is found, the rest reference pixels are found along a plurality of preset directions to obtain a corresponding matching string; selecting one matching string from all matching strings as the best matching string; and/or after finding out the matching string matched with the current string, adjusting the position of the pixel in the matching string to generate a new matching string; and selecting the best matching string from the new matching string and the matching string. By the mode, the compression rate of the encoding can be improved.

Description

Encoding method, encoder, decoder and storage medium based on string prediction

Technical Field

The present application relates to the field of coding technologies, and in particular, to a coding method, an encoder, a decoder, and a storage medium based on string prediction.

Background

The video image data volume is large, and usually the video pixel data needs to be compressed, the compressed data is a video code stream, and the video code stream is transmitted to a user end through a wired network or a wireless network and then decoded for viewing. The whole video coding flow comprises the processes of prediction, transformation, quantization, coding and the like. For the String Prediction (SP) technique, when the scanning mode of the pixel String is horizontal bow, the matching String is also guaranteed to be the scanning mode of the horizontal bow, which may result in that some pixel strings may not match the optimal matching String, and further improvement of the compression rate in the SP mode is affected.

Disclosure of Invention

The present application provides an encoding method, an encoder, a decoder, and a storage medium based on string prediction, which can improve the compression rate of encoding.

In order to solve the technical problem, the technical scheme adopted by the application is as follows: there is provided a string prediction-based encoding method, the method including: dividing pixels of each coding unit in an image to be coded to obtain a plurality of current strings; after the reference pixel of the initial pixel in the current string is found, the rest reference pixels are found along a plurality of preset directions to obtain a corresponding matching string; selecting one matching string from all matching strings as a best matching string; and/or after finding out the matching string matched with the current string, adjusting the position of the pixel in the matching string to generate a new matching string; and selecting the best matching string from the new matching string and the matching string.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an encoder comprising a memory and a processor connected to each other, wherein the memory is used for storing a computer program, and the computer program, when executed by the processor, is used for implementing the string prediction based encoding method of the above technical solution.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a decoder, comprising a processor, configured to decode an encoded data stream output by an encoder to obtain decoded image data, wherein the encoder is the encoder in the above technical solution.

In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a computer-readable storage medium for storing a computer program for implementing the string prediction based encoding method of the above technical solution when the computer program is executed by a processor.

Through the scheme, the beneficial effects of the application are that: firstly, dividing an acquired image to be coded into blocks to obtain a plurality of coding units; dividing pixels in each coding unit to obtain a plurality of current strings; after the reference pixel of the initial pixel in the current string is found, the residual reference pixels corresponding to the residual pixels in the current string are found along a plurality of preset directions to generate corresponding matching strings, and then one matching string is selected from all the matching strings to serve as the best matching string matched with the current string, so that multi-directional string prediction is realized, and the reference pixel which is more matched with the current string can be found; and/or after the matching string is found, adjusting the position of the pixel in the matching string to generate a new matching string, and then selecting the best matching string from the matching string and all new matching strings to find the reference pixel which is more matched with the current string; the two schemes of multi-direction string prediction and pixel position adjustment inside the matching string are both beneficial to finding out a better matching string, and the compression rate of encoding can be improved, so that the encoding efficiency and the encoding performance are improved, and the two schemes can be combined for use, so that the encoding performance can be further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. Wherein:

FIG. 1 is a schematic illustration of a normal SP prediction mode;

FIG. 2 is a schematic diagram of a CU level motion search;

FIG. 3 is a schematic diagram of the pixel scanning pattern in the current string;

FIG. 4 is a schematic diagram of pixel level matching;

FIG. 5 is a flowchart illustrating an embodiment of a string prediction based encoding method provided herein;

FIG. 6 is a flow chart illustrating another embodiment of a string prediction based encoding method provided herein;

FIG. 7(a) is a schematic diagram of matching between a current string and a matching string when the matching strings provided by the present application are arranged in a right-to-left order;

FIG. 7(b) is a schematic diagram illustrating matching between a current string and a matching string when the matching strings are arranged in a bottom-up order according to the present application;

FIG. 7(c) is a schematic diagram of the matching between the current string and the matching string when the matching strings provided by the present application are arranged in the order from top to bottom;

FIG. 8(a) is a schematic diagram of the matching of the current string and the matching string when the matching strings provided by the present application are arranged in the reverse order of the scanning manner of the current string;

FIG. 8(b) is a schematic diagram illustrating the matching between the current string and the matching string when the matching strings provided by the present application are arranged in the vertical direction;

FIG. 8(c) is another schematic diagram of the matching of the current string and the matching string when the matching strings provided by the present application are arranged in the vertical direction;

fig. 9(a) is a schematic diagram of matching between a current string and a matching string when the matching strings provided by the present application are arranged in a direction inclined to the upper left;

FIG. 9(b) is a schematic diagram illustrating the matching between the current string and the matching string when the matching strings provided by the present application are arranged in a direction inclined to the right and downward;

FIG. 9(c) is a schematic diagram illustrating the matching between the current string and the matching string when the matching strings provided by the present application are arranged in a direction inclined to the left and downward;

FIG. 9(d) is a schematic diagram illustrating the matching between the current string and the matching string when the matching strings provided by the present application are arranged in a direction inclined to the upper right;

FIG. 10 is a flow chart illustrating a method for encoding based on string prediction according to another embodiment of the present application;

fig. 11(a) is a schematic diagram of flipping a matching string in a first flipping manner provided in the present application;

fig. 11(b) is another schematic diagram of flipping the matching string in the first flipping manner provided in the present application;

FIG. 11(c) is a schematic diagram of the present application illustrating a matching string being flipped in a first flipping manner;

FIG. 12 is a schematic diagram of a matching string flipped in a second flipping manner according to the present application;

FIG. 13 is a schematic diagram of a matching string flipped in a third flipping manner according to the present application;

FIG. 14 is a schematic diagram of the matching strings provided herein aligned in a vertical direction and flipped;

FIG. 15 is a schematic block diagram of an embodiment of an encoder provided in the present application;

FIG. 16 is a block diagram of an embodiment of a decoder provided herein;

FIG. 17 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The most commonly used color coding methods in video coding are YUV (Y denotes brightness, i.e. the gray value of an image, and U and V denote chrominance for describing the color and saturation of an image), and RGB (red, green, and blue), etc. The color coding method adopted by the application is YUV, each Y brightness block corresponds to a U chroma block and a V chroma block, and each chroma block corresponds to only one brightness block. Taking the sample format of 4:2:0 as an example, the luminance block size corresponding to one N × M block is N × M, the sizes of the two corresponding chrominance blocks are both (N/2) × (M/2), and the size of the chrominance block is 1/4, which is the size of the luminance block.

When video Coding is performed, a plurality of image frames are input, but when each frame image is coded, the frame image needs to be divided into a plurality of Largest Coding Units (LCUs), and then each Coding Unit needs to be divided into Coding Units (CUs) with different sizes, and video Coding is performed by using a CU as a Unit.

The present application is related to the SP technology, and therefore the SP technology is introduced to facilitate understanding of the technical solutions adopted in the present application.

SP is an independent prediction technology, and for the case that different areas in an image have the same image content, there are two modes in the SP prediction technology: a normal SP prediction mode and an equivalent string-level unit basis vector string prediction mode.

The present application is related to the general SP prediction mode and therefore introduces the general SP prediction mode. The consecutive n (n > -1) pixels in the current block are called a String, and there may be several different strings in the current block, each String having its own String Vector (SV) pointing to the previously encoded pixel (i.e., reference pixel) in the spatial domain. If there are pixels in the current block that cannot be serialized, the pixel value is directly encoded. The schematic diagram of string matching is shown in fig. 1, where different padding types represent three strings (the pixel values in the same string are not necessarily the same), the corresponding string vectors are SV1-SV3, respectively, and the pixels in the bottom right corner of the current block are not in a string and are directly encoded.

The detailed steps of the SP prediction technique are as follows:

(1) application conditions of SP technique

The SP technique can only be used for luminance blocks with width and height greater than/equal to 4 and less than/equal to 32, and for some small blocks, if the small blocks are further divided to generate chrominance blocks with side length less than 4, the chrominance blocks of the small blocks are not further divided, and the small blocks do not adopt the SP mode, but adopt the traditional intra prediction mode.

The pixel-level string prediction method also has the following limitations:

A. the positions of the reference pixels are limited in the areas where the left adjacent LCU and the current LCU are located, the LCUs are divided evenly according to the size of 64 x 64, and all pixels of one reference string come from the same area.

B. The number of strings allowed per CU (including the number of strings of matched pixels and the number of unmatched pixels) must not be greater than 1/4 for the number of pixels within the current CU block.

C. The number of pixels in each string must be a multiple of 4.

(2) Motion estimation

The purpose of motion estimation is to find the best motion vector of the current block, and the part is obtained by adopting two methods of CU level motion search and pixel level motion search; specifically, CU-level motion search is performed first, and if no matching block is found, pixel-level motion search is performed.

1. CU-LEVEL MOTION SEARCH

[a] Setting search range

The search direction is horizontally left and vertically up along the top left corner (e.g., black dot in fig. 2) of the current block, the upper side of the search cannot exceed the upper boundary of the current LCU, and the left side boundary cannot exceed the left boundary of the left neighboring LCU.

In addition to searching CU-level matching strings in the above two search directions, a Block pointed to by a motion Vector in a Block Vector Prediction (HBVP) list needs to be taken as a matching Block for cost (cost) calculation.

[b] Selection of matching blocks

For a brightness block, in a search range, searching a block with the same size as the current block for matching, wherein the search block needs to be satisfied as an encoded block, the search is performed in the vertical direction and the horizontal direction, the Sum of Absolute Differences (SAD) of the current block and the search block is calculated, at most 8 minimum SADs are selected, and an SAD list and a difference value (marked as SV) list of coordinates of the upper left corner of the search block and the upper left corner of the current block corresponding to the SAD are recorded in a mode that the SAD is from small to large.

For the chroma block, if a difference list for searching the coordinates of the upper left corner of the luma block and the upper left corner of the current block exists, and the SAD corresponding to the first difference in the list is less than or equal to 32, the SAD list and the difference list for searching the coordinates of the upper left corner of the block and the upper left corner of the current block corresponding to the SAD need to be traversed, the total SAD of the three components needs to be recalculated, and the corresponding difference in the difference list corresponding to the minimum SAD is used as the coordinate difference between the current block and the matching block, namely, is used as the SV and points to the matching block of the current block. Otherwise, the first difference value in the difference value list of the coordinates of the upper left corner of the search block corresponding to the brightness SAD and the upper left corner of the current block is used as the coordinate difference value between the current block and the matching block, namely used as SV, and points to the matching block of the current block.

[c] Obtaining a reconstructed block

And then calculating Rate Distortion cost (rdcost) of the current block and the reconstructed block by using Sum of Square (SSE), wherein bits in the rdcost are bits consumed by a coordinate difference value of the current block and the matching block.

[d] Information to be saved

If the rdcost is smaller than that in the previous best mode, the matching flag (flag) in the mode, the coordinate difference between the current block and the matching block, the number of matched pixels, and the type of the matching string need to be saved.

2. Pixel level motion search

[a] Pixel scanning mode

The current scanning mode is a horizontal arcuate scanning mode, as shown in fig. 3.

[b] Constructing motion search string motion vector candidates

When scanning in the horizontal direction, the motion search candidates are: firstly, the vertical direction is put, and the motion vector is correspondingly (0, -1); then, the horizontal direction is placed, and the motion vector corresponds to (-1, 0); then, putting Intra Block Copy (IBC) mode best motion vector (mv); then putting historical motion vectors (up to 12); and finally, all reconstructed pixels and the motion vector of the current pixel (the motion vector corresponding to the last reconstructed pixel is placed at the front, and the motion vector corresponding to the earliest reconstructed pixel is placed at the back) which are equal to the HASH (HASH) value of the current pixel and within the search range, wherein the search range does not exceed the current LCU and the left adjacent LCU.

[c] Selection of matching strings

Each motion search candidate is traversed for the luminance component, with the scan mode determined. When the current motion search candidate is operated, matching operation needs to be performed on the current pixel and subsequent continuous pixels by using the current motion search candidate. A threshold is set, the threshold is related to a Quantizer Parameter (QP), the larger the QP is, the larger the threshold is, and if the difference between a number of consecutive original pixel values and the original pixel value of the corresponding reference pixel is smaller than the threshold, the number of consecutive pixel points (i.e. string length) under the motion search candidate is recorded, as shown in fig. 4.

The selection process of the best motion search candidate is to be followed:

I. and performing primary roughing according to string length comparison:

and if the length of the string under the current motion search is >.

II. And performing fine selection according to the string length and the cost, and taking the current motion search as the optimal motion search if one of the following three conditions is met:

i. (string length under current motion search) > (length under previous best motion search).

ii. (string length under current motion search ═ length under previous best motion search) & (current string length is not 0) & (current string cost < previous best motion search lower string cost).

iii, (the string length under the current motion search ═ length under the previous best motion search-1) & (the current string length is not 0) & (average cost of each pixel in the current string < average cost of each pixel in the previous best motion search lower string).

It will be appreciated that during the search, the reference pixel cannot exceed the image boundary and is an encoded pixel. For the chroma components, the best motion search candidate is used directly without participating in the motion search candidate traversal. And if the current pixel does not find a matched pixel, directly coding the current pixel value.

[d] Obtaining a reconstructed block

There may be multiple strings and non-strings of pixels in the current block, for which the reconstructed value is the pixel value (i.e., predictor) of its matching string; for these unconcatenated pixels, the reconstructed value is the original pixel value. Then the SSE is used to calculate the rdcost of the current block and the reconstructed block.

[e] Information to be saved

If the rdcost is less than that of the previous best mode, then the matching flag in this mode, the coordinate difference between each pixel string in the current block and the corresponding matching string, the number of pixels in each matching string, and the type of matching string (whether to the last string of the current block) need to be saved.

3. Motion compensation

The reconstructed value in this mode is the reference value, so that no transform quantization operation is required.

4. Syntax elements

A syntax element is a flag that indicates some parameters configured by the encoding end or the encoding technique and method used, etc. The syntax element is required to be encoded at an encoding end, that is, a specific encoding mode is used to convert the value of the syntax element into a string of characters which can be read and understood by a computer and are composed of '0' and '1', and the syntax element is encoded into a code stream and then transmitted to a decoding end. The decoding end can know the coding information of the coding end and adopt corresponding operation by analyzing the meaning represented by the syntactic element read by the coded characters. For example, the syntax element SPCuFlag represents whether or not the current CU employs the SP technique, and when SPCuFlag is equal to 1, it indicates that the SP technique is employed; when SPCuFlag is 0, it indicates that the SP technique is not employed; when encoding, the value corresponding to the SPCuFlag needs to be encoded into a code stream and transmitted to a decoding end, the decoding end knows whether the encoding end adopts the SP technology or not by analyzing the value corresponding to the SPCuFlag, and if the SP technology is adopted, the decoding end needs to decode through the operation related to the SP.

There is no pixel residual information in SP mode, but it is necessary to pass a string matching mode flag, a flag whether a matching string is found, a string length, a string motion vector (or an index under the history candidate list), or unmatched pixel value information to represent all information required for the current block motion search.

Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of a method for encoding based on string prediction according to the present application, the method including:

step 11: and dividing pixels of each coding unit in the image to be coded to obtain a plurality of current strings.

For each frame of image to be coded, before coding, dividing blocks, namely dividing the image to be coded into a plurality of coding units; then, several continuous pixels in each coding unit are divided into a string, so that a plurality of current strings are obtained.

Step 12: after the reference pixel of the initial pixel in the current string is found, the rest reference pixels are found along a plurality of preset directions to obtain a corresponding matching string; selecting one matching string from all matching strings as the best matching string; and/or after finding out the matching string matched with the current string, adjusting the position of the pixel in the matching string to generate a new matching string; and selecting the best matching string from the new matching string and the matching string.

The matching string includes reference pixels and remaining reference pixels, and the present embodiment provides three schemes for improving the coding efficiency and the coding performance, which are specifically shown as follows:

multi-directional string prediction: after the reference pixel matched with the initial pixel of the current string is found, the matching string is not limited to be searched in the same direction as the scanning direction of the current string, but is found along a plurality of preset directions, wherein the plurality of preset directions comprise the horizontal direction which is the same as the scanning direction of the current string, the horizontal direction which is opposite to the scanning direction of the current string, the vertical direction or the diagonal direction; after the matching strings corresponding to each direction are found, one matching string with the highest matching degree with the current string can be selected from the matching strings matched with the current string to serve as the best matching string, and therefore prediction of the current string is achieved.

It is understood that the preset directions are not limited to the directions provided in the present embodiment, and may be other reasonable directions, such as: the included angle between the horizontal direction and the right direction is +/-30 degrees; it is also not necessary to use all the preset directions provided by the present embodiment, and the adjustment can be performed along with the actual application scenario, for example: the matching string can be searched only along the horizontal direction and the vertical direction, and not along the diagonal direction.

Adjusting the pixel position inside the matching string: after the matching string of the current string is found according to the horizontal bow-shaped scanning scheme, the matching string is changed, namely the position of the pixel in the matching string is adjusted, a new matching string (marked as a new matching string) is correspondingly generated in each adjustment, the matching degree of the matching string and the current string can be calculated and compared, the matching degree of each new matching string and the current string can be calculated, and one of the matching string and all the new matching strings is selected as the string which is most matched with the current string, so that the prediction of the current string is realized.

The scheme is combined with the scheme, and coding efficiency is improved by predicting the multi-direction strings and adjusting the positions of pixels in the matching strings.

In one embodiment, the second scheme may be executed in the process of executing the first scheme, that is, after a matching string in each preset direction is found, the position of a pixel in the matching string is adjusted to generate a new matching string, and then which of the matching string and the new matching string has the highest matching degree with the current string is compared, and the string with the highest matching degree with the current string is recorded as a first matching string; and repeating the operation until all the first matching strings corresponding to the preset directions are obtained, then comparing which one of the first matching strings has the highest matching degree with the current string, and recording the first matching string with the highest matching degree as the best matching string.

In another embodiment, after the scheme (i) is executed, the scheme (ii) is executed, that is, after all the matching strings in the preset direction are found, the matching degrees of the matching strings and the current string are calculated, and the string which is most matched with the current string is found and recorded as a second matching string; then adjusting the pixel position of each matching string in the preset direction to generate a new matching string, selecting one string which is most matched with the current string from the new matching strings corresponding to all the preset directions, and recording the string as a third matching string; and finally, comparing the second matching string with the third matching string, wherein the matching degree of the second matching string with the third matching string is the highest, and recording the string with the highest matching degree with the current string in the second matching string and the third matching string as the best matching string.

The embodiment provides a novel encoding method based on SP technology, which adopts multi-direction string prediction and/or adjusts the pixel position inside a matching string, namely, the two technical schemes of multi-direction string prediction and pixel position adjustment inside the matching string can be used independently or combined together, and because an optimal matching string which is more matched with the current string can be found, the compression rate of encoding is improved, and the encoding efficiency and the encoding performance can be improved.

Referring to fig. 6, fig. 6 is a schematic flowchart illustrating another embodiment of a method for encoding based on string prediction according to the present application, the method including:

step 21: and dividing pixels of each coding unit in the image to be coded to obtain a plurality of current strings.

Step 21 is the same as step 11 in the above embodiment, and is not described again.

Step 22: after the reference pixel of the initial pixel in the current string is found, the rest reference pixels are found along a plurality of preset directions to obtain a corresponding matching string.

In the existing scheme, because a horizontal bow-shaped scanning mode is adopted for the coding blocks, the current strings are all in the horizontal direction, and the shapes of the matching strings referred by the current strings are consistent with the current strings, which are also horizontal strings.

In order to further find a string that is more matched with the current string, after the reference pixel of the starting point of the current string (i.e., the starting point of the matched string) is found, the reference pixels of the subsequent pixels in the current string may be arranged not only in the horizontal direction adopted by the current string, but also in the opposite horizontal direction, vertical direction, or diagonal direction.

Further, the matching string satisfies a preset condition, where the preset condition includes that the pixels in the preset direction are all reconstructed pixels, and when the current string crosses the row (i.e., the current string is a cross-row string), the matching string is 90 degrees at the corner, that is, the directional arrangement of the newly added matching string needs to satisfy the following specifications:

(1) the pixels in the matching string in the preset direction are guaranteed to be reconstructed pixels.

(2) For cross-row strings, matching strings need to be guaranteed to be 90 degrees at the corners.

In one embodiment, the scan direction of the current string includes left to right, right to left, vertically up, or vertically down, and the scan direction of the matching string includes left to right, right to left, vertically up, or vertically down.

For example, as shown in fig. 7(a) -7(c), a current string with a length of 4 is predicted, and the current string is scanned from left to right, and the matching strings are arranged in a manner of selecting a horizontal left direction (i.e., a direction from right to left), as shown in fig. 7 (a); or the matching strings are arranged vertically upwards as shown in fig. 7 (b); or the matching strings are arranged vertically downward as shown in fig. 7 (c).

In another embodiment, the current string includes a first pixel sub-string and a second pixel sub-string positioned next to the first pixel sub-string, the scanning direction of the first pixel sub-string is from left to right and the scanning direction of the second pixel sub-string is from right to left, or the scanning direction of the first pixel sub-string is from right to left and the scanning direction of the second pixel sub-string is from left to right; the matching strings comprise a first matching substring and a second matching substring, and the scanning direction of the matching strings comprises: the scanning direction of the first matching sub-string is from right to left, the scanning direction of the second matching sub-string is from left to right, the scanning direction of the first matching sub-string is from bottom to top, the scanning direction of the second matching sub-string is from top to bottom, the scanning direction of the first matching sub-string is from top to bottom, and the scanning direction of the second matching sub-string is from bottom to top, or the scanning direction of the matching string is the same as the scanning direction of the current string.

For example, as shown in fig. 8(a) -8(c), a current string of length 7 and spanning rows is predicted, and the current string is initially scanned from left to right and then scanned from right to left after being turned to the next row, and includes a first pixel sub-string L1 and a second pixel sub-string L2; the matching string comprises a first matching substring S1 and a second matching substring S2, and the matching string may be arranged in a direction opposite to the arrangement direction of the current string, as shown in fig. 8 (a); or the first matching substring S1 is arranged vertically upwards, and the second matching substring S2 is arranged vertically downwards, as shown in fig. 8 (b); or the first match substring S1 is arranged vertically downward and the second match substring S2 is arranged vertically upward, as shown in fig. 8 (c).

It is understood that the number of rows of the current string is not limited to one row or two rows, and may be three or more rows, and the scanning manner of the corresponding matching string is similar to that of two rows, and is not described herein again.

In yet another embodiment, the diagonally opposite directions include a first diagonal direction and a second diagonal direction, the scan direction of the current string is from left to right, and the scan direction of the matching string includes from bottom to top along the first diagonal direction, from top to bottom along the first diagonal direction, from bottom to top along the second diagonal direction, or from top to bottom along the second diagonal direction; specifically, the first diagonal direction is perpendicular to the second diagonal direction, the first diagonal direction is a direction having an angle of 45 ° with the horizontal right direction, and the second diagonal direction is a direction having an angle of 135 ° with the horizontal right direction; or the first diagonal direction is a direction with an included angle of 135 degrees with the horizontal right direction, and the second diagonal direction is a direction with an included angle of 45 degrees with the horizontal right direction.

For example, a current string with a length of 4 is predicted, and the current string is scanned from left to right, and the arrangement manner of the matching strings may be an oblique diagonal direction, as shown in fig. 9(a), the matching strings are arranged in an oblique left-up direction; or as shown in fig. 9(b), the matching strings are arranged in a direction inclined to the right and downward; or as shown in fig. 9(c), the matching strings are arranged in a diagonal left-down direction; or as shown in fig. 9(d), the matching strings are arranged in a diagonally right-up direction.

Step 23: and calculating the cost between the matching string corresponding to each preset direction and the current string, and taking the matching string with the minimum cost as the optimal matching string.

After the matching string corresponding to each preset direction is found, the matching degree can be calculated, and the matching string with the highest matching degree with the current string is taken as the best matching string; specifically, the cost between each matching string and the current string can be calculated, and the matching string with the minimum cost is determined as the string with the highest matching degree with the current string; since the best matching string has already been encoded, the current string can be predicted by directly using the encoded data of the best matching string.

Referring to fig. 10, fig. 10 is a flowchart illustrating a method for encoding based on string prediction according to another embodiment of the present application, the method including:

step 31: and dividing pixels of each coding unit in the image to be coded to obtain a plurality of current strings.

Step 31 is the same as step 11 in the above embodiment, and will not be described again.

Step 32: and after the matching string matched with the current string is found, turning over pixels in the matching string according to a preset turning mode to generate a new matching string.

In order to more accurately obtain the best matching string, in each searching process, the position of a pixel inside the searched matching string can be adjusted, but the reference traversal sequence between the current string and the pixel of the matching string is not changed; for example, assume that the current string includes four consecutive pixels C1-C4, the matching string includes 4 consecutive pixels D1-D4, pixels C1-C4 correspond to pixels D1-D4, respectively, with reference to the traversal order from left to right; after the position adjustment, the pixels in the matching string become D4, D3, D2 and D1, and then pixel C1 corresponds to pixel D4, pixel C2 corresponds to pixel D3, pixel C3 corresponds to pixel D2, pixel C1 corresponds to pixel D4, and the reference traversal order is still from left to right.

Further, the preset turning mode includes a first turning mode, a second turning mode or a third turning mode.

1) The first turning mode is that at least one pixel in the matching string is selected as a turning shaft, the pixel on the right side of the turning shaft in the matching string is turned to the left side of the turning shaft, and/or the pixel on the left side of the turning shaft in the matching string is turned to the right side of the turning shaft, namely, the pixel in the row where the turning shaft is located is turned for 180 degrees; and the pixels of each row in the matching string are turned over according to the turning axis corresponding to the current row, namely the turning axis of the current row is only effective to the current row. In addition, when a plurality of flip axes exist in the same row, the plurality of flip axes are continuous.

In a specific embodiment, assuming that the length of the matching string is 5 and the flip axis is the second pixel from right to left, the situation before and after flipping is shown in fig. 11 (a); or the second pixel and the third pixel with the flip axes from right to left, and the case before and after the flip is shown in fig. 11 (b).

In another specific embodiment, assuming that the matching string has a length of 7 and spans rows, two pixels in the middle of the first row are selected as the flip axis, and one pixel in the middle of the second row is selected as the flip axis, and the conditions before and after flipping are as shown in fig. 11 (c).

2) The second turnover mode is the same as the first turnover mode in general, the second turnover mode is to select a plurality of discontinuous pixel points in the matching string which are positioned in the same row as turnover axes, namely, a plurality of pixel points are selected in one row as the turnover axes which are discontinuous, one part of the pixels between two adjacent turnover axes is turned over according to the turnover axes which are positioned in front in the scanning sequence, and the other part is turned over according to the turnover axes which are positioned in back in the scanning sequence.

In a specific embodiment, assuming that the matching string is 7 in length and there are two non-consecutive flip axes, as shown in FIG. 12, flip axis 1 is located at the second pixel from left to right and flip axis 2 is located at the second pixel from right to left, for the three pixels E1-E3 between the two flip axes, the left two pixels E1-E2 flip based on flip axis 1 and the right one pixel E3 flip based on flip axis 2.

3) The third turning mode is that the ith (i is more than or equal to 1 and less than or equal to m, m is the number of pixels in the matching string) pixels of the matching string are changed into the jth (j is more than or equal to 1 and less than or equal to m) pixels of the new matching string, i + j is equal to m +1, namely the third turning mode is that the sequence of the pixels in the matching string is completely turned over, the last pixel point in the matching string is changed into the first pixel point of the new matching string, the last pixel point in the matching string is changed into the second pixel point of the new matching string, and so on, the first pixel point in the matching string is changed into the last pixel point of the new matching string.

In a specific embodiment, assuming that the matching string has a length of 5 and spans across rows, the matching string is flipped in a third flipping manner, and the conditions before and after flipping are shown in fig. 13.

It can be understood that the adjustment modes are not limited to the above three modes, and the adjustment mode may be designed according to specific application requirements, as long as the position of the pixel is adjusted, for example, each pixel point may be used as the flip axis in a traversal mode.

Step 33: and calculating the cost between the matching string and the current string, recording the cost as a first cost, and calculating the cost between the new matching string and the current string, recording the cost as a second cost.

After the matching string is flipped to generate a new matching string, the cost between the current string and the matching string (denoted as the first cost) can be calculated, and the cost between the current string and the new matching string (denoted as the second cost) can be calculated.

Step 34: and judging whether the first cost is less than the second cost.

To obtain the least costly matching string, the first cost is compared to the second cost.

Step 35: and if the first cost is greater than or equal to the second cost, taking the new matching string as the best matching string.

If the first cost is greater than or equal to the first cost, it indicates that the cost between the new matching string and the current string is less, and the new matching string is taken as the string that best matches the current string.

Step 36: and if the first price is smaller than the second price, the matching string is taken as the best matching string.

If the first price is less than the first price, it indicates that the price between the matching string and the current string is less, and the matching string is regarded as the string that best matches the current string.

In other embodiments, the two schemes of multi-directional string prediction and pixel flipping of the matching string may be combined to perform prediction, as shown in fig. 14, the current string is 5 in length and is a cross-row string, the matching strings are arranged from the starting point to the top, and the flipping axis selects the 2 nd pixel and the 3 rd pixel in the matching strings; the prediction traversal sequence of each pixel in the current string is not changed, but the positions of the pixels in the matching string are changed, so that the prediction values of at least part of the pixels in the current string are changed, and whether the matching string is turned or not is determined by comparing the cost values before and after turning.

In other embodiments, a new syntax element may be added to identify the method adopted by the present application, for example, the syntax element F1 is set, if F1 is 0, it indicates that the existing scheme is adopted for prediction; if F1 is 1, it indicates that the scheme provided in this embodiment is used for prediction.

Further, if the scheme provided by the embodiment is adopted, a first syntax element may be added, where the first syntax element is an angular direction of the matching string, and is used to identify a scanning direction of the matching string; and/or adding a second syntax element, wherein the second syntax element is an adjustment type of the matching string and is used for identifying the turning mode of the matching string.

In a specific embodiment, if only the multi-directional string prediction scheme is adopted, only one syntax element needs to be added to indicate the scanning direction of the matching string; specifically, when the first syntax element is a first preset value, the scanning directions of pixels in the matching string and the current string are the same; when the first syntax element is a second preset value, the scanning direction of the pixels in the matched string is a horizontal direction opposite to the scanning direction of the current string; when the first syntax element is a third preset value, the scanning direction of the pixels in the matched string is vertical downward; when the first syntax element is a fourth preset value, the scanning direction of the pixels in the matched string is vertically upward; when the first syntax element is a fifth preset value, the scanning direction of the pixels in the matched string is from bottom to top along the first diagonal direction; when the first syntax element is a sixth preset value, the scanning direction of the pixels in the matching string is from top to bottom along the first diagonal direction; when the first syntax element is a seventh preset value, the scanning direction of the pixels in the matching string is from bottom to top along the second diagonal direction; when the first syntax element is an eighth preset value, the scanning direction of the pixels in the matching string is from top to bottom along the second diagonal direction. It is understood that the values of the first to eighth preset values are different.

For example, a syntax element ref _ dir is added, and ref _ dir is 0, which means that the matching string and the current string are arranged in the same way; ref _ dir is 1, which means that the matching strings are arranged in the horizontal direction opposite to the scanning direction of the current string; a ref _ dir of 2 indicates that the matching string is vertically up, and a ref _ dir of 3 indicates that the matching string is vertically down.

In another specific embodiment, if only the scheme of adjusting the pixel position inside the matching string is adopted, only one syntax element needs to be added to represent which inversion scheme is finally adopted for the matching string referred by a current string to be inverted or not to be inverted; specifically, when the second syntax element is a ninth preset value, the matching string is not flipped; when the second syntax element is a tenth preset value, the matching string is turned over according to the first turning mode; when the second syntax element is an eleventh preset value, the matching string is turned over according to a second turning mode; and when the second syntax element is a twelfth preset value, the matching string is turned over according to a third turning mode. It is understood that the values of the ninth to twelfth preset values are different.

For example, a syntax element str _ rev is added, where str _ rev is 0 and represents that the matching string corresponding to the current string is not inverted, where str _ rev is 1 and represents that the matching string corresponding to the current string is inverted in a first inversion manner, where str _ rev is 2 and represents that the matching string corresponding to the current string is inverted in a second inversion manner, and where str _ rev is 3 and represents that the matching string corresponding to the current string is inverted in a third inversion manner.

If a multi-directional string prediction scheme and a matching string internal pixel position adjustment scheme are combined, two syntax elements need to be added, one syntax element ref _ dir represents the scanning direction of a matching string, and the other syntax element str _ rev represents which inversion scheme is finally adopted for the matching string referred by a current string to be inverted or not to be inverted. For example, when a current string is encoded, a syntax element ref _ dir is transmitted first, and then a syntax element str _ rev is transmitted first, which indicates that a matching string of the current string is vertically arranged upward, and the current string can be predicted only by completely flipping the matching string.

The application provides a new coding method based on SP technology, which can be applied to the SP technology and relates to prediction and syntax expression modes in the SP technology, and the method mainly comprises three technical points: 1) multi-directional string prediction, which includes simultaneous participation of matching strings in multiple directions in the prediction; 2) adjusting the positions of pixels in the matching strings, wherein the adjusting comprises the modes of turning the positions of the pixels in the matching strings and the like; 3) and the syntax element expression comprises the identification of the method and an added syntax expression form. Through multi-direction string prediction and the adjustment of the pixel position inside the matching string, the best matching string which is matched with the current string can be found, the prediction accuracy is improved, and the compression rate of the coding is further improved.

Referring to fig. 15, fig. 15 is a schematic structural diagram of an embodiment of an encoder provided in the present application, the encoder 150 includes a memory 151 and a processor 152 connected to each other, the memory 151 is used for storing a computer program, and the computer program is used for implementing the encoding method based on string prediction in the above embodiment when being executed by the processor 152.

Referring to fig. 16, fig. 16 is a schematic structural diagram of an embodiment of a decoder provided in the present application, in which the decoder 160 includes a processor 161, and the processor 161 is configured to decode an encoded data stream output by an encoder to obtain decoded image data, where the encoder is an encoder in the above embodiment.

Referring to fig. 17, fig. 17 is a schematic structural diagram of an embodiment of a computer-readable storage medium 170 provided by the present application, where the computer-readable storage medium 170 is used for storing a computer program 171, and the computer program 171, when executed by a processor, is used for implementing the encoding method based on string prediction in the foregoing embodiments.

The computer-readable storage medium 170 may be a server, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various media capable of storing program codes.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules or units is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The above description is only an example of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for encoding based on string prediction, comprising:

dividing pixels of each coding unit in an image to be coded to obtain a plurality of current strings;

after the reference pixel of the initial pixel in the current string is found, the remaining reference pixels are found along a plurality of preset directions to obtain a corresponding matching string; selecting one matching string from all the matching strings as a best matching string; wherein the plurality of preset directions include a horizontal direction that is the same as a scanning direction of the current string, a horizontal direction that is opposite to the scanning direction of the current string, a vertical direction, or an oblique diagonal direction; and/or

After the matching string matched with the current string is found, adjusting the position of the pixel in the matching string to generate a new matching string; and selecting the best matching string from the new matching string and the matching string.

2. The string prediction-based encoding method according to claim 1, wherein the matching string satisfies a preset condition, the preset condition includes that all pixels in the preset direction are reconstructed pixels, and the matching string is 90 degrees at a corner when the current string crosses a row; the step of selecting one matching string from all the matching strings as a best matching string comprises:

and calculating the cost between the matching string corresponding to each preset direction and the current string, and taking the matching string with the minimum cost as the optimal matching string.

3. The string prediction-based encoding method according to claim 1,

the scanning direction of the current string comprises from left to right, from right to left, vertically upward or vertically downward, and the scanning direction of the matching string comprises from left to right, from right to left, vertically upward or vertically downward.

4. The string prediction-based encoding method according to claim 1,

the current string comprises a first pixel sub-string and a second pixel sub-string positioned on the next row of the first pixel sub-string, the scanning direction of the first pixel sub-string is from left to right and the scanning direction of the second pixel sub-string is from right to left, or the scanning direction of the first pixel sub-string is from right to left and the scanning direction of the second pixel sub-string is from left to right; the matching strings comprise a first matching sub string and a second matching sub string, the scanning direction of the first matching sub string is from right to left, and the scanning direction of the second matching sub string is from left to right; or the scanning direction of the first matching substring is from bottom to top, and the scanning direction of the second matching substring is from top to bottom; or the scanning direction of the first matching substring is from top to bottom, and the scanning direction of the second matching substring is from bottom to top.

5. The string prediction-based encoding method according to claim 1,

the diagonal direction includes first diagonal direction and second diagonal direction, the scanning direction of current cluster is from left to right, the scanning direction of matching the cluster includes along first diagonal direction is from bottom to top, along first diagonal direction is from top to bottom, along second diagonal direction is from bottom to top or along second diagonal direction is from top to bottom, wherein, first diagonal direction with second diagonal direction is perpendicular.

6. The string prediction-based encoding method according to claim 1, wherein the step of selecting the best matching string from the new matching string and the matching string comprises:

calculating the cost between the matching string and the current string, and recording the cost as a first cost;

calculating the cost between the new matching string and the current string and recording the cost as a second cost;

judging whether the first price is smaller than the second price;

if not, taking the new matching string as the optimal matching string;

and if so, taking the matching string as the best matching string.

7. The string prediction-based encoding method according to claim 1, wherein the step of adjusting the positions of the pixels in the matching string comprises:

turning over the pixels in the matching strings according to a preset turning mode;

the preset turning mode comprises a first turning mode, a second turning mode or a third turning mode, and the pixels of each line in the matching string are turned according to the turning axis corresponding to the current line.

8. The string prediction-based encoding method according to claim 7,

the first turning mode is that at least one pixel in the matching string is selected as a turning shaft, the pixel on the right side of the turning shaft in the matching string is turned to the left side of the turning shaft, and/or the pixel on the left side of the turning shaft in the matching string is turned to the right side of the turning shaft; wherein when a plurality of the flip axes exist in the same row, the plurality of the flip axes are continuous.

9. The string prediction-based encoding method according to claim 7,

the second turning mode is that a plurality of discontinuous pixel points which are positioned in the same row in the matching string are selected as turning axes, one part of pixels between two adjacent turning axes is turned over according to the turning axis positioned at the front in the scanning sequence, and the other part of pixels is turned over according to the turning axis positioned at the back in the scanning sequence.

10. The string prediction-based encoding method according to claim 7,

the third inversion mode is to change the ith pixel of the matching string into the jth pixel of the new matching string, where i is greater than or equal to 1 and less than or equal to m, j is greater than or equal to 1 and less than or equal to m, i + j = m +1, and m is the number of pixels in the matching string.

11. The string prediction-based encoding method according to claim 1, wherein the method further comprises:

adding a first syntax element, wherein the first syntax element is used to identify a scan direction of the matching string; and/or

And adding a second syntax element, wherein the second syntax element is used for identifying the turning mode of the matching string.

12. The string prediction-based encoding method according to claim 11,

when the first syntax element is a first preset value, the scanning directions of the pixels in the matching string and the current string are the same; when the first syntax element is a second preset value, the scanning direction of the pixels in the matching string is a horizontal direction opposite to the scanning direction of the current string; when the first syntax element is a third preset value, the scanning direction of the pixels in the matching string is vertically downward; when the first syntax element is a fourth preset value, the scanning direction of the pixels in the matching string is vertically upward; when the first syntax element is a fifth preset value, the scanning direction of the pixels in the matching string is from bottom to top along a first diagonal direction; when the first syntax element is a sixth preset value, the scanning direction of the pixels in the matching string is from top to bottom along a first diagonal direction; when the first syntax element is a seventh preset value, the scanning direction of the pixels in the matching string is from bottom to top along a second diagonal direction; and when the first syntax element is an eighth preset value, the scanning direction of the pixels in the matching string is from top to bottom along a second diagonal direction.

13. The string prediction-based encoding method according to claim 11,

when the second syntax element is a ninth preset value, the matching string is not turned over; when the second syntax element is a tenth preset value, the matching string is turned over according to a first turning mode; when the second syntax element is an eleventh preset value, the matching string is turned over according to a second turning mode; and when the second syntax element is a twelfth preset value, the matching string is turned over according to a third turning mode.

14. An encoder comprising a memory and a processor connected to each other, wherein the memory is configured to store a computer program, which when executed by the processor is configured to implement the string prediction based encoding method of any one of claims 1-13.

15. A decoder comprising a processor configured to decode an encoded data stream output by an encoder to obtain decoded image data, wherein the encoder is the encoder of claim 14.

16. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, is configured to implement the string prediction based encoding method of any one of claims 1-13.