CN110197513B

CN110197513B - String matching data compression method based on adjustment threshold matching error

Info

Publication number: CN110197513B
Application number: CN201910393318.3A
Authority: CN
Inventors: 赵利平; 林涛; 沈士根; 胡珂立; 彭华
Original assignee: University of Shaoxing
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2023-04-28
Anticipated expiration: 2039-05-13
Also published as: CN110197513A

Abstract

The invention relates to a string matching data compression method based on adjustment of threshold matching errors, which comprises a coding string, a reference string, a set threshold and preset matching conditions, wherein the minimum matching unit of the coding string is a coding primitive, and the minimum matching unit of the reference string is a reference primitive; setting an initial threshold value and corresponding matching conditions, comparing whether the current coding primitive (or coding string) and the reference primitive (or reference string) meet the set matching conditions, adjusting the current threshold value after the current matching judgment is completed, and using the new adjusted threshold value to carry out matching judgment when the next coding primitive or coding string is matched. The invention considers the influence of the current matching error on the global error, and the global optimization processing is carried out for the matching of the subsequent primitives by adjusting the current threshold value, thereby obviously improving the coding efficiency.

Description

String matching data compression method based on adjustment threshold matching error

Technical field:

the invention relates to a method for lossy compression of data, in particular to a string matching data compression method based on adjustment of threshold matching errors.

The background technology is as follows:

in the 5G era, the internet data generated in various emerging applications are diversified and diverse in all things. Therefore, how to develop an efficient data encoding technology aiming at the characteristics of novel internet data with diversity and variability becomes an urgent need today. In order to further improve the coding efficiency of various new types of data with diversity and variability, the latest data coding techniques mainly include a multi-tree based, symmetrical or asymmetrical based, more flexible based traditional coding framework with "blocks" as codec units, and a string matching algorithm based on "strings" as codec units.

The minimum matching unit in the string matching algorithm is called a primitive, and the coding primitive represents the minimum matching unit of the coding string; the reference primitive represents the smallest matching unit of the reference string. In the lossy string matching algorithm, how to determine whether the current coding primitive matches the reference primitive is a key issue in the string matching algorithm. In the prior art, the lossy string matching algorithm mainly sets a threshold, the threshold is a fixed value, and whether the absolute value of the difference between the current coding primitive and the reference primitive is smaller than the preset single fixed threshold or not is compared, namely, a single fixed threshold method is adopted to judge whether the current coding primitive and the reference primitive are matched or not. For example: if the absolute value of the difference between the current coding primitive and the reference primitive is less than a preset single fixed threshold, then the current coding primitive is indicated to match the reference primitive.

The method adopting a single fixed threshold in the prior art mainly has the following defects:

(1) The use of a "single threshold" has the disadvantage of: when the threshold value is set smaller, the error between the reference string and the code string is smaller, the distortion is smaller, but the matching string length is shorter, so that the number of bits required to represent the current code string and the code information of the reference string is increased, and the coding efficiency is reduced; on the contrary, when the threshold value is set larger, the distortion of the reference string and the code string becomes larger, the matching string length becomes longer, and the reconstruction quality of the image is hardly ensured.

(2) The use of a "fixed threshold" has the disadvantage that: the fixed value of the threshold value is adopted because only partial error values of the coding primitive and the reference primitive are considered, and the whole error values of the coding string and the reference string and the error value information of the optimal reference string and the optimal mode before are not considered. In fact, whether the reference string can be the final best reference string is generally measured according to an average rate distortion minimization rule, i.e., in relation to the overall error value, etc. Therefore, only local error values cannot be considered, otherwise the coding efficiency is also affected.

To clearly illustrate the problems with the prior art using a single and fixed threshold, one embodiment is given below: when there are only a few large number of errors (the other errors are very small) in the code string and the reference string, the current code string is first divided into multiple code strings for processing by using a single fixed threshold method. Such as: the current code string has values of 177, 23, 177, 23, 40, 177, 165, 165, 165, and the reference string has values of 177, 23, 177, 23, 45, 177, 165, 165, and the single fixed threshold is set to 4, and the 5 th value of the current code string and the reference string exceeds the set threshold range, i.e., is a non-matching value. Therefore, when this situation is met, the single fixed threshold method needs to split the code string into a plurality of code strings, and then perform the matching confirmation one by one, which results in low efficiency of matching and coding of the lossy strings.

In view of this, the present invention has been developed.

The invention comprises the following steps:

in view of the problem of low efficiency of the string matching coding method in the prior art, the invention achieves the aim of improving the coding efficiency by a threshold adjustment mode, and specifically adopts the following technical scheme:

a string matching data compression method based on adjustment of threshold matching errors comprises a coding string, a reference string, a set threshold and preset matching conditions, wherein the minimum matching unit of the coding string is a coding primitive, and the minimum matching unit of the reference string is a reference primitive; setting an initial threshold value and a matching condition corresponding to the initial threshold value, and judging whether the current coding primitive is matched with the reference primitive or whether the current coding string is matched with the reference string or not by utilizing the initial threshold value and the matching condition corresponding to the initial threshold value; and after the comparison is completed, the current threshold value is adjusted, and when the next coding primitive or the coding string is matched, the new adjusted threshold value is used for matching judgment.

The current threshold value is further adjusted by adding/subtracting/multiplying/dividing the threshold value by a value, wherein the value adopts a fixed value or adopts a non-fixed value, and one of the modes of adding, subtracting, multiplying and dividing can be adopted or a plurality of mixed modes of operation can be adopted in the threshold value adjustment.

Further, one threshold is set, and the threshold is set by adopting a threshold corresponding to different components, or adopting a constant fixed value or adopting a non-fixed value.

Further, when the threshold is set to a fixed value, the preset matching condition is: the absolute value of the difference between the current encoded primitive and the reference primitive is less than the fixed threshold, or: the absolute value of the average error value of the current code string and the reference string is less than the fixed threshold, or is: the maximum error value of the current coding string and the reference string is smaller than the fixed threshold value; when the threshold is set to be a non-fixed value, the preset matching condition is as follows: the absolute value of the difference between the current encoded primitive and the reference primitive is less than the non-fixed threshold.

Further, a plurality of thresholds are set, wherein the thresholds are one of thresholds, constant fixed values and non-fixed values corresponding to different components, or the thresholds are combined.

Further, when the threshold is set to a fixed value a and a fixed value B, the preset matching condition is: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B, or the preset matching condition is that: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B; when the threshold is set to be a fixed value A and a non-fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a non-fixed value B; when the threshold is set to be a non-fixed value A and a fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B, or the preset matching condition is: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B; when the threshold is set to be a fixed value A of one component and a fixed value B of other components, the preset matching condition is as follows: the absolute value of the difference between one component of the current coding primitive and one component of the reference primitive is less than a fixed value a, and the absolute value of the difference between the other components of the current coding primitive and the other components of the reference primitive is less than a fixed value B.

Further, when the threshold value adopts a non-fixed value, the non-fixed value is an average value of absolute values of current best reference string error values, or is a maximum value of absolute values of current best reference string error values, or is a minimum value of absolute values of current best reference string error values, or is an average value of absolute values of error values obtained in a current best mode.

Further, the string matching data in the data compression method includes one of the following types, or several combinations of the following types, and the data types include: one-dimensional data; two-dimensional data; multidimensional data; a pattern; an image; a sequence of images; video; a three-dimensional scene; a sequence of continuously varying three-dimensional scenes; a virtual reality scene; a sequence of continuously changing virtual reality scenes; an image in the form of pixels; transform domain data of the image; a set of two or more bytes; a set of two or more bits; a set of pixels; a set of three-component pixels (Y, U, V); a set of three-component pixels (Y, cb, cr); a set of three-component pixels (Y, cg, co); a set of three-component pixels (R, G, B); a set of four-component pixels (C, M, Y, K); a set of four-component pixels (R, G, B, a); a set of four-component pixels (Y, U, V, a); a set of four-component pixels (Y, cb, cr, a); a set of four-component pixels (Y, cg, co, a).

Further, the primitives in the coded primitive and the primitives in the reference primitive include a single component, or a single pixel, or a string of multiple components, or a string of multiple pixels.

The method comprises the steps of setting a plurality of threshold values, and adjusting only part of threshold values or all threshold values when the current threshold value is adjusted; the adjustment of the current threshold is achieved by adding/subtracting/multiplying/dividing the threshold by a value or by selecting a minimum/maximum value.

The invention improves the string matching algorithm realized by a single fixed threshold value adopted in the prior art, considers the influence of the current matching error on the global error, carries out global optimization processing on the matching of the subsequent primitives by adjusting the threshold value, and obviously improves the efficiency of string matching data coding.

The specific embodiment is as follows:

the invention discloses a string matching data compression method based on adjusting threshold matching error, aiming at improving data compression processing by adopting a string matching algorithm, aiming at improving coding efficiency, comprising the following steps: setting an initial threshold value and a matching condition corresponding to the initial threshold value, and judging whether the current coding primitive is matched with the reference primitive or whether the current coding string is matched with the reference string or not by utilizing the initial threshold value and the matching condition corresponding to the initial threshold value; and after the comparison is completed, the current threshold value is adjusted, and when the next coding primitive or the coding string is matched, the new adjusted threshold value is used for matching judgment.

When the data compression processing is performed in a string matching manner, a coding string, a reference string, a set threshold value and preset matching conditions are generally involved, wherein the minimum matching unit of the coding string is a coding primitive, and the minimum matching unit of the reference string is a reference primitive. The invention is mainly suitable for coding the lossy string matching data, wherein the data refers to any one of the following listed data, or can be any combination of the following data, and the data comprises:

1) One-dimensional data;

2) Two-dimensional data;

3) Multidimensional data;

4) A pattern;

5) An image;

6) A sequence of images;

7) Video;

8) A three-dimensional scene;

9) A sequence of continuously varying three-dimensional scenes;

10 A virtual reality scene;

11 A sequence of continuously changing virtual reality scenes;

12 An image in the form of pixels;

13 Transform domain data of the image;

14 A set of bytes of two or more dimensions;

15 A set of bits that are two or more dimensions;

16 A set of pixels;

17 A set of three-component pixels (Y, U, V);

18 A set of three-component pixels (Y, cb, cr);

19 A set of three-component pixels (Y, cg, co);

20 A set of three-component pixels (R, G, B);

21 A set of four-component pixels (C, M, Y, K);

22 A set of four-component pixels (R, G, B, a);

23 A set of four-component pixels (Y, U, V, a);

24 A set of four-component pixels (Y, cb, cr, a);

25 Four-component pixels (Y, cg, co, a).

The primitives in the above-given coded primitive or reference primitive mainly refer to the following: a single component, or a single pixel, or a string of multiple components, or a string of multiple pixels.

The threshold refers to the threshold corresponding to different components, or a constant fixed value with a value between 0 and 100, or a non-fixed value. The non-fixed values here include the following list of several: average value of absolute value of current best reference string error value; or is the maximum value of the absolute value of the current best reference string error value; or the minimum value of the absolute value of the current best reference string error value; or an average of the absolute values of the error values obtained for the current best mode.

The setting of the threshold value may be set to only one, and when the threshold value is set to a fixed value, the corresponding preset matching condition is one or a combination of the following:

1) The absolute value of the difference between the current coding primitive and the reference primitive is less than the fixed threshold;

2) The absolute value of the average error value of the current coding string and the reference string is smaller than the fixed threshold;

3) The maximum error value of the current code string and the reference string is less than the fixed threshold.

When the threshold is set to one and a non-fixed value is adopted, the preset matching condition is: the absolute value of the difference between the current encoded primitive and the reference primitive is less than the non-fixed threshold.

The setting of the threshold value can be multiple, and multiple matching conditions corresponding to the multiple threshold values can be met, or can be met simultaneously. The number of the multiple thresholds may be at least two, or may be two or more, and the multiple thresholds may be set as follows: (1) setting a fixed threshold A and a fixed threshold B; (2) setting a fixed threshold A and a non-fixed threshold B; (3) setting a non-fixed threshold A and a non-fixed threshold B; (4) Setting a threshold A corresponding to one component and a threshold B corresponding to other components; (5) Setting a threshold A corresponding to one component, a threshold B corresponding to other components and a fixed threshold C; (6) Setting a threshold A corresponding to one component, a threshold B corresponding to the other components and an unfixed threshold C, and specifically setting the threshold A, the threshold B and the unfixed threshold C according to the needs.

When the threshold value is set to be plural, the preset matching condition is one or a combination of the following, such as:

(1) When the threshold is set to a fixed value A and a fixed value B, the preset matching conditions are as follows:

(1a) The absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B;

(1b) The absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B.

(2) When the threshold is set to be a fixed value A and a non-fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is less than a fixed value a or/and the absolute value of the average error value of the current coding string and the reference string is less than a non-fixed value B.

(3) When the threshold is set to be a non-fixed value A and a fixed value B, the preset matching conditions are as follows:

(3a) The absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B;

(3b) The absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B.

(4) When the threshold is set to be a fixed value A of one component and a fixed value B of other components, the preset matching condition is as follows: the absolute value of the difference between one component of the current coding primitive and one component of the reference primitive is less than a fixed value a, and the absolute value of the difference between the other components of the current coding primitive and the other components of the reference primitive is less than a fixed value B.

The adjustment of the threshold value is realized by adding/subtracting/multiplying/dividing the threshold value by a numerical value, the numerical value adopts a fixed numerical value or adopts a non-fixed numerical value, one operation mode of adding, subtracting, multiplying and dividing can be adopted during the adjustment of the threshold value, and a plurality of mixed operation modes can also be adopted for setting according to actual needs. When the scheme of setting multiple thresholds is adopted, the method can only adjust a part of set thresholds, can also adjust all thresholds, and can select according to specific needs; when multiple thresholds are to be adjusted, the adjustment of the current threshold can be achieved by adding/subtracting/multiplying/dividing the threshold by a value, or selecting a minimum/maximum value, and selecting according to the needs.

A detailed description of several specific embodiments is given below:

example 1:

1.1 setting a threshold: a single non-fixed threshold A, wherein the initial value of the threshold is the average error value obtained by the current optimal reference string;

setting matching conditions: the absolute value of the difference between the current coding primitive and the reference primitive is less than the threshold A;

1.2, judging matching: if the absolute value of the difference between the current coding primitive and the reference primitive is smaller than the threshold A, the current coding primitive is matched with the reference primitive, and if the matching condition is not met, the current coding primitive is not matched with the reference primitive;

1.3, adjusting the current threshold A, wherein the adjusted threshold A' is adjusted as follows:

a' = (total error of best reference string-total error of current coding string)/(length of best reference string-length of current coding string);

and 1.4, continuously searching the next coding primitive, adopting the adjusted threshold value and the adjusted corresponding matching condition in the step 1.3 to carry out matching judgment on the next coding primitive, and then circularly operating according to the steps.

Example 2:

2.1 setting a threshold: setting an unfixed threshold A and an unfixed threshold B, wherein the initial value of the unfixed threshold A is an average error value obtained by a current optimal reference string, and the unfixed threshold B is an average error value obtained by a current optimal mode;

setting matching conditions:

condition 1: the absolute value of the difference between the current coding primitive and the reference primitive is less than the threshold A;

condition 2: the absolute value of the average error value of the current code string is smaller than a threshold B;

2.2, judging matching: if the condition 1 is met or the condition 2 is met, the current coding primitive is matched with the reference primitive, and if neither the condition 1 nor the condition 2 is met, the current coding primitive is not matched with the reference primitive;

2.3, adjusting the current threshold A, wherein the adjusted threshold is A', and the adjustment is as follows:

and 2.4, continuously searching the next coding primitive, adopting the adjusted threshold A' and the adjusted corresponding matching condition 1 in the step 2.3, and adopting the original threshold B and the original condition 2 to continuously carry out matching judgment on the next coding primitive, and then carrying out cyclic operation according to the steps.

Example 3:

3.1 setting a threshold: setting an unfixed threshold A and an unfixed threshold B, wherein the unfixed threshold A is an average error value obtained by a current optimal reference string minus a set value, and the unfixed threshold B is an average error value obtained by the current optimal reference string plus a set value, and the A is smaller than the B;

setting matching conditions:

condition 1: the absolute value of the difference between the current coding primitive and the reference primitive is less than the threshold A or less than the threshold B;

condition 2: the average error value of the optimal reference string A1 obtained by the current coding string through the threshold A is smaller than the average error value of the optimal reference string B1 obtained by the threshold B;

3.2, judging matching: if the condition 1 is satisfied, the coding primitive is matched with the reference primitive, and the optimal reference string A1 and the optimal reference string B1 are obtained at the same time; if the condition 1 is not satisfied, judging whether the condition 2 is satisfied, if the condition 2 is satisfied, indicating that the current coding string is matched with the optimal reference string A1, and adjusting the threshold A according to the step 3.3; otherwise, the current coding string is matched with the optimal reference string B1, and the threshold B is adjusted according to the step 3.4;

3.3, adjusting the current threshold A, wherein the adjusted threshold is A', and the adjustment is as follows:

a' =min (a+preset, B);

3.4, adjusting the current threshold B, wherein the adjusted threshold is B', and the adjustment is as follows:

b' =max (B-preset value, a);

and 3.5, continuously searching the next coding primitive, adopting the adjusted threshold A 'or B' in 3.3 or 3.4 and the adjusted corresponding matching condition 1 and matching condition 2 to continuously carry out matching judgment on the next coding primitive, and then carrying out cyclic operation according to the steps.

The above embodiments are only for illustrating the technical solution of the present invention, but not for limiting, and other modifications and equivalents thereof by those skilled in the art should be included in the scope of the claims of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A string matching data compression method based on adjustment of threshold matching errors is characterized in that: the method comprises a coding string, a reference string, a set threshold value and preset matching conditions, wherein the minimum matching unit of the coding string is a coding primitive, and the minimum matching unit of the reference string is a reference primitive; setting an initial threshold value and a matching condition corresponding to the initial threshold value, and judging whether the current coding primitive is matched with the reference primitive or whether the current coding string is matched with the reference string or not by utilizing the initial threshold value and the matching condition corresponding to the initial threshold value; when the comparison is completed, the current threshold value is adjusted, and when the next coding primitive or coding string is matched, the new adjusted threshold value is used for matching judgment; setting one threshold value, wherein the threshold value is a threshold value corresponding to different components, or a constant fixed value or a non-fixed value; when the threshold is set to be a fixed value, the preset matching conditions are as follows: the absolute value of the difference between the current encoded primitive and the reference primitive is less than the fixed threshold, or: the absolute value of the average error value of the current code string and the reference string is less than the fixed threshold, or is: the maximum error value of the current coding string and the reference string is smaller than the fixed threshold value; when the threshold is set to be a non-fixed value, the preset matching condition is as follows: the absolute value of the difference between the current encoded primitive and the reference primitive is less than the non-fixed threshold.

2. The string match data compression method based on the adjustment threshold match error of claim 1, wherein: the adjustment of the current threshold value is realized by adding/subtracting/multiplying/dividing the threshold value by a numerical value, wherein the numerical value adopts a fixed numerical value or adopts a non-fixed numerical value, and one operation mode of adding, subtracting, multiplying and dividing can be adopted in the adjustment of the threshold value, and a plurality of mixed operation modes can also be adopted.

3. The string match data compression method based on the adjustment threshold match error of claim 1, wherein: the threshold value is set to be a plurality of, the threshold value corresponding to different components, the constant fixed value or the non-fixed value is adopted by the plurality of threshold values, or the plurality of threshold values are combined by the plurality of threshold values.

4. A string match data compression method based on an adjusted threshold match error as claimed in claim 3, wherein: when the threshold is set to be a fixed value A and a fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B, or the preset matching condition is that: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B; when the threshold is set to be a fixed value A and a non-fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a non-fixed value B; when the threshold is set to be a non-fixed value A and a fixed value B, the preset matching conditions are as follows: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the absolute value of the average error value of the current coding string and the reference string is smaller than a fixed value B, or the preset matching condition is: the absolute value of the difference between the current coding primitive and the reference primitive is smaller than a non-fixed value A or/and the maximum error value of the current coding string and the reference string is smaller than a fixed value B; when the threshold is set to be a fixed value A of one component and a fixed value B of other components, the preset matching condition is as follows: the absolute value of the difference between one component of the current coding primitive and one component of the reference primitive is less than a fixed value a, and the absolute value of the difference between the other components of the current coding primitive and the other components of the reference primitive is less than a fixed value B.

5. A string match data compression method based on an adjusted threshold match error as claimed in claim 1 or 3, characterized by: when the threshold value adopts a non-fixed value, the non-fixed value is the average value of the absolute values of the current best reference string error values, or is the maximum value of the absolute values of the current best reference string error values, or is the minimum value of the absolute values of the current best reference string error values, or is the average value of the absolute values of the error values obtained in the current best mode.

6. The string match data compression method based on the adjustment threshold match error of claim 1, wherein: the string matching data in the data compression method comprises one of the following types or a combination of the following types, wherein the data types comprise: one-dimensional data; two-dimensional data; multidimensional data; a pattern; an image; a sequence of images; video; a three-dimensional scene; a sequence of continuously varying three-dimensional scenes; a virtual reality scene; a sequence of continuously changing virtual reality scenes; an image in the form of pixels; transform domain data of the image; a set of two or more bytes; a set of two or more bits; a set of pixels; a set of three-component pixels (Y, U, V); a set of three-component pixels (Y, cb, cr); a set of three-component pixels (Y, cg, co); a set of three-component pixels (R, G, B); a set of four-component pixels (C, M, Y, K); a set of four-component pixels (R, G, B, a); a set of four-component pixels (Y, U, V, a); a set of four-component pixels (Y, cb, cr, a); a set of four-component pixels (Y, cg, co, a).

7. The string match data compression method based on the adjustment threshold match error of claim 1, wherein: the primitives in the coded primitive and the primitives in the reference primitive comprise single components, single pixels, strings of multiple components, or strings of multiple pixels.

8. A string match data compression method based on an adjusted threshold match error as claimed in claim 3, wherein: the number of the thresholds is set to be multiple, and when the current threshold is adjusted, only part of the thresholds are adjusted, or all the thresholds are adjusted; the adjustment of the current threshold is achieved by adding/subtracting/multiplying/dividing the threshold by a value or by selecting a minimum/maximum value.