CN110647508A

CN110647508A - Data compression method, data decompression method, device and electronic equipment

Info

Publication number: CN110647508A
Application number: CN201910818105.0A
Authority: CN
Inventors: 舒承椿
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2020-01-03
Anticipated expiration: 2039-08-30
Also published as: CN110647508B

Abstract

The disclosure relates to a data compression method, a data decompression device, an electronic device and a storage medium. The data compression method comprises the following steps: acquiring original representation data of sparse features to be compressed, wherein the original representation data comprises a non-zero feature array, an index array corresponding to the non-zero feature array and a dense shape array; acquiring index segmentation data of sparse features according to the index array and a preset index representation length; the index segment data comprises a segment value, a segment length and a segment offset of each index segment; aiming at each non-zero eigenvalue in the non-zero eigenvalue array, combining the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue according to the index segmentation data to obtain a compressed expression array of the sparse characteristic; and acquiring compressed representation data of the sparse features according to the compressed representation array, the index segmentation data and the dense shape array. The beneficial effects of improving the data compression efficiency and reducing the byte number of the compressed data and the information redundancy are obtained.

Description

Data compression method, data decompression method, device and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a data compression method, a data decompression method, an apparatus, an electronic device, and a storage medium.

Background

In the related art, features used by a deep learning model are classified into sparse features and dense features. The dense feature is that each dimension of the feature has a fixed length, and its value is dense and meaningful, and the sparse feature is a feature with a larger length, but usually only a few values are not zero, and other features are all zero, such as an ID (identity) of an advertiser, and the length may be 2 million, but only one value is 1, and other positions are 0. During model training, the features of data are usually stored in a mini-batch (mini-batch) mode, the sparse features are usually represented in a triple mode, and 99% of training data in a small batch can be sparse features, so that effective compression of the sparse features can be greatly helpful for improving the network transmission and reading efficiency of the training data.

Currently, the open-source deep learning framework tensorflow represents sparse features in the form of (index, values, dense _ shape) triples. Where values are a list of values representing all sparse features in the batch; index (index) indicates to which sample each value in values belongs; dense _ shape represents the dimension of the data of this batch and how many samples it has, and how many features each sample has at most.

However, the above representation method needs to use 2 | values | data to represent a sparse feature, so that the number of bytes of transmitted data is still too large, and the transmitted information is redundant.

Disclosure of Invention

The present disclosure provides a data compression method, a data decompression device, an electronic device, and a storage medium, which at least solve the problems in the related art that the number of bytes of transmission data for sparse features is large, and transmission information has redundancy. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a data compression method, including:

acquiring original representation data of sparse features to be compressed, wherein the original representation data comprises a non-zero feature array, an index array corresponding to the non-zero feature array and a dense shape array;

acquiring index segmentation data of the sparse feature according to the index array and a preset index representation length; the index segment data comprises a segment value, a segment length and a segment offset of each index segment;

aiming at each non-zero eigenvalue in the non-zero eigenvalue array, merging the non-zero eigenvalue and an index value corresponding to the non-zero eigenvalue according to the index segmentation data to obtain a compressed representation array of the sparse characteristic;

and acquiring compressed representation data of the sparse features according to the compressed representation array, the index segmentation data and the dense shape array.

Optionally, the step of obtaining index segment data of the sparse feature according to the index array and a preset index representation length includes:

acquiring an index value upper limit of the sparse feature according to a preset index representation length;

and segmenting the index array by taking the index value upper limit as reference, and acquiring the segment value, the segment length and the segment offset of each index segment.

Optionally, the step of, for each non-zero eigenvalue in the non-zero eigenvalue array, merging the non-zero eigenvalue and an index value corresponding to the non-zero eigenvalue according to the index segmentation data to obtain a compressed representation array of the sparse feature includes:

aiming at each non-zero eigenvalue in the non-zero eigenvalue array, acquiring an index value corresponding to the non-zero eigenvalue according to the index segmentation data;

according to the eigenvalue representation length and the index representation length, combining the nonzero eigenvalue and the index value in a binary mode to obtain a binary combined value;

and converting the binary combination value corresponding to each non-zero characteristic value into a decimal form to obtain a compressed expression array of the sparse characteristics.

According to a second aspect of the embodiments of the present disclosure, there is provided a data decompression method, including:

acquiring a non-zero feature array and an offset index array of original sparse features according to a compression representation array, a preset feature value representation length and an index representation length in compression representation data to be decompressed;

performing offset correction on elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data to obtain an index array of the original sparse feature;

and acquiring the original sparse features corresponding to the compressed representation data according to the non-zero feature array, the index array and the dense shape array in the compressed representation data.

Optionally, the step of obtaining a non-zero feature array and an offset index array of the original sparse feature according to a compression representation array, a preset feature value representation length and an index representation length in the compression representation data to be decompressed includes:

converting decimal elements in the compressed representation array to binary elements;

reading a first segment bit representing a characteristic value and a second segment bit representing an index value in each binary element according to the characteristic value representation length and the index representation length;

acquiring a non-zero feature array of the original sparse features according to the first segment bits;

and acquiring an offset index array of the original sparse feature according to the second segment bit.

Optionally, the step of performing offset correction on elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data to obtain the index array of the original sparse feature includes:

constructing an offset array of the offset index array according to the segment length and the segment offset of each index segment;

and adding the offset array and the offset index array to obtain the index array of the original sparse feature.

According to a third aspect of the embodiments of the present disclosure, there is provided a data compression apparatus including:

the original representation data acquisition module is configured to execute acquisition of original representation data of sparse features to be compressed, and the original representation data comprises a non-zero feature array, an index array corresponding to the non-zero feature array and a dense shape array;

the index segmentation data acquisition module is configured to acquire the index segmentation data of the sparse feature according to the index array and a preset index representation length; the index segment data comprises a segment value, a segment length and a segment offset of each index segment;

a compressed representation array obtaining module configured to perform merging, for each non-zero eigenvalue in the non-zero eigenvalue, the non-zero eigenvalue and an index value corresponding to the non-zero eigenvalue according to the index segment data, so as to obtain a compressed representation array of the sparse feature;

a compressed representation data acquisition module configured to perform acquisition of compressed representation data of the sparse features according to the compressed representation array, the index segmentation data, and the dense shape array.

Optionally, the index segmentation data obtaining module includes:

the index value upper limit acquisition submodule is configured to acquire the index value upper limit of the sparse feature according to a preset index representation length;

and the index array segmentation submodule is configured to segment the index array by taking the index value upper limit as reference, and acquire the segment value, the segment length and the segment offset of each index segment.

Optionally, the compressed representation array obtaining module includes:

the index value acquisition submodule is configured to execute, aiming at each non-zero eigenvalue in the non-zero eigenvalue array, acquiring an index value corresponding to the non-zero eigenvalue according to the index segmentation data;

a binary combination value obtaining sub-module configured to perform a binary combination of the non-zero eigenvalue and the index value according to the eigenvalue representation length and the index representation length to obtain a binary combination value;

and the compressed representation array acquisition submodule is configured to convert the binary combination value corresponding to each non-zero characteristic value into a decimal form to obtain a compressed representation array of the sparse characteristic.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a data decompression apparatus, including:

the compression expression array decompression module is configured to execute the steps of obtaining a non-zero feature array and an offset index array of the original sparse feature according to a compression expression array, a preset feature value expression length and an index expression length in compression expression data to be decompressed;

an index array obtaining module configured to perform offset correction on elements in the offset index array according to a segment length and a segment offset of each index segment in the index segment data to obtain an index array of the original sparse feature;

and the original sparse feature construction module is configured to execute the operation of acquiring the original sparse features corresponding to the compressed representation data according to the non-zero feature array, the index array and the dense shape array in the compressed representation data.

Optionally, the compressed representation array decompression module comprises:

an element binary conversion submodule configured to perform conversion of decimal elements in the compressed representation array into binary elements;

a segment bit splitting and reading submodule configured to perform reading a first segment bit characterizing a feature value and a second segment bit characterizing an index value in each binary element according to the feature value representation length and the index representation length;

a non-zero feature array obtaining sub-module configured to obtain a non-zero feature array of the original sparse feature according to the first segment bit;

and the offset index array acquisition submodule is configured to acquire the offset index array of the original sparse feature according to the second segment bit.

Optionally, the index array obtaining module includes:

an offset array construction sub-module configured to perform construction of an offset array of the offset index array according to the segment length and segment offset of each of the index segments;

and the index array acquisition submodule is configured to perform addition of the offset array and the offset index array to obtain an index array of the original sparse feature.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement any of the data compression methods as previously described.

According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium, wherein instructions that, when executed by a processor of an electronic device, enable the electronic device to perform any one of the data compression methods as described above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform any one of the data compression methods as described above.

According to an eighth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus, comprising:

a processor;

a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement any of the data decompression methods as previously described.

According to a ninth aspect of the embodiments of the present disclosure, there is provided a storage medium, wherein instructions when executed by a processor of an electronic device enable the electronic device to perform any one of the data decompression methods as described above.

According to a tenth aspect of embodiments of the present disclosure, there is provided a computer program product, wherein instructions of the computer program product, when executed by a processor of an electronic device, enable the electronic device to perform any one of the data decompression methods as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, original representation data of sparse features to be compressed is acquired, wherein the original representation data comprises a non-zero feature array, an index array corresponding to the non-zero feature array and a dense shape array; acquiring index segmentation data of the sparse feature according to the index array and a preset index representation length; the index segment data comprises a segment value, a segment length and a segment offset of each index segment; aiming at each non-zero eigenvalue in the non-zero eigenvalue array, merging the non-zero eigenvalue and an index value corresponding to the non-zero eigenvalue according to the index segmentation data to obtain a compressed representation array of the sparse characteristic; and acquiring compressed representation data of the sparse features according to the compressed representation array, the index segmentation data and the dense shape array. Therefore, the compression efficiency of the sparse characteristics is improved, and the byte number of the transmitted data and the information redundancy are reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow chart illustrating a method of data compression according to an example embodiment.

FIG. 2 is a flow chart illustrating another method of data compression according to an example embodiment.

FIG. 3 is a diagram illustrating an index and value collectively represented by an integer according to an example embodiment.

Fig. 4 is a flow chart illustrating a method of data decompression according to an exemplary embodiment.

FIG. 5 is a diagram illustrating a data structure for storing segment lengths and segment offsets for respective index segments, according to an illustrative embodiment.

Fig. 6 is a flow chart illustrating another method of data decompression according to an example embodiment.

Fig. 7 is a block diagram illustrating a data compression apparatus according to an example embodiment.

Fig. 8 is a block diagram illustrating another data compression apparatus according to an example embodiment.

Fig. 9 is a block diagram illustrating a data decompression apparatus according to an exemplary embodiment.

Fig. 10 is a block diagram illustrating another data decompression apparatus according to an example embodiment.

FIG. 11 is a block diagram illustrating an apparatus in accordance with an example embodiment.

FIG. 12 is a block diagram illustrating another apparatus according to an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Fig. 1 is a flow chart illustrating a method of data compression according to an exemplary embodiment, which may include the steps of, as shown in fig. 1:

in step S11, original representation data of the sparse feature to be compressed is obtained, where the original representation data includes a non-zero feature array, an index array corresponding to the non-zero feature array, and a dense shape array.

As described above, in the related art, the (index, values, dense _ shape) triple is used to indicate that the sparse feature has the problem that the number of bytes of data is still too large, and the transmission information has redundancy. In the embodiment of the disclosure, mainly aiming at the problems that the current sparse characteristic representation needs a relatively large number of bytes for transmission, and the decompression speed is not fast enough and easily affects the data loading speed of an input data interface, a compression representation algorithm of the sparse characteristic is provided, which compresses the original representation data of a sparse matrix to reduce the number of bytes, and the decompression speed is also fast without affecting the data loading speed.

First, raw representation data of the sparse features to be compressed needs to be acquired. The original representation data may be data representing sparse features in a triple form, and specifically may include a non-zero feature array, that is, values, an index array, that is, index, corresponding to the non-zero feature array, and a dense shape array, that is, dense _ shape.

For example, for any batch of sparse features [1,0,0,0], [0,2,3,0], [0,0,4,0], [0,0,0,5], using (index, values, dense _ shape) triple representation method, the original representation data can be obtained to include the following contents:

index＝[0,1,1,2,3]，values＝[1,2,3,4,5]，dense_shape＝[4,4]。

the index value of the sample to which each value in the values belongs, that is, the specific value rule of the index value corresponding to each non-zero eigenvalue in the index, may be preset according to a requirement, and the embodiment of the present disclosure is not limited thereto. For example, for the above-mentioned index array, the index value starts to take a value from 0, and the row number of each non-zero eigenvalue may also be used as the index value of the corresponding non-zero eigenvalue according to the requirement, and so on.

In step S12, according to the index array and a preset index representation length, obtaining index segmentation data of the sparse feature; the index segment data includes a segment value, a segment length, and a segment offset for each index segment.

By analyzing the raw representation data of the above-described triplet representations, the following conclusions can be drawn: index usually starts with 0 or 1 and the value gradually increases until the maximum number is likely the Batch Size (Batch Size); the size of the representation number of values can be determined by the size of a hash bucket of the sparse feature, some hash buckets of the sparse feature are smaller, for example, gender features may exist only by four, and some hash buckets of the sparse feature are larger, for example, user IDs may reach tens of millions; the number of data in index and values is the same.

Therefore, in the embodiment of the present disclosure, in order to improve the compression effect, the values of the index and the values may be collectively expressed using one integer by using a method of sharing data storage spaces of the index and the values. In the embodiment of the present disclosure, when compressing the sparse feature, in order to reduce the number of bytes after compression, length limit may be set for the index and the feature value respectively, and since the index set for the index value indicates that the length exceeds the maximum value in the index value, the index value of the sparse feature needs to be segmented. Specifically, index segment data of the sparse feature may be obtained according to an index array of the sparse feature and a preset index representation length, where the index segment data may include a segment value, a segment length, and a segment offset of each index segment.

For example, assuming that the maximum size of the hash bucket of the sparse feature of the current batch is several tens of millions, a 24-bit binary can be used to represent the non-zero feature value in the sparse feature, and a 24-bit binary can represent that the maximum hash bucket size is 2^24, which is about 1.6 tens of millions; furthermore, the word length of a computer is 32 bits (bits), that is, 4 bytes, and data of an integer type is also stored with 32 bits, so in the embodiment of the present disclosure, in order to use one integer to jointly represent the values of index and values, the remaining 8-bit binary can be used to represent the value of index, and the value range can be 256 integers between 0 and 255.

However, in practical application, 256 is a smaller batch size value, the number of a batch of sparse features that need to be compressed and represented may be larger, and the batch number of the sparse features may be larger than this value in many cases.

Of course, in the embodiment of the present disclosure, the compression lengths of the values and the index may also be customized according to requirements, and the embodiment of the present disclosure is not limited thereto. For example, values may also be represented with 28 bits, index with 4 bits, and so on.

Moreover, when index segment data of the sparse feature is acquired according to the index array of the sparse feature and the preset index representation length, the value range of the index value in each index segment cannot exceed the maximum value that the index representation length can represent due to the limitation of the index representation length. For example, assuming that the preset index has a length of 8 as described above, and can represent 256 values at most, if the Batch Size of the sparse feature is greater than 256, in order to be able to represent all index values by 8 bits, the index array of the sparse feature may be segmented, and since index usually starts from 0 or 1 and the values gradually increase, the segmentation may be performed in the order of front and rear of the index values in the index array. Moreover, in order to ensure that the index value included in each index segment obtained after the segmentation processing is less than or equal to the maximum value that can be represented by the preset index representation length, data migration may be performed on the index value that is greater than the maximum value that can be represented by the preset index representation length in the index array according to the preset index representation length.

Taking the above-mentioned [1,0,0,0], [0,2,3,0], [0,0,4,0], [0,0,0,5] as an example, the above-mentioned respective sparse features are expressed in a matrix form, and can obtain:

at this time, the above sparse feature can be obtained by using a triplet, which can be expressed as follows:

index ═ 0,1,1,2, 3, which indicates that the index values of the samples to which the non-0 eigenvalues in the matrix respectively belong are 0,1,1,2, 3 in order; values ═ 1,2, 3, 4,5], indicating that the non-0 eigenvalues in the matrix are 1,2, 3, 4,5, respectively; dense _ shape ═ 4,5, indicates that the size of the matrix is 4 rows and 5 columns.

At this time, assuming that the preset index indicates a length of 8 bits, and the index ranges from 256 integers of 0 to 255, since the values of the indexes are all smaller than 255, the index segments can be divided into index segments, and the index segment data is obtained as follows:

segment value: 01123; segment size: 5; segment offset: 0.

alternatively, the index may be split into a plurality of index segments. For example, the index segment may be split into [ 011 ] and [ 23 ], and since the maximum value of each index segment does not exceed the value range of index, the index segment may not be shifted, and the finally obtained index segment data at this time is as follows:

segment value: [011] [ 23 ]; segment size: 3, 2; segment offset: 0,0.

In step S13, for each non-zero eigenvalue in the non-zero eigenvalue array, merging the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue according to the index segmentation data, so as to obtain a compressed representation array of the sparse feature.

As described above, in order to improve the compression effect, the values of index and values may be collectively expressed using one integer. Therefore, in the embodiment of the present disclosure, for each non-zero eigenvalue in the non-zero eigenvalue array, the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue may be merged according to the index segment data, so as to obtain the compressed representation array of the sparse feature. When merging, the merging principle of the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue may be preset according to the requirement, and the embodiment of the present disclosure is not limited thereto.

For example, the merging rule may be set to have a non-zero eigenvalue before, an index value corresponding to the non-zero eigenvalue after, or an index value before the non-zero eigenvalue, and so on. When merging is performed, the non-zero eigenvalue and the index value in the decimal form may be directly merged, or the non-zero eigenvalue and the index value may be converted into any other available binary form and then merged, and so on.

Then for the sparse feature mentioned above, assuming that the merging rule is that the non-zero eigenvalue precedes the index value corresponding to the non-zero eigenvalue, and the non-zero eigenvalue and the index value in decimal form are directly merged, then the compressed representation array of the corresponding sparse feature is obtained as [10, 21, 31, 42, 53 ].

If the non-zero eigenvalue and the index value in decimal form are converted into binary, then merged and then converted into decimal, and in the embodiment of the present disclosure, if the index representation length is set to 8, then the representation length of the non-zero eigenvalue may be 24, then the compressed representation array of the sparse characteristics may be obtained as [0< <24+1,1< <24+2,1< <24+3,2< <24+4,3< <24+5], that is [1,16777218,16777219,33554436,50331653 ].

In step S14, compressed representation data of the sparse feature is acquired from the compressed representation array, the index-segmented data, and the dense shape array.

After the compressed representation array and the index segmentation data of the sparse feature to be compressed of the current batch are obtained, the compressed representation data of the sparse feature can be further obtained according to the compressed representation array, the index segmentation data and the dense shape array.

For example, for the sparse feature of the above-mentioned batch, the compressed representation array, the index segment data, and the dense shape may be directly used as the compressed representation data, so that the compressed representation data may include the following:

the compression represents the array: [1,16777218,16777219,33554436,50331653 ];

segment value: 01123;

segment size: 5;

segment offset: 0;

dense shape array: [4,5].

Where the segment value, segment size, and segment offset are index segmentation data. However, as can be seen from the above analysis, the compressed representation array is a combination of the non-zero eigenvalue and the index value thereof, that is, the compressed representation array includes the index value corresponding to each non-zero eigenvalue. The segment value may also include an index value included in the index segment obtained by segmenting the index array of the sparse feature, and therefore, when the compressed representation data includes the compressed representation array, the segment value in the index segment data may not be included, and the compressed representation data obtained at this time may include the compressed representation array, the segment size, the segment offset, and the dense shape array.

At this time, the size of the original representation data that needs to be transmitted before compression includes index size + values size + dense _ shape size 5+5+ 2-12 integers; the size of the compressed representation data to be transmitted after the compression according to the scheme includes new _ value size + segment offset + dense _ shape size, which is 5+1+1+2, which is 9 integers. The compression ratio is 9/12-3/4.

Referring to fig. 2, in an embodiment of the present disclosure, the step S12 may further include:

step S121, acquiring an index value upper limit of the sparse feature according to a preset index representation length;

and step S122, segmenting the index array by taking the index value upper limit as reference, and acquiring the segment value, the segment length and the segment offset of each index segment.

In the embodiment of the present disclosure, for the sparse feature to be compressed, since the total number of index values included in the index array is fixed, the data occupied by the index values themselves does not change much regardless of the division of the index array into several segments. However, the larger the number of index segments, the larger the amount of data of segment length and segment offset in the corresponding index segment data.

For example, for the sparse matrix described above, if the index array is divided into two segments, index-segmented data can be obtained: segment value: [011] [ 23 ]; segment size: 3, 2; segment offset: 0,0. At this time, the data size of the index segment data is 5+2+2 — 9; if the index array is divided into one segment, index segmentation data can be obtained: segment value: 01123; segment size: 5; segment offset: 0. at this time, the data size of the index segment data is 5+1+1 — 7.

Therefore, in order to increase the number of index segments obtained by segmenting the index array as much as possible, and thus increase the data compression efficiency, in the embodiment of the present disclosure, the index array may be divided into as few index segments as possible. However, since the index representation length is limited, the index value that the index representation length can represent is correspondingly limited.

Therefore, in the embodiment of the present disclosure, the index value upper limit of the sparse feature may be obtained according to a preset index representation length, and then the index array is segmented with the index value upper limit as a reference, and a segment value, a segment length, and a segment offset of each index segment are obtained.

For example, assuming that the index indicates a length of 8 bits and the index has 256 integers in a range from 0 to 255, for the sparse feature, since the index values are all smaller than 255, the index can be obtained by dividing into segments:

segment value: 01123; segment size: 5; segment offset: 0.

for a larger batch of sparse features, which is a batch containing 512 samples, assume that the number of columns of data is 100000.

[1 1 0 0 … 0 0]

[2 2 0 0 … 0 0]

[3 3 0 0 … 0 0]

…

[511 511 0 0 … 0 0]

[512 512 0 0 … 0 0]

The representation using triplets can be expressed as:

index＝[0 0 1 1 2 2 … 510 510 511 511]；values＝[1 1 2 2 3 3 … 511 511 512 512]；dense_shape＝[512,100000]。

at this time, since the value of index may exceed 255, it is necessary to divide index into a plurality of segments, and since the maximum value 511/255 is greater than 1 and less than 2, it is necessary to divide into two segments. Specifically, since the index indicates that the maximum index value that can be represented by the length is 255, an offset of 256 × N may be subtracted from the index value that exceeds 255 in the index array, and the index value in each of the finally obtained index segments is between 0 and 255. Then, at this time, the offset processing may be performed for index values exceeding 255 in the second half of the index array, and the index values with the same offset may be divided into one index segment. Then for the index array described above, the segmented result may be as follows:

segment value: [ 001122 … 254254255255 ] [ 001122 … 254254255255 ];

segment size: 512512, respectively;

segment offset: 0256;

where parenthesis is used in describing the segment values herein to indicate that different index segments are not needed for actual transmission, but are cut by segment size.

Moreover, in order to verify that the original index can be restored according to the index segmentation data, the segment value of each index segment is added with the corresponding segment offset, that is:

the index is [0+ 00 + 01 + 01 + 02 + 02 +0 … 254+ 0254 + 0255 + 0255 +0] [0+ 2560 + 2561 + 2561 + 2562 + 2562 +256 … 254+ 256254 + 256255 + 256255 +256] (001122 … 254254255255 ] [ 256256257257 … 510510511511 ] (001122 … 510510511511 ], and the original index is obtained.

Referring to fig. 2, in an embodiment of the present disclosure, the step S13 may further include:

step S131, aiming at each non-zero eigenvalue in the non-zero eigenvalue array, according to the index segmentation data, obtaining an index value corresponding to the non-zero eigenvalue;

step S132, merging the non-zero eigenvalue and the index value in a binary form according to the eigenvalue representation length and the index representation length to obtain a binary merged value;

step S133, converting the binary merged value corresponding to each non-zero eigenvalue into a decimal form, so as to obtain a compressed representation array of the sparse characteristic.

As described above, in the embodiment of the present disclosure, in order to equalize the hash bucket size of the sparse feature and the complexity of the decompression process, the binary representation length may be set appropriately for the feature value and the index value. Generally, if the hash bucket corresponding to the sparse feature is larger, binary values larger than 24 bits and binary index smaller than 8 bits may be used similarly. However, in order to reduce the complexity of the index correction operation, the number of bits of the binary representing the index cannot be too small.

Then, when combining the non-zero eigenvalue and the index value, the index value corresponding to the non-zero eigenvalue may be obtained according to the index segment data for each non-zero eigenvalue in the non-zero eigenvalue array, and then the non-zero eigenvalue and the index value corresponding thereto are combined in a binary form according to the eigenvalue representation length and the index representation length to obtain a binary combined value, and finally the binary combined value corresponding to each non-zero eigenvalue is converted into a decimal form to obtain a compressed representation array of the sparse characteristic.

For example, assuming that the preset eigenvalue representation length and the index representation length are 24 and 8, respectively, that is, index occupies 8 bits, value occupies 24 bits, and the index value is before and the non-zero eigenvalue is after in the merging process, as shown in fig. 3, a diagram is shown in which the index and the value are collectively represented by an integer. Then for the sparse feature of the above smaller lot, the segment value and value of the above segment are combined into an integer of 32 bits in turn, and a compressed representation array new _ value [0< <24+1,1< <24+2,1< <24+3,2< <24+4,3< <24+5] > 1,16777218,16777219,33554436,50331653] can be obtained.

For the sparse feature of the larger batch, segment values and values are combined into 32-bit integers in sequence, and the following results new _ value [ [0< <24+1, 0< <24+1,1< <24+2,1< <24+2, 2< <24+3,2< <24+3 … < <24+255, 254< <24+255, 255< <24+256, 255< <24+256] [0< <24+257, 0< <24+257, 1< <24+258, 1< <24+258, < <24+258 … < <24+511, 254< <24+511, 255< <24+512, 512], [1,16777218, 33554435, 3344 …, 613113119, 61417749, 61419041336, 42336, 42257, 4278175, 42781781781781, 42781, 4241336, 9041781, 4275, 42781, 4241336, 90781, 42781.

In the intermediate calculation process, the division into the two arrays through the brackets is mainly used for distinguishing different sections, and the actual calculation process may not be divided into the two arrays.

In this case, the compressed presentation data after compression includes:

new_value＝[1，1，16777218，16777218，33554435，33554435，…，4261413119，4261413119，4278190336，4278190336，257，257，16777474，16777474，…，4261413375，4261413375，4278190592，4278190592]；

segment size: 512512, respectively;

segment offset: 0256;

dense_shape:[512,100000]；

the size of the original representation data to be transmitted before compression is 1024+1024+2 is 2050 integers, and the size of the compressed representation data to be transmitted after compression is 1024+2+2+ 1030. The compression ratio at this time was 1030/2050 ═ 0.5024.

In the embodiment of the present disclosure, the index value upper limit of the sparse feature may be obtained according to a preset index representation length; and segmenting the index array by taking the index value upper limit as reference, and acquiring the segment value, the segment length and the segment offset of each index segment. Therefore, the segmentation effect of the index array is improved, and the data compression effect of the sparse feature is further improved.

Moreover, in the embodiment of the present disclosure, for each non-zero eigenvalue in the non-zero eigenvalue array, according to the index segmentation data, an index value corresponding to the non-zero eigenvalue may be obtained; according to the eigenvalue representation length and the index representation length, combining the nonzero eigenvalue and the index value in a binary mode to obtain a binary combined value; and converting the binary combination value corresponding to each non-zero characteristic value into a decimal form to obtain a compressed expression array of the sparse characteristics. Therefore, the data compression effect of the sparse feature can be further improved.

Fig. 4 is a flow chart illustrating a method of data decompression according to an exemplary embodiment, which may include the steps of, as shown in fig. 4:

step S21, according to the compressed representation array, the preset feature value representation length and the index representation length in the compressed representation data to be decompressed, obtain the non-zero feature array and the offset index array of the original sparse feature.

Step S22, performing offset correction on the elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data, to obtain the index array of the original sparse feature.

And step S23, obtaining the original sparse characteristics corresponding to the compressed representation data according to the non-zero characteristic array, the index array and the dense shape array in the compressed representation data.

In practical application, in order to improve the transmission efficiency of data, the sparse feature may be compressed and represented, but in the subsequent use process, the compressed and represented data of the sparse feature also needs to be decompressed and restored to obtain the sparse feature and/or the original represented data of the sparse feature. The specific process of decompression can be operated in reverse according to the compression process, so that the original sparse feature before compression can be obtained.

Specifically, the segment value and the non-zero feature value may be obtained by decompressing according to the compressed representation array in the compressed representation data, and then the index array of the sparse feature may be obtained again according to the segment length and the segment offset corresponding to each index segment and the segment value. Furthermore, a dense array of shapes may also be obtained from the compressed representation data.

In addition, because the original representation data of the sparse features includes the non-zero feature array, the index array corresponding to the non-zero feature array, and the dense shape array, in the process of decompressing the compressed representation data, the original sparse features can be restored and obtained, so that the non-zero feature array, the index array corresponding to the non-zero feature array, and the dense shape array can be obtained, and the original sparse features corresponding to the compressed representation data can be further obtained according to the non-zero feature array, the index array, and the dense shape array in the compressed representation data.

Of course, in the embodiment of the present disclosure, if the purpose of decompression is to obtain the original representation data of the original sparse feature, only the non-zero feature array, the index array, and the dense shape array in the compressed representation data of the original sparse feature need to be obtained.

As described above, each non-zero feature array and the merged value of the index value are included in the compressed representation array, and the feature value representation length and the index representation length are preset, so in the embodiment of the present disclosure, each non-zero feature value and the index value can be correspondingly read from the compressed representation array according to the feature value representation length and the index representation length, and further the non-zero feature array and the offset index array of the original sparse feature can be obtained.

Since the merged value included in the compressed representation array is the index value corresponding to each non-zero eigenvalue and the corresponding non-zero eigenvalue in the segmented index segment, the index array obtained at this time is the index array corresponding to the segmented index segment, and may have an offset with respect to the index array in the original representation data, and thus may be referred to as an offset index array.

Further, in order to decompress the index array in the original representation data corresponding to the original sparse feature, offset correction may be further performed on each element in the offset index array according to the segment length and the segment offset in the compressed representation data, so as to obtain the index array of the original sparse feature. Specifically, the offset corresponding to each element in the offset index array can be obtained according to the segment length and the segment offset, so that each element in the offset index array is subjected to offset correction according to the offset of each element, and the index array of the original sparse feature is obtained.

For example, for the large batch of sparse features, according to the compressed representation array, the preset feature value representation length and the index representation length, the obtained non-zero feature array and the obtained offset index array are respectively as follows:

offset index array: [0,0,1,1,2,2 … 254,254,255,255], for a total of 1024 values;

non-zero eigenvalue array: [1,1,2,2,3,3 … 511,511,512,512], for a total of 1024 values;

further, the offset index array may be segmented according to segment length to obtain segment values of 2 index segments:

[0 0 1 1 2 2 … 254 254 255 255]，[0 0 1 1 2 2 … 254 254 255 255]，

furthermore, index can be restored according to the offset of each index segment, and the above segment values are respectively added with offsets, i.e. the first segment is added with 0, and the second segment is added with 256, so as to obtain

[ 001122 … 254254255255 ] [ 256256257257258258 … 510510511511 ], also known as [ 001122 … 254254255255256256257257258258 … 510510511511 ].

In addition, in the embodiment of the present disclosure, in order to correctly restore the index value represented by 8 bits in the binary system to the index value in the original representation data, a data structure dedicated to storing the segment length and the segment offset of each index segment may also be defined, as shown in fig. 5. Wherein the segment length and segment offset for each index segment may be stored in a one-to-one correspondence,

in the embodiment of the disclosure, a non-zero feature array and an offset index array of an original sparse feature can be obtained according to a compression representation array, a preset feature value representation length and an index representation length in compression representation data to be decompressed; performing offset correction on elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data to obtain an index array of the original sparse feature; and acquiring the original sparse features corresponding to the compressed representation data according to the non-zero feature array, the index array and the dense shape array in the compressed representation data. Therefore, the compressed representation data can be decompressed quickly and accurately, and the original sparse features are recovered.

Referring to fig. 6, in an embodiment of the present disclosure, the step S21 may further include:

step S211, converting decimal elements in the compressed representation array into binary elements;

step S212, reading a first segment bit of the characteristic value and a second segment bit of the characteristic index value in each binary element according to the characteristic value representation length and the index representation length;

step S213, acquiring a non-zero feature array of the original sparse feature according to the first segment;

step S214, obtaining the offset index array of the original sparse feature according to the second segment bit.

As described above, in the embodiment of the present disclosure, each non-zero eigenvalue and index value may be combined by one decimal integer in order to improve compression efficiency. The eigenvalue representation length and the index representation length limit the number of bits occupied by non-zero eigenvalues and index values in binary form.

Therefore, in the embodiment of the present disclosure, in order to obtain the nonzero feature array and the index array from decompression in the compressed representation array, each element in the compressed representation array may be first converted into a binary system from a decimal system, and then a first segment bit of a representation feature value and a second segment bit of a representation index value in each binary element are read according to the feature value representation length and the index representation length, and finally, the nonzero feature array of the original sparse feature may be obtained according to the first segment bit, and the offset index array of the original sparse feature may be obtained according to the second segment bit.

For example, assuming that the preset feature value represents a length of the lower 24 bits in each binary element, the index represents a length of the upper 8 bits in each binary element. The lower 24 bits of each of the binary elements that characterize the feature value and the upper 8 bits that characterize the index value may be read. In particular, the segment bit reading may be performed in any available manner, and the embodiment of the present disclosure is not limited thereto.

For example, the bitwise _ and operation of numpy (an open source numerical computation extension of Python) may be employed to take the lower 24 bits of each binary element in the compressed representation array Z as the first segment bits, e.g.: bitwise _ and (Z,0 xfffff), and then the binary first segment is converted into decimal, that is, the original values can be obtained by reduction. In addition, numpy bitwise _ and right _ shift may be adopted to take out the upper 8 bits of the binary element that is compressed to represent each element of the array Z, and then shift the upper 24 bits to the right as the second segment bit, and further convert the first segment bit of the binary to the decimal system, so as to obtain the shift index array of the original sparse feature, for example, index ═ numpy.

Referring to fig. 6, in an embodiment of the present disclosure, the step S22 may further include:

step S221, constructing an offset array of the offset index array according to the segment length and the segment offset of each index segment;

step S222, adding the offset array and the offset index array to obtain an index array of the original sparse feature.

In the embodiment of the present disclosure, since the offsets of different elements in the offset index array may also be different, it is time-consuming to perform offset correction on each element in the offset index array in turn. Therefore, in order to improve the offset correction efficiency, an offset array, i.e., a correction array of the index, may be first constructed according to the known segment length S and the offset O of each index segment, and specifically, the offset O of each index segment may be repeated by the corresponding size S, so as to obtain an offset array of the whole index value, i.e., offset, which has the same size as the offset array obtained in the previous step.

Then, the offset index array may be added to the offset index array, so as to obtain an index array of the original sparse feature, that is, index _ final is index + offsets.

Taking the sparse feature of the smaller batch as an example, the period value is 01123; segment size: 5; segment offset: 0. to expand the offset to a length of 5, an offset array is obtained. That is, extended is numpy, repeat ([0], [5]) is [ 00000 ], and then corresponding elements in the offset array extended and the offset index array seg _ value are directly added one by one, that is, extended + seg _ value is [0+ 01 + 01 + 02 + 03 +0], [ 01123 ], thereby obtaining the original index array.

Taking the sparse feature of the larger batch as an example, the period value is: [ 001122 … 254254255255 ] [ 001122 … 254254255255 ]; segment size: 512512, respectively; segment offset: 0256; seg _ value ═ 001122 … 254254255255001122 … 254254255255. First, an offset array of the offset index array, extended ═ numpy ([ 0256 ], [ 512512 ]) [ 00 … 00 ] [ 256256 … 256256 ] [ 00 … 00256256 … 256256 ], is constructed according to the segment length and the segment offset of each index segment, and each segment in the extended has a length of 512 and a total length of 1024. And directly adding corresponding elements in the offset array expanded and the offset index array seg _ value one by one, so as to obtain the index array of the corresponding sparse feature.

Through the above operation, the corrected index is the same as the index before compression, and values are also restored to the original values. More importantly, the speed of the whole decompression process is very fast through the above efficient operations such as bitwise _ and, right _ shift, repeat, etc.

In embodiments of the present disclosure, decimal elements in the compressed representation array may be converted to binary elements; reading a first segment bit representing a characteristic value and a second segment bit representing an index value in each binary element according to the characteristic value representation length and the index representation length; acquiring a non-zero feature array of the original sparse features according to the first segment bits; and acquiring an offset index array of the original sparse feature according to the second segment bit. Constructing an offset array of the offset index array according to the segment length and the segment offset of each index segment; and adding the offset array and the offset index array to obtain the index array of the original sparse feature. So that the compressed representation data can be decompressed more quickly and accurately.

Fig. 7 is a block diagram illustrating a data compression apparatus according to an example embodiment. Referring to fig. 7, the apparatus includes: an original representation data acquisition module 31, an index segmentation data acquisition module 32, a compressed representation array acquisition module 33, and a compressed representation data acquisition module 34.

An original representation data obtaining module 31 configured to perform obtaining original representation data of sparse features to be compressed, where the original representation data includes a non-zero feature array, an index array corresponding to the non-zero feature array, and a dense shape array;

an index-segment data obtaining module 32 configured to obtain index-segment data of the sparse feature according to the index array and a preset index representation length; the index segment data comprises a segment value, a segment length and a segment offset of each index segment;

a compressed representation array obtaining module 33, configured to perform merging, according to the index segmentation data, the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue for each non-zero eigenvalue in the non-zero eigenvalue array, so as to obtain a compressed representation array of the sparse feature;

a compressed representation data acquisition module 34 configured to perform acquisition of compressed representation data of the sparse features according to the compressed representation array, the index segmentation data, and the dense shape array.

Referring to fig. 8, in an embodiment of the present disclosure, the index segmentation data obtaining module 32 may further include:

an index value upper limit obtaining submodule 321 configured to perform obtaining an index value upper limit of the sparse feature according to a preset index representation length;

and an index array segmenting submodule 322 configured to segment the index array with reference to the index value upper limit, and obtain a segment value, a segment length, and a segment offset of each index segment.

Referring to fig. 8, in the embodiment of the present disclosure, the compressed representation array obtaining module 33 may further include:

the index value obtaining sub-module 331 is configured to perform, for each non-zero eigenvalue in the non-zero eigenvalue array, obtaining an index value corresponding to the non-zero eigenvalue according to the index segmentation data;

a binary merge value obtaining sub-module 332, configured to perform merging the non-zero eigenvalue and the index value in a binary form according to the eigenvalue representation length and the index representation length to obtain a binary merge value;

a compressed representation array obtaining sub-module 333 configured to perform conversion of the binary merged value corresponding to each non-zero eigenvalue into a decimal form, so as to obtain a compressed representation array of the sparse characteristic.

Fig. 9 is a block diagram illustrating a data decompression apparatus according to an example embodiment. Referring to fig. 9, the apparatus includes: a compressed representation array decompression module 41, an index array acquisition module 42 and an original sparse feature construction module 43. Wherein the content of the first and second substances,

and the compression representation array decompression module 41 is configured to execute the steps of obtaining a non-zero feature array and an offset index array of the original sparse feature according to the compression representation array, the preset feature value representation length and the preset index representation length in the compression representation data to be decompressed.

And an index array obtaining module 42, configured to perform offset correction on elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data, so as to obtain an index array of the original sparse feature.

An original sparse feature constructing module 43 configured to execute obtaining an original sparse feature corresponding to the compressed representation data according to the non-zero feature array, the index array, and the dense shape array in the compressed representation data.

Referring to fig. 10, in the embodiment of the present disclosure, the compressing expression array decompression module 41 may further include:

an element binary conversion submodule 411 configured to perform conversion of decimal elements in the compressed representation array into binary elements;

a segment bit splitting reading submodule 412 configured to perform reading a first segment bit characterizing a feature value and a second segment bit characterizing an index value in each binary element according to the feature value representation length and the index representation length;

a non-zero feature array obtaining submodule 413 configured to perform obtaining a non-zero feature array of the original sparse feature according to the first segment bit;

an offset index array obtaining sub-module 414 configured to perform obtaining an offset index array of the original sparse feature according to the second segment bit.

Referring to fig. 10, in the embodiment of the present disclosure, the index array obtaining module 42 may further include:

an offset array construction submodule 421 configured to perform construction of an offset array of the offset index array according to the segment length and the segment offset of each index segment;

an index array obtaining sub-module 422 configured to perform addition of the offset array and the offset index array to obtain an index array of the original sparse feature.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 is a block diagram illustrating an apparatus 500 for data compression and/or data decompression according to an example embodiment. For example, the apparatus 500 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, the apparatus 500 may include one or more of the following components: a processing component 502, a memory 504, a power component 506, a multimedia component 508, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.

The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.

The memory 504 is configured to store various types of data to support operation at the device 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.

The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.

The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the device 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in the position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, an operator network (such as 2G, 5G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, an embodiment of the present disclosure further provides an electronic device, including: a processor; a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement any of the data compression methods as described above.

In an exemplary embodiment, the disclosed embodiments also provide a storage medium, where instructions are executed by a processor of an electronic device, so that the electronic device can execute any one of the data compression methods as described above.

wherein the processor is configured to execute the instructions to implement any of the data decompression methods as described above.

In an exemplary embodiment, the disclosed embodiments also provide a storage medium, where instructions are executed by a processor of an electronic device, so that the electronic device can execute any one of the data decompression methods described above.

In an exemplary embodiment, a storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 12 is a block diagram illustrating an apparatus 600 for data compression and/or data decompression according to an example embodiment. For example, the apparatus 600 may be provided as a server. Referring to fig. 12, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, configured to execute instructions, such as applications, that store executable instructions that can be executed by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform any of the data compression methods and/or data decompression methods described above.

The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input/output (I/O) interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The present disclosure provides a1, a data compression method, comprising:

A2, the method as described in a1, wherein the step of obtaining index-segmented data of the sparse feature according to the index array and the preset index representation length includes:

A3, the method according to a1, wherein the step of merging, for each non-zero eigenvalue in the non-zero eigenvalue array, the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue according to the index segmentation data to obtain the compressed representation array of the sparse feature comprises:

The present disclosure provides B4, a data decompression method, comprising:

B5, the method according to B4, wherein the step of obtaining the non-zero feature array and the offset index array of the original sparse features according to the compressed representation array, the preset feature value representation length and the index representation length in the compressed representation data to be decompressed includes:

B6, the method according to B4 or B5, wherein the step of performing offset correction on the elements in the offset index array according to the segment length and the segment offset of each index segment in the index segment data to obtain the index array of the original sparse feature includes:

The present disclosure provides C7, a data compression apparatus, comprising:

C8, the apparatus of C7, the index fragment data obtaining module comprising:

C9, the apparatus of C7, the compressed representation array obtaining module comprising:

The present disclosure provides D10, a data decompression device, comprising:

D11, the apparatus of D10, the compressed representation array decompression module comprising:

and the offset index array acquisition module is configured to acquire the offset index array of the original sparse feature according to the second segment bit.

D12, the apparatus as described in D10 or D11, the index array obtaining module comprising:

and the index array acquisition module is configured to perform addition of the offset array and the offset index array to obtain an index array of the original sparse feature.

The present disclosure provides E13, an electronic device, comprising:

a processor;

a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement the data compression method of any one of A1 to A3.

The present disclosure provides F14, a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform a data compression method as recited in any one of a1 to A3.

The present disclosure provides G15, an electronic device, comprising:

a processor;

a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement the data decompression method as claimed in any one of B4 to B6.

The present disclosure provides H16, a storage medium whose instructions, when executed by a processor of an electronic device, enable the electronic device to perform a data decompression method as recited in any one of B4 to B6.

Claims

1. A method of data compression, comprising:

2. The method according to claim 1, wherein the step of obtaining index-segmented data of the sparse feature according to the index array and a preset index representation length comprises:

3. The method according to claim 1 or 2, wherein the step of, for each non-zero eigenvalue in the non-zero eigenvalue array, merging the non-zero eigenvalue and the index value corresponding to the non-zero eigenvalue according to the index segment data to obtain the compressed representation array of the sparse feature comprises:

4. A method of data decompression, comprising:

5. A data compression apparatus, comprising:

6. A data decompression apparatus, comprising:

7. An electronic device, comprising:

a processor;

a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement the data compression method of any one of claims 1 to 3.

8. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a data compression method as claimed in any one of claims 1 to 3.

9. An electronic device, comprising:

a processor;

a memory configured to execute instructions stored in the memory;

wherein the processor is configured to execute the instructions to implement the data decompression method of claim 4.

10. A storage medium in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data decompression method of claim 4.