CN109542854B

CN109542854B - Data compression method, device, medium and electronic equipment

Info

Publication number: CN109542854B
Application number: CN201811371727.5A
Authority: CN
Inventors: 蒋宇翔
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2020-11-24
Anticipated expiration: 2038-11-14
Also published as: CN109542854A

Abstract

The embodiment of the invention relates to a data compression method and a device, belonging to the technical field of data processing, wherein the method comprises the following steps: abstracting fields in an original data table and table entries in the fields to obtain a dimensional space corresponding to the fields and discrete points corresponding to the table entries; dividing the discrete points to obtain a discrete point set according to the dimension space to which the discrete points belong and the distance between the discrete points; constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set; deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure, and reserving the difference fields. The method solves the problem of heavy burden of the terminal equipment caused by excessive redundant data in each table entry in the prior art, reduces the redundant data amount and reduces the burden of the terminal equipment.

Description

Data compression method, device, medium and electronic equipment

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a data compression method, a data compression device based on difference control, a computer readable storage medium and electronic equipment.

Background

As the game size and the operation time increase, the guide table data inevitably expands continuously. In the end-game era, the system memory is relatively sufficient, and the memory occupation of the data configuration table is acceptable. In the hand-game era, too large memory occupation of the guide data can cause overlarge burden on the terminal equipment, so that the terminal equipment can react slowly in the game process, and the user experience is influenced.

In order to solve the above problems, most enterprises have many attempts to optimize game data memory, such as design data, tupledit, sparsed dit, taggeddit, and record class with __ slots __.

However, in the above optimization schemes, the memory footprint is reduced by optimizing the format of a single entry or modifying the internal implementation, but if a single entry is viewed, the entry may be very compact and have little space for compression, but if multiple entries are put together, it is found that there are many same redundant data in each entry. For example, in the data table shown in fig. 1, the contents in the boxes are also completely consistent, so that a large amount of redundant data is generated, and the load on the terminal device is increased.

Therefore, it is desirable to provide a new data compression method and apparatus.

It is to be noted that the information invented in the above background section is only for enhancing the understanding of the background of the present invention, and therefore, may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present invention is directed to a data compression method, a data compression apparatus based on difference control, a computer-readable storage medium, and an electronic device, which overcome, at least to some extent, the problem of overloading a terminal device due to excessive redundant data caused by the limitations and disadvantages of the related art.

According to an aspect of the present disclosure, there is provided a data compression method including:

abstracting fields in an original data table and table entries in the fields to obtain a dimensional space corresponding to the fields and discrete points corresponding to the table entries;

dividing the discrete points to obtain a discrete point set according to the dimension space to which the discrete points belong and the distance between the discrete points;

constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set;

deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure, and reserving the difference fields.

In an exemplary embodiment of the present disclosure, dividing the discrete points into discrete point sets according to a dimension space to which the discrete points belong and a distance between the discrete points includes:

calculating distances between the discrete points;

and dividing the discrete points meeting the maximum distance limiting principle into the same set according to the dimension space to which the discrete points belong and the distance to obtain a plurality of discrete point sets.

In an exemplary embodiment of the present disclosure, dividing the discrete points satisfying the maximum distance constraint rule into the same set includes:

judging whether the distance between the discrete points is smaller than the maximum distance limit or not;

if the distance between the discrete points is less than the maximum distance limit, the discrete points are divided into the same set.

In an exemplary embodiment of the present disclosure, calculating the distance between the discrete points comprises:

and calculating the distance between the discrete points according to the difference sum of the discrete points on the field.

In an exemplary embodiment of the present disclosure, constructing a discrete point tree structure according to distances between discrete points in the discrete point set includes:

configuring a father table entry;

configuring the discrete points corresponding to the parent table items in the discrete point set as root nodes, and taking other discrete points except the root nodes as child nodes;

and constructing a discrete point tree structure according to the distance between the child node and the root node.

In an exemplary embodiment of the present disclosure, constructing a discrete point tree structure according to distances between discrete points in the discrete point set further includes:

judging whether the number of the discrete point sets is larger than the number of preset sets or not; the number of the preset sets is determined according to the number of the table entries;

and if the number of the discrete point sets is less than the preset set number, constructing a discrete point tree structure according to the distance between the discrete points in the discrete point sets.

In an exemplary embodiment of the present disclosure, after the retaining the difference field, the data compression method further includes:

and packaging and storing the difference field in a RecordManager mode.

According to an aspect of the present disclosure, there is provided a data compression apparatus including:

the abstract processing module is used for carrying out abstract processing on the field in the original data table and the table entry in the field to obtain a dimensional space corresponding to the field and a discrete point corresponding to the table entry;

the dividing module is used for dividing the discrete points to obtain a discrete point set according to the dimension space to which the discrete points belong and the distance between the discrete points;

the construction module is used for constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set;

and the deleting module is used for deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure and reserving the difference fields.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a data compression method as described in any one of the above.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform any of the data compression methods described above via execution of the executable instructions.

The embodiment of the invention relates to a data compression method and a device, wherein a field and an item in the field are abstracted to obtain a dimensional space and discrete points; then dividing the discrete points to obtain a discrete point set and constructing a discrete point tree structure; finally deleting the same fields in the tree structure and reserving the difference fields; on one hand, the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure are deleted, and the difference fields are reserved, so that the problem that the terminal equipment is overloaded due to excessive redundant data in each table entry in the prior art is solved, the redundant data amount is reduced, and meanwhile, the burden of the terminal equipment is also reduced; on the other hand, a discrete point tree structure is constructed according to the distance between discrete points in the discrete point set, then the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure are deleted, and the difference fields are reserved; when the table entry data included in the tree structure needs to be accessed again, complete data can be obtained directly through the relationship between the child node and the father node; by the method, the redundant data volume is reduced, the problem of reducing the accuracy of the representation data caused by reducing the data volume is avoided, and the accuracy of the representation data compression is improved; on the other hand, the field in the original data table and the table entry in the field are abstracted to obtain a dimensional space corresponding to the field and a discrete point corresponding to the table entry; and then according to the dimension space to which the discrete points belong and the distance between the discrete points, the discrete points are divided to obtain a discrete point set, so that the accuracy of constructing the discrete point set is improved, and the accuracy of data compression is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Figure 1 schematically shows an illustration of a representation of raw data.

Fig. 2 schematically shows a flow chart of a method of data compression.

Fig. 3 schematically shows an exemplary diagram of discrete points.

Fig. 4 schematically shows an example diagram of a set of discrete points.

Fig. 5 schematically shows an example of a tree structure.

FIG. 6 schematically shows an example graph of compressed data.

Fig. 7 schematically shows an exemplary diagram of a RecoredManager.

Fig. 8 schematically shows a flow chart of another data compression method.

Fig. 9 schematically shows another raw data representation diagram.

FIG. 10 schematically illustrates an example diagram of a detailed memory footprint prior to data compression.

FIG. 11 schematically illustrates an example diagram of a detailed memory footprint after data compression.

FIG. 12 schematically illustrates an example diagram of overall memory footprint prior to data compression.

FIG. 13 schematically illustrates an example graph of overall memory footprint after data compression.

Fig. 14 schematically shows a block diagram of a data compression apparatus.

Fig. 15 schematically shows an electronic device for implementing the above-described data compression method.

Fig. 16 schematically illustrates a computer-readable storage medium for implementing the above-described data compression method.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.

Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the present exemplary embodiment, a data compression method is first provided, for example, the method may be executed in a server, a server cluster, a cloud server, or the like, or may be executed in an equipment terminal; of course, those skilled in the art may also operate the method of the present invention on other platforms as needed, and this is not particularly limited in this exemplary embodiment. Referring to fig. 2, the data compression method may include the steps of:

step S210, carrying out abstract processing on the field in the original data table and the table entry in the field to obtain a dimensional space corresponding to the field and a discrete point corresponding to the table entry.

And S220, dividing the discrete points to obtain a discrete point set according to the dimension space to which the discrete points belong and the distance between the discrete points.

And S230, constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set.

And S240, deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure, and reserving the difference fields.

In the data compression method, on one hand, the same field between the discrete point corresponding to the root node and the discrete point corresponding to the child node in the tree structure is deleted, and the difference field is reserved, so that the problem that the terminal equipment is overloaded due to excessive redundant data in each table entry in the prior art is solved, the redundant data volume is reduced, and the burden of the terminal equipment is also reduced; on the other hand, a discrete point tree structure is constructed according to the distance between discrete points in the discrete point set, then the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure are deleted, and the difference fields are reserved; when the table entry data included in the tree structure needs to be accessed again, complete data can be obtained directly through the relationship between the child node and the father node; by the method, the redundant data volume is reduced, the problem of reducing the accuracy of the representation data caused by reducing the data volume is avoided, and the accuracy of the representation data compression is improved; on the other hand, the field in the original data table and the table entry in the field are abstracted to obtain a dimensional space corresponding to the field and a discrete point corresponding to the table entry; and then according to the dimension space to which the discrete points belong and the distance between the discrete points, the discrete points are divided to obtain a discrete point set, so that the accuracy of constructing the discrete point set is improved, and the accuracy of data compression is further improved.

Hereinafter, the above-described data compression method in the present exemplary embodiment will be explained and explained in detail with reference to the drawings.

In step S210, abstract processing is performed on a field in the original data table and an entry in the field, so as to obtain a dimensional space corresponding to the field and a discrete point corresponding to the entry.

In the present exemplary embodiment, an original data table including m fields of n entries is abstracted to obtain n discrete points of an m-dimensional space; for example, referring to fig. 1, a data table including 9 entries and 6 fields may be represented as 9 discrete points as shown in fig. 3 (for ease of understanding, the discrete points may be compressed from an m-dimensional space to a 2-dimensional space for display, further, the distance between the discrete points is not known in advance in a real algorithm, and therefore, the dimension space is compressed to a two-dimensional space for display, which may also facilitate understanding of the idea of the algorithm). It should be added that, in order to facilitate subsequent reading of the compressed file, the entries and the discrete points are in one-to-one correspondence; i.e. one discrete point corresponds to one table entry.

In step S220, the discrete points are divided into discrete point sets according to the dimension space to which the discrete points belong and the distance between the discrete points.

In the present exemplary embodiment, the step of dividing the discrete points into discrete point sets according to the dimension space to which the discrete points belong and the distance between the discrete points may specifically include the step S2202 and the step S2204. Wherein:

in step S2202, the distance between the discrete points is calculated.

In the present example embodiment, calculating the distance between the discrete points may include: and calculating the distance between the discrete points according to the difference sum of the discrete points on the field. For example:

continuing with FIG. 1, for example, if there are 6 fields in entry 1 (discrete point 1) and entry 2 (discrete point 2), the distance between the corresponding discrete point 1 and discrete point 2 is 6.

In step S2204, the discrete points that satisfy the maximum distance limiting principle are divided into the same set according to the dimension space to which the discrete points belong and the size of the distance, so as to obtain a plurality of discrete point sets.

In this exemplary embodiment, after the distances between the discrete points are obtained, the discrete points that satisfy the maximum distance limiting principle may be divided into the same set according to the dimension space of the discrete points and the sizes between the distances, so as to obtain a plurality of discrete point sets; wherein the set of discrete points may be illustrated, for example, with reference to fig. 4. Further, dividing the discrete points satisfying the maximum distance constraint rule into the same set may include: judging whether the distance between the discrete points is smaller than the maximum distance limit or not; if the distance between the discrete points is less than the maximum distance limit, the discrete points are divided into the same set. In detail:

the maximum distance LIMIT principle (MAX _ DIS _ LIMIT), which may also be referred to as difference control; based on the maximum distance limit principle, if the distance between discrete points is smaller than the maximum distance limit, it is likely to be in the same set, and if the maximum distance limit is exceeded, it is certainly not likely to be in the same set. Further, the maximum distance limit may be set to min (15, 1/2m) after testing, where m is the number of fields of the original data table, that is, if the number of fields that differ between two entries is less than min (15, 1/2m), then it may be considered similar. Because the maximum distance limit is set, when the distance between the discrete points is calculated, if the distance exceeds the maximum distance limit, the calculation is not needed to be continued, for example, the real distance between two table entries is 100, and the distance is only recorded as 15, so that the complexity of the algorithm can be further reduced, and through setting the pseudo code and the maximum distance limit, the worst time complexity of the algorithm can be obtained to be O (n) (n is the worst time complexity of the algorithm)²). Therefore, this approach is a very fast and efficient way of aggregation without prior knowledge of the distances between all discrete points; the number of sets after aggregation is not specified in advance, and the influence of dirty data is avoided.

In step S230, a discrete point tree structure is constructed according to the distance between the discrete points in the discrete point set.

In the present exemplary embodiment, constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set may include steps S310 to S330. Wherein:

in step S310, a parent entry is configured.

In step S320, the discrete points corresponding to the parent table entry in the discrete point set are configured as root nodes, and the other discrete points except the root node are taken as child nodes.

In step S330, a discrete point tree structure is constructed according to the distance between the child node and the root node.

Next, steps S310 to S330 will be explained and explained. Firstly, configuring a father table entry; the parent table item can be configured according to the difference between the field content in the original data table and the other field content; then, configuring the discrete points corresponding to the parent table items in each discrete point set as root nodes, and taking other discrete points except the root nodes as child nodes; wherein the root node comprises only one; finally, a discrete point tree structure (the interdependence relation of the tree structures) is constructed according to the distance between the child node and the root node; wherein the tree structure can be referred to as shown in fig. 5. By this method, each set can be represented as a tree, and all nodes can be represented as forests, wherein the edge weights of the trees are the distances from each point to its parent node. Because the discrete points and the entries are in one-to-one correspondence, each entry of a table will eventually have the same tree structure interdependence relationship.

Furthermore, in order to avoid the problem caused by too many discrete point sets, the constructing the discrete point tree structure according to the distance between the discrete points in the discrete point sets may further include: judging whether the number of the discrete point sets is larger than the number of preset sets or not; the number of the preset sets is determined according to the number of the table entries; and if the number of the discrete point sets is less than the preset set number, constructing a discrete point tree structure according to the distance between the discrete points in the discrete point sets. In detail:

firstly, judging whether the number of discrete point sets is greater than the number of preset sets or not; wherein, the tree age of the preset set can be set to 1/2n, n is the number of the table entries in the original data table; then, when the number of the discrete point sets is judged to be smaller than the preset set number, a discrete point tree structure can be constructed according to the distance between the discrete points in the discrete point sets; when the number of the discrete point sets is judged to be larger than the preset set number, the compression of the original data table is stopped; that is, if the number of sets exceeds half the number of entries during compression, the table is deemed unsuitable for compression, at which point the algorithm will stop and instead use the original storage form.

Further, since each set is a tree with the center point of the set being the root, trying to join a newly added point to a point with a height of 2 closest to the newly added point and, if unsuccessful, joining the newly added point to the root, the maximum height of the tree can be controlled to 3, and the weight of each edge in the tree can be minimized. The potential reader will consider that a minimum spanning tree is created for each set after the aggregation step is completed, which minimizes the weight of the tree. However, the height of the minimum spanning tree is not controllable, and the compression yield is not high, so that it can be known from the above that when a certain piece of data is accessed for the first time, a record instance may need to be recursively created, which may cause a case that the recursion is too deep, thereby resulting in low efficiency; secondly, the minimum spanning tree needs to know the distance between each discrete point, and in order to reduce the complexity of the algorithm, we do not actively solve the distance between all the discrete points. Therefore, the tree structure is used for storing data, the maximum height of the tree is controlled, and the difference fields among the parent table items and the child table items can be reduced as much as possible under the condition of not influencing the efficiency, so that the memory occupation is reduced.

In step S240, the same field between the discrete point corresponding to the root node and the discrete point corresponding to the child node in the tree structure is deleted, and the difference field is retained.

In the present exemplary embodiment, after the above tree structure is obtained, the same field between the discrete point corresponding to the root node and the discrete point corresponding to the child node is deleted, and then the difference field is retained. Furthermore, in each tree, the child nodes can be completely dependent on the parent nodes, so that all the entries except the tree root can only keep the difference field information between the child nodes and the parent entries, and thus, the complete field information can be indirectly constructed by the parent entry information and the difference field information together without being directly and completely presented, and thus, redundant data between the entries can be eliminated. For example, after the data compression algorithm is completed, the data export structure in FIG. 3 changes to the form shown in FIG. 6. Further, after the difference field is reserved, the data compression method may further include: and packaging and storing the difference field in a RecordManager mode. In detail:

since the difference field is stored in a very compact form without redundant redundancy, this form does not present complete information and is therefore not directly accessible. In order not to affect the efficiency of access, the data may be packaged and stored as a RecordManager. After the data is accessed by replacing the data with the RecordManager, a process of constructing real data exists only when a certain item is accessed for the first time, and the speed of accessing again is consistent with that of normal accessing, so that the efficiency of data accessing in the game is not influenced. Fig. 7 shows a simple implementation of RecoredManager, and its main work is to try to create and store a record instance if the record instance of the accessed entry does not exist, delete the source data in the data, and directly obtain the record instance without going through the created flow when the entry is accessed again.

Further, where attention is needed in RecoredManager are: creation of a record instance. If the table entry is a root table entry, directly creating a record instance according to complete _ values in the table entry; if not, a record instance with the same content as the parent table item is created, and then the field value different from the parent table item is modified to construct the final record instance. For example, when accessing the table entry No. 3 for the first time, because the parent table entry of the table entry No. 3 is the table entry No. 2, the complete content of the parent table entry No. 2 is obtained first, then the field value of the table entry No. 3 different from that of the table entry No. 2 is modified to construct the record instance of the table entry No. 3, and finally the source data of the table entry No. 3 is deleted. If the record instance of entry No. 2 is not created when attempting to obtain the complete contents of entry No. 2, the record instance of entry No. 2 will be created first. After that, if the

entry

2 or 3 is accessed again, it can be directly obtained because their record instance has already been created and saved. In this way, it can be ensured that the record instance of each entry is created only once, and the other entries except the root entry have a process of additionally copying and modifying the content of the parent entry only once, which hardly affects the access efficiency.

Furthermore, in fig. 6, Record is unchanged, and the intern part is also correspondingly automatically generated, which are not important, and the format of the data part should be mainly focused. data is now largely divided into two categories:

one type is, for all tree ROOT entries (entries No. 1, 2, 5 in fig. 7), denoted as id: ("ROOT", complete _ values). Where complete _ values is the complete set of field values for this entry, e.g., the contents of entry 1 are ("ROOT", "(" FFF "," DDD ",9,3.0, None)), which means: entry 1 is a tree root entry whose complete field information is ("FFF", "DDD",9,3.0, None).

The other type is that for all non-tree root entries (entries No. 3, 4, 6, 7, 8, 9 in FIG. 7), it is expressed as id (fast _ id, diff _ keys, diff _ values). Wherein, the fast _ id is the parent entry of the entry, and diff _ keys and diff _ values are the field name set and field value set that are different between the entry and the parent entry, respectively. For example, the contents of entry 6 are (5, (S02, "attrE"), (5.0, True)), which means: the parent table entry for entry 6 is entry 5, and entry 6 differs from entry 5 in that the values of entry 6 in the "attrD" and "attrE" fields are 5.0 and True, respectively. It should be noted that the weight of each edge in the tree is the number of fields that each non-tree root entry must display.

Fig. 8 schematically shows a flow chart of a data compression method based on disparity control. Referring to fig. 8, the data compression method based on the disparity control may include the steps of:

step S801, constructing a first set S₁The first table entry d₁Central table item c as the first set₁Completing the processing of the first table entry; for the second table entry d₂To the last table entry d_nSteps S802-S806 are repeated.

Step S802, sequentially selecting one list item which is not processed yet and marked as i; calculate the ith table entry d_iFinding out the set s with the nearest distance from the distance to all the set central table items_jAnd obtain the corresponding distance

Step S803, judge

If it is less than the maximum distance LIMIT (MAX _ DIS _ LIMIT), if so, it goes to step S804, otherwise, it goes to step S808.

Step S804, d is_iAdding s_j(ii) a Find and d in all height 2 entries of this set_iTable item with minimum distance

And obtain the corresponding distance

Step S805, judge

Whether or not less than

If so, the process proceeds to step S806, otherwise, the process proceeds to step S807.

Step S806, set d_iThe parent table entry is

Then d_iIs a set s_jThe table entry with the middle height of 3 completes the table entry d_iThe treatment of (1);

step S807, set d_iThe parent table entry of (A) is the central table entry c of the set_jThen d is_iIs a set s_jThe table entry with the middle height of 2 completes the table entry d_iThe treatment of (1);

step S808, generating a new set S_next，d_iThe central table entry c as this set_nextFinishing the table entry d_iAnd (4) processing.

The pseudo code is as follows:

after the data compression method based on the difference control is applied, the data use is changed into a mode similar to 'database + cache', except that the 'database' is not true database, but is compressed by the algorithm, and the data is stored in a compact and almost redundancy-free form; while "cache" is the record instance that is created and saved in the RecordManager that is actually used for access. Because the client end can really use few table items in one complete game process and has a large number of access times, the mode of database + cache is very effective.

Since each item has a part of tables with very little data, the tables do not have too much compression space, and the effect after compression is not obvious, the part of the data tables are not compressed.

Currently, the above-mentioned data compression method based on difference control has been used in a plurality of hand-game projects.

In a certain tour project, all the compressed tables are counted, and the data memory can be compressed to about only the remaining 31 percent; counting all tables including the uncompressed table, the total data memory is only left about 37%, and the compression ratio is very considerable.

In another hand game (the original data of the hand game can refer to fig. 9), since a part of optimization work has been done before, the compression rate is slightly lower, all the compressed tables are counted, and the data memory only remains about 38%; the detailed memory usage table before data compression can refer to fig. 10; the detailed memory usage table after data compression can refer to fig. 11; counting all tables including uncompressed tables, the total data memory is only about 59% left, and still considerable; wherein, the total memory occupation amount before data compression can be referred to as that shown in fig. 12; the total memory occupancy after data compression can be seen with reference to fig. 13. .

In the third type of hand-trip project, although the project is also optimized to a certain extent, through preliminary statistics, the total data memory still remains only about 51% after being compressed by the algorithm.

The present disclosure also provides a data compression apparatus. Referring to fig. 14, the data compression apparatus may include an abstraction processing module 1410, a division module 1420, a construction module 1430, and a deletion module 1440. Wherein:

the abstraction processing module 1410 may be configured to perform abstraction processing on a field in an original data table and an entry in the field, so as to obtain a dimensional space corresponding to the field and a discrete point corresponding to the entry;

the dividing module 1420 may be configured to divide the discrete points to obtain a discrete point set according to a dimension space to which the discrete points belong and a distance between the discrete points;

the building module 1430 may be configured to build a discrete point tree structure according to the distance between the discrete points in the discrete point set;

the deleting module 1440 may be configured to delete the same field between the discrete point corresponding to the root node and the discrete point corresponding to the child node in the tree structure, and keep the difference field.

In an example embodiment of the present disclosure, dividing the discrete points into discrete point sets according to a dimension space to which the discrete points belong and a distance between the discrete points includes:

calculating distances between the discrete points;

In an example embodiment of the present disclosure, the dividing the discrete points satisfying the maximum distance limiting principle into the same set includes:

In an example embodiment of the present disclosure, calculating the distance between the discrete points comprises:

In an example embodiment of the present disclosure, constructing a discrete point tree structure according to distances between discrete points in the set of discrete points includes:

configuring a father table entry;

In an example embodiment of the present disclosure, constructing a discrete point tree structure according to distances between discrete points in the discrete point set further includes:

In an example embodiment of the present disclosure, the data compression apparatus may further include:

and the encapsulation storage module can be used for encapsulating and storing the difference fields in a RecordManager mode.

The specific details of each module in the data compression apparatus have been described in detail in the corresponding data compression method, and therefore are not described herein again.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Moreover, although the steps of the methods of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided an electronic device capable of implementing the above method.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 1500 according to this embodiment of the invention is described below with reference to fig. 15. The electronic device 1500 shown in fig. 15 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 15, electronic device 1500 is in the form of a general purpose computing device. Components of electronic device 1500 may include, but are not limited to: the at least one processing unit 1510, the at least one memory unit 1520, and the bus 1530 that connects the various system components (including the memory unit 1520 and the processing unit 1510).

Wherein the memory unit stores program code that is executable by the processing unit 1510 to cause the processing unit 1510 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary methods" of the present specification. For example, the processing unit 1510 may perform step S210 as shown in fig. 2: abstracting fields in an original data table and table entries in the fields to obtain a dimensional space corresponding to the fields and discrete points corresponding to the table entries; s220: dividing the discrete points to obtain a discrete point set according to the dimension space to which the discrete points belong and the distance between the discrete points; step S230: constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set; step S240: deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure, and reserving the difference fields.

The storage unit 1520 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)15201 and/or a cache memory unit 15202, and may further include a read only memory unit (ROM) 15203.

Storage unit 1520 may also include a program/utility 15204 having a set (at least one) of program modules 15205, such program modules 15205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1530 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1500 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1500, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1500 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interface 1550. Also, the electronic device 1500 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 1560. As shown, the network adapter 1560 communicates with the other modules of the electronic device 1500 over the bus 1530. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present invention.

In an exemplary embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 16, a program product 1600 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A method of data compression, comprising:

deleting the same fields between the discrete points corresponding to the root node and the discrete points corresponding to the child nodes in the tree structure, and reserving the difference fields;

the dividing the discrete points into discrete point sets according to the dimension space to which the discrete points belong and the distance between the discrete points comprises:

calculating distances between the discrete points;

dividing the discrete points meeting the maximum distance limiting principle into the same set according to the dimension space to which the discrete points belong and the distance to obtain a plurality of discrete point sets;

constructing a discrete point tree structure according to the distance between the discrete points in the discrete point set comprises the following steps:

configuring a father table entry;

2. The data compression method of claim 1, wherein dividing the discrete points satisfying the maximum distance constraint criterion into the same set comprises:

3. The data compression method of claim 1, wherein calculating the distance between the discrete points comprises:

4. The data compression method of claim 1, wherein constructing a discrete point tree structure according to distances between discrete points in the set of discrete points further comprises:

5. The data compression method of any one of claims 1-4, wherein after retaining the difference field, the data compression method further comprises:

and packaging and storing the difference field in a RecordManager mode.

6. A data compression apparatus, comprising:

a deleting module, configured to delete the same field between the discrete point corresponding to the root node and the discrete point corresponding to the child node in the tree structure, and keep the difference field;

calculating distances between the discrete points;

configuring a father table entry;

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data compression method of any one of claims 1 to 5.

8. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the data compression method of any one of claims 1-5 via execution of the executable instructions.