CA2871435C - Method and device for compressing and storing data based on sparse matrix - Google Patents

Method and device for compressing and storing data based on sparse matrix Download PDF

Info

Publication number
CA2871435C
CA2871435C CA2871435A CA2871435A CA2871435C CA 2871435 C CA2871435 C CA 2871435C CA 2871435 A CA2871435 A CA 2871435A CA 2871435 A CA2871435 A CA 2871435A CA 2871435 C CA2871435 C CA 2871435C
Authority
CA
Canada
Prior art keywords
data
attribute
storage file
stored
identifiers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA2871435A
Other languages
French (fr)
Other versions
CA2871435A1 (en
Inventor
Daoxin Liu
Hanghai Hu
Jian Zhang
Xiumin Xu
Qiwei Zhang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing Guodiantong Network Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing China Power Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing China Power Information Technology Co Ltd filed Critical State Grid Corp of China SGCC
Publication of CA2871435A1 publication Critical patent/CA2871435A1/en
Application granted granted Critical
Publication of CA2871435C publication Critical patent/CA2871435C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and device for compressing and storing data based on a sparse matrix are disclosed. According to the method, attribute data and value data of a first data are respectively stored, that is, the attribute data of the first data and an attribute identifiers corresponding to the attribute data are stored in the first storage file, and the attribute identifiers corresponding to the attribute data are set as dimension data for determining the first data. Thus second storage file only stores the dimension data and the single value data. In conventional art, five storage domains are used to store the sparse matrix, where real data value is stored in a data value domain. Since many of the data values of each node are the same, and the same data are stored repeatedly, thus storage space is wasted.

Description

METHOD AND DEVICE FOR COMPRESSING AND STORING DATA BASED
ON SPARSE MATRIX
[0001]
FIELD
[00021 The present application relates to computer data storing technique, and in particular to a method and device for compressing and storing data based on a sparse matrix.
BACKGROUND
[0003] In recent years, informatization of enterprises and the like has become more and more mature, and there is an explosively increasing trend in business data produced by various business applications. However, by combing the mass data, it is found that a data structure form of the mass data is in accordance with characteristic of a sparse matrix in which most data elements are 0. If massive 0 elements are needed to be stored for storing the sparse matrix, it may result in a waste of storage space. Accordingly, a technique for compressing and storing the sparse matrix is required.
[0004] Currently, one of the ways for compressing and storing the massive data of the sparse matrix is orthogonal list pattern. In this method, nonzero elements in the sparse matrix are stored by using a node, where it is necessary to establish five storage domains in the node, in which row values, column values, data values, row pointers and column pointers are stored respectively. Specifically, the row value or the column value indicates a row position or a column position of the node in the sparse matrix, while the row pointer or the column pointer points to a next nonzero element in the current row or in the current column. In addition, it is necessary to establish header nodes for each row and column, in which pointers pointing to the first node of the current row and the column may be stored.

[0005] In the above-described storing manner, there are five storage domains contained in each node, where real data values are stored in a data value domain. Since many of the data values in each node are the same, it results in a waste of the storage space in the storing manner.
SUMMARY
[0006] In view of this, a method and a device for compressing and storing data based on a sparse matrix are provided in the present application, for solving the technical problem in the conventional method for compressing and storing the data of the sparse matrix data that: there are five storage domains contained in each node, real data values are contained in a data value domain, and a waste of the storage space is caused since many of the data values of each node are the same. The technical solution according to the present application is as follows.
[0007] A method for compressing and storing data based on a sparse matrix, including:
[0008] receiving a data set containing a plurality of first data, wherein each first data contains a plurality of attribute data and one value data, each of the plurality of attribute data corresponds to a different attribute, and each first data has a plurality of attributes which are the same;
[0009] generating a first storage file in accordance with the attributes, wherein the attribute data contained in individual attributes and attribute identifiers corresponding to the attributes are contained in the first storage file;
100101 determining an attribute identifier corresponding to each attribute data in each first data, and combining the attribute identifiers to generate dimension data of the first data;
[0011] determining the dimension data and the value data corresponding to each first data as a data tuple to be stored; and [0012] generating a second storage file in accordance with the individual data tuples to be stored.
[0013] Preferably, in the above-described method, the generating a first storage file in
- 2 -accordance with the individual attributes, may include:
[0014] determining the attribute data corresponding to each attribute;
[0015] determining attribute element corresponding to the attribute in accordance with a type of the attribute; and [0016] judging whether the attribute element is the same as the attribute data: if yes, generating an attribute identifier corresponding to the attribute data , and storing the attribute data and the corresponding attribute identifier in the generated first storage file;
[0017] or otherwise, generating an element identifier corresponding to the attribute element, and storing the attribute element and the corresponding element identifier in the generated first storage file.
[0018] Preferably, in the above-described method, the first storage file and the second storage file both may be data sheets.
[0019] Preferably, the above-described method may further include, after generating a second storage file in accordance with the individual data tuples to be stored:
[0020] receiving second data containing a plurality of attribute data and one value data, wherein attribute corresponding to the attribute data contained in the second data is the same as the attribute of the first data;
[0021] acquiring attribute identifiers corresponding to the individual attribute data, and combining the attribute identifiers to generate dimension data of the second data;
[0022] determining the dimension data and the value data corresponding to the second data as a data tuple to be stored; and [0023] adding the data tuple to be stored into the second storage file.
[0024] Preferably, the above-described method may further include:
[0025] determining a classification query rule in accordance with the individual attribute data corresponding to the individual attributes in the first storage file;
[0026] searching for target dimension data corresponding to the query rule in the second storage file in accordance with the classification query rule; and [0027] displaying the attribute data corresponding to the target dimension data and the
- 3 -value data corresponding to the target dimension data.
[0028] A device for compressing and storing data based on a sparse matrix is further provided according to the present application, includes:
[0029] a data set receiving unit configured to receive a data set containing a plurality of first data, wherein each of the first data contains a plurality of attribute data and one value data, each of the plurality of attribute data corresponds to a different attribute, and each first data has a plurality of attributes which are the same;
[0030] a first storage file generating unit configured to generate a first storage file in accordance with the individual attributes, wherein the attribute data contained in the individual attributes and attribute identifiers corresponding to the individual attribute data are contained in the first storage file;
[0031] a dimension data generating unit configured to determine an attribute identifier corresponding to each attribute data in each first data, and to combine the attribute identifiers to generate dimension data of the first data;
[0032] a data tuple to be stored determining unit configured to determine the dimension data and the value data corresponding to each first data as a data tuple to be stored; and [0033] a second storage file generating unit configured to generate a second storage file in accordance with the individual data tuples to be stored.
[0034] Preferably, in the above-described device, the first storage file generating unit may include:
[0035] an attribute data determining sub-unit configured to determine attribute data corresponding to each attribute;
[0036] an attribute element determining sub-unit configured to determine attribute elements corresponding to individual attributes in accordance with a type of the attributes;
[0037] a judging sub-unit configured to judge whether the attribute element is the same as the attribute data; if yes, trigger a first result sub-unit; or otherwise, trigger a second result sub-unit;
[0038] the first result sub-unit configured to generate the attribute identifier corresponding to the attribute data, and to store the attribute data and the corresponding
- 4 -attribute identifier in the generated first storage file; and [0039] the second result sub-unit configured to generate an element identifier corresponding to the attribute element, and to store the attribute element and the corresponding element identifier in the generated first storage file.
[0040] Preferably, in the above-described device, the first storage file generated by the first storage file generating unit and the second storage file generated by the second storage file generating unit both are data sheets.
[0041] Preferably, the above-described device may further include:
[0042] a second data receiving unit configured to receive a second data containing a plurality of attribute data and one value data, wherein attribute corresponding to the attribute data contained in the second data is the same as the attribute of the first data;
[0043] a second data dimension data generating unit configured to acquire individual attribute identifiers corresponding to the individual attribute data, and combining the attribute identifiers to generate dimension data of the second data;
[0044] a second data tuple to be stored determining unit configured to determine the dimension data and the value data corresponding to the second data as a data tuple to be stored; and [0045] a second data adding unit configured to add the data tuple to be stored into the second storage file.
[0046] Preferably, the above-described device may further include:
[0047] a rule determining unit configured to determine a classification query rule in accordance with the attribute data corresponding to the individual attributes in the first storage file;
[0048] a data searching unit configured to search for target dimension data corresponding to the query rule in the second storage file in accordance with the classification query rule; and [0049] a data displaying unit configured to display attribute data corresponding to the target dimension data and value data corresponding to the target dimension data.
[0050] It can be seen from the above technical solution that: compared with the
- 5 -conventional art, a method and a device for compressing and storing data based on a sparse matrix are provide by the present application, wherein attribute data and value data of the first data are respectively stored according to the method, that is, the attribute data of the first data and the attribute identifier corresponding to the attribute data are stored in the first storage file, and the value data of the first data and the combination of the attribute identifiers corresponding to the individual attribute data of the first data are stored in the second storage file. In the conventional art, five storage domains are used to store the sparse matrix data, where real data values are stored in a data value domain.
Since many of the data values of each node are the same, the same data are stored repeatedly, and storage space is wasted. According to the present application, the first data has its own the attribute data stored separately, and the attribute identifier corresponding to the attribute data is used as the dimension data of the first data. Thereby sharing degree for the data is improved, and the storage space is effectively saved.
BRIEF DESCRIPTION
[0051] The accompanying drawings to be used in the description of the embodiments will be described briefly as follows, so that the technical solutions according to the embodiments of the present application will become clearer. It is obvious that the accompanying drawings in the following description are only some embodiments of the present application. For those skilled in the art, other accompanying drawings may be obtained according to these accompanying drawings without any creative work.
[0052] Figure 1 is a flowchart of an embodiment of a method for compressing and storing data based on a sparse matrix according to the present application;
[0053] Figure 2 is a part of a flowchart of another embodiment of a method for compressing and storing data based on a sparse matrix according to the present application;
[0054] Figure 3 is an exemplary diagram of an embodiment according to the present application;
[0055] Figure 4 is a part of a flowchart of yet another embodiment of a method for compressing and storing data based on a sparse matrix according to the present
- 6 -application;
[0056] Figure 5 is a part of a flowchart of a further embodiment of a method for compressing and storing data based on a sparse matrix according to the present application;
[0057] Figure 6 is a structural schematic diagram of an embodiment of a device for compressing and storing data based on a sparse matrix according to the present application; and [0058] Figure 7 is a structural schematic diagram of another embodiment of a device for compressing and storing data based on a sparse matrix according to the present application.
DETAILED DESCRIPTION
[0059] The technical solution according to the embodiments of the present application will be described clearly and completely as follows in conjunction with the accompany drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments according to the present application. All the other embodiments obtained by those skilled in the art based on the embodiments in the present application without any creative work belong to the scope of the present application.
[0060] Please referring to Figure 1, which shows a flowchart of an embodiment of a method for compressing and storing data based on a sparse matrix according to the present application, the present embodiment may include steps 101 to 105 as follows.
[0061] In step 101, the data set containing a plurality of first data is received, wherein each first data may contain a plurality of attribute data and one value data, the plurality of attribute data may correspond to different attributes respectively, and each first data have a plurality of attributes which are the same.
[0062] Specifically, the data set containing the plurality of first data may be considered as a sparse matrix, where the plurality of first data contained in the data set are a plurality of element data in the sparse matrix. It should be noted that the sparse matrix refers to a matrix in which a plurality of data element are 0. The value data of the first data contained
- 7 -in the data set is not 0.
[0063] Additionally, the first data may be but not limited to be the business data produced in industrial field, and the first data contains a plurality of attribute data and one value data. For example, the first data is data of power consumption of a unit such as a company during a certain period of time, wherein the power consumption of unit A in January, 2013 is 1000 kilowatt-hours, the power consumption of unit B in February, 2013 is 2000 kilowatt-hours, and the power consumption of unit C in February, 2013 is 3000 kilowatt-hours.
[0064] Three first data are contained in the above-description example, wherein unit A, January, 2013, and power consumption are three attribute data of the first first data; unit B, February, 2013, and power consumption are three attribute data of the second first data;
unit C, February, 2013, and power consumption are three attribute data of the third first data. Moreover, the 1000 kilowatt-hours is the value data of the first first data; the 2000 kilowatt-hours is the value data of the second first data; and the 3000 kilowatt-hours is the value data of the third first data.
[0065] In addition, three attributes of the individual first data in the above-described example may respectively correspond to a different attribute, wherein the attribute corresponding to the unit A, the unit B and the unit C is an unit attribute;
the attribute corresponding to January, 2013, February, 2013 and February, 2013 is a time attribute;
and the attribute corresponding to the power consumption is an index attribute.
[0066] It should be noted that the plurality of attributes of each first data are the same.
For example, the attributes of the first, the second, and the third first data in the above-described example are the same, that is, they each have the unit attribute, the time attribute and the index attribute.
[0067] In step 102, a first storage file is generated in accordance with the attributes as above, wherein the attribute data contained in individual attributes and attribute identifiers corresponding to the individual attributes are contained in the first storage file.
[0068] Specifically, it is necessary to generate the attribute identifiers corresponding to the individual attribute data and to establish correspondence between the attribute data and the corresponding attribute identifiers. For example, the attributes of the three first
- 8 -data in the example of step 101 are respectively the unit attribute, the time attribute and the index attribute, wherein the attribute data contained in the first first data are respectively the unit A, January, 2013 and the power consumption, and the attribute identifier generated for the unit A is A, the attribute identifier generated for January, 2013 is Ti, and the attribute identifier generated for the power consumption is Elec. Then the unit A and the corresponding attribute identifier A, January, 2013 and the corresponding identifier T1, and the power consumption and the corresponding attribute identifier Elec are saved in the first storage file.
[0069] It should be noted that same attribute data are not stored in the first storage file repeatedly, and the same attribute data are indicated by the same attribute identifier. For example, the attribute data of the second first data in the above-described example includes the unit B, February, 2013 and the power consumption, and the attribute data of the third first data includes the unit C, February, 2013 and the power consumption. The same attribute data of the two first data, that is February, 2013, and the power consumption, is indicated by the same attribute identifier T2, and the same attribute identifier Elec respectively.
[0070] In step 103, attribute identifiers corresponding to individual attribute data in each first data is determined, and the attribute identifiers are combined to generate dimension data of the first data.
[0071] Each first data is analyzed, and individual attribute data contained in each first data is determined. According to the correspondence between the attribute data and the attribute identifiers stored in the first storage file in step 102, an attribute identifier corresponding to each attribute data is determined. Accordingly individual attribute identifiers are combined to generate the dimension data of the first data. For example, the three attribute data contained in the first first data in the example of step 101 are: the company A, January, 2013 and the power consumption, and in accordance with the correspondence generated in step 102, A, Ti and Elec are generated as the dimension data of the first first data.
[0072] It should be noted that the dimension data of each first data may be any permutation and combination of a plurality of attribute identifiers corresponding to the first data. For example, combination relation of the attribute identifiers of the first first
- 9 -data may be any one of the combinations of: A, Elec, and TI; T1, Elec, and A;
TI, A, and Elec; Elec, A, and TI; or Elec, Ti, and A.
[0073] In step 104, the dimension data and the value date corresponding to each first data are determined as a data tuple to be stored.
[0074] Specifically, each first data contains a plurality of attribute data and one value data, and the plurality of attribute data may be indicated by the dimension data corresponding to the first data generated in step 103. Further, each first data may be indicated by a combination of the dimension data and the value data corresponding to the first data, and the combination of the dimension data and the value data may be set as the data tuple to be stored corresponding to the first data. For example, the dimension data of the first first data in the example of step 101 may be A, T1, and Elec, and the value data is 1000. Accordingly the respective first data may be indicated as A, TI, Elec and 1000, which may be determined as the data tuple to be stored.
[0075] In step 105, a second storage file is generated in accordance with the individual data tuples to be stored.
[0076] It should be noted that the content stored in the second storage file is the data tuples to be stored corresponding to the individual first data in the data set.
[0077] In the conventional art, five storage domains are used to store the sparse matrix, where real data values (the attribute data and the value data) are stored in a data value domain. Since many of the data values of each node are the same, which mainly refers to a plurality of same attribute data, the same data are stored repeatedly, so a waste of storage space is caused.
[0078] It can be seen from the above technical solution that a method and a device for compressing and storing data based on a sparse matrix are provided by the present embodiment, wherein the attribute data and the value data of the first data are stored respectively according to the method, that is, the attribute data of the first data and the attribute identifier corresponding to the attribute data are stored in the first storage file, and the same attribute identifier is used for the same attribute data, thus repeated and redundant data are effectively removed, the sharing degree of the data is improved, so the storage space is saved. There are only two storage domains contained in the second
- 10-storage file, where the combination of the attribute identifiers corresponding to the attribute data of the first data and the value data of the first data are stored respectively.
Compared with that five storage domains are used to store the sparse matrix in the conventional art, the storage space is further saved.
[0079] It should be noted that the method for compressing and storing data based on the sparse matrix according to the present application may be applied to but not limited to the sparse matrix, which may also be applicable for storing non-sparse matrix.
[0080] It should be noted that there may be one or more attribute items in the attribute data contained in each first data according to the above-described embodiment.
For example, the attribute data of January, 2013 includes two attribute items, that is, 2013 which belongs to a year attribute item, and January which belongs to a month attribute item. Correspondingly, attribute item identifiers corresponding to the attribute items are needed to be indicated in the attribute identifier corresponding to the attribute data. For example, if the attribute identifier corresponding to January, 2013 of the attribute data is Ti, it may be considered that the T corresponds to the year attribute item and the 1 corresponds to the month attribute item.
[0081] It should be noted that the first storage file may be a data sheet containing a plurality of data units, where each data unit includes two parts: one part is used for storing the attribute data of each first data, and the other part is used for storing the attribute identifier corresponding to the attribute data. For example, one data unit includes two parts, wherein one part stores the unit A, and the other part stores the attribute identifier A;
another data unit includes two parts, wherein one part stores the unit B, and the other part stores the attribute identifier B; yet another data unit includes two parts, wherein one part stores the company C, and the other part stores the attribute identifier C.
[0082] The second storage file may also be a data sheet including a plurality of data units. Each data unit includes two parts, wherein one part is used to store the dimension data of each first data, and the other part is used to store the value data corresponding to the first data. For example, the third first data in step 101 is stored in a data unit including two parts, wherein one part stores A, Ti, and Elec, and the other part stores 1000.
[0083] The above-described embodiment may be used to compress and store the sparse matrix in which most of the data elements are 0. In order to restore and reproduce the
- 11 -sparse matrix, referring to Figure 2 which shows a part of a flowchart of another embodiment of the method for compressing and storing data based on the sparse matrix according to the present application, step 102 according to the above-described embodiment may be implemented using the following steps 201 to 205.
[0084] In step 201, the attribute data corresponding to each attribute is determined.
[0085] Specifically, there are a plurality of attributes which are the same in each first data contained in the data set received in the above-described step 101, and each first data all contains the attribute data corresponding to the attributes. For example, individual first data has a time attribute, and the attribute data corresponding to the time attributes contained in individual first data are respectively January, 2013, February, 2013, March, 2013, April, 2013, May, 2013, June, 2013, July, 2013, August, 2013, September, 2013, October, 2013, and November, 2013.
[0086] In step 202, attribute elements corresponding to the attributes are determined in accordance with the type of the individual attributes.
[0087] Specifically, the type of the attribute includes a first type and a second type.
Specifically, the attribute elements contained in the attribute of the first type is fixed. For example, the attribute elements corresponding to the time attribute are January, 2013 to December, 2013, that is, January, 2013, February, 2013, ..., and December, 2013.
[0088] The attribute elements contained in the attribute of the second type is unfixed, and is related to the attribute data contained in individual first data in the data set received in step 101. Specifically, individual different attribute data are set as the attribute elements corresponding to the attribute. For example, the attributes of the second type includes a unit attribute, and different attribute data corresponding to the unit attribute contained in the data set received in step 101 are unit A, unit B and unit C, thus the attribute elements corresponding to the company data are unit A, unit B and unit C.
[0089] In step 203, it is judged whether the attribute elements are the same as the attribute data; if the attribute elements are the same as the attribute data, step 204 is performed; else if the attribute elements are not the same as the attribute data, step 205 is performed.
[0090] Specifically, the attribute elements are the attribute elements determined in step
- 12 -202, and the attribute data is the individual attribute data determined in step 201. The term "the same" includes that the numbers are the same and the contents are the same.
[0091] For example, the attribute elements corresponding to the time attribute determined in step 202 are January, 2013 to December, 2013, but the attribute data corresponding to the time attribute determined in step 201 lacks December, 2013, thus determination result is no; the attribute elements corresponding to the unit attribute determined in step 202 are the same as the attribute data determined in step 201, that is, they are all unit A, unit B and unit C, thus determination result is yes.
[0092] In step 204, the attribute identifiers corresponding to the attribute data are generated, and the attribute data and the corresponding attribute identifiers are stored in the generated first storage file.
[0093] In step 205, the element identifiers corresponding to the attribute elements are generated, and the attribute elements and the corresponding element identifiers are stored in the generated first storage file.
[0094] For example, the element identifiers generated by the generated date element are respectively Ti, T2, T3, ..., and T12. If the judgment result in step 203 is no, it indicates that certain matrix data elements in the sparse matrix indicated by the data set received in the above-described embodiment 101 are 0. For example, the power consumption of unit A in December, 2013 is 0, the power consumption of unit B in December, 2013 is 0, and the power consumption of unit C in December, 2013 is 0.
[0095] It can be seen from the above technical solution that that zero element in the sparse matrix is stored according to the present embodiment by storing the attribute element identifier in the sparse matrix. For example, please referring to Figure 3, which shows an exemplary diagram according to the present embodiment, each data unit contains two parts, where 301 is a dimension data part and 302 is a value data part.
Specifically, if 302 is null value, it means that matrix data in the sparse matrix is 0. Thus it is only required to store the attribute element and the corresponding element identifier in the first storage file, and it is not required to store the data unit in which 302 is null value in the second storage file.
[0096] Specifically, X1 , X2, X3, Y 1 , Y2, Y3 and Y4 are stored in the first storage file
- 13-corresponding to the drawing. As it should be, it is also necessary to store the attribute data or the attribute elements corresponding to the individual identifiers. If it is necessary to present null value element corresponding to the X1Y3, it is only required to search for the attribute data or the attribute element corresponding to the X1 Y3. For example, the attribute element corresponding to the X1 is December, 2013, and the attribute data corresponding to the Y3 is unit C. Thus it may present that the data value corresponding to the unit C on December, 2013 is 0.
[0097] When it is necessary to insert data into the data set, please refer to Figure 4, which shows a part of a flowchart of yet another embodiment of a method for compressing and storing data based on the spares matrix according to the present application. After step 105 in the above-described embodiment, the method may further include the following steps 401 to 404.
[0098] In step 401, the second data containing a plurality of attribute data and one value data is received, wherein the attributes corresponding to the attribute data contained in the second data are the same as the attributes of the first data.
[0099] Specifically, the first data is the first data contained in the data set received in step 101. For example, the attribute corresponding to the second data contains a unit attribute, a time attribute and an index attribute, which are the same as the unit attribute, the time attribute and the index attribute corresponding to the first data.
[0100] In step 402, the attribute identifiers corresponding to the individual attribute data is acquired, and the attribute identifiers are combined to generate the dimension data of the second data.
101011 The second data is analyzed, individual attribute data contained in the second data are acquired, and the attribute identifiers corresponding to the individual attribute data are determined. For example, the second data is the power consumption of unit D in April, 2013, which is 3000 kilowatt-hours. The attribute data contained in the second data are unit D, April, 2013 and the power consumption, and the value data contained in the second data is 3000. The attribute identifiers corresponding to the attribute data are acquired, which are respectively D, T4 and Elec. The attribute identifiers are combined to generate the dimension data corresponding to the second data, such as, D, T4, and Elec.
- 14-[0102] In step 403, the dimension data and the value data corresponding to the second data are determined as a data tuple to be stored.
[0103] In step 404, the data tuple to be stored are added to the second storage file.
[0104] Specifically, the position where the data tuple to be stored is added into the second storage file may be any position, that is, it may either be inserted to be ahead or behind any data tuple to be stored in the second storage file, or it may be added directly into the end of the second storage file.
[0105] In the conventional art, orthogonal list pattern is used to store the matrix data. If a certain data tuple is required to be inserted, it is necessary to search for a position corresponding to the row identifier and the column identifier of the data tuple and to insert the data tuple into the position, and the inserting position is fixed and single.
However, the data tuple can be added and stored into any position in the second storage file according to the present embodiment.
[0106] If it is necessary to search for certain matrix data in individual data in the sparse matrix which is stored, please refer to Figure 5, which shows a part of a flow chart of a further embodiment of a method for compressing and storing data based on the sparse matrix according to the present application. After step 105 in the above embodiment, the method may further include the following step 501 to 503.
[0107] In step 501, a classification query rule is determined in accordance with the attribute data corresponding to the individual attributes in the first storage file.
[0108] Specifically, each of the attribute data contained in the first storage file corresponds to a different attributes, such as, a time attribute, a unit attribute and an index attribute. If it is necessary to query in accordance with a certain attribute in the first storage file, the classification query rule may be determined in accordance with the attribute data corresponding to the attribute. For example, the attribute data corresponding to the time attribute includes January, 2013, February, 2013, March, 2013, April, 2013, and May, 2013 and the like. The power consumption of individual units in the first quarter in 2013 that is from January, 2013 to March, 2013 may be queried.
[0109] In step 502, a target dimension data corresponding to the classification query rule in the second storage file is searched in accordance with the classification query rule.
- 15-101101 Specifically, in accordance with the classification query rule, the attribute identifiers contained in the classification rule is determined, and by iterating through the dimension data of individual data tuples stored in the second storage file, the dimension data containing the attribute identifiers is determined as the target dimension data. For example, the attribute identifiers corresponding to the classification query rule are Ti, T2 and T3, and the determined target dimension data are: A, Ti, Elec; B, T2, Elec; and B, T3, Elec.
[0111] In step 503, the attribute data corresponding to the target dimension data and the value data corresponding to the target dimension data are displayed.
[0112] The attribute data corresponding to the target dimension data is determined, the value data corresponding to the target dimension data is determined, and the attribute data and the value data are displayed. For example, in accordance with the example in step 502, display content are: the power consumption of unit A in January, 2013 is 1000 kilowatt-hours, the power consumption of unit B in February, 2013 is 2000 kilowatt-hours, and the power consumption of unit B in March, 2013 is 2000 kilowatt-hours.
[0113] A device for compressing and storing data is provided by the present application corresponding to the above-described embodiment of the method. Please refer to Figure 6, which shows a structural schematic diagram of an embodiment of a device for compressing and storing data based on a sparse matrix according to the present application. The device for compressing and storing data may include: a data set receiving unit 601, a first storage file generating unit 602, a dimension data generating unit 603, a data tuple to be stored determining unit 604 and a second storage file generating unit 605.
[0114] The data set receiving unit 601 is configured to receive a data set containing a plurality of first data, wherein each first data contains a plurality of attribute data and one value data, the plurality of attribute data corresponds to a different attributes respectively, and each first data has a plurality of attributes which are the same.
[0115] The first storage file generating unit 602 is configured to generate a first storage file in accordance with the individual attributes, wherein the attribute data contained in individual attributes and attribute identifiers corresponding to the individual attributes are contained in the first storage file.
-16-[0116] The dimension data generating unit 603 is configured to determine the attribute identifier corresponding to each attribute data in each first data, and to combine the individual attribute identifiers to generate dimension data of the first data.
[0117] The data tuple to be stored determining unit 604 is configured to determine the dimension data and the value data corresponding to the first data as a data tuple to be stored.
[0118] The second storage file generating unit 605 is configured to generate a second storage file in accordance with the individual data tuples to be stored.
[0119] Optionally, please refer to Figure 7, which shows a structural schematic diagram of another embodiment of a device for compressing and storing data based on a sparse matrix according to the present application. In the above-described embodiment of the device, the first storage file generating unit 602 may include: an attribute data determining sub-unit 701, an attribute element determining sub-unit 702, a judging sub-unit 703, a first result sub-unit 704 and a second result sub-unit 705.
[0120] The attribute data determining sub-unit 701 is configured to determine the attribute data corresponding to each attribute.
[0121] The attribute element determining sub-unit 702 is configured to determine the attribute elements corresponding to the attribute in accordance with a type of the individual attributes.
[0122] The judging sub-unit 703 is configured to judge whether the attribute elements are the same as the attribute data; if yes, trigger a first result sub-unit 704; or otherwise, trigger a second result sub-unit.
[0123] The first result sub-unit 704 is configured to generate the attribute identifiers corresponding to the attribute data, and to store the attribute data and the corresponding attribute identifiers in the generated first storage file.
[0124] The second result sub-unit 705 is configured to generate an element identifiers corresponding to the attribute elements, and to store the attribute elements and the corresponding element identifiers in the generated first storage file.
[0125] For the explanation for the embodiment of the device, reference can be made to
- 17-the above-described embodiment of the method, which will not be repeated herein.
[0126] It should be noted that the first storage file generated by the first storage file generating unit 602 and the second storage file generated by the second storage file generating unit 605 both are data sheets.
[0127] Optionally, on a basis of the above-described embodiment of the device, the device for compressing and storing data may further include: a second data receiving unit, a second data dimension data generating unit, a second data tuple to be stored determining unit and a second data adding unit.
[0128] The second data receiving unit is configured to receive second data containing a plurality of attribute data and one value data, wherein the attributes corresponding to the attribute data contained in the second data are the same as the attributes of the first data.
[0129] The second data dimension data generating unit is configured to acquire the attribute identifiers corresponding to the attribute data, and combining the attribute identifiers to generate dimension data of the second data.
[0130] The second data tuple to be stored determining unit is configured to determine the dimension data and the value data corresponding to the second data as a data tuple to be stored.
[0131] The second data adding unit is configured to add the data tuple to be stored into the second storage file.
[0132] For the explanation for the embodiment of the device, reference can be to the above embodiment of the device, which will not be repeated herein.
[0133] Optionally, on a basis of the above-described embodiment of the device, the device for compressing and storing data may further include: a rule determining unit, a data searching unit and a data displaying unit.
[0134] The rule determining unit is configured to determine a classification query rule in accordance with the attribute data corresponding to the individual attributes in the first storage file.
[0135] The data searching unit is configured to search for target dimension data corresponding to the query rule in the second storage file in accordance with the
- 18-classification query rule.
[0136] The data displaying unit is configured to display the attribute data corresponding to the target dimension data and the value data corresponding to the target dimension data.
[0137] For the explanation for the embodiment of the device, reference can be made to the above embodiment of the device, which will not be repeated herein.
[0138] It is to be noted that in the present specification, the embodiments are described in progression, each embodiment mainly focuses on its difference from other embodiments, and same or similar parts can be referenced among the embodiments.
[0139] The method and the device for compressing and storing data based on the sparse matrix according to the present disclosure are introduced in details as above.
The above descriptions of the disclosed embodiments enable those skilled in the art to implement or use the present disclosure. Various modifications made to those embodiments will be obvious to those skilled in the art, and the ordinal principles defined in the present disclosure can be implemented in other embodiments. Therefore, the present invention should not be limited to those embodiments disclosed herein, but should be in coincidence with the widest scope in accordance with the principles and the novel characteristics disclosed in the present invention.
- 19 -

Claims (10)

1. A method for compressing and storing data based on a sparse matrix, comprising:
receiving a data set containing a plurality of first data, wherein each first data contains a plurality of attribute data and one value data, the plurality of attribute data corresponds to different attributes respectively, and each first data has attribute data corresponding to a same plurality of the different attributes;
generating a first storage file in accordance with the individual attributes, wherein the attribute data contained in individual attributes and attribute identifiers corresponding to the individual attributes are contained in the first storage file;
determining the attribute identifiers corresponding to each attribute data in each first data, and combining the attribute identifiers to generate dimension data of the first data;
determining the dimension data and the value data corresponding to the first data as a data tuple to be stored; and generating a second storage file in accordance with the data tuple to be stored.
2. The method according to claim 1, wherein the generating a first storage file in accordance with the individual attributes comprises:
determining the attribute data corresponding to each attribute;
determining attribute elements corresponding to the attribute in accordance with a type of the individual attributes; and judging whether the attribute elements are the same as the attribute data; if yes, generating the attribute identifiers corresponding to the attribute data, and storing the attribute data and the corresponding attribute identifiers in the generated first storage file;
or otherwise, generating element identifiers corresponding to the attribute elements, and storing the attribute elements and the corresponding element identifiers in the generated first storage file.
3. The method according to claim 1, wherein the first storage file and the second storage file both are data sheets.
4. The method according to claim 1, further comprising, after the generating a second storage file in accordance with the individual data tuples to be stored:
receiving second data containing a plurality of attribute data and one value data, wherein the attributes corresponding to the attribute data contained in the second data are the same as the attributes of the first data;
acquiring the attribute identifiers corresponding to the individual attribute data of the second data, and combining the attribute identifiers to generate dimension data of the second data;
determining the dimension data and the value data corresponding to the second data as a second data tuple to be stored; and adding the second data tuple to be stored into the second storage file.
5. The method according to any one of claims 1 to 4, further comprising:
determining a classification query rule in accordance with the attribute data corresponding to the individual attributes in the first storage file;
searching for target dimension data corresponding to the classification query rule in the second storage file in accordance with the classification query rule; and displaying the attribute data corresponding to the target dimension data and the value data corresponding to the target dimension data.
6. A device for compressing and storing data based on a sparse matrix, comprising:
a data set receiving unit configured to receive a data set containing a plurality of first data, wherein each first data contains a plurality of attribute data and one value data, the plurality of attribute data corresponds to different attributes, and each first data has attribute data corresponding to a same plurality of the different attributes;
a first storage file generating unit configured to generate a first storage file in accordance with the individual attributes, wherein the attribute data contained in individual attributes and attribute identifiers corresponding to the attribute data are contained in the first storage file;
a dimension data generating unit configured to determine attribute identifiers corresponding to each attribute data in each first data, and to combine the attribute identifiers to generate dimension data of the first data;

a data tuple to be stored determining unit configured to determine the dimension data and the value data corresponding to the first data as a data tuple to be stored; and a second storage file generating unit configured to generate a second storage file in accordance with the data tuple to be stored.
7. The device according to claim 6, wherein the first storage file generating unit comprises:
an attribute data determining sub-unit configured to determine the attribute data corresponding to each attribute;
an attribute element determining sub-unit configured to determine the attribute elements corresponding to the attribute in accordance with a type of the individual attributes;
a judging sub-unit configured to judge whether the attribute elements are the same as the attribute data; if yes, trigger a first result sub-unit; or otherwise, trigger a second result sub-unit;
the first result sub-unit configured to generate the attribute identifiers corresponding to the attribute data, and to store the attribute data and the corresponding attribute identifiers in the generated first storage file; and the second result sub-unit configured to generate element identifiers corresponding to the attribute elements, and to store the attribute elements and the corresponding element identifiers in the generated first storage file.
8. The device according to claim 6, wherein the first storage file generated by the first storage file generating unit and the second storage file generated by the second storage file generating unit both are data sheets.
9. The device according to claim 6, further comprising:
a second data receiving unit configured to receive a second data containing a plurality of attribute data and one value data, wherein the attributes corresponding to the attribute data contained in the second data are the same as the attributes of the first data;
a second data dimension data generating unit configured to acquire the attribute identifiers corresponding to the attribute data of the second data, and combining the attribute identifiers to generate dimension data of the second data;
a second data tuple to be stored determining unit configured to determine the dimension data and the value data corresponding to the second data as a second data tuple to be stored; and a second data adding unit configured to add the second data tuple to be stored into the second storage file.
10. The device according to any one of claims 6 to 9, further comprising:
a rule determining unit configured to determine a classification query rule in accordance with the attribute data corresponding to the individual attributes in the first storage file;
a data searching unit configured to search for target dimension data corresponding to the classification query rule in the second storage file in accordance with the classification query rule; and a data displaying unit configured to display the attribute data corresponding to the target dimension data and the value data corresponding to the target dimension data.
CA2871435A 2014-01-26 2014-11-18 Method and device for compressing and storing data based on sparse matrix Expired - Fee Related CA2871435C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410037979.XA CN103761316B (en) 2014-01-26 2014-01-26 A kind of data compression storage method and device based on sparse matrix
CN201410037979.X 2014-01-26

Publications (2)

Publication Number Publication Date
CA2871435A1 CA2871435A1 (en) 2015-07-26
CA2871435C true CA2871435C (en) 2017-02-07

Family

ID=50528553

Family Applications (1)

Application Number Title Priority Date Filing Date
CA2871435A Expired - Fee Related CA2871435C (en) 2014-01-26 2014-11-18 Method and device for compressing and storing data based on sparse matrix

Country Status (2)

Country Link
CN (1) CN103761316B (en)
CA (1) CA2871435C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10644721B2 (en) 2018-06-11 2020-05-05 Tenstorrent Inc. Processing core data compression and storage system

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156407B (en) * 2014-07-29 2017-08-25 华为技术有限公司 Storage method, device and the storage device of index data
CN104574159B (en) * 2015-01-30 2018-01-23 华为技术有限公司 Data storage, querying method and device
CN109710611B (en) * 2018-12-25 2019-09-17 北京三快在线科技有限公司 The method of storage table data, the method, apparatus of lookup table data and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6009432A (en) * 1998-07-08 1999-12-28 Required Technologies, Inc. Value-instance-connectivity computer-implemented database
US8032499B2 (en) * 2007-05-21 2011-10-04 Sap Ag Compression of tables based on occurrence of values
US7769729B2 (en) * 2007-05-21 2010-08-03 Sap Ag Block compression of tables with repeated values
CN102402617A (en) * 2011-12-23 2012-04-04 天津神舟通用数据技术有限公司 Easily compressed database index storage system using fragments and sparse bitmap, and corresponding construction, scheduling and query processing methods

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10644721B2 (en) 2018-06-11 2020-05-05 Tenstorrent Inc. Processing core data compression and storage system
US10938413B2 (en) 2018-06-11 2021-03-02 Tenstorrent Inc. Processing core data compression and storage system

Also Published As

Publication number Publication date
CN103761316A (en) 2014-04-30
CA2871435A1 (en) 2015-07-26
CN103761316B (en) 2018-02-06

Similar Documents

Publication Publication Date Title
CA2921616C (en) Data storage method and apparatus
EP3188043B1 (en) Indirect filtering in blended data operations
US10515064B2 (en) Key-value storage system including a resource-efficient index
CA2871435C (en) Method and device for compressing and storing data based on sparse matrix
US10002143B2 (en) Computer implemented method for storing unlimited amount of data as a mind map in relational database systems
GB2582234A (en) Storing unstructured data in a structured framework
CN109710618A (en) The mixing storage method and system of knowledge mapping data relationship separation
CN108351867B (en) Enriched composite data objects
CN109446205B (en) Device and method for judging data state and device and method for updating data
JP2000339390A5 (en)
US10268737B2 (en) System and method for performing blended data operations
CN103778135A (en) Method for distribution storage and paging querying of real-time data
US10496645B1 (en) System and method for analysis of a database proxy
CN104572676A (en) Cross-database paging querying method for multi-database table
CN106326309A (en) Data query method and device
WO2014110940A1 (en) A method, apparatus and system for storing, reading the directory index
US8903797B2 (en) System and method for loading objects for object-relational mapping
US20080114752A1 (en) Querying across disparate schemas
CN102193988A (en) Method and system for retrieving node data in graphic database
CN105117333A (en) Method and system for management of test cases
WO2023197865A1 (en) Information storage method and apparatus
US20140067751A1 (en) Compressed set representation for sets as measures in olap cubes
US20160055211A1 (en) Apparatus and method for memory storage and analytic execution of time series data
US8533167B1 (en) Compressed set representation for sets as measures in OLAP cubes
CN116226133A (en) Method, device and storage medium for reading business data based on sub-table

Legal Events

Date Code Title Description
MKLA Lapsed

Effective date: 20211118