CN113742332A - Data storage method, device, equipment and storage medium - Google Patents

Data storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN113742332A
CN113742332A CN202010477537.2A CN202010477537A CN113742332A CN 113742332 A CN113742332 A CN 113742332A CN 202010477537 A CN202010477537 A CN 202010477537A CN 113742332 A CN113742332 A CN 113742332A
Authority
CN
China
Prior art keywords
data
attribute data
length
variable
length attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010477537.2A
Other languages
Chinese (zh)
Inventor
卢栋栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202010477537.2A priority Critical patent/CN113742332A/en
Publication of CN113742332A publication Critical patent/CN113742332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data

Abstract

The invention discloses a data storage method, a device, equipment and a storage medium, wherein the method comprises the following steps: the method comprises the steps of obtaining line data to be stored, determining data attributes of the line data, and storing the line data according to storage rules corresponding to the data attributes, wherein different data attributes correspond to different storage rules. By the method, the line data are stored by adopting different storage rules according to different data attributes in the line data, so that the space occupied by the data can be effectively reduced, and the waste of storage space is avoided.

Description

Data storage method, device, equipment and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data storage method, a data storage apparatus, a data storage device, and a computer storage medium.
Background
With the continuous development of computer technology, the amount of data is gradually increased, and the space for storing data is also increased.
Currently, data generated by running an application is typically stored in a database.
In the prior art, the process of storing data in a database usually inserts data to be stored into a row of the database, specifically, classifies the data to be stored according to each attribute field in the row, and inserts the classified data into a corresponding attribute field.
However, the prior art still has a large waste of storage space, which is not enough to meet the current requirement of database storage, and therefore, a data storage mode which saves more storage space is needed to be provided.
Disclosure of Invention
It is an object of the present invention to provide a new solution for data storage.
According to a first aspect of the present invention, there is provided a data storage method comprising:
acquiring row data to be stored;
determining data attributes of the line of data;
and storing the line data according to the storage rule corresponding to the data attribute, wherein different data attributes correspond to different storage rules.
Optionally, determining the data attribute of the line data includes:
determining fixed-length attribute data included in the line of data;
storing the line data according to a storage rule corresponding to the data attribute, including:
merging fixed-length attribute data which do not exceed the preset byte length;
and storing the combined fixed-length attribute data and other fixed-length attribute data into a fixed-length attribute field together.
Optionally, determining the data attribute of the line data further includes:
determining null attribute data included within the line of data;
storing the line data according to a storage rule corresponding to the data attribute, including:
establishing an identifier of the null value attribute data;
and adding the identification of the null value attribute data into any one blank storage unit in the null value attribute field.
Optionally, wherein the method further comprises:
generating a first compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data and the null value attribute data, wherein the data in each storage unit in the first compressed bitmap corresponds to at least two fixed-length attribute data and/or at least two null value attribute data;
storing the first compressed bitmap into a compressed control bitmap field.
Optionally, generating a first compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data, and the null value attribute data includes:
obtaining a primary compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data and the null value attribute data;
obtaining a secondary compressed bitmap according to the primary compressed bitmap;
wherein the first compressed bitmap comprises the primary compressed bitmap and the secondary compressed bitmap.
Optionally, determining the data attribute of the line data further includes:
determining variable length attribute data included in the line data;
storing the line of data according to the storage rule corresponding to the data attribute comprises:
generating a second compressed bitmap corresponding to each variable length attribute data aiming at each variable length attribute data, and acquiring the length of the second compressed bitmap;
determining a byte of a specified type in the variable length attribute data;
removing all bytes of the specified type;
placing the length of the second compressed bitmap and the compressed bitmap in front of the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data;
and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
Optionally, determining the data attribute of the line data further includes:
determining variable length attribute data included in the line data;
storing the line of data according to the storage rule corresponding to the data attribute comprises:
determining the bytes of the specified type in the tail part of the variable length attribute data and the number of the bytes for each variable length attribute data;
removing all bytes of the specified type;
placing the number of the bytes of the specified type before the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data;
and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
Optionally, wherein the method further comprises:
acquiring the byte length of each variable length attribute data after removing the bytes;
determining the offset length of each variable length attribute data according to the byte length of each variable length attribute data after removing the bytes;
counting the number of variable-length attribute data with the offset length exceeding a preset length;
storing the offset length into a first field;
the counted number is stored in the second field.
According to a second aspect of the present invention, there is provided a data storage device comprising:
the acquisition module is used for acquiring line data to be stored;
a determining module for determining a data attribute of the line data;
and the storage module is used for storing the line data according to the storage rules corresponding to the data attributes, wherein different data attributes correspond to different storage rules.
According to a third aspect of the present invention, there is provided a data storage apparatus comprising data storage means; alternatively, the first and second electrodes may be,
the apparatus comprises: a processor and a memory;
the memory is for storing executable instructions for controlling the processor to perform the data storage method according to the first aspect.
According to a fourth aspect of the present invention, there is provided a computer storage medium storing computer instructions which, when executed by a processor, implement the data storage method as described in the first aspect.
In the embodiment, different storage rules are adopted to store the line data aiming at different data attributes in the line data, so that the space occupied by the data can be effectively reduced, and the waste of the storage space is avoided.
Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a block diagram of a hardware configuration of a data storage device according to an embodiment of the present invention;
FIG. 2 is a flow chart of a data storage method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a data storage structure provided by an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a data storage device according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data storage device according to an embodiment of the present invention.
Detailed Description
Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
< hardware configuration >
Fig. 1 is a block diagram of a hardware configuration of a data storage device according to an embodiment of the present invention.
The data storage device 1000 may be a virtual machine or a physical machine. The data storage apparatus 1000 may include a processor 1100, a memory 1200, an interface device 1300, a communication device 1400, a display device 1500, an input device 1600, a speaker 1700, a microphone 1800, and the like. The processor 1100 may be a central processing unit CPU, a microprocessor MCU, or the like. The memory 1200 includes, for example, a ROM (read only memory), a RAM (random access memory), a nonvolatile memory such as a hard disk, and the like. The interface device 1300 includes, for example, a USB interface, a headphone interface, and the like. Communication device 1400 is capable of wired or wireless communication, for example. The display device 1500 is, for example, a liquid crystal display panel, a touch panel, or the like. The input device 1600 may include, for example, a touch screen, a keyboard, and the like. A user can input/output voice information through the speaker 1700 and the microphone 1800.
Memory 1200 is used to store computer program instructions that control processor 1100 to operate to perform storage methods according to any embodiment of the present invention, as applied to the present embodiment. The skilled person can design the instructions according to the disclosed solution. How the instructions control the operation of the processor 1100 is well known in the art and will not be described in detail herein.
Although a plurality of devices are shown for each data storage device 1000 in fig. 1, the present invention may relate to only some of the devices, for example, the data storage device 1000 relates to only the memory 1200 and the processor 1100.
In the above description, the skilled person will be able to design instructions in accordance with the disclosed solution. How the instructions control the operation of the processor is well known in the art and will not be described in detail herein.
< method examples >
An embodiment of the present invention provides a data storage method, as shown in fig. 2, including the following steps:
s201: and acquiring the row data to be stored.
In actual applications, data generated by running an application program is usually stored in a database.
Further, in the process of storing data, the embodiments of the present specification first need to acquire data of the line to be stored.
It should be noted that the line data described in the embodiments of the present specification includes data attributes different from each other.
S202: determining data attributes of the line of data.
Further, because the present invention stores the line data by using different storage rules according to different data attributes in the line data, the embodiment of the present invention needs to determine the data attribute of the line data after the line data to be stored is acquired.
It should be noted that the data attribute of the line data may include one data attribute, or may include a plurality of different data attributes, and the number of the data attributes may be determined according to a preset partition rule.
In the embodiment of the present invention, the partitioning rule may be as follows:
the data attribute of the line data may be divided according to the data type, or may be divided according to the storage characteristic of the data, for example, according to the byte type of the data.
S203: and storing the line data according to a storage rule corresponding to the data attribute.
Further, after the data attributes of the line data are determined, the line data are stored according to the storage rule corresponding to each data attribute.
It should be noted that each data attribute corresponds to one storage rule, and if the data attributes are different, the storage rules are also different.
And in the process of storing the line data, storing the data corresponding to the data attributes according to the respective corresponding storage rule by each data attribute.
By the method, the line data are stored by adopting different storage rules according to different data attributes in the line data, so that the space occupied by the data can be effectively reduced, and the waste of storage space is avoided.
Further, in practical applications, the storage space (i.e., the number of bytes) occupied by some attribute data stored in the database is fixed, that is, fixed-length attribute data, such as numerical data, regardless of the numerical value, the number of bytes occupied by each data value data is fixed and the same, so in this specification embodiment, the space occupied by the data can be reduced according to a specific manner for the fixed-length attribute data each time, that is, assuming that the method a is used when the line data is stored for the first time, the method a is still used when the line data is stored for the second time.
Based on this, in this embodiment of the present specification, after the line data to be stored is acquired, determining the data attribute of the line data may be determining fixed-length attribute data included in the line data.
Further, because the number of bytes occupied by some fixed-length attributes is not an integer number of bytes, that is, there is a half byte in the number of bytes occupied by some fixed-length attributes, for example, the number of bytes occupied by the attribute value 60 is a half byte, such an attribute is stored in one byte in the prior art, the attribute value 60 is stored in a half byte, all bits corresponding to the other half byte are represented by 0, from which it can be seen that the storage space occupied by the attribute value 60 is a complete byte as a whole, therefore, in the embodiment of the present specification, for fixed-length attribute data, fixed-length attributes which are not an integer number of bytes can be merged, for example, assuming that the number of bytes occupied by the attribute value of the attribute a is a half byte, and the number of bytes occupied by the attribute value of the data B is also a half byte, the attribute a and the attribute B are merged together, the attribute 1 and the attribute 2 are stored in the same byte together, and for example, the name of the attribute 1 is "gender", the attribute value is "male", the name of the attribute 2 is "achievement", the attribute value is "60", and the attribute 1 and the attribute 2 can be merged and stored in one byte.
Based on this, in the embodiment of the present specification, after determining the fixed-length attribute data included in the line data, for the fixed-length attribute data, the line data may be stored by using the following storage rule, specifically:
and merging the fixed-length attribute data which do not exceed the preset byte length, and storing the merged fixed-length attribute data and other fixed-length attribute data into a fixed-length attribute field together.
It should be noted that the preset byte length may be one byte, two bytes, or multiple bytes, and may be set according to actual situations.
Further, after the fixed-length attribute data that does not exceed the preset byte length is merged, in this embodiment of the present specification, for other fixed-length attribute data, the other fixed-length attribute data may be sorted from large to small according to the length of the occupied storage space, for fixed-length attribute data with the same length, the fixed-length attribute data with the same length is arranged according to the original arrangement order of the attributes, and finally, the sorted other fixed-length attribute data and the merged fixed-length attribute data are stored together in the fixed-length attribute field shown in fig. 3, where the fixed-length attribute data is stored in the fixed-length attribute field. Fig. 3 is a completely new data storage structure provided by the embodiments of the present disclosure.
By the method, the fixed-length attribute data which does not exceed the preset byte length are merged and stored in the fixed-length attribute field, so that the space occupied by the data is effectively reduced, and the waste of storage space is avoided.
Further, in practical applications, some attribute data are empty, no data is recorded, and these attribute data are empty, so after the line data to be stored is acquired, determining the data attribute of the line data may be determining null value attribute data included in the line data.
For null attribute data, the following storage rules may be adopted to store the row of data, specifically:
and establishing an identifier of the null value attribute data, and adding the identifier of the null value attribute data into any blank storage unit in the null value attribute field shown in fig. 3.
It should be noted that the blank storage location may be a bit or a byte, for example, a storage location in the null attribute field records an identifier of null attribute data, that is, a storage location in the null attribute field represents an attribute.
Further, in practical applications, the storage space (i.e., the number of bytes) occupied by some attribute data stored in the database changes with the attribute value, that is, the attribute data is variable in length, such as character-type data, and therefore, in the embodiment of the present specification, the space occupied by the data can be reduced in a specific manner for the variable-length attribute data each time, that is, it is assumed that the method a is used when the line data is stored for the first time, and the method a is still used when the line data is stored for the second time.
Based on this, in this specification embodiment, after the line data to be stored is acquired, determining the data attribute of the line data may be determining variable length attribute data included in the line data.
It should be noted that, in the embodiment of the present specification, for the variable-length attribute data, the line data may be stored by using the following two storage rules, specifically:
the first mode is as follows: generating a second compressed bitmap corresponding to each variable length attribute data aiming at each variable length attribute data, and acquiring the length of the second compressed bitmap; determining a byte of a specified type in the variable length attribute data; removing all bytes of the specified type; placing the length of the second compressed bitmap and the compressed bitmap in front of the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data; and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field shown in fig. 3 according to a preset storage sequence.
It should be noted here that the byte of the specified type may be 0 byte. And the generated second compressed bitmap corresponds to at least two data in each storage unit, and the bytes in the variable length attribute data before the bytes of the specified type are removed. The sequence of the data stored in the variable length attribute data block is as follows: the length of the second compressed bitmap is before the second compressed bitmap, and the second compressed bitmap is before the byte-removed variable-length attribute data. The preset storage sequence is a storage sequence among the variable-length attribute data, and describes the storage relationship among each variable-length attribute data block.
The second mode is as follows: determining the bytes of the specified type in the tail part of the variable length attribute data and the number of the bytes for each variable length attribute data; removing all bytes of the specified type; placing the number of the bytes of the specified type before the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data; and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
It should be noted that the byte of the specified type in the tail of the variable length attribute data refers to a byte of the specified type in which the tail of the variable length attribute data is continuously repeated, and if the byte of the specified type is 0 and the variable length attribute data is aafa0000, 0000 is the last byte of the specified type in which the tail of the variable length attribute data is continuously repeated, and the number of bytes of the specified type in the tail of the variable length attribute data is stored in the variable length attribute field so that the variable length attribute data before the byte is removed can be restored when the user accesses the variable length attribute. The sequence of the data stored in the variable length attribute data block is as follows: the number of bytes of the specified type is before the variable length attribute data after the bytes are removed. The preset storage sequence is a storage sequence among the variable-length attribute data, and describes the storage relationship among each variable-length attribute data block.
Furthermore, after the variable-length attribute data is compressed by the two storage rules and stored in the variable-length attribute field, the position of each variable length attribute data in the variable length attribute field needs to be known later in the access process, therefore, in the embodiment of the present specification, the byte length of each byte-removed variable length attribute data is acquired, the offset length of each variable length attribute data is determined according to the byte length of each variable length attribute data from which the bytes are removed, and the offset length is stored in the first field as shown in fig. 3, and, subsequently, in the process of accessing the variable length attribute data in the variable length attribute field, which variable length attribute data is desired to be accessed, the offset length of the variable length attribute data is directly obtained from the first field, and searching the variable length attribute data in the variable length attribute field according to the offset length of the variable length attribute data.
It should be noted that, specifically, the offset length of each of the variable length attribute data may be determined based on the byte length of each of the byte-removed variable length attribute data, and the offset length of each of the variable length attribute data may be determined from the rear to the front in accordance with the arrangement order of the variable length attribute data stored in the variable length attribute field, where the offset length of each of the variable length attribute data is the sum of the byte lengths of each of the byte-removed variable length attribute data arranged after the variable length attribute data.
Here, the offset length of the variable-length attribute data refers to a length of the variable-length attribute data from the end of the variable-length attribute field.
In addition, the variable length attribute data stored in the variable length attribute field is stored in an original order, wherein the original order refers to the storage order of the variable length attribute data in the prior art, the offset length corresponding to the variable length attribute is stored in the first field according to the storage order of the variable length attribute data in the variable length attribute field, and the offset length exceeding a certain value occupies two bytes, i.e., a double-byte offset, when stored, and the offset length below the certain value occupies one byte, i.e., a single-byte offset, when stored.
Furthermore, although the offset length is stored in the first field according to the storage sequence of the variable length attribute data in the variable length attribute field, and the offset length exceeding a certain value occupies two bytes, i.e. a double-byte offset, and the offset length below the certain value occupies one byte, i.e. a single-byte offset, when stored, it is not known which offset length occupies two bytes and which offset length occupies one byte in the first field, therefore, in this specification embodiment, it is desired to accurately find the offset length corresponding to each variable length attribute data, the number of the variable length attribute data whose offset length exceeds the preset length can be counted, the counted number is stored in the second field as shown in fig. 3, and in the subsequent process of accessing the variable length attribute data in the variable length attribute field, the offset length of the variable-length attribute data to be accessed can be determined according to the number stored in the second field and the ordering of the offset lengths of the variable-length attribute data in the first field.
And searching the variable length attribute data in the variable length attribute field according to the offset length of the variable length attribute data.
It should be noted that the preset length may be two bytes.
It should be noted that, in the prior art, when accessing a certain attribute data based on the storage structure in the prior art, the stored whole line of data needs to be parsed into each individual attribute data, and then the attribute data to be accessed is searched from the parsed attribute data, but the present invention determines the offset length of each variable-length attribute data according to the byte length of each variable-length attribute data from which bytes are removed, counts the number of variable-length attribute data whose offset length exceeds the preset length, stores the offset length in the first field, stores the counted number in the second field, and subsequently determines the offset length of the variable-length attribute data to be accessed according to the number stored in the second field and the ordering of the offset lengths of the variable-length attribute data in the first field when accessing a certain attribute data, and then the offset length is obtained in the first field, and finally the attribute data to be accessed is found in the variable-length attribute field directly according to the offset length, so that the speed of accessing the attribute data by the user is greatly improved.
Further, the speed of accessing the attribute data by the user is mainly improved for the variable length attribute data, but in the embodiment of the present specification, the speed of accessing the attribute data by the user can also be improved for the fixed length attribute data and the null value attribute data in the following manner.
Specifically, a first compressed bitmap is generated according to the combined fixed-length attribute data, other fixed-length attribute data, and the null value attribute data, where data in each storage unit in the first compressed bitmap corresponds to at least two fixed-length attribute data and/or at least two null value attribute data, and the first compressed bitmap is stored in the compressed control bitmap field shown in fig. 3.
It should be noted that, the first compressed bitmap is generated according to the combined fixed-length attribute data, other fixed-length attribute data, and the null attribute data, and specifically, the combined fixed-length attribute data, other fixed-length attribute data, and the null attribute data may be compressed by a compression algorithm (e.g., a bitmap algorithm) to generate the first compressed bitmap.
The data in each storage unit in the first compressed bitmap corresponds to at least two fixed-length attribute data and/or at least two null attribute data, and the assumption is made that one byte of data is stored in one storage unit in the first compressed bitmap, and the one byte of data is equivalent to eight bytes of data in a fixed-length attribute field or null attribute data.
In addition, the generated first compressed bitmap may be a multi-level compressed bitmap, each level of the compressed bitmap being generated based on a previous level of the compressed bitmap, the storage unit of each level of the compressed bitmap corresponding to a plurality of storage units of the previous level of the compressed bitmap, that is, each level of the compressed bitmap being an index of the previous level of the compressed bitmap, e.g., a two-level compressed bitmap, i.e., a first level compressed bitmap and a second level compressed bitmap, wherein the second level of the compression is generated based on the first level compressed bitmap, one byte of data in the first level compressed bitmap corresponds to eight bytes of data in the fixed-length attribute field, and one double byte of data in the second level compressed bitmap corresponds to four bytes or eight bytes of data in the first level compressed bitmap.
Based on this, an embodiment of the present invention provides an implementation manner for generating a first compressed bitmap according to the merged fixed-length attribute data, other fixed-length attribute data, and the null value attribute data, which is specifically as follows:
and obtaining a primary compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data and the null value attribute data, and obtaining a secondary compressed bitmap according to the primary compressed bitmap, wherein the first compressed bitmap comprises the primary compressed bitmap and the secondary compressed bitmap.
By the implementation mode, the multi-level index is established in a multi-level bitmap compression mode, and the speed of accessing the attribute data by a user can be greatly improved.
< apparatus embodiment >
As shown in fig. 4, the present embodiment also provides a data storage device including:
an obtaining module 401, configured to obtain line data to be stored;
a determining module 402, configured to determine a data attribute of the line data;
a storage module 403, configured to store the line of data according to a storage rule corresponding to the data attribute, where different data attributes correspond to different storage rules.
The determining module 402 is specifically configured to determine fixed-length attribute data included in the line data; the storage module 403 is specifically configured to merge fixed-length attribute data that does not exceed a preset byte length; and storing the combined fixed-length attribute data and other fixed-length attribute data into a fixed-length attribute field together.
The determining module 402 is specifically configured to determine null attribute data included in the line data; the storage module 403 is specifically configured to establish an identifier of the null value attribute data; and adding the identification of the null value attribute data into any one blank storage unit in the null value attribute field.
The device further comprises:
a compressed bitmap establishing module 404, configured to generate a first compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data, and the null attribute data, where data in each storage unit in the first compressed bitmap corresponds to at least two fixed-length attribute data and/or at least two null attribute data; storing the first compressed bitmap into a compressed control bitmap field.
The compressed bitmap establishing module 404 is specifically configured to obtain a first-level compressed bitmap according to the combined fixed-length attribute data, the other fixed-length attribute data, and the null value attribute data; obtaining a secondary compressed bitmap according to the primary compressed bitmap; wherein the first compressed bitmap comprises a primary compressed bitmap and the secondary compressed bitmap.
The determining module 402 is specifically configured to determine variable-length attribute data included in the line data; the storage module 403 is specifically configured to, for each variable-length attribute data, generate a second compressed bitmap corresponding to the variable-length attribute data, and obtain a length of the second compressed bitmap; determining a byte of a specified type in the variable length attribute data; removing all bytes of the specified type; placing the length of the second compressed bitmap and the compressed bitmap in front of the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data; and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
The determining module 402 is specifically configured to determine variable-length attribute data included in the line data; the storage module 403 is specifically configured to, for each variable length attribute data, determine bytes of a specified type in the tail of the variable length attribute data and the number of the bytes; removing all bytes of the specified type; placing the number of the bytes of the specified type before the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data; and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
The device further comprises:
an offset length storage module 405, configured to obtain the byte length of each byte-removed variable length attribute data; determining the offset length of each variable length attribute data according to the byte length of each variable length attribute data after removing the bytes; counting the number of variable-length attribute data with the offset length exceeding a preset length; storing the offset length into a first field; the counted number is stored in the second field.
< apparatus embodiment >
An embodiment of the present invention further provides a data storage device, as shown in fig. 5, where the data storage device includes the data storage apparatus in the foregoing apparatus embodiment.
Alternatively, the data storage device includes a memory 501 and a processor 502. Wherein the memory is configured to store executable instructions for controlling the processor to perform the method according to any of the above method embodiments.
< computer storage Medium >
The invention also provides a computer storage medium storing computer instructions which, when executed by a processor, implement a method as in any one of the above method embodiments.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims (11)

1. A method of data storage, the method comprising:
acquiring row data to be stored;
determining data attributes of the line of data;
and storing the line data according to the storage rule corresponding to the data attribute, wherein different data attributes correspond to different storage rules.
2. The method of claim 1, wherein determining data attributes for the line of data comprises:
determining fixed-length attribute data included in the line of data;
storing the line data according to a storage rule corresponding to the data attribute, including:
merging fixed-length attribute data which do not exceed the preset byte length;
and storing the combined fixed-length attribute data and other fixed-length attribute data into a fixed-length attribute field together.
3. The method of claim 2, wherein determining data attributes for the line of data further comprises:
determining null attribute data included within the line of data;
storing the line data according to a storage rule corresponding to the data attribute, including:
establishing an identifier of the null value attribute data;
and adding the identification of the null value attribute data into any one blank storage unit in the null value attribute field.
4. The method of claim 3, further comprising:
generating a first compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data and the null value attribute data, wherein the data in each storage unit in the first compressed bitmap corresponds to at least two fixed-length attribute data and/or at least two null value attribute data;
storing the first compressed bitmap into a compressed control bitmap field.
5. The method of claim 4, wherein generating a first compressed bitmap from the merged fixed-length attribute data, other fixed-length attribute data, and the null attribute data comprises:
obtaining a primary compressed bitmap according to the combined fixed-length attribute data, other fixed-length attribute data and the null value attribute data;
obtaining a secondary compressed bitmap according to the primary compressed bitmap;
wherein the first compressed bitmap comprises the primary compressed bitmap and the secondary compressed bitmap.
6. The method of claim 1, wherein determining data attributes for the line of data further comprises:
determining variable length attribute data included in the line data;
storing the line of data according to the storage rule corresponding to the data attribute comprises:
generating a second compressed bitmap corresponding to each variable length attribute data aiming at each variable length attribute data, and acquiring the length of the second compressed bitmap;
determining a byte of a specified type in the variable length attribute data;
removing all bytes of the specified type;
placing the length of the second compressed bitmap and the compressed bitmap in front of the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data;
and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
7. The method of claim 1, wherein determining data attributes for the line of data further comprises:
determining variable length attribute data included in the line data;
storing the line of data according to the storage rule corresponding to the data attribute comprises:
determining the bytes of the specified type in the tail part of the variable length attribute data and the number of the bytes for each variable length attribute data;
removing all bytes of the specified type;
placing the number of the bytes of the specified type before the variable-length attribute data after the bytes are removed to obtain a variable-length attribute data block corresponding to the variable-length attribute data;
and storing the variable length attribute data block corresponding to each variable length attribute data into the variable length attribute field according to a preset storage sequence.
8. The method according to claim 6 or 7, characterized in that the method further comprises:
acquiring the byte length of each variable length attribute data after removing the bytes;
determining the offset length of each variable length attribute data according to the byte length of each variable length attribute data after removing the bytes;
counting the number of variable-length attribute data with the offset length exceeding a preset length;
storing the offset length into a first field;
the counted number is stored in the second field.
9. A data storage device, comprising:
the acquisition module is used for acquiring line data to be stored;
a determining module for determining a data attribute of the line data;
and the storage module is used for storing the line data according to the storage rules corresponding to the data attributes, wherein different data attributes correspond to different storage rules.
10. A data storage device, comprising: a processor and a memory;
the memory is to store executable instructions to control the processor to perform the method of any one of claims 1-8.
11. A computer storage medium storing computer instructions, the computer instructions in the storage medium when executed by a processor implementing the method of any one of claims 1-8.
CN202010477537.2A 2020-05-29 2020-05-29 Data storage method, device, equipment and storage medium Pending CN113742332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010477537.2A CN113742332A (en) 2020-05-29 2020-05-29 Data storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010477537.2A CN113742332A (en) 2020-05-29 2020-05-29 Data storage method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113742332A true CN113742332A (en) 2021-12-03

Family

ID=78724807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010477537.2A Pending CN113742332A (en) 2020-05-29 2020-05-29 Data storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113742332A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168085A (en) * 2021-12-16 2022-03-11 潍柴动力股份有限公司 Variable processing method, device, equipment and storage medium
CN114443670A (en) * 2022-04-07 2022-05-06 北京奥星贝斯科技有限公司 Data storage and reading method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114168085A (en) * 2021-12-16 2022-03-11 潍柴动力股份有限公司 Variable processing method, device, equipment and storage medium
CN114168085B (en) * 2021-12-16 2024-02-20 潍柴动力股份有限公司 Variable processing method, device, equipment and storage medium
CN114443670A (en) * 2022-04-07 2022-05-06 北京奥星贝斯科技有限公司 Data storage and reading method and device
CN114443670B (en) * 2022-04-07 2022-07-08 北京奥星贝斯科技有限公司 Data storage and reading method and device

Similar Documents

Publication Publication Date Title
US10698912B2 (en) Method for processing a database query
US10585915B2 (en) Database sharding
CN107704202B (en) Method and device for quickly reading and writing data
CN111090628A (en) Data processing method and device, storage medium and electronic equipment
US10127254B2 (en) Method of index recommendation for NoSQL database
CN109086456B (en) Data indexing method and device
CN113742332A (en) Data storage method, device, equipment and storage medium
CN111708805A (en) Data query method and device, electronic equipment and storage medium
JP2021535473A (en) Token matching in a large document corpus
CN114579561A (en) Data processing method and device, and storage medium
CN114817651B (en) Data storage method, data query method, device and equipment
CN108255486B (en) View conversion method and device for form design and electronic equipment
CN112559497B (en) Data processing method, information transmission method, device and electronic equipment
CN112000667B (en) Method, apparatus, server and medium for retrieving tree data
CN111488341B (en) Database index management method and device and electronic equipment
CN114117149A (en) Sensitive word filtering method and device and storage medium
CN109840080B (en) Character attribute comparison method and device, storage medium and electronic equipment
CN111723177A (en) Modeling method and device of information extraction model and electronic equipment
CN114359610B (en) Entity classification method, device, equipment and storage medium
CN111143232A (en) Method, apparatus and computer program product for storing metadata
CN110911015B (en) Disease name standardization rapid calculation method based on profile implicit Markov model
US20230214394A1 (en) Data search method and apparatus, electronic device and storage medium
CN112612925B (en) Data storage method, data reading method and electronic equipment
CN115567584A (en) Processing method and device of subscription theme, electronic equipment and readable storage medium
CN115794800A (en) Data processing method, data processing device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination