CN109033271B - Data insertion method and device based on column storage, server and storage medium - Google Patents

Data insertion method and device based on column storage, server and storage medium Download PDF

Info

Publication number
CN109033271B
CN109033271B CN201810749909.5A CN201810749909A CN109033271B CN 109033271 B CN109033271 B CN 109033271B CN 201810749909 A CN201810749909 A CN 201810749909A CN 109033271 B CN109033271 B CN 109033271B
Authority
CN
China
Prior art keywords
data
preset
column
auxiliary
rows
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810749909.5A
Other languages
Chinese (zh)
Other versions
CN109033271A (en
Inventor
郭琰
王攀
周智伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN201810749909.5A priority Critical patent/CN109033271B/en
Publication of CN109033271A publication Critical patent/CN109033271A/en
Application granted granted Critical
Publication of CN109033271B publication Critical patent/CN109033271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a data insertion method, a device, a server and a storage medium based on column storage, relating to the field of databases, wherein the method comprises the following steps: acquiring data to be inserted of a column storage table; acquiring data inserted into the auxiliary table corresponding to the column storage table; if the total number of lines of the data to be inserted and the data inserted into the auxiliary table is less than the preset number of lines, inserting the data to be inserted into the auxiliary table; if the total row number of the data to be inserted and the data inserted into the auxiliary table is greater than or equal to the preset row number, storing each column of the data to be inserted and the data to be inserted into the auxiliary table into one or more data areas according to the preset row number, emptying the auxiliary table, and storing the rest data into the auxiliary table; and storing one or more data areas as a data file of a column storage table, and inserting the acquired control information and statistical information of each data area into a column storage auxiliary table corresponding to the column storage table. By adopting the technical scheme, the data insertion efficiency of the column memory table is improved.

Description

Data insertion method and device based on column storage, server and storage medium
Technical Field
The embodiment of the invention relates to a database technology, in particular to a data insertion method, a data insertion device, a server and a storage medium based on column storage.
Background
In order to improve the data query performance, a method for storing data in a column storage mode is developed. The lookup performance of the column memory table is inherently better than the row memory table, but conversely the insertion performance of the column memory table is lower than the row memory table.
The basic idea of the column memory table is to store data in the table in units of columns, and the physical locations where data in one row is stored are not consecutive, so when frequently inserted in a single row (or few rows), the performance of the column memory table is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data insertion method and apparatus based on column storage, a server, and a storage medium, so as to solve the problem of complex operation when small data is frequently inserted.
In a first aspect, an embodiment of the present invention provides a data insertion method based on column storage, where the method includes:
acquiring data to be inserted of a column storage table;
acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset line number;
if the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, inserting the data to be inserted into the insertion auxiliary table;
if the total row number of the data to be inserted and the data in the auxiliary insertion table is greater than or equal to the preset row number, storing each column of the data to be inserted and the data to be inserted into the auxiliary insertion table into one or more data areas according to the preset row number, emptying the auxiliary insertion table, and storing the rest data in the data to be inserted into the auxiliary insertion table;
and storing the one or more data areas as a data file of the column storage table, acquiring the control information and the statistical information of each data area, and inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table.
In a second aspect, an embodiment of the present invention further provides a data insertion apparatus based on column storage, where the apparatus includes:
the first data acquisition module is used for acquiring data to be inserted of the column storage table;
the second data acquisition module is used for acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset line number;
the data inserting module is used for inserting the data to be inserted into the auxiliary inserting table if the total row number of the data to be inserted and the data in the auxiliary inserting table is smaller than the preset row number;
the data storage module is used for storing the data to be inserted and the data to be inserted into each column into one or more data areas according to the preset number of rows, emptying the insertion auxiliary table and storing the residual data in the data to be inserted into the insertion auxiliary table if the total number of rows of the data to be inserted and the data in the insertion auxiliary table is greater than or equal to the preset number of rows;
the file saving module is used for saving the one or more data areas as the data files of the column storage table; acquiring control information and statistical information of each data area; and inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table.
In a third aspect, an embodiment of the present invention further provides a server, including;
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the data insertion method based on column storage according to any embodiment of the present invention.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data insertion method based on column storage according to any one of the embodiments of the present invention.
According to the technical scheme provided by the embodiment of the invention, when data is inserted, the insertion rule of the data is determined by judging the data to be inserted and the size relation between the total line number of the insertion auxiliary table and the preset line number. And when the data to be inserted and the total row number of the inserted auxiliary table are less than the preset row number, directly storing the data to be inserted in the inserted auxiliary table, otherwise, performing partition storage on the data to be inserted and the data in the inserted auxiliary table. The problem of frequently reading and writing the data file when inserting the data based on column storage is solved, IO is reduced, and then the effect of data insertion efficiency is promoted.
Drawings
FIG. 1 is a flowchart of a data insertion method based on column storage according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data insertion method based on column storage according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a data insertion method based on column storage according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data insertion device based on column storage according to a fourth embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a server according to a fifth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of a data insertion method based on column storage according to an embodiment of the present invention, where the embodiment is applicable to a data insertion system, and the method may be executed by a data insertion device, as shown in fig. 1, a technical solution provided by the embodiment specifically includes the following steps:
step 110, data to be inserted of the column storage table is obtained.
Column storage is to store data in units of columns, and the specific storage rule and management implementation mode directly determine the operation efficiency of a column storage table. The data that the user needs to insert, i.e. the data to be inserted, specifies the column storage table to be inserted.
The data to be inserted in the embodiment of the present application may be any data object that needs to be inserted, and may be, for example, a student data object, where the student data object generally includes fields of name, gender, school number, age, grade, and the like of a student, or a teacher data object, where the teacher data object generally includes fields of name, age, teaching age, salary, and the like.
And step 120, acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset row number.
In this embodiment, the preset number of rows is a preset number of rows, the column storage table is used for storing each column of data of the original data table in a partitioned manner according to the preset number of rows, such an area is referred to as a data area, and the preset number of rows is also referred to as an area size.
Optionally, the structure of the insertion auxiliary table is in a form of row storage, and is used for buffering data inserted into the column storage table, and the number of rows of the buffered data is smaller than the area size of the data area in the column storage table. When the number of data lines inserted into the auxiliary table reaches the area size, the data inserted into the auxiliary table is written into the data file corresponding to each column in units of columns, and then the auxiliary table is emptied. The data to be inserted is row storage data, and the structure of the insertion auxiliary table is the same as that of the data to be inserted.
According to the embodiment, the data is inserted into the insertion auxiliary table first when the small data amount is frequently inserted, and the data is written into the data file when the number of data lines inserted into the auxiliary table reaches the area size, so that frequent reading and writing of the data file are avoided, IO (input/output) is reduced, and the efficiency is improved.
When the data to be inserted is acquired, the data in the auxiliary insertion table corresponding to the column storage table is acquired first, and because the data in the auxiliary insertion table is earlier than the insertion time of the data to be inserted, it needs to be determined whether the total number of rows of the data to be inserted and the data to be inserted in the auxiliary insertion table is less than the preset number of rows.
Step 130, judging whether the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, if so, executing step 140, otherwise, executing step 150.
Step 140, inserting the data to be inserted into the insertion auxiliary table;
after data to be inserted by a user and data inserted into the auxiliary table are obtained, the total row number and the preset row number of the data to be inserted and the data inserted into the auxiliary table are compared, and if the total row number of the data to be inserted and the data inserted into the auxiliary table is smaller than the preset row number, the data to be inserted is directly stored into the auxiliary table.
Illustratively, if the preset number of rows is 10, the number of rows of data to be inserted is 5, and no data is inserted into the auxiliary table before data is inserted. Since 5+0<10, the data is directly inserted into the insertion auxiliary table.
Illustratively, the preset number of rows is 10, the number of rows of data to be inserted is 5, and the number of rows of data to be inserted into the auxiliary table before data insertion is 4. Since 5+4<10, the data is inserted directly into the insertion assistance table.
Step 150, storing each column of the data to be inserted and the data to be inserted in the auxiliary table as one or more data areas according to a preset number of rows, emptying the auxiliary table, and storing the remaining data in the data to be inserted in the auxiliary table.
And if the total row number of the data to be inserted and the data inserted into the auxiliary table is greater than or equal to the preset row number through comparison, taking data formed by the data inserted into the auxiliary table and the data to be inserted according to the sequence as new data to be inserted, storing each column of the new data to be inserted into one or more data areas according to the preset row number, storing the data in corresponding data files, and storing the rest data in the auxiliary table.
Illustratively, the preset number of rows is 10, the number of rows of data to be inserted is 5, and the number of rows of data to be inserted into the auxiliary table before data insertion is 8. Since 5+8>10, 8 lines of data inserted into the auxiliary table and the first 2 lines of data to be inserted are stored as one data area per column, respectively, and the remaining 3 lines of data in the strip insertion data are stored in the auxiliary table.
Step 160, saving the one or more data areas as the data file of the column storage table, obtaining the control information and the statistical information of each data area, and inserting the control information and the statistical information of each data area into the column storage auxiliary table corresponding to the column storage table.
Optionally, the data in the same data area is stored in the same data file, and one data file may store one to multiple data areas. The control information and the statistical information of each data area generate a corresponding column storage secondary table record, which is inserted into the column storage secondary table. The column storage auxiliary table is used for recording control information such as offset addresses and data lengths of each data area of each column in the data file, and statistical information such as maximum values and minimum values of column values stored in each area.
Optionally, the column storage auxiliary table has the following structure:
TABLE 1 column storage auxiliary table structure
Figure BDA0001725303510000061
Figure BDA0001725303510000071
The structure of the above-described storage table is explained below:
1) column number: the column is the corresponding sequence number in the table definition when the table is created;
2) area code: different data areas have different numbers, and the number corresponding to the data area is an area number;
3) file number: the file number corresponding to the data file;
4) offset in file: for example, if three data areas are stored in the same data file, the offset in the file of the first data area is 0, the offset in the file of the second data area is the data space occupied by the first data area, and the offset in the file of the third data area is the data space occupied by the first and second data areas.
5) Zone size: the total row number of the data which can be stored in the data area is preset by a user;
6) number of valid data lines in area: removing the line number of the data in the data area after the data are deleted;
7) the size of the occupied space of the data is as follows: the number of bytes occupied by data storage;
8) number of rows of NULL values included: the number of lines occupied by the data null value in the data area;
9) number of lines where all data are different from each other: the number of lines occupied by mutually different data in the data stored in the column storage table;
10) maximum within the zone: a maximum data value in the data area;
11) minimum in zone: a minimum data value in the data region;
12) the sum of all values in the field, all data values in the data field, is summed.
Wherein, the column number, the area number, the file number, the size of the occupied space of the data and the offset in the file in the column storage auxiliary table are control information; the maximum value in the area, the minimum value in the area, the sum of all values in the area, the area size, the number of lines of effective data in the area, the number of lines of included null values and the number of lines of all data which are different from each other are statistical information.
According to the technical scheme provided by the embodiment of the invention, when data is inserted, the specific insertion rule of the data is determined by judging the size relationship between the total row number of the data to be inserted and the data inserted into the auxiliary table and the preset row number. When the number of rows of data inserted into the auxiliary table reaches the size of the data area, each row is stored into one or more data areas according to the preset number of rows and stored in corresponding data files, the data inserted into the auxiliary table is emptied, and the rest of data is stored in the auxiliary table to be read and written frequently, so that the problem that the data files are frequently read and written when the data stored on the basis of the rows are inserted is solved, IO (input/output) is reduced, and the data insertion efficiency is improved.
On the basis of the technical scheme, the method can also optionally comprise the following steps:
and writing the data in the auxiliary insertion table into the data file of the column storage table at preset time, and emptying the auxiliary insertion table.
The preset time user can set according to needs, for example, the preset time user can set the time when the system is idle, such as 3 am every day or 3 am every weekday.
When the preset time is reached, the data inserted into the auxiliary table is written into the data file of the column storage table, so that the problem of expansion of the data inserted into the auxiliary table can be avoided, and the query efficiency of the data is improved.
Example two
Fig. 2 is a flowchart of a data insertion method based on column storage according to a second embodiment of the present invention. The present embodiment provides a preferred embodiment based on the above embodiments, and reference is made to the first embodiment for details that are not described in detail in the present embodiment. As shown in fig. 2, the method for inserting data based on column storage according to this embodiment includes the following steps:
step 210, obtaining the data to be inserted of the column storage table.
Step 220, acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, where the insertion auxiliary table is used to record insertion data smaller than a preset number of rows.
Step 230, determining whether the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, if so, executing step 240, otherwise, executing step 250.
And 240, inserting the data to be inserted into the insertion auxiliary table.
Step 250, judging whether the data inserted into the auxiliary table is empty, if so, executing step 260, otherwise, executing step 270.
Step 260, storing the integral multiple of the data with the preset number of rows in the data to be inserted into the data area with the number of the integral multiple and the number of rows as the preset number of rows according to each column, storing the remaining data in the data to be inserted into the insertion auxiliary table, and then executing step 280.
Illustratively, if the preset number of rows is 3, the number of rows of data to be inserted is 8, and no data is inserted into the auxiliary table before data is inserted. Since 8+0>3, the data with the first 2 times of the preset number of rows in the data to be inserted, that is, 6 rows, are extracted, 2 data areas are generated according to each column, the number of rows of the data areas is 3, the extracted data is stored in the corresponding data file, and the remaining 2 rows of data in the data to be inserted are stored in the insertion auxiliary table.
Step 270, extracting data in the insertion data table and data in front of the data to be inserted to form integer multiple of preset number of rows, storing the data in each column as a data area with the number being the integer multiple and the number being the preset number of rows, clearing the insertion auxiliary table, inserting the remaining data in the data to be inserted into the insertion auxiliary table, and then executing step 280.
Illustratively, if the preset number of rows is 3, the number of rows of data to be inserted is 8, and the number of rows of data to be inserted into the auxiliary table before data insertion is 2. And as 8+2>3, taking out the insertion auxiliary table and the data with 3 times of the preset row number in the data to be inserted, namely 9 rows, generating 3 data areas according to each column, wherein the row number of the data areas is 3, storing the taken out data in a corresponding data file, and storing the rest 1 row of data in the data to be inserted in the insertion auxiliary table.
Step 280, saving the one or more data areas as the data file of the column storage table, obtaining the control information and the statistical information of each data area, and inserting the control information and the statistical information of each data area into the column storage auxiliary table corresponding to the column storage table.
According to the technical scheme provided by the embodiment of the invention, when data is inserted, the insertion rule of the data is determined by judging the size relationship between the total row number of the data to be inserted and the data inserted into the auxiliary table and the preset row number. When the total number of rows of the data to be inserted and the data inserted into the auxiliary table exceeds an integral multiple of a preset number of rows, storing the data into a data area with the number of the integral multiple and the number of rows as the preset number of rows according to each column, writing the data into a data file, emptying the data inserted into the auxiliary table, and storing the rest data into the auxiliary table to be inserted. The problem of frequently reading and writing the data file is solved, IO is reduced, and data insertion efficiency is improved.
EXAMPLE III
Fig. 3 is a flowchart of a data insertion method based on column storage according to a third embodiment of the present invention. The present embodiment provides a preferred embodiment based on the second embodiment, and reference is made to the second embodiment for details that are not described in detail in the present embodiment. As shown in fig. 3, the method for inserting data based on column storage according to this embodiment includes the following steps:
step 301, acquiring data to be inserted of a column storage table.
Step 302, obtaining data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, where the insertion auxiliary table is used to record insertion data smaller than a preset number of rows.
Step 303, determining that the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, if so, executing step 304, otherwise, executing step 305.
Step 304, inserting the data to be inserted into the insertion auxiliary table.
Step 305, determining whether the data inserted into the auxiliary table is empty, if yes, executing step 306, otherwise executing step 307.
Step 306, storing the integral multiple of the data with the preset number of rows in the data to be inserted as the data area with the number of the integral multiple and the number of rows as the preset number of rows according to each column, storing the remaining data in the data to be inserted into the insertion auxiliary table, and then executing step 310.
Step 307, determining whether the total row number of the data inserted into the data table and the data to be inserted is less than twice the preset row number, if so, executing step 308, otherwise, executing step 309.
Step 308, extracting data in the insertion auxiliary table and data in front of the to-be-inserted data to form data with a preset number of rows, storing the data as a data area according to each column, emptying the insertion auxiliary table, inserting the remaining data in the to-be-inserted data into the insertion auxiliary table, and then executing step 310.
Illustratively, if the preset number of rows is 6, the number of rows of data to be inserted is 8, and the number of rows of data to be inserted into the auxiliary table before data insertion is 2, since 12>8+2>6, the data to be inserted into the auxiliary table and the first 4 rows of data to be inserted into the data are taken out to form 6 rows of data, 1 data area is generated according to each column and stored in a corresponding data file, the auxiliary table is emptied, and the remaining 4 rows of data to be inserted into the data are stored in the auxiliary table.
Step 309, extracting data in the insertion auxiliary table and data in front of the to-be-inserted data to form data with a preset number of rows, storing the data with the number of the remaining integer multiple of the preset number of rows in the to-be-inserted data as a data area with the number of the remaining integer multiple and the number of rows as the preset number of rows in each column, emptying the insertion auxiliary table, inserting the remaining data in the to-be-inserted data into the insertion auxiliary table, and then executing step 310.
Illustratively, if the preset number of rows is 4, the number of rows of data to be inserted is 15, and the number of rows of data to be inserted into the auxiliary table before data insertion is 2. Because 15+2> (4 × 2), the data inserted into the auxiliary table and the first 2 rows of data in the data to be inserted are taken out to form 4 rows of data, 1 data area is generated according to each column and is stored in a corresponding data file, the remaining 13 rows of data to be inserted also comprise data with 3 times of preset rows, namely 12 rows of data, each column is stored as data areas with 3 and 4 rows, so that each column stores data areas with 4 rows, the data areas are stored in the corresponding data file, the auxiliary table is emptied, and the remaining 1 row of data is inserted into the auxiliary table.
Step 310, saving the one or more data areas as the data file of the column storage table, obtaining the control information and the statistical information of each data area, and inserting the control information and the statistical information of each data area into the column storage auxiliary table corresponding to the column storage table.
According to the technical scheme provided by the embodiment of the invention, when data is inserted, the insertion rule of the data is determined by judging the size relation between the total row number of the data to be inserted and the data inserted into the auxiliary table and twice the preset row number. When the total row number of the data inserted into the data table and the data to be inserted is less than twice of the preset row number, extracting the data inserted into the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, and storing the data as a data area according to each column; otherwise, extracting the data inserted into the auxiliary table and the previous data in the data to be inserted to form data with preset line number, storing the data with the preset line number as a data area according to each column, and storing the data with the residual integral multiple of the preset line number in the data to be inserted as the data area with the residual integral multiple of the number and the line number as the preset line number according to each column. And then emptying the insertion auxiliary table and inserting the residual data in the data to be inserted into the insertion auxiliary table. The problem of frequently reading and writing the data file is solved, IO is reduced, and data insertion efficiency is improved.
Example four
Fig. 4 is a flowchart of a data insertion apparatus based on column storage according to a fourth embodiment of the present invention, where the apparatus is configured to execute a data insertion method based on column storage. As shown in fig. 4, the apparatus includes a first data obtaining module 410, a second data obtaining module 420, a data inserting module 430, a data storing module 440, and a file saving module 450.
The first data obtaining module 410 is configured to obtain data to be inserted in a column storage table;
a second data obtaining module 420, configured to obtain, according to the data to be inserted, data in an auxiliary insertion table corresponding to the column storage table, where the auxiliary insertion table is used to record insertion data smaller than a preset number of rows;
a data inserting module 430, configured to insert the data to be inserted into the auxiliary insertion table if the total number of rows of the data to be inserted and the data in the auxiliary insertion table is less than the preset number of rows;
a data storage module 440, configured to, if the total number of rows of the data to be inserted and the data in the auxiliary insertion table is greater than or equal to the preset number of rows, store each column of the data to be inserted and the data to be inserted in the auxiliary insertion table as one or more data areas according to the preset number of rows, empty the auxiliary insertion table, and store remaining data in the data to be inserted in the auxiliary insertion table;
a file saving module 450, configured to save the one or more data areas as data files of the column storage table; acquiring control information and statistical information of each data area; and inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table.
Further, the data storage module comprises:
a first data storage unit, configured to store, if the data in the insertion auxiliary table is empty, the data of the previous integer multiple of the preset number of rows in the data to be inserted as a data area of which the number is the integer multiple and the number of rows is the preset number of rows according to each column, and store the remaining data in the data to be inserted into the insertion auxiliary table;
and the second data storage unit is used for extracting data in the inserted data table and data forming an integral multiple of a preset number of rows by the previous data in the data to be inserted if the data in the inserted auxiliary table is not empty, storing the data in a data area with the integral multiple of the number of rows as the preset number of rows according to each column, emptying the inserted auxiliary table, and inserting the residual data in the data to be inserted into the inserted auxiliary table.
Further, the second data storage unit is specifically configured to:
if the total row number of the data inserted into the data table and the data to be inserted is greater than or equal to the preset row number and less than twice of the preset row number, extracting the data inserted into the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, and storing the data as a data area according to each column;
if the total row number of the data to be inserted and the data to be inserted are larger than two times of the preset row number, extracting the data to be inserted in the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, storing the data with the residual integral multiple of the preset row number in the data to be inserted into a data area according to each column, and storing the data with the residual integral multiple of the preset row number in the data to be inserted into the data area with the quantity of the integral multiple and the row number of the preset row number according to each column.
Further, the data to be inserted is row storage data, and the structure of the insertion auxiliary table is the same as that of the data to be inserted.
Further, the method also comprises the following steps:
and the insertion table reforming module is used for writing the data in the insertion auxiliary table into the data file of the column storage table at preset time and emptying the insertion auxiliary table.
The data insertion device based on the column storage can execute the data insertion method based on the column storage provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to a column storage-based data insertion method provided in any embodiment of the present invention.
EXAMPLE five
Fifth, an embodiment of the present invention provides a server, which integrates the data insertion device based on the column storage according to any embodiment of the present invention. Specifically, as shown in fig. 5, an embodiment of the present invention provides a server, where the server includes:
one or more processors 510, one processor 510 being illustrated in FIG. 5;
a memory 520; and one or more modules.
The server may further include: an input device 530 and an output device 540. The processor 510, the memory 520, the input device 530 and the output device 540 in the terminal may be connected by a bus or other means, for example, in fig. 5.
The memory 520 is a computer-readable storage medium and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the column-based stored data insertion method in the embodiment of the present invention (for example, the first obtaining module 410, the second obtaining module 420, the first inserting module 430, and the second inserting module 440 shown in fig. 4). The processor 510 executes various functional applications of the terminal and data processing, i.e., implements the column storage-based data insertion method in the above-described method embodiments, by executing software programs, instructions, and modules stored in the memory 520.
The memory 520 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 520 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 520 may further include memory located remotely from the processor 510, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 530 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the terminal. The output device 540 may include a display device such as a display screen.
The terminal can execute the data insertion method based on the column storage provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
The sixth embodiment of the present invention further provides a storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for querying data based on column storage, as provided in all the embodiments of the present invention of the present application, is implemented:
that is, the program when executed by the processor implements:
acquiring data to be inserted of a column storage table;
acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset line number;
if the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, inserting the data to be inserted into the insertion auxiliary table;
if the total row number of the data to be inserted and the data in the auxiliary insertion table is greater than or equal to the preset row number, storing each column of the data to be inserted and the data to be inserted into the auxiliary insertion table into one or more data areas according to the preset row number, emptying the auxiliary insertion table, and storing the rest data in the data to be inserted into the auxiliary insertion table;
and storing the one or more data areas as a data file of the column storage table, acquiring the control information and the statistical information of each data area, and inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A method for column storage based data insertion, the method comprising:
acquiring data to be inserted of a column storage table;
acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset line number;
if the total row number of the data to be inserted and the data in the insertion auxiliary table is less than the preset row number, inserting the data to be inserted into the insertion auxiliary table;
if the total row number of the data to be inserted and the data in the auxiliary insertion table is greater than or equal to the preset row number, storing each column of the data to be inserted and the data to be inserted into the auxiliary insertion table into one or more data areas according to the preset row number, emptying the auxiliary insertion table, and storing the rest data in the data to be inserted into the auxiliary insertion table;
storing the one or more data areas as data files of the column storage table, acquiring control information and statistical information of each data area, and inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table;
the data to be inserted is row storage data, and the structure of the insertion auxiliary table is the same as that of the data to be inserted.
2. The method according to claim 1, wherein if the total number of rows of the data to be inserted and the data in the insertion auxiliary table is greater than or equal to the preset number of rows, storing each column of the data to be inserted and the data to be inserted in the insertion auxiliary table as one or more data areas according to the preset number of rows, emptying the insertion auxiliary table, and storing the remaining data in the data to be inserted into the insertion auxiliary table, includes:
if the data in the insertion auxiliary table is empty, storing the integral multiple of the data with the preset number of rows in the data to be inserted into a data area with the quantity of the integral multiple and the number of rows of the data to be inserted into the preset number of rows according to each column, and storing the residual data in the data to be inserted into the insertion auxiliary table;
if the data in the auxiliary insertion table is not empty, extracting the data in the auxiliary insertion table and the data in the front of the data to be inserted to form integral multiple preset line number data, storing the integral multiple data and the preset line number data in each column, emptying the auxiliary insertion table, and inserting the residual data in the data to be inserted into the auxiliary insertion table.
3. The method according to claim 2, wherein extracting data in which the data inserted into the data table and the preceding data in the data to be inserted constitute an integer multiple of a predetermined number of rows, and storing the data in each column as a data area having the integer multiple of the number of rows as the predetermined number of rows, comprises:
if the total row number of the data inserted into the data table and the data to be inserted is greater than or equal to the preset row number and less than twice of the preset row number, extracting the data inserted into the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, and storing the data as a data area according to each column;
if the total row number of the data to be inserted and the data to be inserted in the data table is larger than or equal to two times of the preset row number, extracting the data to be inserted in the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, storing the data with the residual integral multiple of the preset row number in the data to be inserted into a data area according to each column, and storing the data with the residual integral multiple of the preset row number in the data to be inserted into the data area with the quantity of the residual integral multiple and the row number of the preset row number according to each column.
4. The method of claim 1, further comprising:
and writing the data in the auxiliary insertion table into the data file of the column storage table at preset time, and emptying the auxiliary insertion table.
5. A data insertion device based on column storage, the device comprising:
the first data acquisition module is used for acquiring data to be inserted of the column storage table;
the second data acquisition module is used for acquiring data in an insertion auxiliary table corresponding to the column storage table according to the data to be inserted, wherein the insertion auxiliary table is used for recording insertion data smaller than a preset line number;
the data inserting module is used for inserting the data to be inserted into the auxiliary inserting table if the total row number of the data to be inserted and the data in the auxiliary inserting table is smaller than the preset row number;
the data storage module is used for storing the data to be inserted and the data to be inserted into each column into one or more data areas according to the preset number of rows, emptying the insertion auxiliary table and storing the residual data in the data to be inserted into the insertion auxiliary table if the total number of rows of the data to be inserted and the data in the insertion auxiliary table is greater than or equal to the preset number of rows;
the file saving module is used for saving the one or more data areas as the data files of the column storage table; acquiring control information and statistical information of each data area; inserting the control information and the statistical information of each data area into a column storage auxiliary table corresponding to the column storage table;
the data to be inserted is row storage data, and the structure of the insertion auxiliary table is the same as that of the data to be inserted.
6. The apparatus of claim 5, wherein the data storage module comprises:
a first data storage unit, configured to store, if the data in the insertion auxiliary table is empty, the data of the previous integer multiple of the preset number of rows in the data to be inserted as a data area of which the number is the integer multiple and the number of rows is the preset number of rows according to each column, and store the remaining data in the data to be inserted into the insertion auxiliary table;
and the second data storage unit is used for extracting data in the inserted data table and data forming an integral multiple of a preset number of rows by the previous data in the data to be inserted if the data in the inserted auxiliary table is not empty, storing the data in a data area with the integral multiple of the number of rows as the preset number of rows according to each column, emptying the inserted auxiliary table, and inserting the residual data in the data to be inserted into the inserted auxiliary table.
7. The apparatus of claim 6, wherein the second data storage unit is specifically configured to:
if the total row number of the data inserted into the data table and the data to be inserted is greater than or equal to the preset row number and less than twice of the preset row number, extracting the data inserted into the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, and storing the data as a data area according to each column;
if the total row number of the data to be inserted and the data to be inserted are larger than two times of the preset row number, extracting the data to be inserted in the auxiliary table and the previous data in the data to be inserted to form data with the preset row number, storing the data with the residual integral multiple of the preset row number in the data to be inserted into a data area according to each column, and storing the data with the residual integral multiple of the preset row number in the data to be inserted into the data area with the quantity of the integral multiple and the row number of the preset row number according to each column.
8. A server, characterized in that the server comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the column storage based data insertion method of any of claims 1-4.
9. A computer storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of column storage based data insertion according to any one of claims 1 to 4.
CN201810749909.5A 2018-07-10 2018-07-10 Data insertion method and device based on column storage, server and storage medium Active CN109033271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810749909.5A CN109033271B (en) 2018-07-10 2018-07-10 Data insertion method and device based on column storage, server and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810749909.5A CN109033271B (en) 2018-07-10 2018-07-10 Data insertion method and device based on column storage, server and storage medium

Publications (2)

Publication Number Publication Date
CN109033271A CN109033271A (en) 2018-12-18
CN109033271B true CN109033271B (en) 2021-03-02

Family

ID=64641397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810749909.5A Active CN109033271B (en) 2018-07-10 2018-07-10 Data insertion method and device based on column storage, server and storage medium

Country Status (1)

Country Link
CN (1) CN109033271B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129458A (en) * 2011-03-09 2011-07-20 胡劲松 Method and device for storing relational database
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method
CN103177055A (en) * 2011-12-22 2013-06-26 Sap股份公司 Hybrid database table stored as both row and column store
CN103870483A (en) * 2012-12-13 2014-06-18 厦门雅迅网络股份有限公司 Method for dynamically adjusting batch stored data in memory space

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080162298A1 (en) * 2000-06-15 2008-07-03 American Express Travel Related Services Company, Inc. Online ordering system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129458A (en) * 2011-03-09 2011-07-20 胡劲松 Method and device for storing relational database
CN103177055A (en) * 2011-12-22 2013-06-26 Sap股份公司 Hybrid database table stored as both row and column store
CN102609491A (en) * 2012-01-20 2012-07-25 东华大学 Column-storage oriented area-level data compression method
CN103870483A (en) * 2012-12-13 2014-06-18 厦门雅迅网络股份有限公司 Method for dynamically adjusting batch stored data in memory space

Also Published As

Publication number Publication date
CN109033271A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
US11636083B2 (en) Data processing method and apparatus, storage medium and electronic device
CN108959587B (en) Data updating method and device based on column storage, server and storage medium
US9069818B2 (en) Textual search for numerical properties
CN111258966A (en) Data deduplication method, device, equipment and storage medium
US10296497B2 (en) Storing a key value to a deleted row based on key range density
CN108875077B (en) Column storage method and device of database, server and storage medium
CN111061758B (en) Data storage method, device and storage medium
CN105373541A (en) Processing method and system for data operation request of database
CN109582231B (en) Data storage method and device, electronic equipment and storage medium
CN105117433A (en) Method and system for statistically querying HBase based on analysis performed by Hive on HFile
CN111666344B (en) Heterogeneous data synchronization method and device
CN111694866A (en) Data searching and storing method, data searching system, data searching device, data searching equipment and data searching medium
CN109815241B (en) Data query method, device, equipment and storage medium
CN114925101A (en) Data processing method and device, storage medium and electronic equipment
CN107862082B (en) High concurrency counting method based on MySQL counter table and web server
US11789940B2 (en) Natural language interface to databases
CN110222046B (en) List data processing method, device, server and storage medium
CN109033271B (en) Data insertion method and device based on column storage, server and storage medium
CN109697234B (en) Multi-attribute information query method, device, server and medium for entity
CN111858393A (en) Memory page management method, memory page management device, medium and electronic device
CN111401934A (en) Distributed advertisement statistical method and device
CN108984720B (en) Data query method and device based on column storage, server and storage medium
CN108984719B (en) Data deleting method and device based on column storage, server and storage medium
CN112632266B (en) Data writing method and device, computer equipment and readable storage medium
CN112835905A (en) Indexing method, device, equipment and storage medium for array type column

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant