CN107391544A - Processing method, device, equipment and the computer storage media of column data storage - Google Patents

Processing method, device, equipment and the computer storage media of column data storage Download PDF

Info

Publication number
CN107391544A
CN107391544A CN201710374036.XA CN201710374036A CN107391544A CN 107391544 A CN107391544 A CN 107391544A CN 201710374036 A CN201710374036 A CN 201710374036A CN 107391544 A CN107391544 A CN 107391544A
Authority
CN
China
Prior art keywords
data
newly
file
raw data
increased
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710374036.XA
Other languages
Chinese (zh)
Other versions
CN107391544B (en
Inventor
孙垚光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710374036.XA priority Critical patent/CN107391544B/en
Publication of CN107391544A publication Critical patent/CN107391544A/en
Application granted granted Critical
Publication of CN107391544B publication Critical patent/CN107391544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of processing method and processing device of column data storage, and methods described includes:The newly-increased data for raw data file are received, the raw data file is stored using column storage format, and the Footer information of the raw data file uses the Footer files independently of the raw data file to be recorded;The newly-increased data are write to the afterbody of the raw data file according to column storage format, and increase is for the Footer information of the newly-increased data in the Footer files, raw data file and Footer files after being updated.The embodiment of the present application can realize efficient streaming supplemental data for column data storage, newly-increased data can be appended in raw data file, and by the way of being recorded using new files, therefore treatment effeciency is higher, occupancy resource is less, and data query speed is faster.

Description

Processing method, device, equipment and the computer storage media of column data storage
Technical field
The application is related to field of computer technology, more particularly to the processing method of column data storage, device, equipment and meter Calculation machine storage medium.
Background technology
Data line is continuously deposited difference by column storage format and traditional line storage, and column storage format is by data The partial data value (or all data values) of a certain row serializes Coutinuous store together in file, then stores another row again Partial data value (or all data values), data trailer can be with the storage format descriptions of Footer information, the i.e. data file Information, including number, relative position, data type information or statistical information of documentary metadata information, row etc..
Often it is related to the demand to former data file increase new data in practical application, is being got newly in correlation technique During data, typically new data is stored by the way of a newly-built data file.But newdata file can take certain money Source, data-handling efficiency is influenceed, and when facing data query requirements, it is necessary in former data file and newdata file Inquire about respectively, search efficiency is relatively low.
The content of the invention
To overcome problem present in correlation technique, this application provides column stored data processing method, device, equipment And computer storage media.
According to the first aspect of the embodiment of the present application, there is provided a kind of processing method of column data storage, methods described bag Include:
The newly-increased data for raw data file are received, the raw data file is deposited using column storage format Storage, the Footer information of the raw data file use the Footer files independently of the raw data file to be remembered Record;
The newly-increased data are write to the afterbody of the raw data file, and described according to column storage format Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.
In an optional implementation, after newly-increased data of the reception for raw data file, the side Method includes:
The newly-increased data of reception are loaded onto in high speed storing space;
It is described to write the newly-increased data to the afterbody of the raw data file according to column storage format, including:
When the newly-increased data loaded in the high speed storing space meet default storage condition, according to column storage format The newly-increased data of the loading are write to the afterbody of the raw data file.
In an optional implementation, the default storage condition includes following one or more conditions:
The data volume of the newly-increased data reaches default data-quantity threshold;Or,
Loading duration of the newly-increased data in high speed storing space reaches default duration threshold value.
In an optional implementation, methods described also includes:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, Read in the raw data file and meet the first data of the request, and loaded in the high speed storing space new Increase the second data that digital independent meets the request;
Exported after first data and the second data are merged.
In an optional implementation, methods described also includes:
The wave file of the newly-increased data of the loading is generated, and the wave file of the newly-increased data of the loading is stored Under the wave file identical catalogue with the raw data file.
According to the second aspect of the embodiment of the present application, there is provided a kind of processing unit of column data storage, including:
Data reception module, it is used for:The newly-increased data for raw data file are received, the raw data file uses Column storage format is stored, and the Footer information of the raw data file is used independently of the raw data file Footer files are recorded;
Data write. module, it is used for:The newly-increased data are write to initial data text according to column storage format The afterbody of part, and increase is for the Footer information of the newly-increased data in the Footer files, the original after being updated Beginning data file and Footer files.
In an optional implementation, the data reception module, it is additionally operable to:
After newly-increased data of the reception for raw data file, the newly-increased data of reception are loaded onto high speed storing In space;
The Data write. module, is specifically used for:
When the newly-increased data loaded in the high speed storing space meet default storage condition, according to column storage format The newly-increased data of the loading are write to the afterbody of the raw data file.
In an optional implementation, the default storage condition includes following one or more conditions:
The data volume of the newly-increased data reaches default data-quantity threshold;Or,
Loading duration of the newly-increased data in high speed storing space reaches default duration threshold value.
In an optional implementation, described device also includes read module, is used for:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, Read in the raw data file and meet the first data of the request, and loaded in the high speed storing space new Increase the second data that digital independent meets the request;
Exported after first data and the second data are merged.
In an optional implementation, described device also includes replica processes module, is used for:
The wave file of the newly-increased data of the loading is generated, and the wave file of the newly-increased data of the loading is stored Under the wave file identical catalogue with the raw data file
According to the third aspect of the embodiment of the present application, there is provided a kind of computer equipment, including:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
The newly-increased data for raw data file are received, the raw data file is deposited using column storage format Storage, the Footer information of the raw data file use the Footer files independently of the raw data file to be remembered Record;
The newly-increased data are write to the afterbody of the raw data file, and described according to column storage format Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.
According to the fourth aspect of the embodiment of the present application, there is provided a kind of computer-readable storage medium, store in the storage medium There is programmed instruction, described program instruction includes:
The newly-increased data for raw data file are received, the raw data file is deposited using column storage format Storage, the Footer information of the raw data file use the Footer files independently of the raw data file to be remembered Record;
The newly-increased data are write to the afterbody of the raw data file, and described according to column storage format Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.
The technical scheme that embodiments herein provides can include the following benefits:
In the application, tail of file is stored in different from the Footer information of raw data file in correlation technique, but Recorded using an independent Footer files, therefore newly-increased data can directly be appended to the afterbody of raw data file, And the Footer information of newly-increased data then records in Footer files.The embodiment of the present application can be directed to column storage number Factually existing efficient streaming supplemental data, newly-increased data can be appended in raw data file, without being entered using new files The mode of row record, therefore treatment effeciency is higher, occupancy resource is less, and data query speed is faster.
It should be appreciated that the general description and following detailed description of the above are only exemplary and explanatory, not The application can be limited.
Brief description of the drawings
Accompanying drawing herein is merged in specification and forms the part of this specification, shows the implementation for meeting the application Example, and be used to together with specification to explain the principle of the application.
Figure 1A is a kind of schematic diagram of column storage format in correlation technique.
Figure 1B is the schematic diagram of another column storage format in correlation technique.
Fig. 2 is a kind of flow of the processing method of column data storage of the application according to an exemplary embodiment Figure.
Fig. 3 A are a kind of applications of the processing method of column data storage of the application according to an exemplary embodiment Scene graph.
Fig. 3 B are the schematic diagrames that a kind of newly-increased data of the application according to an exemplary embodiment are carried in internal memory.
Fig. 4 is a kind of hardware structure diagram of computer equipment where the processing unit of the application column data storage.
Fig. 5 is a kind of block diagram of the processing unit of column data storage of the application according to an exemplary embodiment.
Embodiment
Here exemplary embodiment will be illustrated in detail, its example is illustrated in the accompanying drawings.Following description is related to During accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represent same or analogous key element.Following exemplary embodiment Described in embodiment do not represent all embodiments consistent with the application.On the contrary, they be only with it is such as appended The example of the consistent apparatus and method of some aspects be described in detail in claims, the application.
It is only merely for the purpose of description specific embodiment in term used in this application, and is not intended to be limiting the application. " one kind " of singulative used in the application and appended claims, " described " and "the" are also intended to including majority Form, unless context clearly shows that other implications.It is also understood that term "and/or" used herein refers to and wrapped Containing the associated list items purpose of one or more, any or all may be combined.
It will be appreciated that though various information, but this may be described using term first, second, third, etc. in the application A little information should not necessarily be limited by these terms.These terms are only used for same type of information being distinguished from each other out.For example, do not departing from In the case of the application scope, the first information can also be referred to as the second information, and similarly, the second information can also be referred to as One information.Depending on linguistic context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determining ".
The column storage format of file is illustrated first.As shown in Figure 1A, it is a kind of column storage in correlation technique The schematic diagram of form, the column storage format in Figure 1A be by the total data value sequenceization of a certain row in data file together Then Coutinuous store stores all data values of another row again in disk;Column storage format can also be for a certain row Partial data value, be the schematic diagram of another column storage format in correlation technique as shown in Figure 1B, by overall number in Figure 1B It is that (such as the RowGroup concepts in Parquet, the quantity of piecemeal can also be other numbers for two piecemeals according to first divided by row Value), the data in each piecemeal enter determinant storage again.
In addition, data trailer with Footer information, can include the metadata information of data file, the number of row, phase To position, data type information or statistical information etc..Because column storage format is by the whole of a certain row in data file Or partial data to serialize storage, therefore Footer information be used for log file store when relevant information, for data Read.
Often it is related to the demand to former data file increase new data in practical application, is being got newly in correlation technique During data, because data trailer is with Footer information, therefore the mode of the newly-built data file of generally use stores new data.
The Footer information that the embodiment of the present application is different from raw data file in correlation technique is stored in tail of file, and It is to be recorded using an independent Footer files, therefore newly-increased data can directly be appended to the tail of raw data file Portion, and the Footer information of newly-increased data is then recorded in Footer files.The embodiment of the present application can be directed to column storage number Factually existing efficient streaming supplemental data, newly-increased data can be appended in raw data file, without being entered using new files The mode of row record, therefore treatment effeciency is higher, occupancy resource is less, and data query speed is faster.Next it is real to the application Example is applied to be described in detail.
As shown in Fig. 2 Fig. 2 is a kind of processing side of column data storage of the application according to an exemplary embodiment The flow chart of method, the embodiment of the present application can be applied in the database using column storage format, comprise the following steps 201 to 202:
In step 201, the newly-increased data for raw data file are received, the raw data file is deposited using column Storage form is stored, and the Footer information of the raw data file uses the Footer independently of the raw data file File is recorded.
In step 202, the newly-increased data are write to the tail of the raw data file according to column storage format Portion, and increase is for the Footer information of the newly-increased data in the Footer files, the initial data after being updated File and Footer files.
For the raw data file of column storage, original is recorded by the way of newly-built Footer files in the present embodiment The Footer information of beginning data file.By above-mentioned processing, when needing newly-increased data, newly-increased data can be directly appended to The afterbody of raw data file, as deblocking new in raw data file, newly-increased data can be still write to original number According in file, and realize efficient stream data and add.Afterwards, Footer information is carried out more in Footer files Newly.
In practical application, the raw data file for having entered determinant storage, raw data file has been stored to disk In.In some examples, when getting newly-increased data, newly-increased data can in real time be obtained and write in real time to initial data text In part.In other examples, the data volume for increasing data newly may be larger, and can persistently acquire, in order to improve at data Efficiency is managed, after newly-increased data of the reception for raw data file, methods described can also include:
The newly-increased data of reception are loaded onto in high speed storing space.
It is described to write the newly-increased data to the afterbody of the raw data file according to column storage format, including:
When the newly-increased data loaded in the high speed storing space meet default storage condition, according to column storage format The newly-increased data of the loading are write to the afterbody of the raw data file.
Wherein, the high speed storing space can include the space of the temporary transient storage data such as internal memory or caching or to data The cushion space of exchange, can specifically be determined according to the hardware environment where actual database system, the present embodiment to this not It is construed as limiting.
By the above-mentioned means, newly-increased data can be temporarily loaded onto in the high speed memory space such as internal memory, specifically, loading Mode to internal memory can use line to store.Unify to write newly-increased data to raw data file again afterwards, therefore can be with Improve data-handling efficiency.Wherein, storage condition sign is preset to write newly-increased data to original number from high speed storing space , can be with flexible configuration in practical application according to the opportunity of file, such as default storage condition can currently make in high speed storing space When reaching higher utilization rate with rate by the newly-increased data supplementing of loading into raw data file, to prevent internal memory or caching etc. from overflowing Go out to cause loss of data;Either the currently used rate in high speed storing space in relatively low utilization rate by the newly-increased data supplementing of loading Into raw data file, to realize processing data etc. various ways in the case where hardware is in idle.
In an optional implementation, the default storage condition can include following one or more conditions:
The first, the data volume of the newly-increased data reach default data-quantity threshold.In such a mode, storage bar is preset Part using data volume as Consideration, can be when newly-increased Data Data amount be larger, in time by newly-increased data supplementing to original In data file, caused by preventing data volume larger the problems such as loss of data.Wherein, data-quantity threshold can be clever as needed Configuration living, the present embodiment are not construed as limiting to this.
Secondth, loading duration of the newly-increased data in internal memory reaches default duration threshold value.In such a mode, in advance If storage condition, as Consideration, can be carried in certain time in internal memory using the loading duration of data in newly-increased data Afterwards, in time by newly-increased data supplementing into raw data file, loss of data etc. caused by preventing the data load time longer is asked Topic.Wherein, duration threshold value can as needed and flexible configuration, the present embodiment are not construed as limiting to this.
Pass through above two mode, it may be determined that a rational opportunity chases after the newly-increased data loaded in high speed storing space Add in raw data file, Consideration be used as using data volume and/or duration, you can prevent comparatively fast by newly-increased data supplementing extremely Resource consumption caused by raw data file, additional newly-increased data in time can also be accomplished, the problems such as preventing loss of data.
In correlation technique, the hardware such as some internal memories or caching may require data block storage, pin when loading data It is described that the newly-increased data are loaded onto in internal memory in the present embodiment to such a situation, it can include:With default data block Size is unit, and the newly-increased data are split as into one or more data blocks is carried in the internal memory.
In Database Systems, for increasing data newly, it will usually be related to the problem of data are visible ageing.For example, when After newly-increased data are loaded onto internal memory, user needs to inquire about some data, is typically employed in correlation technique in raw data file Inquire about the processing mode of data.And now increasing data newly may also be carried in internal memory, also have not enough time to be stored in disk, because If some data that this user needs to inquire about also are carried in internal memory, this partial data does not export, and causes output to user's Data are not comprehensive, cause to load the visible ageing poor of newly-increased data.For this problem, the method for the embodiment of the present application may be used also Including:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, Read in the raw data file and meet the first data of the request, and loaded in the high speed storing space new Increase the second data that digital independent meets the request.
Exported after first data and the second data are merged.
, on the one hand can be when getting the data inquiry request for the raw data file in the present embodiment The first data for meeting request are read in the raw data file stored in disk, on the other hand, it is also necessary to deposited in the high speed The second data for meeting the request are read in the newly-increased data loaded in storage space.Merge the first data and the second number afterwards According to being exported using the data after merging as the response data to data inquiry request.It is described by the above-mentioned means, due to for The newly-increased data loaded in high speed storing space can also be inquired about, therefore can prevent that output is incomplete to the data of user Problem, data it is visible ageing higher.
In practical application, in order to prevent loss of data, it will usually to raw data file ghost file to enter line number According to backup.If the newly-increased data received are loaded onto in high speed storing space, the newly-increased data of loading are not write temporarily to initial data In file, it is therefore possible to occur the situation of loss of data, in the present embodiment, methods described also includes:
The wave file of the newly-increased data of the loading is generated, and the wave file of the newly-increased data of the loading is stored Under the wave file identical catalogue with the raw data file.
By the above-mentioned means, due to generating corresponding wave file for the newly-increased data of loading, therefore can prevent Loss of data, in addition, loading newly-increased data wave file be stored in it is identical with the wave file of the raw data file Catalogue under so that the wave file of newly-increased data is corresponding with the wave file of raw data file, be easy to data recovery.
Scheme provided herein is described in detail again followed by a specific embodiment.
As shown in Figure 3A, it is a kind of processing method of column data storage of the application according to an exemplary embodiment Application scenario diagram, Fig. 3 A include a data base management system, the data base management system can be Parquet or Orcfile etc. supports the data base management system of column storage format.The column data storage that the embodiment of the present application is provided Processing scheme can be independent as one module or process, run in the data base management system, with to column data storage Handled.
As shown in Figure 3A, safeguard there is a raw data file in data base management system, the raw data file is with column Storage format is stored in some position in disk.Mode in the raw data file such as Figure 1B is stored, and overall data is first pressed Go and be divided into multiple piecemeals (RowGroup), the data in each piecemeal enter determinant storage again, the raw data file Footer information uses the Footer files independently of the raw data file to be recorded.
In some period, persistently inputted to data base management system for the newly-increased data of raw data file.This reality Applying can continue to receive newly-increased data in example, and the newly-increased data received are deposited in internal memory.According to the loading machine of internal memory System, is the signal that a kind of newly-increased data of the application according to an exemplary embodiment are carried in internal memory as shown in Figure 3 B Figure, newly-increased data write (write) to internal memory by way of streaming adds (AppendRecord), can be to scheme in internal memory DataBlock (data block) shown in 3B is present, when the total amount of data of DataBlock corresponding to newly-increased data reaches certain Size, or DataBlock loading duration reach certain time length, can call the bottom interface of data base management system, will DataBlock corresponding to the newly-increased data is appended to the tail end of raw data file, forms a new RowGroup.Wherein, When newly-increased data are corresponding with more DataBlock, or meet other conditions, multiple DataBlock can also be merged (compaction) tail end of raw data file is appended to after again, to reduce DataBlock quantity, specifically whether is used This can be not construed as limiting with flexible configuration, the present embodiment in DataBlock processing mode practical application.On the other hand, at this Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.Wherein, because record has Footer information, the newly-increased data of reception can be directly appended in original without row Sequence.
After data write-in internal memory, it is contemplated that data loss problem, therefore the processing of the more copies of data can be carried out, for The newly-increased data loaded in internal memory, corresponding copy can be generated, and it is consistent with the deposit position of the copy of raw data file, It that is to say and the wave file of newly-increased data is stored under the wave file identical catalogue with the raw data file.
In data read process, in the present embodiment in addition to normal disk file is read, it is also necessary to which consideration loads on Newly-increased data in internal memory, it is the visible ageing key of lifting data herein.When receiving data inquiry request, Ke Yiru Read operation (Reader) is carried out shown in Fig. 3 B, request can be split as to two son requests:Disk file digital independent (DiskTableScan) and internal storage data reads (MemTableScan), after two parts data result is merged, is used as this The returning result of request, in Fig. 3 B by taking Query Engine data query languages as an example, carry out table scan (TableScan) and obtain Query Result.
Corresponding with the embodiment of the processing method of foregoing column data storage, present invention also provides column data storage Processing unit and its computer equipment applied embodiment.
The embodiment of the processing unit of the application column data storage can be applied in computer equipment.Device embodiment can To be realized by software, can also be realized by way of hardware or software and hardware combining.Exemplified by implemented in software, as one Device on logical meaning, being will be corresponding in nonvolatile memory by the processor of the processing of column data storage where it Computer program instructions read in internal memory what operation was formed.For hardware view, as shown in figure 4, being the application column A kind of hardware structure diagram of computer equipment where the processing unit of data storage, except processor 410, the internal memory shown in Fig. 4 430th, outside network interface 420 and nonvolatile memory 440, the computer equipment in embodiment where device 431 is usual According to the actual functional capability of the computer equipment, other hardware can also be included, this is repeated no more.
As shown in figure 5, Fig. 5 is a kind of processing dress of column data storage of the application according to an exemplary embodiment The block diagram put, including:
Data reception module 51, is used for:The newly-increased data for raw data file are received, the raw data file is adopted Stored with column storage format, the Footer information of the raw data file is used independently of the raw data file Footer files recorded.
Data write. module 52, is used for:The newly-increased data are write to the initial data according to column storage format The afterbody of file, and increase is directed to the Footer information of the newly-increased data in the Footer files, after being updated Raw data file and Footer files.
In an optional implementation, the data reception module 51, it is additionally operable to:
After newly-increased data of the reception for raw data file, the newly-increased data of reception are loaded onto high speed storing In space;
The Data write. module 52, is specifically used for:
When the newly-increased data loaded in the high speed storing space meet default storage condition, according to column storage format The newly-increased data of the loading are write to the afterbody of the raw data file.
In an optional implementation, the default storage condition includes following one or more conditions:
The data volume of the newly-increased data reaches default data-quantity threshold;Or,
Loading duration of the newly-increased data in high speed storing space reaches default duration threshold value.
In an optional implementation, described device also includes read module, is used for:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, Read in the raw data file and meet the first data of the request, and loaded in the high speed storing space new Increase the second data that digital independent meets the request;
Exported after first data and the second data are merged.
In an optional implementation, described device also includes replica processes module, is used for:
The wave file of the newly-increased data of the loading is generated, and the wave file of the newly-increased data of the loading is stored Under the wave file identical catalogue with the raw data file
According to the third aspect of the embodiment of the present application, there is provided a kind of processing unit of column data storage, including:Processing Device;For storing the memory of processor-executable instruction;Wherein, the processor is configured as:
The newly-increased data for raw data file are received, the raw data file is deposited using column storage format Storage, the Footer information of the raw data file use the Footer files independently of the raw data file to be remembered Record.
The newly-increased data are write to the afterbody of the raw data file, and described according to column storage format Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.
The function of modules and the implementation process of effect specifically refer in the processing unit of above-mentioned column data storage The implementation process that step is corresponded in the processing method of column data storage is stated, will not be repeated here.
Accordingly, the application also provides a kind of computer-readable storage medium, and have program stored therein instruction in the storage medium, institute Stating programmed instruction includes:
The newly-increased data for raw data file are received, the raw data file is deposited using column storage format Storage, the Footer information of the raw data file use the Footer files independently of the raw data file to be remembered Record;
The newly-increased data are write to the afterbody of the raw data file, and described according to column storage format Increase is for the Footer information of the newly-increased data in Footer files, raw data file and Footer after being updated File.
The embodiment of the present application can use the storage medium for wherein including program code in one or more (including but unlimited In magnetic disk storage, CD-ROM, optical memory etc.) on the form of computer program product implemented.Computer can use storage Medium includes permanent and non-permanent, removable and non-removable media, can realize information by any method or technique Storage.Information can be computer-readable instruction, data structure, the module of program or other data.The storage medium of computer Example include but is not limited to:Phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus or any other non-biography Defeated medium, the information that can be accessed by a computing device available for storage.
For device embodiment, because it corresponds essentially to embodiment of the method, so related part is real referring to method Apply the part explanation of example.Device embodiment described above is only schematical, wherein described be used as separating component The module of explanation can be or may not be physically separate, can be as the part that module is shown or can also It is not physical module, you can with positioned at a place, or can also be distributed on multiple mixed-media network modules mixed-medias.Can be according to reality Need to select some or all of module therein to realize the purpose of application scheme.Those of ordinary skill in the art are not paying In the case of going out creative work, you can to understand and implement.
Those skilled in the art will readily occur to the application its after considering specification and putting into practice the invention applied here Its embodiment.The application is intended to any modification, purposes or the adaptations of the application, these modifications, purposes or Person's adaptations follow the general principle of the application and the common knowledge in the art do not applied including the application Or conventional techniques.Description and embodiments are considered only as exemplary, and the true scope of the application and spirit are by following Claim is pointed out.
It should be appreciated that the precision architecture that the application is not limited to be described above and is shown in the drawings, and And various modifications and changes can be being carried out without departing from the scope.Scope of the present application is only limited by appended claim.
The preferred embodiment of the application is the foregoing is only, not limiting the application, all essences in the application God any modification, equivalent substitution and improvements done etc., should be included within the scope of the application protection with principle.

Claims (12)

1. a kind of processing method of column data storage, methods described include:
The newly-increased data for raw data file are received, the raw data file is stored using column storage format, The Footer information of the raw data file uses the Footer files independently of the raw data file to be recorded;
The newly-increased data are write to the afterbody of the raw data file according to column storage format, and in the Footer Increase is for the Footer information of the newly-increased data in file, raw data file and Footer files after being updated.
2. according to the method for claim 1, after newly-increased data of the reception for raw data file, methods described Including:
The newly-increased data of reception are loaded onto in high speed storing space;
It is described to write the newly-increased data to the afterbody of the raw data file according to column storage format, including:
When the newly-increased data that are loaded in the high speed storing space meet default storage condition, according to column storage format by institute The newly-increased data for stating loading are write to the afterbody of the raw data file.
3. according to the method for claim 2, the default storage condition includes following one or more conditions:
The data volume of the newly-increased data reaches default data-quantity threshold;Or,
Loading duration of the newly-increased data in high speed storing space reaches default duration threshold value.
4. according to the method for claim 2, methods described also includes:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, in institute State the first data for being read in raw data file and meeting the request, and the newly-increased number loaded in the high speed storing space The second data for meeting the request according to reading;
Exported after first data and the second data are merged.
5. according to the method for claim 2, methods described also includes:
Generate the wave file of the newly-increased data of the loading, and by the wave file of the newly-increased data of the loading be stored in Under the wave file identical catalogue of the raw data file.
6. a kind of processing unit of column data storage, described device include:
Data reception module, it is used for:The newly-increased data for raw data file are received, the raw data file uses column Storage format is stored, and the Footer information of the raw data file is used independently of the raw data file Footer files are recorded;
Data write. module, it is used for:The newly-increased data are write to the raw data file according to column storage format Afterbody, and increase is for the Footer information of the newly-increased data in the Footer files, the original number after being updated According to file and Footer files.
7. device according to claim 6, the data reception module, are additionally operable to:
After newly-increased data of the reception for raw data file, the newly-increased data of reception are loaded onto high speed storing space In;
The Data write. module, is specifically used for:
When the newly-increased data that are loaded in the high speed storing space meet default storage condition, according to column storage format by institute The newly-increased data for stating loading are write to the afterbody of the raw data file.
8. device according to claim 7, the default storage condition includes following one or more conditions:
The data volume of the newly-increased data reaches default data-quantity threshold;Or,
Loading duration of the newly-increased data in high speed storing space reaches default duration threshold value.
9. device according to claim 7, described device also includes read module, is used for:
When getting the data inquiry request for the raw data file, according to the Footer files before renewal, in institute State the first data for being read in raw data file and meeting the request, and the newly-increased number loaded in the high speed storing space The second data for meeting the request according to reading;
Exported after first data and the second data are merged.
10. device according to claim 7, described device also includes replica processes module, is used for:
Generate the wave file of the newly-increased data of the loading, and by the wave file of the newly-increased data of the loading be stored in Under the wave file identical catalogue of the raw data file.
11. a kind of computer equipment, including:
Processor;
For storing the memory of processor-executable instruction;
Wherein, the processor is configured as:
The newly-increased data for raw data file are received, the raw data file is stored using column storage format, The Footer information of the raw data file uses the Footer files independently of the raw data file to be recorded;
The newly-increased data are write to the afterbody of the raw data file according to column storage format, and in the Footer Increase is for the Footer information of the newly-increased data in file, raw data file and Footer files after being updated.
12. a kind of computer-readable storage medium, have program stored therein instruction in the storage medium, and described program instruction includes:
The newly-increased data for raw data file are received, the raw data file is stored using column storage format, The Footer information of the raw data file uses the Footer files independently of the raw data file to be recorded;
The newly-increased data are write to the afterbody of the raw data file according to column storage format, and in the Footer Increase is for the Footer information of the newly-increased data in file, raw data file and Footer files after being updated.
CN201710374036.XA 2017-05-24 2017-05-24 Processing method, device and equipment of column type storage data and computer storage medium Active CN107391544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710374036.XA CN107391544B (en) 2017-05-24 2017-05-24 Processing method, device and equipment of column type storage data and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710374036.XA CN107391544B (en) 2017-05-24 2017-05-24 Processing method, device and equipment of column type storage data and computer storage medium

Publications (2)

Publication Number Publication Date
CN107391544A true CN107391544A (en) 2017-11-24
CN107391544B CN107391544B (en) 2020-06-30

Family

ID=60338375

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710374036.XA Active CN107391544B (en) 2017-05-24 2017-05-24 Processing method, device and equipment of column type storage data and computer storage medium

Country Status (1)

Country Link
CN (1) CN107391544B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109068311A (en) * 2018-07-31 2018-12-21 Oppo广东移动通信有限公司 A kind of display methods, terminal and storage medium
CN109447183A (en) * 2018-11-27 2019-03-08 东软集团股份有限公司 Model training method, device, equipment and medium
CN109542889A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Stream data column storage method, device, equipment and storage medium
CN112181973A (en) * 2019-07-01 2021-01-05 北京涛思数据科技有限公司 Time sequence data storage method
CN112925672A (en) * 2021-02-08 2021-06-08 重庆紫光华山智安科技有限公司 Data recovery method, device, equipment and storage medium
CN114442940A (en) * 2022-01-04 2022-05-06 网易(杭州)网络有限公司 Data processing method, device, medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file
KR20150012869A (en) * 2013-07-26 2015-02-04 에스케이플래닛 주식회사 System for providing contents authoring tool, apparatus and method for providing authoring tool and storage medium recording program thereof
CN104866497A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Metadata updating method and device based on column storage of distributed file system as well as host
CN105683897A (en) * 2013-08-07 2016-06-15 桑迪士克科技股份有限公司 Data storage system with stale data mechanism and method of operation thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102929884A (en) * 2011-08-10 2013-02-13 阿里巴巴集团控股有限公司 Method and device for compressing virtual hard disk image file
KR20150012869A (en) * 2013-07-26 2015-02-04 에스케이플래닛 주식회사 System for providing contents authoring tool, apparatus and method for providing authoring tool and storage medium recording program thereof
CN105683897A (en) * 2013-08-07 2016-06-15 桑迪士克科技股份有限公司 Data storage system with stale data mechanism and method of operation thereof
CN104866497A (en) * 2014-02-24 2015-08-26 华为技术有限公司 Metadata updating method and device based on column storage of distributed file system as well as host

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109068311A (en) * 2018-07-31 2018-12-21 Oppo广东移动通信有限公司 A kind of display methods, terminal and storage medium
CN109542889A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Stream data column storage method, device, equipment and storage medium
CN109542889B (en) * 2018-10-11 2023-07-21 平安科技(深圳)有限公司 Stream data column storage method, device, equipment and storage medium
CN109447183A (en) * 2018-11-27 2019-03-08 东软集团股份有限公司 Model training method, device, equipment and medium
CN109447183B (en) * 2018-11-27 2020-10-16 东软集团股份有限公司 Prediction model training method, device, equipment and medium
CN112181973A (en) * 2019-07-01 2021-01-05 北京涛思数据科技有限公司 Time sequence data storage method
CN112181973B (en) * 2019-07-01 2023-05-30 北京涛思数据科技有限公司 Time sequence data storage method
CN112925672A (en) * 2021-02-08 2021-06-08 重庆紫光华山智安科技有限公司 Data recovery method, device, equipment and storage medium
CN114442940A (en) * 2022-01-04 2022-05-06 网易(杭州)网络有限公司 Data processing method, device, medium and electronic equipment

Also Published As

Publication number Publication date
CN107391544B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN107391544A (en) Processing method, device, equipment and the computer storage media of column data storage
US9916176B2 (en) Method and apparatus of accessing data of virtual machine
CN106201771B (en) Data-storage system and data read-write method
CN106484906B (en) Distributed object storage system flash-back method and device
CN105117351B (en) To the method and device of buffering write data
US10275481B2 (en) Updating of in-memory synopsis metadata for inserts in database table
CN106649828B (en) Data query method and system
US20140101167A1 (en) Creation of Inverted Index System, and Data Processing Method and Apparatus
CN101582076A (en) Data de-duplication method based on data base
KR101548689B1 (en) Method and apparatus for partial garbage collection in filesystems
CN103914483A (en) File storage method and device and file reading method and device
CN104899117A (en) Memory database parallel logging method for nonvolatile memory
CN111177143A (en) Key value data storage method and device, storage medium and electronic equipment
CN102929935B (en) A kind of Large Volume Data reading/writing method based on affairs
CN111309245A (en) Layered storage writing method and device, reading method and device and system
CN108304142A (en) A kind of data managing method and device
CN114924911B (en) Method, device, equipment and storage medium for backing up effective data of Windows operating system
CN115858471A (en) Service data change recording method, device, computer equipment and medium
CN106980616A (en) A kind of mass small documents merge storage method and system
US10063256B1 (en) Writing copies of objects in enterprise object storage systems
JP6695973B2 (en) Computer system and database management method
CN117539690B (en) Method, device, equipment, medium and product for merging and recovering multi-disk data
JPS593567A (en) Buffer number setting system of tree structure
JP2587417B2 (en) File backup and restoration method
US20230409235A1 (en) File system improvements for zoned storage device operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200921

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right