CN110196871A

CN110196871A - Data storage method and system

Info

Publication number: CN110196871A
Application number: CN201910173503.1A
Authority: CN
Inventors: 袁伟康; 徐承杰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-03-07
Filing date: 2019-03-07
Publication date: 2019-09-03

Abstract

The embodiment of the invention discloses a kind of data storage method and systems.Method includes: to receive data to be put in storage；According to the corresponding blocking information of the data to be put in storage, the data to be put in storage are divided into data block, and determine the data block corresponding target index in non-relational database；Block carries out write-in processing in the associated document of target index based on the data, obtains that the data block is corresponding to report document；The corresponding target partition of data block described in document to relevant database is reported described in synchronizing.Through the embodiment of the present invention, data to be put in storage are divided into data block, only treat the corresponding index of data block and subregion progress write operation that storage data are divided, reduces the writing range during data loading, to improve data loading efficiency.

Description

Data storage method and system

Technical field

The present invention relates to database technology more particularly to a kind of data storage method and systems.

Background technique

With the fast development of internet, the type of data and scale are all increasing at an amazing speed in internet, " big The arriving in data " epoch has become reality.

At present to the storage of data, unstructured data storage and storage organization number including storing unstructured data According to structural data storage.Wherein, unstructured data data at once, can be realized with two-dimentional table structure come logical expression Data are stored in non-relational database, such as: use JavaScript object numbered musical notation (JavaScript Object Notation, JSON) index data come by hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP) Distributed search server (Elastic Search, ES).For unstructured data, it has not been convenient to use database two Come the data that show i.e. referred to as unstructured data, office documents, text, picture including all formats can expand dimension logical table Open up markup language (Extensible Markup Language, XML), hypertext markup language (HyperText Markup Language, HTML), all kinds of reports, image and audio/visual information etc., be stored in relevant database, such as: base In the data warehouse Hive of Hadoop distributed file system (Hadoop Distributed File System, HDFS).

The prior art there is technical issues that data loading.

Summary of the invention

An embodiment of the present invention is intended to provide a kind of data storage method and systems, are able to ascend data loading efficiency.

The technical solution of the embodiment of the present invention is achieved in that

The embodiment of the present invention provides a kind of data storage method, comprising:

Receive data to be put in storage；According to the corresponding blocking information of the data to be put in storage, the data to be put in storage are divided For data block, and determine the data block corresponding target index in non-relational database；Wherein, point of different data block Block message is different；Block carries out write-in processing in the associated document of target index based on the data, obtains the number Document is reported according to block is corresponding；The corresponding target partition of data block described in document to relevant database is reported described in synchronizing.

The embodiment of the present invention provides a kind of Database Systems, comprising:

Data storage server, will according to the corresponding blocking information of the data to be put in storage for receiving data to be put in storage The data to be put in storage are divided into data block, and determine the data block corresponding target index in non-relational database；

The data storage server, be also used to block based on the data in the associated document of target index into Row write enters processing, obtains that the data block is corresponding to report document, and synchronizes and described report document into relevant database The corresponding target partition of the data block；

The non-relational database, for being stored in the data block in the associated document of target index Data；

The non-relational database, for reporting document described in the storage in the target partition.

The embodiment of the present invention provides a kind of data loading device, comprising:

Receiving unit, for receiving data to be put in storage；

The data to be put in storage are divided by blocking unit for the corresponding blocking information of data to be put in storage according to Data block, and determine the data block corresponding target index in non-relational database；Wherein, the piecemeal of different data block Information is different；

First storage unit carries out at write-in in the associated document of target index for block based on the data Reason, obtains that the data block is corresponding to report document；

Second storage unit described reports the corresponding target of data block described in document to relevant database for synchronizing Subregion.

The embodiment of the present invention also provides a kind of data loading device, comprising:

Memory, for storing executable instruction；

Processor when for executing the executable instruction stored in the memory, is realized provided in an embodiment of the present invention Data storage method.

The embodiment of the present invention provides a kind of computer readable storage medium, is stored in the computer readable storage medium Computer program when the computer program is executed by processor, realizes data storage method provided in an embodiment of the present invention.

The embodiment of the present invention has the advantages that

By the way that data to be put in storage are divided into data block, and according to the blocking information of each data block to non-relational database The corresponding target of middle data block indexes target partition corresponding with data block in relevant database and is operated, therefore, will be to Storage data are divided into data block, only treat the corresponding index of data block that storage data are divided and carry out write-in behaviour with subregion Make, reduces the writing range during data loading, to improve data loading efficiency；And during storage, by non-relationship Data of the reported result of type database as write-in non-relational database, by way of upstream and downstream iteration, implementation relation The storage of type database overcomes the problem of relevant database does not support major key, external key, index.

Detailed description of the invention

Fig. 1 is the structural schematic diagram one of Database Systems of the embodiment of the present invention；

Fig. 2 is the structural schematic diagram of production environment of the embodiment of the present invention；

Fig. 3 is the flow diagram one of data storage method of the embodiment of the present invention；

Fig. 4 is the circuit theory schematic diagram of sparkSQL of the embodiment of the present invention；

Fig. 5 is the configuration diagram of ES of the embodiment of the present invention；

Fig. 6 is the flow diagram of creation index of the embodiment of the present invention；

Fig. 7 is the system architecture schematic diagram of Hive of the embodiment of the present invention；

Fig. 8 is the flow diagram two of data storage method of the embodiment of the present invention；

Fig. 9 is the structural schematic diagram of data storage server in Database Systems of the embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with attached drawing to the present invention It is described in further detail, described embodiment is not construed as limitation of the present invention, and those of ordinary skill in the art exist All other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

In the following description, it is related to " some embodiments ", which depict the subsets of all possible embodiments, but can To understand, " some embodiments " can be the same subsets or different subsets of all possible embodiments, and can not conflict In the case where be combined with each other.

Unless otherwise defined, technical and scientific term all used in the embodiment of the present invention and belong to implementation of the present invention The normally understood meaning of those skilled in the art of example is identical.Term used in the embodiment of the present invention is intended merely to describe The purpose of the embodiment of the present invention, it is not intended that the limitation present invention.

Before the embodiment of the present invention is further elaborated, to noun involved in the embodiment of the present invention and term It is illustrated, noun involved in the embodiment of the present invention and term are suitable for following explanation.

1) non-relational database carries out the database or data warehouse of unstructured data storage, also such as: ES.It can An independent server is operated in, also may operate on multiple servers cooperated with each other.It is looked into non-relational database When looking for data, index is directly searched, data are searched by index.

2) index and document, index are the top layer units that non-relational database carries out data management, are a kind of documents Set, an index have multiple documents, the corresponding data of a document below.That is, an index is included below to be had A plurality of data.Such as: data include the achievement of A, B and C, wherein A, B are schoolgirl, C is boy student, then when being index with schoolgirl, Following data include the achievement of A and the achievement of B, and the achievement of A is the document 1 indexed under " schoolgirl ", and the achievement of B is index " female Document 2 under life "；It is when indexing with boy student, data below include the achievement of C, and the achievement of C is the document 1 under " boy student ".

3) relevant database carries out the database or data warehouse of structural data storage, by the data text of structuring Part is mapped as database table, such as: Hive.It may operate at an independent server, also may operate at multiple cooperate with each other On server.When searching data in relational database, index is directly searched, data are searched by index.Relational data Library may include that data engine and data source are stored data in data source by the processing of data engine.Wherein, Hive Data source is HDFS.

4) table (Table), each table have a corresponding catalogue storing data in relevant database.For example, One table ab, the path in HDFS are as follows:/wh/ab, wherein wh is the catalogue of specified data warehouse, the data of all tables It is all stored in this catalogue.

5) subregion (Partition), a Partition in table correspond to a catalogue under table, all The data of Partition are stored in corresponding catalogue.Such as: it include two Partition of ds and city in ab table, then it is right It should be in ds=20090801, the HDFS subdirectory of city=US are as follows:/wh/ab/ds=20090801/city=US；Corresponding to ds The subdirectory of=20090801, city=CA is；/ wh/ab/ds=20090801/city=CA.

6) metadata, the name including table, the column and subregion and its attribute of table, the attribute (whether being external table etc.) of table, Catalogue etc. where the data of table.

7) external table is directed toward the data present in HDFS, can create Partition.External table and table are in member The tissue of data is identical, and external table " an only process ", load data and creation table are completed at the same time, and can't be moved It moves in the catalogue of data warehouse, one is only established with external data and is linked.When deleting an external table, only deleting should Link.

The embodiment of the present invention can provide as data storage method, system, device and computer readable storage medium.Actually answer In, data storage method can be realized by data loading device, and each functional entity in data loading device can be by equipment (such as Terminal device, server or server cluster) hardware resource, such as processor computing resource, the communication resource are (as supporting Realize that the various modes such as optical cable, honeycomb communicate) cooperative achievement.

The data storage method of the embodiment of the present invention can be applied to Database Systems shown in FIG. 1, as shown in Figure 1, comprising: Data storage server 101, non-relational database 102 and relevant database 103.Wherein, data storage server 101 connects Receive data to be put in storage；According to the corresponding blocking information of the data to be put in storage, the data to be put in storage are divided into data block, and Determine the data block corresponding target index in non-relational database 102；Block is in the target rope based on the data Draw in associated document and carry out write-in processing, obtains that the data block is corresponding to report document；Document is reported extremely described in synchronizing The corresponding target partition of data block described in relevant database 103, to storing data to be reported to non-relational simultaneously In database 102 and relevant database 103.

Merely exemplary show in Database Systems a data storage server 101 in Fig. 1, of course it is not excluded The case where multiple data storage server 101 can be implemented constitutes data storage service by multiple data storage servers 101 Device cluster.

Database Systems shown in FIG. 1 provided in an embodiment of the present invention can be applied to production environment described in Fig. 2, such as Fig. 2 It is shown, it include business front end 201 and Database Systems shown in FIG. 1 100.Wherein, business front end 201, for being handed over user Mutually, and data are generated, data storage server 101 are reported to for generated data as data to be put in storage by network 202, To the number that storage service front end 201 reports in the non-relational database 102 of Database Systems and relevant database 103 According to.

In practical applications, business front end and data storage server can be same physical entity, can also be different objects Manage entity.

Non-relational database 102 and data storage server 101 can be located on same physical entity, may be alternatively located at difference Physical entity on.Relevant database 103 and data storage server 101 can be located on same physical entity, may be alternatively located at On different physical entities.Data storage server 101 may include non-relational database 102 or relevant database 103 Control section other than middle data storage section.

The application scenarios of production environment shown in Fig. 2 can are as follows: business intelligence (Business Intelligence, BI) system System, such as a group, megastore, the sales data of each cashier is reported, and deposits to the data reported Storage, at this point, business front end is each cashier in market.

Commodity price monitor website, the price for each commodity sold is monitored, when some commodity such as: toothpaste Price when changing, the price of toothpaste is reported, and the data reported are stored, at this point, business front end is Each terminal of typing commodity price.

Banking information system is acquired and reports to the information streaming information of every day of each branch, to the number reported According to being stored, at this point, business front end is service terminal operated by each office worker.

In the following, the schematic diagram of Database Systems and production environment shown in Fig. 2 as shown in connection with fig. 1, implements the present invention Each embodiment of data storage method, system, device and computer readable storage medium that example provides is illustrated.

The embodiment of the present invention provides a kind of data storage method as shown in figure 3, being applied to data storage service shown in FIG. 1 Device 101 will be illustrated in conjunction with network structure shown in fig. 1.

Step S301 receives data to be put in storage.

Data to be put in storage include at least one data.When business front end obtains data to be put in storage, by data transmission to be put in storage When to data storage server, data storage server receives data to be put in storage.Data storage server institute is received wait enter Library data may include a plurality of data.Here, data to be put in storage are the data of same subject.The data of different themes belong to different Data to be put in storage.Such as: the consumption data of data and market that banking system reports is the data of different themes, is belonged to different Data to be put in storage.

The data to be put in storage are divided into data according to the corresponding blocking information of the data to be put in storage by step S302 Block, and determine the data block corresponding target index in non-relational database.

The blocking information of different data block is different, includes at least one data in a data block.

Data to be put in storage for one, every data that data storage server treats storage data carry out the extraction of field, The blocking information of every data in a plurality of data included by the data to be put in storage is obtained, and according to blocking information to a plurality of data Carry out the division of data block.

Wherein, data to be put in storage are divided by corresponding data block according to different blocking informations.Such as: data to be put in storage In data include data 1, data 2, data 3 and data 4, the blocking information of data 1 and data 2 is A, data 3 and data 4 Blocking information is B, then data 1 and data 2 is divided to the corresponding data block 1 of blocking information A, data 3 and data 4 are divided to The corresponding data block 2 of blocking information B.

One data block may include a data, may also comprise a plurality of data.Such as: data to be put in storage include data 1, number According to 2, data 3 and data 4, data 1 are mutually all blocking information 1 with the blocking information of data 3, and the blocking information of data 2 is piecemeal The blocking information of information 2, data 4 is blocking information 3, then data 1 and data 3 is divided to the corresponding data block of blocking information 1 1, data 2 are divided to the corresponding data block 2 of blocking information 2, data 4 are divided to the corresponding data block 4 of blocking information 4.

After the data wait be put in storage are divided into data block, non-relational data are determined according to the blocking information of each data block The target index of write-in processing is carried out in library.Here, the content phase of the index in non-relational database and blocking information Together, to establish mapping relations between data block and index, such as: the blocking information of data block includes: order type of service =' A' and order creation date=' 20181201', the corresponding index of the data block is A_20181201.

In practical applications, the index in non-relational database can be made of theme and blocking information.Such as: for one Data block, theme order, and blocking information be include: order type of service=' A' and order creation date=' 20181201', then the corresponding index of the data block is order_A_20181201；Theme is class, and blocking information includes: to order Single type of service=' A' and order creation date=' 20181201', then the corresponding index of data block is class_A_ 20181201。

Here, when data storage server receives after being put in storage data, storage data can be treated and carry out data cleansing, with It will be filtered wait be put in storage the undesirable data such as incomplete data in data, the data of mistake, duplicate data.Its In, incomplete data are the data there are loss of learning, as the region of the title of supplier, the title of branch company, client are believed Main table cannot be matched with detail list in breath missing, operation system.The data Producing reason of mistake is that business front end is corresponding Operation system is not well established, caused by not carrying out judgement after receiving input and writing direct background data base, such as numerical value number At having behind full-shape numerical character, string data, a carriage return operation, date format be incorrect, the date crosses the border according to defeated.

After data storage server, which treats storage data, to be cleaned, data to be put in storage are sent as unit of data block To non-relational database.

Step S303, block carries out write-in processing in the associated document of target index based on the data, obtains The data block is corresponding to report document.

After determining the corresponding target index of each data block, using data block as data cell, associated in target index Document carries out the write-in processing of data in data block, and the associated document of corresponding target index is written in the data in each data block In.Such as: data block includes data block 1, data block 2 and data block 3, respectively according to data block 1, data block 2 and data block 3 After blocking information determines target index 1, target index 2 and target index 3, the data write-in target in data block 1 is indexed into 1 institute In associated document, by document associated by the data write-in target index 2 in data block 2, the data in data block 3 are write Enter in document associated by target index 3.

The type of write-in processing includes: to increase data, modification data and deletion data.Wherein, increasing data is new data The data of addition increase operation, and modification data are the data modification operations such as historical data supplement, historical data modification, delete data The data delete operation deleted for historical data.

Here, an associated document of target index includes one or more documents.It will be by write-in treated mesh The associated document of mark index is known as reporting document.Wherein, reporting document includes the associated all documents of target index.

In some embodiments, block indexes associated document in the target to data storage server based on the data In carry out write-in processing before, can also be according to the data structure of non-relational database rule, in the data block Data format, obtain the data for the data structure rule for meeting the non-relational database.

Here, the data stored in non-relational database are unstructured data, the data knot of non-relational database Structure rule can be JSON key-value pair.It, will number be put in storage when the data structure rule of non-relational database is JSON key-value pair Every data in is converted to the data of key-value pair structure, the i.e. data of key-value structure, and key is key name, characterization The attribute of value, value are the corresponding value of the key.Such as: for an order data, being converted to JSON structure is " Order_type ": A ", wherein order_type key characterizes type, and A is specific type.

It should be noted that the data of storage, that is, document in non-relational database is stored in ES cluster, ES cluster It is made of multiple nodes, wherein having one is host node and multiple back end.Host node is can be generated by election process , it is responsible for cluster relevant operation, management cluster change.Back end saves data, executes Data dependent operations.

Step S304 is synchronized and described is reported the corresponding target partition of data block described in document to relevant database.

When in step S303 by data to be put in storage using data block as unit be written non-relational database in each data Block corresponding target index, obtains after reporting document, relevant database and non-relational database carry out between data into The transmission of row data, realizes the synchronization of the two.

Illustrate the data method of synchronization of relevant database and non-relational database below.

Mode one calls migration interface, establishes interim external table by migrating interface, the interim external table of foundation is directed toward non- Document is reported in relevant database, and the document data reported in document is covered and is written in the file of target partition, into The covering of style of writing file data.

Mode two will report document to be converted to meet the file destination of the structured data request of relational data block, pass through File destination covers the file of target partition.

In the above-mentioned data method of synchronization, using carried out in non-relational database write-in processing report document as synchronization Object, the data or file itself to the file in target partition in relevant database cover, thus as upper The non-relational database of trip and as between the relevant database in downstream, passes through the synchronization that data are realized in covering write-in.

It should be noted that data block to be synchronized to the target partition of internal table.When data storage server receives not With the different when being put in storage data of theme, for different data to be put in storage, corresponding target partition is located at different interior In portion's table.

In embodiments of the present invention, data to be put in storage are divided into data block, and according to the blocking information pair of each data block The corresponding target of data block in non-relational database index target partition corresponding with data block in relevant database into Row operation, therefore, the entire storing process of data only relates to a small amount of index and subregion based on the data block of data to be put in storage, from And reduce the opereating specification during data loading, improve data loading efficiency；And during storage, by non-relational data Write-in object of the storage result in library as write-in non-relational database, by way of upstream and downstream iteration, implementation relation type The storage of database overcomes the problem of relevant database does not support major key, external key, index.

In some embodiments, it before step S302, can also extract described wait be put in storage each data in data Field；The field of each data is combined, the blocking information of each data is obtained.

For example, data storage server is receiving after being put in storage data, extract wait be put in storage each data in data Field is made of the feature field set of the data the field of the data for each data.Wherein, field is data Key, such as: type, date created, Data Identification etc..

After the feature field set of each data, according to the partition strategy of setting to the field in feature field set It is combined, obtains the blocking information of every data.Here, partition strategy can be configured according to user demand.Such as: when to Storage data are order data, when the field of order data includes: order type, date created, Data Identification, data information, Order type, date created are combined by regulation in partition strategy, obtain blocking information.In the embodiment of the present invention, piecemeal plan Slightly can the field according to included by data be configured, for the quantity of specific field and field involved in strategy without Any restriction.

Here, for data to be put in storage, duplicate removal is carried out to the blocking information of included data, obtains data pair to be put in storage The blocking information answered.Such as: data to be put in storage include data 1, data 2, data 3 and data 4, and the blocking information of data 1 is point Block message 1, the blocking information of data 2 are blocking information 2, and the blocking information of data 3 is blocking information 1, the blocking information of data 4 For blocking information 3, it is all blocking information 1 that the blocking information of data 1 and data 3, which repeats, therefore, corresponding point of data to be put in storage Block message includes blocking information 1, blocking information 2 and blocking information 3.

In embodiments of the present invention, by the index in the data block divided according to blocking information, non-relational database with And the subregion in relevant database is associated, the quantity of data block, index and the subregion that can divide is identical.Carrying out data When storage, the data block divided according to data to be put in storage is to the partial index and relational data in non-relational database Partial-partition in library carries out data write-in, accelerates data loading efficiency.Such as: when order type, date created are carried out group It closes, when obtaining blocking information, order type includes 3 seed types, and the date includes 100 days, if being combined in a manner of enumerating, can be generated 3*100=300 data block, 300 indexes can be corresponded to by corresponding in non-relational database, can be right in corresponding relationship type database 300 subregions are answered, however, including 100 datas wait be put in storage in data, are only divided into 10 data blocks, remaining 290 number According to block not comprising any data, at this point, only to 10 indexes and corresponding relationship type database in non-relational database In 10 subregions carry out in-stockroom operations.

In some embodiments, before step S303, and the index in the non-relational database does not include institute When stating data block corresponding target index, the target is created according to the blocking information of the data block and is indexed.

Here, the blocking information of data block target index association corresponding with the data block, such as: blocking information are as follows: (order Single type=' A', date created=' 20181201'), then target index is order_index_A_20181201.For another example: Blocking information are as follows: (order type=' B', date created=' 20181201')；Then target index is order_index_B_ 20181201。

If the index in non-relational database does not include the corresponding target index of a data block, according to the data block Blocking information create target index, and the associated text of newly created target index is written into the data in the data block Shelves.If the target index of a data is present in the index in non-relational database, the data in the data block are write Enter the associated document of existing target index.

In some embodiments, it is based on data storage method shown in Fig. 3, block is indexed in the target based on the data Write-in processing is carried out in associated document, may include: to choose from described wait be put in storage in data in the field of each data Aiming field, using the aiming field as the Data Identification of corresponding data；According to the number of data each in the data block According to mark, the associated destination document of the target index is determined；The corresponding number of destination document described in block based on the data According to carried out in the destination document include at least one of write-in processing: increase data, modification data and delete data.

Here, it can be selected from the field in the feature field set of each data by the Data Identification strategy of setting Aiming field is taken, using aiming field as the Data Identification of the unique identification data.Wherein, aiming field may include a word Section, may also comprise multiple fields.When aiming field includes multiple fields, multiple fields are carried out by word by the separator of setting The splicing for according with string, obtains Data Identification.It should be noted that the setting of Data Identification strategy can be carried out according to user's actual need Setting, the embodiment of the present invention is to this without any restriction.Here, the Data Identification of the data in different data block may be present It repeats, at this point, uniquely identifying a data within the data block by Data Identification.

For a data, the field of aiming field and blocking information can be overlapped, that is to say, that a field can not only be used for Aiming field also can be used as the field in blocking information；Or aiming field and the field of blocking information are not overlapped completely, the present invention Embodiment is to this without limiting.After determining the Data Identification of data, according to the data in each data block, each target is indexed Under the corresponding destination document of Data Identification carry out write-in processing.Such as: data block 1 includes data 1, data 2 and data 3, number It include data 1 and data 6 according to block 2, wherein the data 1 in data 1 and data block 2 in data block 1 are different data, number According to the corresponding target index 1 of block 1, data block corresponds to target index 2, then the corresponding mesh of the data 1 in data block 1, data 2 and data 3 Destination document under mark index 1 is respectively as follows: document 1, document 2 and document 3 under target index 1；1 He of data in data block 2 Destination document under the corresponding target index 2 of data 6 is respectively as follows: document 1 and document 6 under target index 2, wherein target index Document under 1 includes document 1 to document 100, and the document under target index 2 includes document 1 to document 30.Wherein, target index 1 Under document 1 to document 30 and target index 2 under the document data of document 1 to document 30 it is different.

Here, when write-in processing is increases data, new destination document adds document content under target index.Work as write-in When processing is revised as data, modify to the document content in the existing document under target index.When write-in processing is deletion When data, the existing document under target index is deleted.

In some embodiments, according to the mark of data each in the data block, determine that the target index is closed The destination document of connection may include: for each data, when the associated document of target index does not include the mark Corresponding destination document creates the destination document according to the mark.

Here, the destination document association under the mark of the data in data block target index corresponding with the data, Such as: blocking information are as follows: (order type=' A', date created=' 20181201'), the mark of the data in corresponding data block Know are as follows: 20181201002, then corresponding destination document is the document under target index is order_index_A_20181201 “20181201001"。

For each data, if document associated by target index does not include the corresponding target text of mark of a data Shelves, then create the destination document, and write the data into newly created destination document.If the associated document packet of target index The corresponding destination document of mark for including the data, then write the data into existing destination document.

In the following, continuing the non-relational database and relationship to S304 in data storage method provided in an embodiment of the present invention The two ways of the synchronization of type database is illustrated.

Mode one

When need to synchronize it is described report the corresponding target partition of data block described in document to relevant database when, determine The corresponding target external table of the data block；Based on the migration interface called, generated according to the blocking information of the data block The metadata of the target external table；The metadata of the target external table makes the target external table be directed toward the data block It is corresponding to report document；Based on the target external table by the document data for reporting document, the target point is written in covering In the corresponding file in area.

The blocking information of each data block divided according to data to be put in storage determines the target external in relevant database Table, and the migration interface of call relation type database and non-relational database, by migration interface according to point of each data block Block message generates the metadata of the corresponding target external table of each data block, so that target external table direction reports document.Such as: to Storage data are divided into data block 1, data block 2 and data block 3, then it is corresponding to be directed toward data block 1 for the corresponding external table of data block 1 Report document 1, the corresponding external table of data block 2 is directed toward that data block 2 is corresponding to report document 2, the corresponding external table of data block 3 It is directed toward that data block 3 is corresponding reports document 3.

Metadata in external table include the name of table, the column of table, table attribute (whether being external table etc.), the data of table Place catalogue etc..Wherein, the name of table is generated according to blocking information, that is, target index of data block in the metadata in external table.

In embodiments of the present invention, a data block, an external table, that is, interim external table of corresponding hive, and it is corresponding The table of hive is a subregion of internal table.The interim external table of hive does not have subregion.Blocking information is for generating facing for hive When external table table name, such as: blocking information are as follows: (order type=' A', date created=' 20181201')；Then outside target Portion's table is order_external_table_A_20181201.

After determining target external table, the pointed document reported in document of target index is read by target external table Read document data is written using structured query language (Structured Query Language, SQL) for data In the corresponding file of target partition.Such as: the document data covering of destination document 1 is written in the corresponding file of target partition 1, By in the document data covering write-in corresponding file of target partition 2 of destination document 2, the document data of destination document 3 is covered It is written in the corresponding file of target partition 3, realizes the covering of document data.

In practical applications, the number of files reported in document pointed by target external table can be read by sparkSQL According to.The frame structure of sparkSQL is as shown in figure 4, include three steps: reading in data -> handle to data -> and is written most Afterwards as a result, it is, will be reported in document pointed by target index in non-relational database 401 by sparkSQL402 Document data, be read into sparkSQL402, sparkSQL402 carries out data processing or calculation to the document data of reading Method is realized, is then output in corresponding output source i.e. relevant database 403 using treated data as final result again The corresponding file of target partition in.

In some embodiments, when determining the corresponding target external table of the data block, when the relevant database In external table do not include the corresponding target external table of the data block, the mesh is created according to the blocking information of the data block Mark external table.

The target external table association corresponding with the data block of the blocking information of data block, such as: blocking information are as follows: (order Type=' A', date created=' 20181201')；Then target external table is order_external_table_A_ 20181201.For another example: blocking information are as follows: (order type=' B', date created=' 20181201')；Then target external table For order_external_table_B_20181201.Wherein, order is the theme of data to be put in storage.

If the external table in relevant database does not include the corresponding target external table of a data block, according to the data The blocking information of block creates the target external table, and using the metadata generated according to the blocking information of the data block as the target The metadata of external table.If the corresponding target external table of a data block is present in the external table in relevant database, By the metadata generated according to the blocking information of the data block directly as the metadata of existing target external table.

It should be noted that the data transmission procedure that mode one describes, is related to the interim of the index index, hive of ES External table, the subregion of the inside hive table.Data block from the index of ES by migration interface (such as: Elastic Search- Hadoop interface) the interim external table that reaches hive, pass through the inside of spark-sql arrival hive from the interim external table of hive Table.

When carrying out data loading, when every wheel data loading, only transmits related data block.One data block, uniquely corresponds to ES One index, an interim external table of unique corresponding hive, a subregion of the unique corresponding inside hive table.In determination Inside the interim external table of the target of target index, hive, hive when the target partition of table, believe using only the piecemeal of each data block Breath realizes that the not heavy of data loading is not leaked, guarantees the integrality of data to determine.

The interim external table of Hvie is the bridge of data transmission between table inside the index and hive of ES.The index of ES is logical The interim external table that Elastic Search-hadoop interface reaches hive is crossed, is not related to the read-write of data, pertains only to metadata Foundation, do not expend the time, and the interim external table of hive by spark-sql reach hive inside table this belong to hive table Internal migration, based entirely on memory calculating, speed is very fast, to improve data loading efficiency.

Mode two

When need to synchronize it is described report the corresponding target partition of data block described in document to relevant database when, from institute It states non-relational database and reads that the data block is corresponding to report document；It reports document to format to described, obtains Meet the file destination of the data structure rule of the relevant database, and the target point is covered by the file destination The corresponding file in area.

Here, relevant database directly reads that each data block is corresponding to report document from non-relational database, and The structure for reporting document is arranged, document will be reported to format as a whole, generates file destination, and pass through file destination Directly the file under the path of target partition is covered, realizes the synchronization of data.Wherein, when relevant database is Hive When, the file destination of generation is HDFS file.

In some embodiments, before step S304, and when the subregion in the relevant database does not include described When the corresponding target partition of data block, the target partition is created according to the blocking information of the data block.

Here, the blocking information of data block target partition association corresponding with the data block, such as: blocking information are as follows: (order Single type=' A', date created=' 20181201'), then target partition are as follows: (type='A', createtime=' 20181201').For another example: blocking information are as follows: (order type=' B', date created=' 20181201')；Then target partition For (type='B', createtime='20181201').

If the subregion in relevant database does not include the corresponding target partition of a data block, according to the data block Blocking information creates the target partition, and will be according to the corresponding text for reporting document-synchronization into the target partition of blocking information Part.If the corresponding target partition of a data block is present in relevant database, document is reported by the data block is corresponding The file being synchronized in existing target partition.

Here, subregion is the subregion of internal table.Metadata in internal table include the name of table, the column of table and subregion and its The attribute (whether being external table etc.) of attribute, table, catalogue etc. where the data of internal table.Wherein, table in internal table metadata Name, the name of column, attribute, whether the information such as external table, can be the static information that is pre-seted according to the theme of business, subregion and The attribute of subregion is the multidate information generated according to blocking information.

Non-relational database provided in an embodiment of the present invention can be ES, and ES is a cluster, have multiple nodes in cluster, There is one for host node, this host node can be by electing, and main and subordinate node is for cluster internal.ES One complete index can be divided into multiple fragments (shard), such benefit is a big index can be split into It is multiple, it is distributed on different nodes.Constitute distributed storage.The quantity of fragment can only be specified before index creation, and rope It cannot be changed after drawing creation.To improve inquiry handling capacity or realizing high availability, fragment copy can be used.

Copy is the accurate duplication an of fragment, and each fragment can have zero or more copy.In ES can there are many Identical fragment, one of them changes index operation by selection, and this special fragment is known as main fragment.When main fragment is lost When, as: when the data where the fragment are unavailable, copy is promoted to new main fragment by cluster.

Fig. 5 shows the exemplary architecture of ES, illustrates its 7 layers of structure for including separately below.

First layer is gateway (Gateway) layer, the index datastore format that Elastic Search is supported, for storing The metadata information of ES cluster.Data are read when Elastic Search closing is restarted or inside gateway.

The second layer is distribution Lucene catalogue (Distributed Lucene directory) layer, Elastic Search is based on Lucene Development of Framework.Wherein, Lucene provides a simple powerful application interface, energy Enough do full-text index and search.

Third layer includes working process mode of the Elastic Search to data, i.e. indexing model, search modes mapping (mapping) and data source (river), below index of definition type field processing rule, such as: how index to establish, Data type etc., the schema being equivalent to inside relational data.River is one and operates in Elastic Search cluster An internal plug-in unit is primarily used to obtain isomeric data from outside, index is then created in Elastic Search, often The plug-in unit seen has rabbitmq, twitter river.

4th layer includes that Elastic Search has found the mechanism of node automatically, it may include Zen.Zen is for realizing section The automatic discovery of point, there are also host node (master) selections, if there is failure and cannot work in maste, then its Its node meeting automatic election, then generates a new master.Elastic Search is the system based on P2P, it is first Existing node is found by broadcast mechanism, the communication between node is then carried out by multicast protocol again, while also supporting a little To an interaction.

Layer 5 is the script execution function of Elastic Search, has this function very easily to checking out Data are processed, wherein the type of the script of Elastic Search can include: mvel, js, python etc..

Layer 6 is third side plug (3rd plugins), including Elastic Search supports installation diversified the Three side plugs.

Layer 7 is the transport layer of Elastic Search, defines the interactive mode of Elastic Search, supports three Kind agreement: thrift, memcached, HTTP, wherein Elastic Search is can to default to be transmitted with HTTP.

It further include having on 7 layers of structure of Elastic Search shown in Fig. 5: Java administration extensions JMX (Java Management Extensions, JMX), Restful Style API and java (Netty).JMX provides implantation management for ES Function can cross over heterogeneous operating system platform, system architecture and the network transmission protocol, flexible exploitation Seamless integration- System.Restful Style application programming interfaces (application program interface, API): being Elastic The API of Search supports mode.Elastic Search carries out information friendship by Restful Style API and the page (Web) Mutually.Java (Netty): for the development language of Elastic Search.Because Elastic Search is based on lucenu frame Exploitation, so Elastic Search can be using the development language for supporting lucenu frame, such as Java language.

The process of index is created in ES as shown in fig. 6, host node i.e. coordinator node receive index creation request after, by rope Draw request to create and routes to node where fragment.After the node where fragment receives the request from coordinator node, meeting Transaction journal is written into the request, and memory cache is added in document.If request is successfully processed on main fragment, the request meeting It is sent in parallel on the copy of the fragment.After transaction journal is synchronized on whole main fragment and its copy, acquisition service Device just will receive acknowledgement notification.

This relevant database provided in an embodiment of the present invention can be Hive, and the system architecture of Hive can be as shown in fig. 7, packet Include: Hive part and Hadoop two parts, Hive part include: Command Line Interface (Command-Line Interface, CLI), Java database connects (Java DataBase Connectivity, JDBC)/open CNC (Open Database Connectivity, ODBC), Web graph shape user interface (Graphical User Interface, GUI), Thrift services (server) and driving (driver).Wherein, CLI, JDBC/ODBC, Web GUI are user interface, Thrift Server is engine service, and driver includes: interpreter, compiler and optimizer.

The interaction of user interface realization Hive and user, relevant database.CLI, i.e. Shell order line.JDBC/ODBC It is the Java of Hive, is interacted by engine service with driving.WebGUI accesses Hive by browser.

Thrift server is a software frame, for carrying out the exploitation of service expansible and across language.It is combined Powerful software stack and code building engine, with building in C++, Java, Go, Python, PHP, Ruby, These programming languages of Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk and OCaml Seamless combination, efficiently service between speech.

Interpreter (executor), compiler (compiler), optimizer (optimizer) complete query statement (Hibernate Query Language, HQL) is from morphological analysis, syntactic analysis, compiling, optimization and inquiry plan (plan) Generation.The inquiry plan of generation is stored in the HDFS of Hadoop, and is executed then there is MapReduce to call.

The data of Hive are stored in the HDFS of Hadoop, and most inquiry is completed by MapReduce.

HDFS stores the file in Hadoop cluster on all memory nodes.Upper one layer of HDFS (for herein) is MapReduce engine, the engine handle service (JobTrackers) by task and form.The framework of HDFS is specific based on one group Node building, these nodes include name node (NameNode) (only one), it is provided inside HDFS metadata take Business；Back end (DataNode), provides memory block for HDFS.

JobTracker corresponds to NameNode, subtask processing service (TaskTracker corresponds to) DataNode. JobTracker receives operation (Job), each subtask (task) for being responsible for scheduling Job is run on TaskTracker, and Them are monitored, if it find that there is the task of failure just to rerun the task.

In the following, to provided in an embodiment of the present invention by taking ES is non-relational database, Hive is relevant database as an example Data storage method is illustrated.

The type of data loading can include: new data addition, historical data supplement, historical data is deleted and historical data is repaired Change.

In the embodiment of the present invention, data are stored in ES and Hive.In data loading, data to be put in storage are carried out clear It washes to filter out invalid data and only retain valid data, the structure of valid data is arranged as JSON structure, and non-relationship is written Type database ES, recalls migration interface, by migration interface to the storage result transformational structure in ES, after transformational structure Storage result covering write-in Hive, not only realizes data backup, meets relevant database and unstructured data not With scene application, it is often more important that quickly storage.Meanwhile treating storage data and carrying out data cleansing, it solves wait be put in storage in data The problems such as existing repetition reports, illegal typing.

Here, the data that have been put in storage stored in ES and Hive are magnanimity, and the data that business front end reports are i.e. to be put in storage Data are a small amount of.When by data loading, is revised by data to be put in storage and be put in storage data or be added in database.This It is upstream and downstream iteration form by the storage process design of i.e. data, using the storage result of ES as write-in in inventive embodiments The storage object of Hive overcomes the problem of Hive does not support major key, external key, index, and combined data block with the document of ES Piecemeal realize data quick storage.

Data storage method provided in an embodiment of the present invention is as shown in figure 8, the step of showing in conjunction with Fig. 8 is illustrated.

Step S801 receives data to be put in storage.

Here, data collected are reported in business front end, and data is reported into data storage server, data storage After server receives reported data, using reported data as data to be put in storage, filters out not meeting in the data reported and report The invalid data of agreement only retains the valid data in reported data.

Step S802 determines the blocking information of data to be put in storage, and treats storage data according to blocking information and carry out data The division of block.

For every data to be put in storage, the piecemeal letter that one or more fields carry out field combination as every data is extracted The identical data of blocking information and are divided to same data block by breath, that is to say, that each data wait be put in storage in data are according to dividing Block message falls into corresponding data block.

Step S803 treats storage data according to the data structure of ES and formats.

Here, according to the structured data request of ES, arranging every data wait be put in storage in data is JSON structure.

Step S804 is constructed wait be put in storage the corresponding data file of every data in data.

For every data wait be put in storage a data block in data, number of one or more fields as the data is extracted According to mark (ID), with every data in unique identification data block.When data ID includes multiple fields, can using " _ " as point The splicing for carrying out character string to each field every symbol, generates the data ID of the data, and using the data ID of the data as the number According to the data file ID of corresponding document (alternatively referred to as data file), the corresponding document of the data is determined.

Step S805 obtains storage result in the data write-in ES of each data block.

According to the blocking information wait be put in storage each data block in data, the index of one or more ES is generated in ES (index), the corresponding index of a data block will be known as mesh wait be put in storage the corresponding index of data block included in data Mark index.Wherein, in ES, when the corresponding index of a blocking information does not exist, the corresponding index of the blocking information is created.

Data in data block corresponding for an index, to the corresponding data file of data file ID under the index Increased, deleted, being changed, the write-ins processing such as duplicate removal, obtained storage result and report document, realize data to be put in storage entering in ES Library.

Step S806 determines the corresponding target partition of each data block according to the blocking information of each data block.

According to the blocking information of data to be put in storage in S802, table subregion inside one or more Hive is determined in hive (partition), as the corresponding target partition of each data block.Wherein, the corresponding index of a data block.When hive's Not there is no the corresponding partition of a blocking information in table, creates the corresponding partition of the blocking information.

Step S807 is determined according to the blocking information of each data block wait be put in storage the corresponding interim outside of each data block in data Table.

According to the corresponding blocking information of data to be put in storage in S802, the interim external table of one or more Hive is determined, as The corresponding target external table of each data block.Wherein, the corresponding target external table of a data block.Here, when data to be put in storage When for a data block, the interim external table of Hive is determined；When being put in storage data includes multiple data blocks, determine multiple The quantity of the interim external table of Hive, identified interim external table is identical as the quantity for the data block that data to be put in storage are divided.

Wherein, when not there is no the corresponding interim external table of a blocking information in Hive, the interim external table is created.

Step S808 calls migration interface, and each interim external table is directed toward storage result.

According to the structured data request of relevant database, migration interface Elastic Search-hadoop interface is called, The structure of each storage result in S805 is converted, and the interim external table of Hive is directed toward storage result.Wherein, which only need to be Attribute and mapping that metadata is used to describe storage result are generated in external table, data read-write operation do not occur, therefore basic It is not time-consuming.

The corresponding storage result of each data block is synchronized to by step S809 by the corresponding interim external table of each data block The corresponding target partition of each data block in Hive.

For each data block wait be put in storage in data, using spark-sql, in a manner of covering write-in, by storage result In specific data, that is, data content migrated one by one from the interim external table of Hive to the corresponding text of table partition inside Hive In part, the storage in Hive is completed.

In data storage method described in Fig. 8, step S803 to step S805 is in non-relational database Operation, S806 to S809 are for the operation in relevant database

In data storage method shown in Fig. 8, relevant information is extracted from data to be put in storage, according to the correlation of extraction Information treats storage data and carries out piecemeal, obtains multiple data blocks, and according to the blocking information of each data block by different data Block is put in storage respectively into ES corresponding internal table partition in corresponding index and Hive.It should be noted that ES with Different index realizes the storage to different data blocks, and Hive is realized with different inside table partition to different numbers According to the storage of block.Therefore, in the embodiment of the present invention, the entire storage process of data to be put in storage need to only operate low volume data block.

In data storage method shown in Fig. 8, by Elastic Search-hadoop interface, it is interim to create Hive External table directly reads the index data of Elastic Search, and then data cover is written in Hive using spark-sql Portion table partition realizes the data transmission between two databases.When realizing the data transmission between two databases, The index data of Elastic Search can be directly read, structure is arranged, generate hdfs file, table inside covering Hive Original under partition respective path, to realize that data are transmitted.

Data storage method provided in an embodiment of the present invention, by taking order data as an example, hundred million grades of the order volume being put in storage, every time Ten thousand grades of the order volume reported.Under normal circumstances, it if order is newly-increased order, is directly put in storage.But actual example exists Part order has been revised to being put in storage data, and solution is to have carried out full table scan for being put in storage data at present, and finding out needs The order to be revised, it is time-consuming very long.In the embodiment of the present invention, data file in ES corresponds each order, passes through pipe Data file ID is managed, it is convenient that order is carried out to revise movement, the data block of design, it is only necessary to be swept to data have partially been put in storage It retouches.In view of practical business scene requirement data are stored in non-relational database Elastic Search and relationship type number simultaneously According to library Hive, the present invention is upstream and downstream iteration form storage process design, using the storage result of Elastic Search as The data source that Hive is written, overcomes Hive with the data file ID of Elastic Search and does not support major key, external key, index The problem of.

In the following, by taking reported data is 100 order datas, has been put in storage the magnanimity order data that data are past 3 years as an example Data storage method provided in an embodiment of the present invention is further described.Wherein, 100 order datas is reported to can be newly Order data, be also possible to the data modified, deleted to the order data being put in storage in data.Here, it has been put in storage number According to storage in the table order_table in Hive, additions and deletions are carried out to the data of storage in order_table and change operation.It is existing There is technology, needs to carry out full table scan to order_table, could judge whether data to be put in storage have been put in storage, if needs pair It has been put in storage data update.

When data storage server is using 100 order datas as data to be put in storage, the order of every order data of extraction Type of service and order creation date, using order type of service and the two fields of order creation date as blocking information pair 100 order datas carry out the division of data block.Wherein, an order data can only belong to some data block, a certain number It may include an order data or a plurality of order data according to block.When the blocking information of a data block be order type of service=' A', the order creation date=' 20181201', totally 10 order datas, an index are created according to blocking information, such as in ES Order_index_A_20181201 stores this 10 order datas, creates a subregion in Hive for order_table, Such as type='A', createtime='20181201', this 10 order datas are stored.It is all to belong to ordering for this data block The write operations such as the additions and deletions of forms data change, relate only to the index:order_index_A_20181201's and Hive of ES The subregion type='A', createtime='20181201' of order_table.Wherein, the data reported receive data The division of block, primary code are realized；ES is written from the division of data block, primary code is realized, Hive is synchronized to from ES, is used Elastic Search-hadoop interface is realized.

Here, in ES, data block corresponds to index:order_A_20181201, and data file corresponds in the index Each json object, and the data file ID of each data file is different.In Hive, data block corresponds to order_ The subregion (type='A', createtime='20181201') of table, data file correspond to the number of every a line in table According to.

It should be noted that the index number of ES is excessive, cluster operation and cluster recovery speed will affect, Hive's Subregion excessively will increase the burden of namenode, and therefore, when determining blocking information, the relevant field for treating storage data is carried out Combination, duplicate removal, reduce the quantity of the corresponding data block of data to be put in storage.Such as: order type of service includes 3 seed types, order Date created includes that if combining in a manner of enumerating, can generate 3*100=300 data block, corresponding ES is exactly comprising 100 days 300 index, corresponding Hive is exactly 300 subregions.However there are some data blocks, do not include the feelings of any order data Condition.Therefore, it extracted, combined, duplicate removal by treating the relevant field of storage data, obtain the corresponding data of data to be put in storage The quantity of block.

Here, by taking data to be put in storage include following 5 data as an example, in the embodiment of the present invention data block, index and Subregion is illustrated.

Data 1 are as follows:

Data 2 are as follows:

Data 3 are as follows:

Data 4 are as follows:

Data 5 are as follows:

For above-mentioned 5 data, by the step 82 of Fig. 8, obtained blocking information is as follows:

The blocking information of data block 1: order type of service=' A', the order creation date=' 20181201'；

The blocking information of data block 2: order type of service=' B', the order creation date=' 20181201'；

The blocking information of data block 3: order type of service=' A', the order creation date=' 20181202'；

The blocking information of data block 4: order type of service=' B', the order creation date=' 20181202'；

Wherein, the corresponding data block of a blocking information, this 5 data reported are divided into 4 data blocks altogether.Data 1 and data 2 fall into data block 1, data 3 fall into data block 2, and data 4 fall into data block 3, and data 5 fall into data block 4.

The index of the corresponding ES of each blocking information is order_index_A_20181201, order_index_B_ 20181201,order_index_A_20181202,order_index_B_20181202.The corresponding Hive's of each blocking information The subregion of order_table are as follows:

Subregion: type='A', creattime='20181201'；

Subregion: type='B', creattime='20181201'；

Subregion: type='A', creattime='20181202'；

Subregion: type='B', creattime='20181202'.

By taking data 1 as an example, the data structure of ES and Hive are illustrated.

Data structure of the data 1 in ES are as follows:

Data structure of the data 1 in Hive are as follows:

20181201001\t00001\t3\t1。

The blocking information of data to be put in storage is synchronized to the ES and hive of Serial Relation.

Corresponding 4 index of this 4 data blocks are found in ES (if it does not, newly-built using blocking information Index), according to 5 order datas, additions and deletions only is carried out to the legacy data in this 4 index in ES and change operation.

The corresponding 4 interim external tables of this 4 data blocks are found in hive (if it does not, new using blocking information Build external table), Elastic Search-hadoop interface is then used, all data files in 4 index in ES, It is transferred to 4 interim external tables of hive.Order_table table finds corresponding 4 subregions of this 4 data blocks in hive (if it does not, using blocking information new partition) then uses spark-sql, 4 interim external tables in hive All data files are transferred to 4 subregions of the order_table table of hive.

The storage work of reported data is completed, subsequent data analysis can be based respectively on ES, hive progress, to meet Business demand under different scenes.

Data storage method provided in an embodiment of the present invention, has the characteristics that the following aspects:

(1) data block is constructed.Relevant information is extracted from data, is treated storage data respectively, has been put in storage data and is divided Block, matching only transmit target data block, are stored.Wherein, data block is the basic unit for transmitting data, and data want ES It is transferred to hive, as long as operating to several data blocks, most data block is not necessarily to any operation.

(2) data file is constructed.Relevant information is extracted from every data of data block, unique ID is generated, according to generation ID construct data file.Only target data document is increased, is deleted, is changed, duplicate removal, the data processings such as filtering.Wherein, data text Shelves are the basic units of write-in processing.ES changes the additions and deletions of the data in data file, than hive close friend, additionally, due to program It is designed to Serial Relation, is carried out so additions and deletions are changed operation and are placed in ES, then the transmission of data blocks after transmission operation arrives hive.For under a data block, each data file be it is unique, it is same this addresses the problem reporting scene usually to occur The problem that one order is repeatedly put in storage.

(3) in view of to be put in storage data be magnanimity, data to be put in storage be it is a small amount of, need to be stored in two kinds of databases, In conjunction with the characteristic and Elastic Search-hadoop interface of two kinds of databases, deblocking, the skill of unique identification are utilized Art is transmitted data, is handled, is stored, and realizes quickly storage.

Data storage method provided in an embodiment of the present invention at least has compared with data storage method in the related technology Following technical effect:

It (1) in the related technology, is clearly not proper to being put in storage data to carry out the operation of full table scan based on data to be put in storage When.In embodiments of the present invention, relevant field is extracted from wait be put in storage in data, data block is carried out to data according to relevant field Division, and the piecemeal fallen into according to each data, by data loading into ES corresponding index, and data are stored to Hive In corresponding internal table subregion, the entire process that is put in storage need to only operate a small amount of range, to improve data loading speed.

(2) in the related technology, the complicated calculating logic of the unsuitable processing of Elastic Search；Hive is using mould when writing Formula does not support major key or external key, does not establish and indexes to data, it is low that additions and deletions change data efficiency.Number provided in an embodiment of the present invention According to storage method, data file is constructed, data processing is carried out using Elastic Search, creates the interim external table of Hive, make Directly acquire processing result with Elastic Search-hadoop interface, using spark-sql realize the interim external table of Hive and The Data Migration of table partition inside Hive, realizes data loading, to improve data processing to guarantee data integrity Efficiency and data analysis capabilities.

The performance of non-relational database is to need not move through the parsing of SQL layer based on key-value pair, and performance is very high, And scalability does not have coupling between data also because being based on key-value pair, so being very easy to horizontal extension.And it closes It is that type database can easily make extremely complex data query, and base of SQL statement between a table and multiple tables Make it possible to meet the very high data access requirements of security performance in affairs support.Therefore, different databases can satisfy not With the application under scene.

After business front-end generating data, when data are reported in database, it is multiple that non-relational database is not suitable for processing Miscellaneous calculating logic, and relevant database does not support major key or external key using mode when writing, and does not establish and indexes to data. Data storage method provided in an embodiment of the present invention is written by the data of relevant database and non-relational database and is realized Data backup, meets the different scenes application of relevant database and unstructured data, and reduce number to be put in storage According to storage it is time-consuming, ensure that the integrality of data.Mass data piecemeal is stored in Elastic Search and Hive, and wait enter Library data loading only carries out write operation to individual data block, can effectively reduce data processing and data loading time, improves Service performance.

Illustrate the illustrative framework of Database Systems provided in an embodiment of the present invention continuing with Fig. 1.

Data storage server 101 is believed for receiving data to be put in storage according to the corresponding piecemeal of the data to be put in storage The data to be put in storage are divided into data block, and determine data block corresponding mesh in non-relational database 102 by breath Mark index；Data storage server 101 is also used to block based on the data and carries out in the associated document of target index Write-in processing, obtains that the data block is corresponding to report document, and synchronizes and described report document into relevant database 103 The corresponding target partition of the data block；Non-relational database 102, for being deposited in the associated document of target index Store up the data in the data block；Non-relational database 103, for reporting document described in the storage in the target partition.

In some embodiments, data storage server 101 are also used to index in the non-relational database 102 not When target index corresponding including the data block, according to the blocking information of the data block in the non-relational database Create the target index.

In some embodiments, data storage server 101 are also used to extract described wait be put in storage each data in data Field, and the field of each data is combined, obtains the piecemeal wait be put in storage each data in data and believe Breath.

In some embodiments, data storage server 101 are closed for block based on the data in target index It is specific to execute following operation: from the field wait be put in storage each data in data when carrying out write-in processing in the document of connection Middle pre-selection aiming field, and it is determined as corresponding Data Identification；According to the Data Identification of data each in the data block, Determine the associated destination document of the target index；The corresponding data of destination document described in block based on the data, in institute State carried out in destination document include at least one of write-in processing: increase data, modification data and delete data.

In some embodiments, data storage server 101, for the data according to data each in the data block Mark is specific to execute following operation when determining destination document associated by the target index:

For each data wait be put in storage in data, when the associated document of target index does not include described When the corresponding destination document of Data Identification, the destination document is created according to the Data Identification.

In some embodiments, data storage server 101 are also used to: according to the data of the non-relational database Tactical rule formats the data in the data block, obtains the data knot for meeting the non-relational database The data of structure rule.

In some embodiments, data storage server 101 described report document into relevant database for synchronizing It is specific to execute following operation: to determine the corresponding target external table of the data block when the corresponding target partition of the data block；Base In the migration interface called, the metadata of the target external table, the mesh are generated according to the blocking information of the data block The metadata of mark external table is directed toward the target external table, and the data block is corresponding to report document；Outside based on the target By the document data for reporting document, covering is written in the corresponding file of the target partition portion's table.

In some embodiments, data storage server 101, for determining the corresponding target external table of the data block When, it is specific to execute following operation: when the external table in the relevant database does not include outside the corresponding target of the data block When portion's table, the target external table is created according to the blocking information of the data block.

In some embodiments, data storage server 101 described report document into relevant database for synchronizing When the corresponding target partition of the data block, executes following steps: reading the data block pair from the non-relational database That answers reports document；It reports document to format to described, obtains the data structure rule for meeting the relevant database File destination then, and the corresponding file of the target partition is covered by the file destination.

In some embodiments, data storage server 101 are also used to: when the subregion in the relevant database not When target partition corresponding including the data block, the target partition is created according to the blocking information of the data block.

Data loading device 900 shown in Fig. 9 is the illustrative function of data storage method provided in an embodiment of the present invention It can structure.The hardware layer of data loading device 900 can be realized by the way of data storage server shown in FIG. 1.

It is the exemplary of data storage server 101 in Database Systems provided in an embodiment of the present invention referring to Fig. 9, Fig. 9 Structural schematic diagram, including at least one processor 1001, memory 1002, at least one network interface 1003 and user interface 1004.Various components in data storage server 101 are coupled by bus system 1005.It is understood that bus system 1005 for realizing the connection communication between these components.Bus system 1005 further includes power supply in addition to including data/address bus Bus, control bus and status signal bus in addition.But for the sake of clear explanation, various buses are all designated as bus in Fig. 9 System 1005.User interface 1004 may include display, keyboard, mouse, trace ball, click wheel, key, button, touch-sensitive plate Or touch screen etc..Memory 1002 can be volatile memory or nonvolatile memory, may also comprise volatibility and non- Both volatile memory.

Memory 1002 in the embodiment of the present invention can storing data to support the operation of data storage server 101. The example of these data includes: any computer program for operating on data storage server 101, such as operating system and Application program.Wherein, operating system includes various system programs, such as relevant database, non-relational database etc., is used for Realize various basic businesses and the hardware based task of processing.

As the example that method provided in an embodiment of the present invention uses software and hardware combining to implement, the embodiment of the present invention is provided Data storage method can be embodied directly in and combined by the software module that processor 1001 executes, software module can be located at meter In calculation machine readable storage medium storing program for executing, computer readable storage medium is located at memory 1002, and processor 1001 reads memory 1002 The executable instruction that middle software module includes, in conjunction with necessary hardware (e.g., including processor 1001 and be connected to bus 1005 other assemblies) complete data storage method provided in an embodiment of the present invention.

Illustrate the example of the combination of the software module in memory 1002 below with reference to Fig. 9.

Receiving unit 901, for receiving data to be put in storage.

Blocking unit 902 divides the data to be put in storage for the corresponding blocking information of data to be put in storage according to For data block, and determine the data block corresponding target index in non-relational database；Wherein, point of different data block Block message is different.

First storage unit 903 is write in the associated document of target index for block based on the data Enter processing, obtains that the data block is corresponding to report document.

Second storage unit 904 described reports data block described in document to relevant database corresponding for synchronizing Target partition.

In some embodiments, data loading device 900 further include: index determination unit, for working as the non-relational When index in database does not include that the corresponding target of the data block indexes, according to the blocking information of the data block described The target index is created in non-relational database.

In some embodiments, data loading device 900 further include: extraction unit, for extracting the data to be put in storage In each data field；Assembled unit is combined for the field to each data, obtains the data to be put in storage In each data blocking information.

In some embodiments, the first storage unit 903, comprising: module is chosen, for from described every wait be put in storage in data Pre-selection aiming field in the field of one data, and it is determined as corresponding Data Identification；Document determining module, for according to institute The Data Identification for stating each data in data block determines the associated destination document of the target index；Writing module is used for The corresponding data of destination document described in block based on the data carry out including at least one of in the destination document Write-in processing: increasing data, modification data and deletes data.

In some embodiments, the document determining module, is specifically used for: for each wait be put in storage in data Data, when the associated document of target index does not include the corresponding destination document of the Data Identification, according to the number The destination document is created according to mark.

In some embodiments, described device further include: structure converting unit, for according to the non-relational database Data structure rule, the data in the data block are formatted, obtain meeting the non-relational database The data of data structure rule.

In some embodiments, the second storage unit 904, comprising: external table determining module, for determining the data block Corresponding target external table；Metadata generation module, for based on the migration interface called, according to the piecemeal of the data block Information generates the metadata of the target external table, and the metadata of the target external table makes the target external table be directed toward institute State that data block is corresponding to report document；First covering writing module, for reporting document for described based on the target external table Document data, covering is written in the corresponding file of the target partition.

In some embodiments, the external table determining module, is specifically used for: the outside in the relevant database When table does not include the data block corresponding target external table, the target external is created according to the blocking information of the data block Table.

In some embodiments, the second storage unit 904, comprising: read module is used for from the non-relational database Read that the data block is corresponding to report document；Second covering writing module, for reporting document to format to described, The file destination for the data structure rule for meeting the relevant database is obtained, and the mesh is covered by the file destination Mark the corresponding file of subregion.

In some embodiments, data loading device 900 further include: subregion determination unit, for working as the relationship type number When according to the subregion in library not including the corresponding target partition of the data block, according to the creation of the blocking information of the data block Target partition.

The embodiment of the present invention also provides a kind of computer readable storage medium, stores in the computer readable storage medium There is computer program, when the computer program is executed by processor, realizes above-mentioned data storage method.

The description of system above, Installation practice and computer readable storage medium embodiment, with above method embodiment Description be it is similar, have with embodiment of the method similar beneficial effect.The system of the embodiment of the present invention, device are implemented Undisclosed technical detail in example and computer readable storage medium embodiment, please refer to the description of embodiment of the present invention method and Understand.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. a kind of data storage method, which is characterized in that the described method includes:

Receive data to be put in storage；

According to the corresponding blocking information of the data to be put in storage, the data to be put in storage are divided into data block, and described in determining Data block corresponding target index in non-relational database；

Block carries out write-in processing in the associated document of target index based on the data, and it is corresponding to obtain the data block Report document；

The corresponding target partition of data block described in document to relevant database is reported described in synchronizing.

2. the method according to claim 1, wherein the method also includes:

When the index in the non-relational database does not include that the corresponding target of the data block indexes, according to the data The blocking information of block creates the target index in the non-relational database.

3. the method according to claim 1, wherein the method also includes:

Extract the field wait be put in storage each data in data；

The field of each data is combined, the blocking information wait be put in storage each data in data is obtained.

4. according to the method described in claim 3, it is characterized in that, block is in target index associated based on the data Write-in processing is carried out in document, comprising:

From described wait be put in storage in data pre-selection aiming field in the field of each data, and it is determined as corresponding data mark Know；

According to the Data Identification of data each in the data block, the associated destination document of the target index is determined；

The corresponding data of destination document described in block based on the data, carried out in the destination document include it is following at least it One write-in processing: increasing data, modification data and deletes data.

5. according to the method described in claim 4, it is characterized in that, the data according to data each in the data block Mark determines the associated destination document of the target index, comprising:

For each data wait be put in storage in data, when the associated document of target index does not include the data When identifying corresponding destination document, the destination document is created according to the Data Identification.

6. the method according to claim 1, wherein the method also includes:

According to the data structure of non-relational database rule, the data in the data block are formatted, are obtained To the data for the data structure rule for meeting the non-relational database.

7. the method according to claim 1, wherein described synchronize described reports document into relevant database The corresponding target partition of the data block, comprising:

Determine the corresponding target external table of the data block；

Based on the migration interface called, the metadata of the target external table is generated according to the blocking information of the data block, The metadata of the target external table is directed toward the target external table, and the data block is corresponding to report document；

Based on the target external table by the document data for reporting document, the corresponding file of the target partition is written in covering In.

8. the method according to the description of claim 7 is characterized in that the corresponding target external table of the determination data block, Include:

When the external table in the relevant database does not include the corresponding target external table of the data block, according to the number The target external table is created according to the blocking information of block.

9. the method according to claim 1, wherein described synchronize described reports document into relevant database The corresponding target partition of the data block, comprising:

Document is reported from the non-relational database reading data block is corresponding；

It reports document to format to described, obtains the target text for the data structure rule for meeting the relevant database Part, and the corresponding file of the target partition is covered by the file destination.

10. according to claim 1, method described in 7 or 9, which is characterized in that the method also includes:

When the subregion in the relevant database does not include the corresponding target partition of the data block, according to the data block Blocking information create the target partition.

11. a kind of Database Systems characterized by comprising

Data storage server, will be described according to the corresponding blocking information of the data to be put in storage for receiving data to be put in storage Data to be put in storage are divided into data block, and determine the data block corresponding target index in non-relational database；

The data storage server is also used to block based on the data and is write in the associated document of target index Enter processing, obtain that the data block is corresponding to report document, and synchronizes described report described in document to relevant database The corresponding target partition of data block；

The non-relational database, for storing the number in the data block in the associated document of target index According to；

12. system according to claim 11, which is characterized in that

The data storage server is also used to when the index in the non-relational database not include that the data block is corresponding Target index when, target index is created in the non-relational database according to the blocking information of the data block.

13. system according to claim 11, which is characterized in that

The data storage server is also used to extract the field wait be put in storage each data in data, and to each The field of data is combined, and obtains the blocking information wait be put in storage each data in data.

14. system according to claim 13, which is characterized in that the data storage server, for being based on the number It is specific to execute following operation when carrying out write-in processing in the associated document of target index according to block:

15. system according to claim 14, which is characterized in that

The data storage server determines the target for the Data Identification according to data each in the data block It is specific to execute following operation when destination document associated by index: