A kind of distributed NewSQL Database Systems and semi-structured data storage method
Technical field
The present invention relates to big data technical field, more particularly to a kind of distributed NewSQL Database Systems and semi-structured
Data storage method.
Background technology
Hbase is one of foremost distributed NoSQL databases in Hadoop ecosystems at present.The main groups of Hbase
Part includes HMaster and HRegionsever, provides the user the data model of form types, is divided into by major key scope multiple
Region, HMaster are responsible for and distributed region, and HRegionserver is responsible for the read-write of region data.It is existing
The data of Hbase storages do not have point of data type, are byte arrays, therefore such as to store this semi-structured numbers of JSON
According to can there are problems that in query aspects.JSON formatted datas are stored in Hbase, then routine can be by whole JSON objects
Stored as character string.Following defect be present in which:
Want when filter record, it is necessary to which all records are all read out and then filtered in client, in number
According to measure it is larger in the case of the performance can not be received.
Will more new record when, it is necessary to record is read out be updated again for specific field after re-write
Hbase is covered.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of distributed NewSQL Database Systems and semi-structured data storage
Method, the data storage of JSON forms can be realized, effect and performance are bad when solving the problems, such as to handle semi-structured data.
To achieve the above object, the embodiments of the invention provide a kind of distributed NewSQL Database Systems, including:
Control unit, in a manner of database interface accessing user ask, and by the user request be sent to meter
Draw unit;Wherein, user's request includes the JSON data that needs write;
Planning unit, for parsing user's request, executive plan corresponding to compiling and generation;
Execution unit, for according to executive plan, using the JSON data as general character string type integrally as one
Individual data field writes tables of data;
Hbase units, for storing the tables of data and concordance list, wherein, the bottom increase JSON types of Hbase units
Data, the JSON data are stored entirely in bottom HFile;The Hbase units also include collaboration processing module, the association
It is used to, when the JSON data write tables of data, the JSON data type nested as one be generated with processing module
The index data of inverted index form, and the index data is written to the concordance list.
Compared with prior art, a kind of distributed NewSQL Database Systems disclosed by the invention, it is single by controlling first
Member accessing user in a manner of database interface is asked, and user's request is sent into planning unit;Then planning unit is passed through
Parse user's request, executive plan corresponding to compiling and generation;Then, by execution unit according to executive plan, by described in
JSON data are as general character string type integrally as one data word section write-in tables of data, the JSON data global storage
In the bottom HFile of Hbase units;The collaboration processing module is used for when the JSON data write tables of data, by institute
The index data of the JSON data type generation inverted index form nested as one is stated, and the index data is written to
The technical scheme of the concordance list, the data storage of JSON forms is realized, solve effect and property during processing semi-structured data
Can be bad the problem of.
Further, the execution unit is used to the result of the Hbase units being back to described control unit;
Described control unit is additionally operable to the result returning to user.
Further, in addition to:Distributed transaction management device, for when being related to distributed transaction in the executive plan
When, coordinate the multi-party completion distributed transaction management in the executive plan.
Further, the Hbase units also include filtering module, and the filtering module and the collaboration processing module are used
The concordance list in generation for data.
Further, the database interface is JDBC or ODBC.
The embodiment of the present invention also provides a kind of semi-structured data storage method, is provided based on the embodiments of the present invention
Described distributed NewSQL Database Systems, including:
Control unit, accessing user is asked in a manner of database interface, and user request is sent into plan
Unit;Wherein, user's request includes the JSON data that needs write;
The user is parsed by planning unit to ask, executive plan corresponding to compiling and generation;
By execution unit according to executive plan, using the JSON data as general character string type integrally as one
Data field writes tables of data;Wherein, the tables of data is stored in Hbase units;The bottom increase of the Hbase units
JSON categorical datas, the JSON data are stored entirely in bottom HFile;
When the JSON data write tables of data, by the collaboration processing modules of the Hbase units by the JSON
The index data of the data type generation inverted index form nested as one, and the index data is written to the rope
Draw table, wherein, the concordance list is stored in the Hbase units.
Compared with prior art, a kind of semi-structured data storage method disclosed by the invention, passes through control unit first
Accessing user is asked in a manner of database interface, and user's request is sent into planning unit;Then planning unit solution is passed through
Analyse user's request, executive plan corresponding to compiling and generation;Then, by execution unit according to executive plan, by described in
JSON data are as general character string type integrally as one data word section write-in tables of data, the JSON data global storage
In the bottom HFile of Hbase units;By the collaboration processing module when the JSON data write tables of data, by institute
The index data of the JSON data type generation inverted index form nested as one is stated, and the index data is written to
The technical scheme of the concordance list, the data storage of JSON forms is realized, solve effect and property during processing semi-structured data
Can be bad the problem of.
Further, after the index data being written into concordance list by the collaboration processing module, in addition to:
The result of the Hbase units is back to by described control unit by the execution unit;
The result is returned to user by described control unit.
Further, in addition to:
By distributed transaction management device when being related to distributed transaction in the executive plan, coordinate the executive plan
In multi-party completion distributed transaction management.
Further, the Hbase units also include filtering module, are given birth to by the filtering module and collaboration processing module
Into the concordance list for data.
Further, the database interface is JDBC or ODBC.
Brief description of the drawings
Fig. 1 is a kind of structural representation for distributed NewSQL databases that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of schematic flow sheet for semi-structured data storage method that the embodiment of the present invention 2 provides;
Fig. 3 is to generate to perform meter in a kind of step S2 for semi-structured data storage method that the embodiment of the present invention 2 provides
The schematic flow sheet drawn.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
Referring to Fig. 1, Fig. 1 is a kind of structural representation for distributed NewSQL Database Systems that the embodiment of the present invention 1 provides
Figure, the concrete structure of the present embodiment include:
Control unit 1, in a manner of database interface accessing user ask, and by the user request be sent to meter
Draw unit 2;Wherein, user's request includes the JSON data that needs write;
Planning unit 2, for parsing user's request, executive plan corresponding to compiling and generation;
Execution unit 3, for according to executive plan, using the JSON data as general character string type integrally as one
Individual data field writes tables of data;
Hbase units 4, for storing the tables of data and concordance list, wherein, the bottom increase JSON classes of Hbase units 4
Type data, the JSON data are stored entirely in bottom HFile;The Hbase units 4 also include collaboration processing module 41,
The collaboration processing module 41 is used for when the JSON data write tables of data, the JSON data are nested as one
Type generates the index data of inverted index form, and the index data is written into the concordance list.
The present embodiment has increased JSON categorical datas newly in the bottom of Hbase units 4, and JSON data are stored entirely in into bottom
In HFile, and JSON index column is also served as into a nested type when building secondary index and is indexed, therefore energy
The arbitrary fields inquiry for JSON is supported, index is created and revises.
Further, the execution unit 3 is used to the result of the Hbase units 4 being back to the control list
Member 1;Described control unit 1 is additionally operable to the result returning to user.
Further, in addition to:Distributed transaction management device, for when being related to affairs in executive plan, coordinating to perform
Multi-party completion distributed transaction management in the works.Distributed transaction management device is realized using Java issued transactions API (JTA) to be divided
Cloth issued transaction and transaction management;Wherein, JTA, i.e. Java TransactionAPI, JTA allow application program to perform distribution
Formula issued transaction --- access and update the data on two or more network computer resources.
Specifically, after user's request of the planning unit 2 for receiving control unit 1, parsing user's request, and pass through height
Fast SQL engines compile SQL, then regenerate executive plan.In addition, execution unit 2 returns to after being additionally operable to executive plan generation
Control unit 1.And control unit 1 is additionally operable to judge whether needs according to the content of executive plan after executive plan is received
The intervention of distributed transaction management device, if it is desired, then start distributed transaction management device.
Further, the Hbase units 4 also include filtering module, the filtering module and the collaboration processing module
41 are used to generate the concordance list for data.
Further, the database interface is JDBC or ODBC.
Further, control unit 1 is also connected with a monitor, for being responsible for metadata management and for monitoring bottom
Hbase Region load, avoids specific region load too high, and using cooperateing with processing module 41 to redistribute
Region。
In addition, control unit 1 is additionally operable to coordinate data communication, the management overall flow between multiple roles.
Wherein, planning unit 2 is used for the process for generating executive plan, specifically includes:
Judge to whether there is the prestore SQL statement corresponding with SQL statement in common buffer pool, if so, then output and SQL
Executive plan corresponding to sentence, if it is not, then
Syntax check is carried out to SQL statement, if syntax error returns to error message to user, otherwise,
Semantic test is carried out to SQL statement, if semantic error returns to error message to user, otherwise,
View and expression formula conversion, conversion results corresponding to acquisition are carried out to SQL statement;
Optimizer, optimizer selection result corresponding to acquisition are selected according to transformation result;
According to data connection approach and the order of connection corresponding to the selection of optimizer selection result;
According to connected mode and the path of order of connection selection search;
Executive plan is generated according to searching route, and exports executive plan.
When it is implemented, control unit 1, accessing user asks in a manner of database interface first, and please by user
Ask and be sent to planning unit 2;Then parse user by planning unit 2 to ask, executive plan corresponding to compiling and generation;Connect
, judged whether according to the content of executive plan to need the intervention of distributed transaction management device by control unit 1, if needed
Will, then start distributed transaction management device, the multi-party completion coordinated by distributed transaction management device in executive plan is distributed
Transaction management;Then, it is overall using the JSON data as general character string type by execution unit 3 according to executive plan
Tables of data is write as one data word section, the JSON data are stored entirely in the bottom HFile of Hbase units 4;It is described
Processing module 41 is cooperateed with to be used for when the JSON data write tables of data, using the type that the JSON data are nested as one
The index data of inverted index form is generated, and the index data is written to the concordance list.Finally, execution unit is passed through
The result of Hbase units 4 is returned to control unit 1 by 3, and result is returned to user by control unit 1.
The distributed NewSQL Database Systems of the present embodiment can realize the data storage of JSON forms, and handle half structure
Effect and the performance for changing data are good.
Referring to Fig. 2, Fig. 2 is a kind of schematic flow sheet for semi-structured data storage method that the embodiment of the present invention 2 provides,
The present embodiment includes step:
S1, control unit 1, accessing user is asked in a manner of database interface, and user request is sent to
Planning unit 2;Wherein, user's request includes the JSON data that needs write;
S2, the user is parsed by planning unit 2 asked, executive plan corresponding to compiling and generation;
S3, by execution unit 3 according to executive plan, using the JSON data as general character string type integrally as
One data word section writes tables of data;Wherein, the tables of data is stored in Hbase units 4;The bottom of the Hbase units 4
Increase JSON categorical datas, the JSON data are stored entirely in bottom HFile;
S4, when the JSON data write tables of data, by the collaboration processing modules 41 of the Hbase units 4 by institute
The index data of the JSON data type generation inverted index form nested as one is stated, and the index data is written to
The concordance list, wherein, the concordance list is stored in the Hbase units 4.
The present embodiment has increased JSON categorical datas newly in the bottom of Hbase units 4, and JSON data are stored entirely in into bottom
In HFile, and JSON index column is also served as into a nested type when building secondary index and is indexed, therefore energy
The arbitrary fields inquiry for JSON is supported, index is created and revises.
Further, after the index data is written to concordance list by step S4 by the collaboration processing module 41, also
Including step:
S5, the result of the Hbase units 4 is back to by described control unit 1 by the execution unit 3;
S6, by described control unit 1 by the result return user.
Further, the present embodiment step S2 is completed after generating executive plan, in addition to executive plan is returned into control
Unit 1, by control unit 1 after executive plan is received, judge whether to need distributed thing always according to the content of executive plan
The intervention of business manager, if it is desired, then start distributed transaction management device, specifically, working as by distributed transaction management device
When being related to affairs in executive plan, coordinate the multi-party completion distributed transaction management in executive plan;If it is not required, then directly
Perform step S3.
Further, the present embodiment is directed to by the filtering module and the collaboration generation of processing module 41 of the Hbase units 4
The concordance list of data.
Further, the database interface is JDBC or ODBC.
Wherein, referring to Fig. 3, Fig. 3 is the schematic flow sheet for generating executive plan in step S2 by planning unit 2, specifically
Including:
S201, judge to whether there is the prestore SQL statement corresponding with SQL statement in common buffer pool, if so, then exporting
Executive plan corresponding with SQL statement, if it is not, then
S202, syntax check is carried out to SQL statement, if syntax error returns to error message to user, otherwise,
S203, semantic test is carried out to SQL statement, if semantic error returns to error message to user, otherwise,
S204, view and expression formula conversion, conversion results corresponding to acquisition are carried out to SQL statement;
S205, according to transformation result select optimizer, optimizer selection result corresponding to acquisition;
S206, data connection approach and the order of connection according to corresponding to the selection of optimizer selection result;
S207, the path for selecting to search for according to connected mode and the order of connection;
S208, executive plan generated according to searching route, and export executive plan.
When it is implemented, control unit 1, accessing user asks in a manner of database interface first, and please by user
Ask and be sent to planning unit 2;Then parse user by planning unit 2 to ask, executive plan corresponding to compiling and generation;Connect
, judged whether according to the content of executive plan to need the intervention of distributed transaction management device by control unit 1, if needed
Will, then start distributed transaction management device, the multi-party completion coordinated by distributed transaction management device in executive plan is distributed
Transaction management;Then, it is overall using the JSON data as general character string type by execution unit 3 according to executive plan
Tables of data is write as one data word section, the JSON data are stored entirely in the bottom HFile of Hbase units 4;It is described
Processing module 41 is cooperateed with to generate the JSON data type nested as one when the JSON data write tables of data
The index data of inverted index form, and the index data is written to the concordance list.Finally, will by execution unit 3
The result of Hbase units 4 returns to control unit 1, and result is returned to user by control unit 1.
The distributed NewSQL Database Systems of the present embodiment can realize the data storage of JSON forms, and handle half structure
Effect and the performance for changing data are good.
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art
For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as
Protection scope of the present invention.