The method and distributed NewSQL Database Systems that a kind of full-text search is established
Technical field
The present invention relates to big data technical field, more particularly to a kind of method of full-text search foundation and distributed NewSQL
Database Systems.
Background technology
Hbase is one of foremost distributed NoSQL databases in Hadoop ecosystems at present.Its design concept is come
Come from Google Bigtable.Hbase primary clusterings include HMaster and HRegionsever, provide the user form class
The data model of type, multiple region being divided into by major key scope, HMaster is responsible for and distributed region,
HRegionserver is responsible for the read-write of region data.The advantages of Hbase, becomes current most widely used distribution
One of NoSQL databases, moved to increasing using trial on Hbase, but Hbase only supports looking into based on major key
Ask, do not support full-text search, inconvenience is brought to many applications.
The content of the invention
The purpose of the embodiment of the present invention is to provide method and the distributed NewSQL data base sets that a kind of full-text search is established
System, by supporting distributed full-text search, meets the needs of user creates full-text index.
To achieve the above object, the embodiments of the invention provide a kind of method that full-text search is established, suitable for distribution
NewSQL Database Systems, the distributed NewSQL Database Systems include Solr units, the side that the full-text search is established
Method includes:
Asked with JDCB/ODBC interface mode accessing user, wherein, user's request includes the full text that needs write
The data field of retrieval;
Parse user's request, executive plan corresponding to compiling and generation;
According to the executive plan, the data field of the full-text search is write into the tables of data;
When the data field of the full-text search writes the tables of data, according to the data field of the full-text search
Description generates the index data of inverted index form, and the index data is written to the concordance list of the solr units;
Result is returned into user, the result has been write for the data field of the full-text search for needing to write
The result entered.
Further, in addition to:The user is asked to the SQL request of conversion SQL statement form.
Further, parsing user's request, executive plan corresponding to compiling and generation include:
Judge to whether there is the prestore SQL statement corresponding with the SQL request in common buffer pool, if so, then exporting
The corresponding executive plan corresponding to SQL statement that prestores, if it is not, then,
Syntax check is carried out to the SQL request, if syntax error returns to error message to user, otherwise,
Semantic test is carried out to the SQL request, if semantic error returns to error message to user, otherwise,
View and expression formula conversion, conversion results corresponding to acquisition are carried out to the SQL request;
Optimizer, optimizer selection result corresponding to acquisition are selected according to the transformation result;
According to data connection approach and the order of connection corresponding to optimizer selection result selection;
According to connected mode and the path of order of connection selection search;
Executive plan is generated according to searching route, and exports executive plan.
Accordingly, the embodiment of the present invention also provides a kind of distributed NewSQL Database Systems, including:
JDCB/ODBC interface units, for interacting operation with user, including user's request is received, return to processing knot
Fruit is to user;Wherein, user's request includes the data field for the full-text search that needs write, and the result is needs
The result that the data field of the full-text search of write-in has been written into;
Master units, the user's request accessed for accessing JDCB/ODBC interface units, and coordinate multiple processors
Between data communication and management overall flow, and by the user request be preferentially sent to SQLPlaner units;master
Unit is additionally operable to the result and returns to JDCB/ODBC interface units;
SQLPlaner units, for parsing user's request, compiling and customization is asked to perform meter according to the user
Draw;
Worker units, for being performed in parallel the plan, including:According to executive plan, by the full-text search
Data field writes tables of data as general character string type;It is additionally operable to the result of the Hbase units being back to institute
State master units;
Hbase units, for storing the tables of data;The Hbase units also include the coprocessor modules,
The coprocessor modules are used for when the data field of the full-text search writes the tables of data, according to the full text
The index data of the description generation inverted index form of the data field of retrieval, and the index data is written to concordance list;
Solr units, for storing the concordance list;
Distributed transaction management device, for when the worker units executive plan is related to affairs, coordinating multi-party complete
Distributed transaction management.
Further, JDCB/ODBC interface units are additionally operable to ask the SQL of conversion SQL statement form please the user
Ask.
Further, the SQLPlaner units are used for:
Judge to whether there is the prestore SQL statement corresponding with the SQL request in common buffer pool, if so, then exporting
The corresponding executive plan corresponding to SQL statement that prestores, if it is not, then,
Syntax check is carried out to the SQL request, if syntax error returns to error message to user, otherwise,
Semantic test is carried out to the SQL request, if semantic error returns to error message to user, otherwise,
View and expression formula conversion, conversion results corresponding to acquisition are carried out to the SQL request;
Optimizer, optimizer selection result corresponding to acquisition are selected according to the transformation result;
According to data connection approach and the order of connection corresponding to optimizer selection result selection;
According to connected mode and the path of order of connection selection search;
Executive plan is generated according to searching route, and exports executive plan.
Further, in addition to:
Monitor, for being responsible for metadata management, the Region of Hbase units load is monitored, and pass through institute
The coprocessor modules for stating Hbase units redistribute Region;The monitor is connected with the master units.
Further, the Region of monitoring Hbase units load, and pass through the Hbase units
Coprocessor modules, which redistribute Region, to be included:
The Data distribution information of the Hbase units is received, receives the worker units in the master units
Load information, wherein, the load information includes the load deviation value of the worker units;
By the load deviation value of the worker units compared with default load deviation threshold, if it is determined that the load
Deviation exceedes threshold values, triggers the Hbase units by the Region on the higher server of hit rate and the relatively low service of hit rate
Region on device is carried out from new distribution;
Every Region data volume is obtained, each Region data volume and preset data amount threshold value are sentenced
It is disconnected, if it is determined that the data volume of the Region exceedes threshold values, the Hbase units are triggered by more than the institute of preset data amount threshold value
State Region and be cut into two.
Further, the JDCB/ODBC interface units include:
JDBC application program modules, for receiving user's request, and JDBC object method are called to provide SQL statement,
And return to user for extracting result;
JDBC driver manager modules, for being loaded for the JDBC application program modules and calling JDBC to drive journey
Sequence module;
JDBC driver modules, for performing the calling of the JDBC object method, send corresponding to user's request
Database of the SQL statement to bottom, and the result obtained from the database of the bottom is returned into JDBC application program modules.
Compared with prior art, a kind of full-text search method for building up disclosed by the invention and distributed NewSQL data base sets
System, by being asked with JDCB/ODBC interface mode accessing user, parse the user and ask, corresponding to compiling and generation
Executive plan;According to the executive plan, the data field of the full-text search is write into the tables of data;Examined in the full text
When the data field of rope writes the tables of data, inverted index form is generated according to the description of the data field of the full-text search
Index data, and the index data is written to concordance list;Result is returned into user, the result is described
The technical scheme for the result that the data field for the full-text search for needing to write has been written into, solves Hbase in the prior art
The inquiry based on major key is only supported, the problem of full-text search is not supported, full-text search is established by Solr, to support distribution
Full-text search, meet the needs of user creates full-text index, improve Consumer's Experience.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet for full-text search method for building up that the embodiment of the present invention 1 provides;
Fig. 2 is a kind of structural representation for distributed NewSQL Database Systems that the embodiment of the present invention 2 provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made
Embodiment, belong to the scope of protection of the invention.
Referring to Fig. 1, Fig. 1 is a kind of schematic flow sheet for full-text search method for building up that the embodiment of the present invention 1 provides;This reality
Apply example 1 and be applied to distributed NewSQL Database Systems, the distributed NewSQL Database Systems include solr units, this reality
Applying example includes step:
S1, the interface mode accessing user request with JDCB/ODBC, wherein, user's request includes what needs write
The data field of full-text search;
S2, parsing user's request, executive plan corresponding to compiling and generation;
S3, according to the executive plan, the data field of the full-text search is write into the tables of data;
S4, the full-text search data field write the tables of data when, according to the data word of the full-text search
The index data of the description generation inverted index form of section, and the index data is written in the concordance list of Solr units;
S5, result is returned to user, the data field for the full-text search that the result writes for the needs
The result having been written into.
The present embodiment supports distributed full-text search, and user can be that the table of oneself creates full-text index, and make in SQL
In full retrieval grammer scans for.Which is the special extension of secondary index, for needing the field of full-text search no longer
By index datastore into other concordance list.
Further, step S1 also includes:The user is asked to the SQL request of conversion SQL statement form.
Further, the user's request of parsing described in step S2, executive plan corresponding to compiling and generation include:
S21, judge to whether there is the prestore SQL statement corresponding with the SQL request in common buffer pool, if so, then
The corresponding executive plan corresponding to SQL statement that prestores is exported, if it is not, then,
S22, syntax check is carried out to the SQL request, if syntax error returns to error message to user, otherwise,
S23, semantic test is carried out to the SQL request, if semantic error returns to error message to user, otherwise,
S24, view and expression formula conversion, conversion results corresponding to acquisition are carried out to the SQL request;
S25, according to the transformation result select optimizer, optimizer selection result corresponding to acquisition;
S26, data connection approach and the order of connection according to corresponding to optimizer selection result selection;
S27, the path for selecting to search for according to connected mode and the order of connection;
S28, executive plan generated according to searching route, and export executive plan.
When it is implemented, first, asked with JDCB/ODBC interface mode accessing user, wherein, user's request bag
Include the data field for the full-text search for needing to write;Then, user's request is parsed, meter is performed corresponding to compiling and generation
Draw;Then, according to the executive plan, the data field of the full-text search is write into the tables of data;And in the full text
When the data field of retrieval writes the tables of data, inverted index shape is generated according to the description of the data field of the full-text search
The index data of formula, and the index data is written to concordance list;Finally, result is returned into user, the processing knot
The result that fruit has been written into for the data field of the full-text search for needing to write.
The present embodiment solves in the prior art that Hbase only supports the inquiry based on major key, does not support full-text search
Problem, distributed full-text search is supported, meet the needs of user creates full-text index, improve Consumer's Experience.
Referring to Fig. 2, Fig. 2 is that the embodiment of the present invention 2 also provides a kind of distributed NewSQL Database Systems, the present embodiment bag
Include:
JDCB/ODBC interface units 1, for interacting operation with user, including user's request is received, return to processing knot
Fruit is to user;Wherein, user's request includes the data field for the full-text search that needs write, and the result is needs
The result that the data field of the full-text search of write-in has been written into;
Master units 2, the user's request accessed for accessing JDCB/ODBC interface units 1, and coordinate multiple processing
Data communication and management overall flow between device, and user request is preferentially sent to SQLPlaner units 3;
Master units 2 are additionally operable to the result and return to JDCB/ODBC interface units 1;
SQLPlaner units 3, for parsing user's request, compiling and customization is asked to perform according to the user
Plan;
Worker units 4, for being performed in parallel the plan, including:According to executive plan, by the full-text search
Data field writes tables of data as general character string type;It is additionally operable to the result of the Hbase units being back to institute
State master units 2;
Hbase units 6, for storing the tables of data;The Hbase units 6 also include the coprocessor modules
61, the coprocessor modules 61 are used for when the data field of the full-text search writes the tables of data, according to described
The index data of the description generation inverted index form of the data field of full-text search, and the index data is written to index
Table;
Solr units 7, for storing the concordance list;
Generally, the distributed NewSQL Database Systems of the present embodiment allow user flexible according to specific service logic
Secondary index is established, user often establishes multiple secondary indexs in actual applications, when in use according to querying condition dynamic
The cost using index is calculated, automatically selects most suitable index.Inquiry for rowkey is extremely efficient, therefore secondary index
Implementation be using Hbase units 6 Coprocessor modules 61 and Filter modules 62 generation for data index
Table.
The present embodiment supports distributed full-text search by Solr units 7, and user can be that the table of oneself creates full text rope
Draw, and scanned in SQL using full-text search grammer.Which is the special extension of secondary index, also with
Coprocessor modules 61 are realized, for needing the field of full-text search no longer by index datastore to other concordance list
In, but by index datastore into Solr units 7, the function of full-text search is provided by Solr units 7.
Distributed transaction management device 5, it is multi-party complete for when the executive plan of worker units 4 is related to affairs, coordinating
Into distributed transaction management.
Further, JDCB/ODBC interface units 1 are used to ask the SQL of conversion SQL statement form please the user
Ask.
Further, the SQLPlaner units 3 are used for:
Judge to whether there is the prestore SQL statement corresponding with the SQL request in common buffer pool, if so, then exporting
The corresponding executive plan corresponding to SQL statement that prestores, if it is not, then,
Syntax check is carried out to the SQL request, if syntax error returns to error message to user, otherwise,
Semantic test is carried out to the SQL request, if semantic error returns to error message to user, otherwise,
View and expression formula conversion, conversion results corresponding to acquisition are carried out to the SQL request;
Optimizer, optimizer selection result corresponding to acquisition are selected according to the transformation result;
According to data connection approach and the order of connection corresponding to optimizer selection result selection;
According to connected mode and the path of order of connection selection search;
Executive plan is generated according to searching route, and exports executive plan.
Further, the present embodiment also includes:
Monitor 8, for being responsible for metadata management, the Region of the Hbase units 6 load is monitored, and passed through
The coprocessor modules 61 of the Hbase units 6 redistribute Region;The monitor 8 and the master units 2
Connection.
Further, the Region of the Hbase units 6 load is monitored, and passes through the Hbase units 6
Coprocessor modules, which redistribute Region, to be included:
The Data distribution information of the Hbase units 6 is received, the worker received in the master units 2 is mono-
The load information of member 4, wherein, the load information includes the load deviation value of the worker units 4;
By the load deviation value of the worker units 4 compared with default load deviation threshold, if it is determined that described negative
Carry deviation and exceed threshold values, trigger the Hbase units 6 by the Region on the higher server of hit rate and the relatively low clothes of hit rate
The Region being engaged on device is carried out from new distribution;
Every Region data volume is obtained, each Region data volume and preset data amount threshold value are sentenced
It is disconnected, if it is determined that the data volume of the Region exceedes threshold values, the Hbase units are triggered by more than the institute of preset data amount threshold value
State Region and be cut into two.
Further, the JDCB/ODBC interface units 1 include:
JDBC application program modules 11, for receiving user's request, and JDBC object method is called to provide SQL languages
Sentence, and return to user for extracting result;
JDBC driver managers module 12, for loading and calling JDBC to drive for the JDBC application program modules 11
Dynamic program module 13;
JDBC driver modules 13, for performing the calling of the JDBC object method, send corresponding to user's request
Database of the SQL statement to bottom, and the result obtained from the database of the bottom is returned into JDBC application program moulds
Block 11.
When it is implemented, first, user is received by JDCB/ODBC interface units 1 and asked;Then, master units 2 connect
Enter user's request that JDCB/ODBC interface units 1 are accessed, and user request is preferentially sent to SQLPlaner units
3;Then, parse the user by SQLPlaner units 3 to ask, ask compiling and customization to perform meter according to the user
Draw;Then, the plan is performed in parallel by worker units 4, including:According to executive plan, by the full-text search
Data field writes tables of data as general character string type;The coprocessor modules 61 are used in the full-text search
Data field when writing the tables of data, inverted index form is generated according to the description of the data field of the full-text search
Index data, and the concordance list that the index data is written to Solr units 7 is additionally operable to the processing of the Hbase units 6
As a result the master units 2 are back to;Finally, the result of Hbase units 6 is back to the master units 2, and leads to
Cross master units 2 and the result that image data has been written into is returned into JDCB/ODBC interface units 1 to return to user.
The present embodiment solves in the prior art that Hbase only supports the inquiry based on major key, does not support full-text search
Problem, full-text search is established by Solr units, to support distributed full-text search, meet that user creates the need of full-text index
Ask, improve Consumer's Experience.
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art
For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as
Protection scope of the present invention.