CN106372177B - Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping - Google Patents

Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping Download PDF

Info

Publication number
CN106372177B
CN106372177B CN201610783143.3A CN201610783143A CN106372177B CN 106372177 B CN106372177 B CN 106372177B CN 201610783143 A CN201610783143 A CN 201610783143A CN 106372177 B CN106372177 B CN 106372177B
Authority
CN
China
Prior art keywords
fuzzy
type
database
grouping
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610783143.3A
Other languages
Chinese (zh)
Other versions
CN106372177A (en
Inventor
黄晓虎
王杰
薛皓
王梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201610783143.3A priority Critical patent/CN106372177B/en
Publication of CN106372177A publication Critical patent/CN106372177A/en
Application granted granted Critical
Publication of CN106372177B publication Critical patent/CN106372177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries

Abstract

The present invention provides a kind of correlation inquiry for supporting mixed data type and the enquiry expanding methods of fuzzy grouping, comprising the following steps: step 1, framework are built;Step 2, data storage;Step 3, query expansion;Step 4, inquiry parsing;Step 5, Hybrid connections;Step 6, fuzzy grouping;Step 7, encapsulated result simultaneously return.The present invention cannot pass through certain rule connection and specified type data aggregate operating function confinement problems for mixed type data in distributed database environment, the SQL Extension syntax of polymerization and connection is provided for user, can include the method for the query expansions such as fuzzy grouping and fuzzy connection by specifying sentence to complete.Extend the functionality and adaptability of distributed data base.

Description

Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping
Technical field
The present invention relates to a kind of mixed type data query methods for supporting fuzzy connection and fuzzy grouping.
Background technique
Fast development with computer and information technology and the becoming increasingly popular in every profession and trade application, have daily The data for often reaching hundreds of TB even tens of to hundreds of PB scales generate and collect, and the mass property and isomery characteristic of data are to biography System database technology especially centralized data base brings huge challenge.In order to the MYSQL being widely used at present, PostGreSQL etc. increase income centralized data base provide it is distributed support, volume of data library middleware comes into being, in these Between part for user provide the scheme of transparent building data-base cluster, can be smooth by existing single machine centralized data base " cloud " end is moved to application, becomes a kind of important distributed data management solution.At the same time, distributed data base Middleware can be by different types of underlying database and application integration, if carrying out relevant database and NoSQL in bottom It is unified integrated, the blended data for being expected to separate sources and different structure is adaptively stored and searching and managing, thus reality Effective management of existing isomery big data.However since the query function of current SQL sentence is limited, mixed type data can not be supported Include connection, grouping etc. most-often used inquiry operation.Therefore mixed type data are directed to, realize the unification for utilizing middleware Storage, and query function is extended, make it that the correlation inquiry of blended data be supported just to seem very necessary.
Summary of the invention
The purpose of the present invention is: it is based on distributed data base middleware, realizes the Function Extension of SQL statement, completion includes The blended datas inquiry such as fuzzy grouping and fuzzy connection.
In order to achieve the above object, the technical solution of the present invention is to provide a kind of associations for supporting mixed data type to look into Inquire about the enquiry expanding method of fuzzy grouping, which comprises the following steps:
Step 1, blended data storage architecture are built;
Step 2, the storage of mixed type data are that data are stored in corresponding node database according to data type by unit with column In, which includes:
Step 2.1 builds table, will include designated word number of segment according to field type specified by configuration file and SQL statement According in table deposit correspondence database, specifically include:
Step 2.1.1, configuration information is obtained, determines that table yet to be built is included in vertical fragmentation configuration information;
Step 2.1.2, analytical decomposition sentence creates field type and field in table statement comprising field according to SQL Length, is divided into structuring and unstructured properties, and write-in file changes the index for the operations such as looking into as subsequent additions and deletions, herein On the basis of respectively building act on branch by substatement;
Step 2.1.3, route distribution, by the number with corresponding types in configuration respectively of substatement obtained in step 2.1.2 It is bound according to library, and carries out route distribution, table is built in completion;
Step 2.2, insertion data, are deposited into according to being inserted into data indexed file with the corresponding relationship of database In the table of correspondence database, specifically include:
Step 2.2.1, configuration information is obtained, determines and is inserted into table name included in vertical fragmentation configuration information;
Step 2.2.2, analytical decomposition sentence, query steps 2.1.2 index file generated, according to attribute in file with The corresponding relationship of table, building act on branch by substatement;
Step 2.2.3, route distribution, by substatement obtained in step 2.2.2 respectively with configuration corresponding types data Library binding, and route distribution is carried out, complete data insertion.
Step 3, mixing query expansion.According to simplifying and functional principle, design SQL statement is as follows:
SELECT* | and expression [AS output_name] [...] and FROM from_item
[GROUP BY column][CONTAIN r DIVIDED BY d]|[START WITH num1 PER num2] [WHERE condition]
Wherein expression indicates field name or an expression formula;From_item indicates the table to be inquired, i.e., each number According to unit corresponding with table in library, it is denoted as table1;GROUP BY grouping and WHERE conditional statement respectively specify that in the sentence It is grouped or attended operation:
1) specified to grouping field column by GROUP BY, and by CONTAIN...DIVIDED BY... or START WITH...PER..., which is respectively specified that, carries out fuzzy division operation to character string or integer column;
It 2) here include condition of contact and substatement by WHERE given query condition condition, in condition of contact Including link field c1 and connection type in table1, attended operation is done in the table table2 and the table comprising inquiry in substatement Field c2;
Step 4, inquiry parsing, system parses specified SQL statement before route distribution, and obtains relevant parameter, It specifically includes:
Step 4.1, return type parsing, obtain result appearance form after SELECT keyword;
Whether step 4.2, fuzzy connection parsing, judge comprising FUZZY IN keyword in SQL statement, comprising then executing step Rapid 5, it is no to then follow the steps 4.3;
Whether step 4.3 obscures packet parsing, judge in SQL statement to include CONTAIN or START WITH keyword, Comprising thening follow the steps 6, the non-newly-increased sentence of the sentence is otherwise judged, default route distribution simultaneously obtains final result.
Step 5, Hybrid connections, support the attended operation of multi-source heterogeneous data fuzzy matching, which includes:
Step 5.1, inquiry are torn open and are write, and system splits prototype statement according to keyword FUZZY IN, extract master respectively and look into The conditional statement with FUZZY IN is ask, is saved in memory as the query statement of table1 and table2, and by connection type;
Step 5.2, routing binding, query configuration information, respectively by the query statement of table1 and table2 and corresponding road It is bound by node, and carries out route distribution;
Step 5.3, query execution divide the inquiry operation of sentence in the execution of each node respectively, obtain result set and successively return To route distribution;
Step 5.4, FUZZY IN connection obtain connection type FUZZY IN in memory, to result obtained in table1 Collection is filtered with c1 Column Properties, is only retained this and is classified as the result set of c2 column substring and return in table2;
Step 6, fuzzy grouping are treated grouping column and are carried out comprising designated character string or numeric type by appointed interval by character type Grouping, the step include:
Step 6.1 determines packet type, if sentence includes keyword START WITH, differentiates that it is numeric type by one Fixed interval is grouped, and executes step 6.2;If comprising CONTAIN keyword, determine that it is character type by comprising character string into Row grouping, executes step 6.3;
Step 6.2, numeric type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, The step includes:
Step 6.2.1, relevant parameter is parsed, initial value s=num1 and spacing value Δ=num2 specified by sentence is extracted;
Step 6.2.2, initial results collection is inquired, and GROUP BY and START...WITH... correlative is filtered, to data Library issues inquiry request, obtains initial results collection t;
Step 6.2.3, each record v in initial results collection t is traversed, according to formulaAffiliated group of definitive result Number, and encapsulated by " k:v " form;
Step 6.3, character type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, The step includes:
Step 6.3.1, relevant parameter is parsed, character string r and string delimiter d specified by sentence is extracted, by r root Multiple substrings are divided into according to d, each substring belongs to one group, and distributes group number k;
Step 6.3.2, initial results collection is inquired, and GROUP BY and START...WITH... correlative is filtered, to data Library issues inquiry request, obtains initial results collection t;
Step 6.3.3, each record v in initial results collection t is traversed, screening includes each substring in step 6.3.1 Record, and encapsulated in the form of " k:v ".
Step 6.4, group result return, and grouping is executed " k:v " result set returned and is encapsulated into one with tabular form In resultset objects, it is back at route distribution;
Step 7, encapsulated result simultaneously return, and according to the type that returns the result obtained in step 4.1, Table Header information is arranged, and The Table Header information form for being successively encapsulated as byte stream corresponding with content is returned the result is returned.
Preferably, the step 1 includes:
Step 1.1 builds database environment, installation relation type and non-relational data in the environment of single machine or multimachine Library;
Step 1.2 builds MYCAT middleware platform, and different types of database is added to centre by configuration file Part bottom layer node, and specify each node database type, comprising the following steps:
Step 1.2.1, MYCAT is installed, the installation of software is completed by importing MYCAT source code in ECLIPSE;
Step 1.2.2, it sets, JAR packet necessary to specified database is accessed passes through in ECLIPSE BUILD PATH is added in system running environment;
Step 1.2.3, configuration node information, to addition table table and node in configuration file " schema.xml " DataNode information specifies the corresponding relationship of table table and node dataNode, and addition vertical fragmentation is regular, and will be to be added Database address and the information such as user name password be added in the configuration file.
The present invention provides one kind to be extended in distributed data base middleware layer in face of SQL statement, and according to SQL language Sentence carries out route distribution, and the strategy for meeting conditional outcome collection is obtained in database bottom or middleware level.
The present invention provides one kind to carry out route distribution according to specified SQL statement in distributed data base middleware level, And the strategy for meeting conditional outcome collection is obtained in database bottom or middleware level according to specified requirements.It is characterized in that supporting The fuzzy grouping of true-to-shape and the fuzzy connection of blended data, to realize SQL statement in terms of blended data query function Extension.
Detailed description of the invention
Fig. 1 is the process schematic of step 5 in the present invention.
Specific embodiment
In order to make the present invention more obvious and understandable, it is hereby described in detail below with preferred embodiment.
The present invention provides fuzzy connections and specified number that mixed data type is realized in a kind of extension by SQL statement According to the method for the fuzzy grouping of type.The present invention cannot pass through a set pattern for mixed type data in distributed database environment Then connection and specified type data aggregate operating function confinement problems, provide the SQL Extension sentence of polymerization and connection for user Method can include the method for the query expansions such as fuzzy grouping and fuzzy connection by specifying sentence to complete, and extend point The functionality and adaptability of cloth database.For using MYSQL and MONGODB as bottom layer node database, specific steps are such as Under:
Step 1, framework are built, and using MYCAT as database middleware, build distributed database environment, and ring is arranged Border variable and bottom layer node information, the step include:
Step 1.1 builds database environment, installation relation type and non-relational data in the environment of single machine or multimachine Library.Here relevant database uses MYSQL, and non-relational database uses MONGODB;
Step 1.2 builds MYCAT middleware platform, and it is each point that each database host in step 1.1 is added in configuration Node, and specify the corresponding relationship of each node and database, the specific steps are as follows:
Step 1.2.1, MYCAT is installed, the installation of software is completed by importing MYCAT Open Source Code in ECLIPSE;
Step 1.2.2, it sets, JAR packet necessary to specified database is accessed passes through in ECLIPSE BUILD PATH is added in system running environment;
Step 1.2.3, configure node information, to addition table (table) in configuration file " schema.xml " and DataNode (node) information specifies the corresponding relationship of table and dataNode, and addition vertical fragmentation is regular, and will be to be added Database address and the information such as user name password be added in the configuration file.
Step 2, the storage of mixed type data.It is that data are stored in corresponding node database according to data type by unit with column In, which includes:
Step 2.1 builds table, will include designated word number of segment according to field type specified by configuration file and SQL statement According in table deposit correspondence database, specifically include:
Step 2.1.1, configuration information is obtained, determines that table yet to be built is included in vertical fragmentation configuration information;
Step 2.1.2, analytical decomposition sentence, according to SQL build in table statement comprising field field type and field it is long Degree, is divided into structuring and unstructured field, constructs substatement respectively;
Step 2.1.3, index file is created, according to table in configuration file and database corresponding relationship and each point of library class Type will be built all field write-in files in table statement in the form of " table name: { field name: database name } ", be changed as subsequent additions and deletions The index of operations such as look into;
Step 2.1.4, route distribution, by the number with corresponding types in configuration respectively of substatement obtained in step 2.1.2 It is bound according to library, carries out route distribution, table is built in completion.
Step 2.2, insertion data, are deposited into according to being inserted into data indexed file with the corresponding relationship of database In the table of correspondence database, specifically include:
Step 2.2.1, configuration information is obtained, determines and is inserted into table name included in vertical fragmentation configuration information;
Step 2.2.2, analytical decomposition sentence, query steps 2.1.2 index file generated, according to attribute in file with The corresponding relationship of table, building act on branch by substatement;
Step 2.2.3, route distribution, by substatement obtained in step 2.2.2 respectively with configuration corresponding types data Library binding, and route distribution is carried out, complete data insertion.
Step 3, query expansion write the fuzzy grouping of specified type data according to designed query expansion syntax respectively And the class SQL statement of mixed type data fuzzy connection, it specifically includes:
The SQL statement of the fuzzy grouping of numeric type is as follows:
SELECT COUNT(COLUMN)FROM TABLE GROUP BY COLUMN START WITH num1 PER num2;
COLUMN arrange the record since the num1 in inquiry TABLE, and it is one group that record value, which is pressed every num2 points, return each group Record number;
The SQL statement of the fuzzy grouping of character type is as follows:
SELECT COUNT(COLUMN)FROM TABLE GROUP BY COLUMN CONTAIN r DIVIDED BY d;
Record of the COLUMN column comprising character in character string group r in TABLE is inquired, r returns to each group using d as separator Record number;
The SQL statement of fuzzy connection is as follows:
SELECT c1 FROM table1 WHERE COLUMN FUZZY IN(SELECT c2 FROM table2);
C1 record and c2 record in table2 in table1 are inquired respectively, are classified as by the c1 that FUZZY IN obtains table1 The record for the substring that c2 is arranged in table2.
Step 4, inquiry parsing, system parses specified SQL statement before route distribution, and obtains relevant parameter. The step includes:
Step 4.1, return type parsing, obtain result appearance form after SELECT keyword;
Whether step 4.2, fuzzy connection parsing, judge comprising FUZZY IN keyword in SQL statement, comprising then executing step Rapid 5, it is no to then follow the steps 4.3;
Whether step 4.3 obscures packet parsing, judge in SQL statement to include CONTAIN or START WITH keyword, Comprising thening follow the steps 6, the non-newly-increased sentence of the sentence is otherwise judged, default route distribution simultaneously obtains final result.
Step 5, Hybrid connections, support the attended operation of multi-source heterogeneous data fuzzy matching, which includes:
Step 5.1, inquiry are torn open and are write, and system splits prototype statement according to keyword FUZZY IN, extract master respectively and look into The conditional statement with FUZZY IN is ask, is saved in memory as the query statement of table1 and table2, and by connection type;
Step 5.2, routing binding, query configuration information, respectively by the query statement of table1 and table2 and corresponding road It is bound by node, and carries out route distribution;
Step 5.3, query execution divide the inquiry operation of sentence in the execution of each node respectively, obtain result set and successively return To route distribution;
Step 5.4, FUZZY IN connection obtain connection type FUZZY IN in memory, to result obtained in table1 Collection is filtered with c1 Column Properties, is only retained this and is classified as the result set of c2 column substring and return in table2.
Step 5 detailed process is as shown in Figure 1.
In Fig. 1, index file is created in step 2.1.3, and the non-structural data of the types such as string and file exist In MONGODB, general type is stored in MYSQL.Fuzzy connection is according to index file field and database corresponding relationship, by sentence It is distributed in corresponding point of library, obtains the implementing result in point library, by FUZZY work N condition of contact, filtering c1 is not c2 substring Record, obtains final result.
Step 6, fuzzy grouping are treated grouping column and are carried out comprising designated character string or numeric type by appointed interval by character type Grouping, the step include:
Step 6.1 determines packet type, if sentence includes keyword START WITH, differentiates that it is numeric type by one Fixed interval is grouped, and executes step 6.2;If comprising CONTAIN keyword, determine that it is character type by comprising character string into Row grouping, executes step 6.3;
Step 6.2, numeric type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, The step includes:
Step 6.2.1, relevant parameter is parsed, initial value s=num1 and spacing value Δ=num2 specified by sentence is extracted;
Step 6.2.2, initial results collection is inquired, and GROUP BY and START...WITH... correlative is filtered, to data Library issues inquiry request, obtains initial results collection t;
Step 6.2.3, each record v in initial results collection t is traversed, according to formulaAffiliated group of definitive result Number, and encapsulated by " k:v " form;
Step 6.3, character type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, The step includes:
Step 6.3.1, relevant parameter is parsed, character string r and string delimiter d specified by sentence is extracted, by r root Multiple substrings are divided into according to s, each substring belongs to one group, and distributes group number k;
Step 6.3.2, initial results collection is inquired, and GROUP BY and START...WITH... correlative is filtered, to data Library issues inquiry request, obtains initial results collection t;
Step 6.3.3, each record v in initial results collection t is traversed, screening includes each substring in step 6.3.1 Record, and encapsulated in the form of " k:v ";
Step 6.4, group result return, and grouping is executed " k:v " result set returned and is encapsulated into one with tabular form In resultset objects, it is back at route distribution.
Step 7, encapsulated result simultaneously return, and according to the type that returns the result obtained in step 4.1, Table Header information is arranged, and The Table Header information form for being successively encapsulated as byte stream corresponding with content is returned the result is returned.
It can be seen that this technology is not high for user's operation level requirement, the flexibility for being supplied to user is larger, and can Give full play to the distinctive function of underlying database.

Claims (2)

1. a kind of correlation inquiry for supporting mixed data type and the enquiry expanding method of fuzzy grouping, which is characterized in that including Following steps:
Step 1, blended data storage architecture are built;
Step 2, the storage of mixed type data are that data are stored in corresponding node database by unit according to data type with column, The step includes:
Step 2.1 builds table, will include designated field data table according to field type specified by configuration file and SQL statement It is stored in correspondence database, specifically includes:
Step 2.1.1, configuration information is obtained, determines that table yet to be built is included in vertical fragmentation configuration information;
Step 2.1.2, analytical decomposition sentence creates field type and field length in table statement comprising field according to SQL, It is divided into structuring and unstructured properties, write-in file changes the index looked into as subsequent additions and deletions, distinguishes on this basis Building act on branch by substatement;
Step 2.1.3, route distribution, by the database with corresponding types in configuration respectively of substatement obtained in step 2.1.2 Binding, and route distribution is carried out, table is built in completion;
Step 2.2, insertion data, according to be inserted into data indexed file be deposited into the corresponding relationship of database it is corresponding In the table of database, specifically include:
Step 2.2.1, configuration information is obtained, determines and is inserted into table name included in vertical fragmentation configuration information;
Step 2.2.2, analytical decomposition sentence, query steps 2.1.2 index file generated, according to attribute in file and table Corresponding relationship, building act on branch by substatement;
Step 2.2.3, route distribution, the database by substatement obtained in step 2.2.2 respectively with configuration corresponding types are tied up It is fixed, and route distribution is carried out, complete data insertion;
Step 3, mixing query expansion;According to simplifying and functional principle, design SQL statement is as follows:
SELECT*|expression[AS output_name][...]FROM from_item
[GROUP BY column][CONTAIN r DIVIDED BY d]|[START WITH num1 PER num2] [WHERE condition]
Wherein expression indicates field name or an expression formula;From_item indicates the table to be inquired, i.e., each database In unit corresponding with table, be denoted as table1;GROUP BY grouping and WHERE conditional statement respectively specify that progress in the sentence Grouping or attended operation:
1) specified to grouping field column by GROUP BY, and pass through CONTAIN...DIVIDED BY... or START WITH...PER... it respectively specifies that and fuzzy division operation is carried out to character string or integer column;
2) by WHERE given query condition condition, here include condition of contact and substatement, include in condition of contact Link field c1 and connection type in table1 do the word of attended operation in the table table2 and the table comprising inquiry in substatement Section c2;
Step 4, inquiry parsing, system parses specified SQL statement before route distribution, and obtains relevant parameter, specifically Include:
Step 4.1, return type parsing, obtain result appearance form after SELECT keyword;
Whether step 4.2, fuzzy connection parsing judge comprising FUZZY IN keyword in SQL statement, comprising thening follow the steps 5, It is no to then follow the steps 4.3;
Whether step 4.3, fuzzy packet parsing judge in SQL statement to include CONTAIN or START WITH keyword, include 6 are thened follow the steps, otherwise judges the non-newly-increased sentence of the sentence, default route distribution simultaneously obtains final result;
Step 5, Hybrid connections, support the attended operation of multi-source heterogeneous data fuzzy matching, which includes:
Step 5.1, inquiry are torn open and are write, and system splits prototype statement according to keyword FUZZY IN, extract respectively main inquiry and The conditional statement of FUZZY IN saves in memory as the query statement of table1 and table2, and by connection type;
Step 5.2, routing binding, query configuration information respectively tie the query statement of table1 and table2 with corresponding routing Point binding, and carry out route distribution;
Step 5.3, query execution divide the inquiry operation of sentence in the execution of each node respectively, obtain result set and be successively back to road By Issuing Office;
Step 5.4, FUZZY IN connection, obtain memory in connection type FUZZY IN, to result set obtained in table1 with C1 Column Properties are filtered, and are only retained this and are classified as the result set of c2 column substring and return in table2;
Step 6, fuzzy grouping are treated grouping column and are divided comprising designated character string or numeric type by appointed interval by character type Group, the step include:
Step 6.1 determines packet type, if sentence includes keyword START WITH, differentiates that it is numeric type by between certain Every being grouped, step 6.2 is executed;If determining that it is character type comprising CONTAIN keyword and being divided by comprising character string Group executes step 6.3;
Step 6.2, numeric type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, the step Suddenly include:
Step 6.2.1, relevant parameter is parsed, initial value s=num1 and spacing value Δ=num2 specified by sentence is extracted;
Step 6.2.2, initial results collection is inquired, and filters GROUP BY and START...WITH... correlative, is sent out to database Inquiry request out obtains initial results collection t;
Step 6.2.3, each record v in initial results collection t is traversed, according to formulaThe affiliated group number of definitive result, and It is encapsulated by " k:v " form;
Step 6.3, character type grouping, parse relevant parameter according to parameter setting rule of classification and obtain group result collection, the step Suddenly include:
Step 6.3.1, relevant parameter is parsed, character string r and string delimiter d specified by sentence is extracted, by r according to d Multiple substrings are divided into, each substring belongs to one group, and distributes group number k;
Step 6.3.2, initial results collection is inquired, and filters GROUP BY and START...WITH... correlative, is sent out to database Inquiry request out obtains initial results collection t;
Step 6.3.3, each record v in initial results collection t is traversed, screening includes the record of each substring in step 6.3.1, And it is encapsulated in the form of " k:v ";
Step 6.4, group result return, and grouping is executed " k:v " result set returned and is encapsulated into a result with tabular form Collect in object, is back at route distribution;
Step 7, encapsulated result simultaneously return, and according to the type that returns the result obtained in step 4.1, are arranged Table Header information, and by table The head information form for being successively encapsulated as byte stream corresponding with content is returned the result returns.
2. the query expansion side of a kind of correlation inquiry for supporting mixed data type as described in claim 1 and fuzzy grouping Method, which is characterized in that the step 1 includes:
Step 1.1 builds database environment, installation relation type and non-relational database in the environment of single machine or multimachine;
Step 1.2 builds MYCAT middleware platform, and different types of database is added to middleware bottom by configuration file Node layer, and specify each node database type, comprising the following steps:
Step 1.2.1, MYCAT is installed, the installation of software is completed by importing MYCAT source code in ECLIPSE;
Step 1.2.2, it sets, JAR packet necessary to specified database is accessed passes through BUILD in ECLIPSE PATH is added in system running environment;
Step 1.2.3, configuration node information, to addition table table and node dataNode in configuration file " schema.xml " Information specifies the corresponding relationship of table table and node dataNode, and addition vertical fragmentation is regular, and by database to be added Address and user name password are added in the configuration file.
CN201610783143.3A 2016-08-30 2016-08-30 Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping Active CN106372177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610783143.3A CN106372177B (en) 2016-08-30 2016-08-30 Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610783143.3A CN106372177B (en) 2016-08-30 2016-08-30 Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping

Publications (2)

Publication Number Publication Date
CN106372177A CN106372177A (en) 2017-02-01
CN106372177B true CN106372177B (en) 2019-09-27

Family

ID=57899072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610783143.3A Active CN106372177B (en) 2016-08-30 2016-08-30 Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping

Country Status (1)

Country Link
CN (1) CN106372177B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515887B (en) * 2017-06-29 2021-01-08 中国科学院计算机网络信息中心 Interactive query method suitable for various big data management systems
CN110019287B (en) * 2017-07-20 2021-09-14 华为技术有限公司 Method and device for executing Structured Query Language (SQL) instruction
CN107291964B (en) * 2017-08-16 2019-11-15 南京华飞数据技术有限公司 A method of fuzzy query is realized based on HBase
CN110019274B (en) * 2017-12-29 2023-09-26 阿里巴巴集团控股有限公司 Database system and method and device for querying database
CN108776678B (en) * 2018-05-29 2020-07-03 阿里巴巴集团控股有限公司 Index creation method and device based on mobile terminal NoSQL database
CN109783543B (en) * 2019-01-14 2021-07-02 广州虎牙信息科技有限公司 Data query method, device, equipment and storage medium
CN109885536B (en) * 2019-02-26 2023-06-16 深圳众享互联科技有限公司 Distributed data fragment storage and fuzzy search method
CN110162544B (en) * 2019-05-30 2022-05-27 口碑(上海)信息技术有限公司 Heterogeneous data source data acquisition method and device
CN110472127A (en) * 2019-07-17 2019-11-19 微梦创科网络科技(中国)有限公司 A kind of data query method and system
CN110597857B (en) * 2019-08-30 2023-03-24 南开大学 Online aggregation method based on shared sample
CN111897824A (en) * 2020-03-25 2020-11-06 上海云励科技有限公司 Data operation method, device, equipment and storage medium
CN112613302B (en) * 2020-12-31 2023-08-18 天津南大通用数据技术股份有限公司 Dynamic credibility judging method for clauses of select statement based on database

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
CN105740344A (en) * 2016-01-25 2016-07-06 中国科学院计算技术研究所 Sql statement combination method and system independent of database
CN105740374B (en) * 2016-01-27 2019-11-08 国网上海市电力公司 Three-dimensional platform data fuzzy query method based on distributed memory

Also Published As

Publication number Publication date
CN106372177A (en) 2017-02-01

Similar Documents

Publication Publication Date Title
CN106372177B (en) Support the correlation inquiry of mixed data type and the enquiry expanding method of fuzzy grouping
CN107092656B (en) A kind of tree data processing method and system
EP3243305B1 (en) Distributed storage and distributed processing query statement reconstruction in accordance with a policy
CN106407302B (en) It supports to call the special functional method of middleware database by Simple SQL
CN105868204B (en) A kind of method and device for converting Oracle scripting language SQL
CN109446279A (en) Based on neo4j big data genetic connection management method, system, equipment and storage medium
CN109614413B (en) Memory flow type computing platform system
CN102609451A (en) SQL (structured query language) query plan generation method oriented to streaming data processing
CN103116625A (en) Volume radio direction finde (RDF) data distribution type query processing method based on Hadoop
CN105210058A (en) Graph query processing using plurality of engines
CN105706078A (en) Automatic definition of entity collections
JP2015531937A (en) Working with distributed databases with external tables
CN103310011A (en) Analytical method for data query under cluster database system environment
CN107016071B (en) A kind of method and system using simple path characteristic optimization tree data
CN103177094B (en) Cleaning method of data of internet of things
CN107291807A (en) A kind of SPARQL enquiring and optimizing methods based on figure traversal
CN101751399A (en) Decision tree optimization method and optimization system
CN107247738B (en) A kind of extensive knowledge mapping semantic query method based on spark
CN103678550A (en) Mass data real-time query method based on dynamic index structure
CN109739882A (en) A kind of big data enquiring and optimizing method based on Presto and Elasticsearch
CN105302842A (en) Data processing method and device
Kricke et al. Graph data transformations in Gradoop
US20080195610A1 (en) Adaptive query expression builder for an on-demand data service
CN103902651B (en) Cloud code query method and device based on MongoDB
CN115858872A (en) Method and device for querying language based on SQL (structured query language) expansion graph database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant