CN107515887A - A kind of interactive query method suitable for a variety of big data management systems - Google Patents

A kind of interactive query method suitable for a variety of big data management systems Download PDF

Info

Publication number
CN107515887A
CN107515887A CN201710515380.6A CN201710515380A CN107515887A CN 107515887 A CN107515887 A CN 107515887A CN 201710515380 A CN201710515380 A CN 201710515380A CN 107515887 A CN107515887 A CN 107515887A
Authority
CN
China
Prior art keywords
document
data
query language
model
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710515380.6A
Other languages
Chinese (zh)
Other versions
CN107515887B (en
Inventor
沈志宏
李跃鹏
黎建辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Computer Network Information Center of CAS
Original Assignee
Computer Network Information Center of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Computer Network Information Center of CAS filed Critical Computer Network Information Center of CAS
Priority to CN201710515380.6A priority Critical patent/CN107515887B/en
Publication of CN107515887A publication Critical patent/CN107515887A/en
Application granted granted Critical
Publication of CN107515887B publication Critical patent/CN107515887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of interactive query method suitable for a variety of big data management systems, its step includes:1) associated document model is established, it includes document sets and incidence set, and the incidence set is the set that the association between document is formed;2) different original data models is converted into associated document model, connected as one different data sources by associated document model;3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;4) using the unified query language for being suitable for multivariate data, the unified query to relevant database, chart database and file system is realized.Present invention firstly provides the unified query language for being suitable for multivariate data management system, it is possible to achieve to the unified query of relevant database, chart database, and file system.

Description

A kind of interactive query method suitable for a variety of big data management systems
Technical field
The present invention relates to a kind of query language, and in particular to a kind of interactive inquiry language suitable for big data management system Querying method is mentioned, belongs to big data, database technical field.
Background technology
With the continuous popularization of computer, the management of data and process demand are increasingly urgent, and people are directed to different data The different data model of form and feature extraction, and realize corresponding data management system come realize the management of data and point Analysis.More influential data model such as E-R models, since the last century 70's proposes, E-R models are ruled substantially The database world is up to more than 40 years.Since last decade, as what internet and Internet of Things were applied gos deep into, large-scale structuring, The generation of semi-structured, non-structural data has triggered NoSQL motions [Cattell R.Scalable SQL and NoSQL data stores[J].ACM SIGMOD Record,2010,39(4):12-27].The database world is monopolized by initial SQL Situation be transformed into the situation that traditional SQL, NoSQL, NewSQL divide and rule.
Structure one perfect big data application system, it is necessary to fully take into account from 4V [Gupta R, Gupta H, Mohania M.Cloud computing and big data analytics:what is new from databases perspective[C]//Proc of 1st BDA.,New Delhi,India:,Springer Berlin Heidelberg,2012:42-61.] challenge, to the further analysis of big data, association mining, or even scientific discovery.With biology Exemplified by the science data of subject, existing by instruments such as sequencing, mass spectrum, nuclear magnetic resonance, caused lots of genes sequence is literary daily Part, protein sequence file, the micro-data such as 26S Proteasome Structure and Function of protein, also have and traditional use MongoDB or SQL numbers According to storehouse come preserved species information, Physiological-biochemical Characters, the macro-data such as reaction condition information, also substantial amounts of document, The knowledge informations such as patent.In order to preferably realize Knowledge Discovery, scientific research personnel can introduce Bio-ontology toward contact, be closed by RDF The mode of network of networking manages the large-scale association between the data such as species, protein, gene.These microcosmic and macroscopic aspects Information ultimately form an organic database, so as to which life is understood and studied from the aspect of entirety.With number Generally require to dispatch a series of data pipeline completion according to the scientific discovery of driving, it can be seen that these streamlines can be crossed over Multiple processes such as the collections of data, batch write-in, inquiry, analysis and visualization, one is there is among these and huge is asked Topic:How to allow streamline programming personnel not consider further that the otherness of bottom data storage model, and can by it is a kind of it is unified in a manner of Access and operation dataThis problem is mapped in data management technique, i.e., how to cross over SQL, NoSQL, NewSQL database Border, realize the universal data access of multivariate data model, and provide for Computational frame as Hadoop, Spark unified Data operating interface.
Relational database covers distributed data base to memory database at present, mainly have MySQL, PostgreSQL, Oracle, SQLite etc., the uniformity of data access is ensured by ACID and affairs, data are carried out using table, row, keyword Processing, is fixed, the application scenario of strong consistency suitable for structure.In October, 1986, U.S. ANSI is using SQL as relation data The standard language (ANSI X3.135-1986) of base management system, it is adopted as international standard afterwards for ISO.SQL is so as to as current Most popular relational database query language.
NoSQL databases include Key-Value databases, columnar database, document database, chart database.Due to NoSQL databases also lack a set of unified query language at present, have part research to be directed to encapsulating out for NoSQL databases The interface of SQL query, as Hive provides the HQL query languages similar to SQL, simplify the use difficulty of NoSQL databases. Spark SQL are that a kind of SQL based on Spark DataFrame big datas processing framework is realized, support the big data based on SQL Processing and analysis.Based on DataFrame, Spark can be current mass data storehouse such as MySQL, HBase, Cassandra, MongoDB provides the SQL query analysis ability based on big data.
As an important branch in NoSQL databases, chart database is often used for managing large-scale related information, Such as associating between species and gene, the social networks of people, Amazon warehouse retail main data system etc., support is based on attribute The quick associative search of graph model.Typical chart database has Neo4j, Titan, Virtuoso etc. at present.For chart database, Neo4J proposes Cypher query languages, and the correlation inquiry of diagram data model is succinctly expressed using the grammer similar to SQL, simplifies Chart database uses difficulty.TinkerPop project Attribute Orienteds figure proposes Gremlin figure traversal metalanguages, supports a variety of figures Database, such as Titan, OrientDB, TinkerGraph, it is referred to as the Perl of chart database circle.In addition, rdf model is base In a kind of semantic description framework of graph model, it is adapted to expression semantic information and its association, typical RDF data storehouse has at present Jena, Virtuoso etc..RDF data in 2004 accesses working group and has issued first RDF query language SPARQL, 2008 SPARQL agreements and query language formally turn into a W3C proposed standard.SPARQL uses structuralized query mode, passes through Where subgraph matchs realize correlation inquiry, and at present, most of RDF data storehouse all supports the SPARQL of standard to inquire about.
It is not difficult to find out, a set of unified query language, wherein chart database is also lacked currently for SQL, NoSQL database Due to its special inquiry and analysis mode, the another side of SQL query language has generally been pulled to.Therefore, people are in selection number When according to model, generally require to make one's choice:It is to select SQL database (including supporting SQL NoSQL databases), still Chart databaseThis choose often brings the otherness of upper layer application, be using the powerful Gremlin of association analysis ability, SPARQL query languages, or use traditional SQL query language based on bivariate table
Based on background above, the present invention proposes a kind of new query language Simba, to realize to relational data The unified query of storehouse, chart database, and file system.
The content of the invention
It is an object of the invention to provide a kind of interactive query method suitable for a variety of big data management systems, pass through Unified query language Simba, relevant database, chart database can be directed to, and file system realizes inquiry.
The technical solution adopted by the present invention is as follows:
A kind of interactive query method suitable for a variety of big data management systems, its step include:
1) associated document model is established, it includes document sets and incidence set, and the incidence set is the association structure between document Into set;
2) different original data models is converted into associated document model, by associated document model by different data Source connects as one;
3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;
4) using the unified query language for being suitable for multivariate data, realize to relevant database, chart database and text The unified query of part system.
Further, the unified query language for being suitable for multivariate data management system include FIND, WITH, WHERE, Tetra- clauses of RETURN;FIND sentences determine the basic variable of inquiry, and these variables must represent document;WITH statement determines The intermediate variable used in matching condition grammer;WHERE sentences determine that returning result needs the condition met;RETURN sentence bags Having contained needs the data referencing for returning to user.
Further, the basic query space in FIND sentences is made up of a kind of document or multiclass document, and requires to close Connection document model can not carry out the comparison between two class documents of onrelevant;It is implicit in WITH statement to define basic query The expansion that document and association in space are carried out;The text for expanding search space can not only be implicitly defined in WHERE sentences Shelves, association, moreover it is possible to be associated the Selecting operation of document mid-module;Document, link, attribute hierarchies are included in RETURN sentences URL, or represent URL variable, the sentence mainly performs the project of associated document model, and the result of return is one Associated document.
Further, the implementation procedure of the unified query language is divided into four steps:Determine document, establish document between close System, selection, projection.
Further, different data sources is connected as one by the associated document model, forms a network, and The data referencing grammer of the unified query language is formed using similar URL form, uniformly to access the data in network.
Further, the intermediate variable in the unified query language represents the document sets with basic search space correlation Conjunction, numerical value, character string, intermediate variable are used in grammer is matched, and corresponding condition coupling is carried out according to the type of intermediate variable Operation.
A set of intermediate language proposed by the present invention, independent of specific operating system and programming language.Due to Simba languages Speech contains the operation of a variety of data models, and some of which operation can not be done directly by database, therefore in practical application In, can be for SDK (Software Development Kit, the SDK of Simba language development Database Systems Bag), some compensation operations are carried out on the basis of local data database query language.Such as:It can be managed for MongoDB database developments Solve and perform the java applet bag (or C++ program bags) of Simba language, such CLIENT PROGRAM can is by calling in SDK API (Application Programming Interface, application programming interface) operated using Simba language MongoDB, i.e., data management system is inquired about by SimbaQL translaters, this pattern is as shown in Figure 1.
Another mode is that database is directly based upon Simba language design communication protocols, and client-side program can pass through The network request of transmission Simba orders, the Query Result needed, the pattern are as shown in Figure 2.
As shown in figure 3, the Simba query languages of the present invention include following components:
1.SimbaQL syntactic structures:Each sons such as SimbaQL general structure and FIND, WITH, WHERE, RETURN are provided The grammer of sentence.
2. data referencing grammer:Illustrate how the data in reference data source in SimbaQL;
3. intermediate variable grammer:Illustrate how defined in SimbaQL and use intermediate variable;
4. matching condition grammer:Illustrate how to write matching condition in SimbaQL;
5.SimbaQL analysis programs:The SimbaQL analysis programs based on Java are provided, to write SimbaQL clients Program, or query engine configuration processor;
Compared with prior art, advantages of the present invention is as follows:
(1) unified query language for being suitable for multivariate data management system is proposed first, and the language can be realized to closing It is the unified query of type database, chart database, and file system.It can retrieve and meet specified attribute condition in relation table Record, can also retrieve the multiple summits for meeting specified associations condition in chart database, while can also retrieve file system Specific file in system.In current development technique, application program must pass through SQL query language, Cypger/gremlin Language, and API mode realize retrieval to relevant database, chart database, file system respectively, this otherness Way brings the difficulty for grasping multilingual and the not versatility of programming.And by SimbaQL, then need a set of unification Syntax format.This difference is as shown in Figure 4, Figure 5.
(2) the typical query mode to big data management system is concluded, to the SQL query function and Tu Cha of complexity Ask function to be simplified, SimbaQL target is to cover most of query demand, and allows the data management system of main flow The language is very easily supported, therefore has abandoned the sophisticated functions in SQL query and figure inquiry, such as:Subquery, or inquiry As a result UNION etc. is operated.SimbaQL suggests these secondary operations, and big data Computational frame can be allowed to do, SimbaQL sheets Body only completes the function of simple data query extraction.
(3) inquiry for a variety of data model databases that SimbaQL language is directed to, which includes a variety of data models Computing.So if the query language of some model can not complete the computing of other models, the realization of SimbaQL language can be helped The model is helped to complete.For example MongoDB query language can not complete the JOIN computings of document, and SimbaQL supports JOIN fortune Calculate, therefore SimbaQL realization will compensate these computings.
(4) SimbaQL introduces the characteristic of intermediate variable, to express and hide the uninterested information of user.To look into Look for exemplified by two related entities of tool:
FIND x, y WITH $ m=x.child WHERE $ m.child=y RETURN x, y
The sentence introduces intermediate variable m, and the object representated by the variable is x child, and y is his child.This is looked into Ask to return to all grandparent and grandchild, but propose the application of the inquiry and need not be concerned about that whom m is specifically.
This mode avoids spelling and repeated simultaneously, realizes quick literary style.
(5) SimbaQL introduces the multistage expression way for quoting attribute, such as:X.knows.knows.name represents x understanding Someone y understanding someone z name.In traditional query language, multistage reference is not supported.This mode is effective Reduce the repetition of code, and with intuitively effect.
Brief description of the drawings
Fig. 1 is shown by way of SimbaQL translaters are inquired about data management system.
Fig. 2 is shown by way of SimbaQL procotols are directly inquired about data management system.
Fig. 3 shows the structure chart of present invention.
Fig. 4 shows to need in the prior art using the different management system of different language inquiries.
Fig. 5 shows that the present invention inquires about different management systems using SimbaQL language unities.
Fig. 6 is the structural representation of LDM models.
Embodiment
Below by specific embodiments and the drawings, the present invention will be further described.
The SimbaQL of present invention design is with LinkedDocument mid-modules (associated document model, Linked Document Model, abbreviation LDM) based on, transported by LDM computings and the mapping of other model calculations and SDK compensation Calculate, reach the purpose of a variety of data model database unified queries.
1st, Linked Document models
1) Linked Document model definitions
Document is the set being made up of one group of attribute, and attribute is the set that same categorical data is formed.Each document is write from memory Recognize the primary key attributes for including a unique mark.Primary key attributes is similar with the function of IP address, it is necessary to is global unique mark; The type of other attributes can be arbitrary, including document, association, a customization type etc..Association is a special text Shelves, wherein (from must be included:Primary key, to:Primary key) two attributes, for representing the association between document, the association is The knows referred between the relation between two datas, such as a person document and another person document associates representative First man recognizes second people.Document sets and incidence set must all possess a name identifiers illustrate set in document and The semanteme of association.Attribute number can be different in same class document or association, and this means that { ' id ':’fffff0’, ‘name’:‘bluejoe’,‘age’:30 } both teacher classes text can also can be used as a member of person class documents A member of shelves.
LDM models are two tuples being made up of document sets and incidence set (document sets, incidence set), and wherein incidence set is A variety of set of relationship between two class documents.The general configuration of LDM models is as shown in Figure 6.Wherein, Documents represents document Collection, Links represent incidence set, and PersonDocument represents this kind of collection of document of people, and SoftwareDocument represents software Class collection of document, InventLink represent the set of this kind of association of people's invention software, and 1,2 represent document unique identifier primary key, Attr1, attr2 ... represent the attribute of document.
2) LDM transformation rules
LDM is directed to the inquiry and analysis of data, and it provides two kinds of transformation rule:Original data model arrives LDM conversion, LDM to existing programming model require the conversion of form.
A) original data model → LDM
The formal definitions of data model translation are (G, L, M), and the Schema that wherein G represents world model that is to say LDM, L represents local data model (relational model, key-value models, document model, attribute graph model), and M represents reflecting from L to G Penetrate rule.Original data model to LDM conversion primary concern is that the semanteme of data, and the conversion of data type aspect then may be used To be determined according to system requirements by developer oneself.The original data model that conversion given below includes have relational model, Key-value models, document model and attribute graph model, main transformation rule are as shown in table 1.Wherein customized conversion rule It is then according to the characteristic of former data model, extracts the data acquisition system for meeting some features.Such as in extraction key-value models The data of key comprising person are as Person class collection of document;Extract the summit that lable in attribute graph model is Person As Person class documents;By personid phases in personid and the Software document of Person classes document in document model Articulation set invent is extracted as Deng this relation.
Transformation rule of the original data model of table 1. to LDM
LDM Relational model Key-value models Document model Attribute graph model
Attribute Attribute Key Attribute Attribute
Document Record Pair Document Summit
Collection of document Table It is self-defined Set It is self-defined
Connection External key It is self-defined It is self-defined Side
Articulation set External key It is self-defined It is self-defined It is self-defined
It should be noted that whether document sets or incidence set must all possess a name in LDM, therefore for In the transfer process of the external key of relational model and other self-defined parts, it is necessary to provide a name conduct by switch crew The semanteme of set element.For example in attribute graph model, the node that lable is ' person ' can be made as in LDM Person class documents;The node comprising attribute ' teacher ' can also be made as the teacher class documents in LDM, and in fact This two classes document may correspond to same node.
In addition, archetype to LDM conversion can be not limited to model above, developer can define it according to demand Its data model is to LDM transformation rule, such as file system, column database etc..
B) LDM → programming model
LDM to programming model conversion primary concern is that relation in data structure.Currently a popular programming model is such as The acceptable data structure such as map/reduce, spark SQL, Pergel mainly has array, table, figure.Therefore it is given below LDM to these three data structures transformation rule, as shown in table 2.
Table 2.LDM to array, table, figure transformation rule
3) LDM operation rules
LDM operation rule is the computing based on relational model, key-value models, document model and attribute graph model Definition.Including the set operation of relational model, concatenation operation, Selecting operation, project;Key-value models Get computings;The selection of document model, project;The traversal and Selecting operation of attribute graph model.The operation method of LDM models Then it is broadly divided into three classes:Set operation, association computing, document computing, specific operation rule are as shown in table 3.
Table 3.LDM operation rule
4) LDM data accesses rule
Due to LDM by database connection for a network, we can be come in citation network using similar URL form Data.This URL form is as follows:
<datasource>.<document>.<link>.<identity>.<propertyName>
Wherein, datasource represents data source, such as MySQL, MongoDB etc., and document represents data source to LDM The document of mapping, link represent the association that data source maps to LDM, and identity represents the primary key of document, propertyName Represent the attribute-name of document.
Data can be quoted in different levels, such as the name attributes to person documents in MySQL database Reference can be expressed as:
MySQL.person.name
Father associations to person documents in MongoDB databases, which carry out application, to be expressed as:
MongoDB.person.father
What association represented is collection of document corresponding to the association, and we can also continue to deeply be quoted, such as
MongoDB.person.father.name
Data corresponding to data referencing URL be actually LDM opening relationships computing and project after result.Than Data such as MongoDB.person.father representatives are that two Person class documents are established into father to associate, and to Father relations carry out the result of project.
2nd, SimbaQL syntactic structures
As SQL and relational model, based on Linked Document models, every SimbaQL sentence can be converted into Linked Document operational formula, operational formula are made up of the following computing in table 2:" establishing and closing in " association computing " Connection ", " Selecting operation ";" Selecting operation ", " project " in " document computing ".SimbaQL query statements mainly include Tetra- clauses of FIND, WITH, WHERE, RETURN, syntactic structure are as follows:
FIND<documents>
WITH<variables>
WHERE<conditions>
RETURN<urls>
Wherein, FIND clause determines the basic variable of inquiry, and each variable is necessarily corresponded in Linked Doument A kind of document;WITH statement determines the intermediate variable used in matching condition grammer, and these intermediate variables can be a variety of numbers According to type, document is not limited to;WHERE determines that returning result needs the condition met;RETURN sentences contain needs and returned to The data referencing of user.LDM calculating processes corresponding to SimbaQL are given below.
First, the basic query space in FIND is made up of a kind of document or multiclass document, and SimbaQL requires LDM The comparison between two class documents of onrelevant can not be carried out.For example the data of person objects and software objects are carried out Inquiry can be expressed as:
FIND MySQL.person p,MySQL.software s
If do not associated between person documents and software documents, then we can only to person and Software carries out Selecting operation respectively, and can not carry out Selecting operation as similar p.inventid=s.id.
The expansion defined the document in basic query space and association progress of variable implicitly defined in WITH, Such as:
FIND person p WITH $ soft=p.invent
Above sentence represents to contain Software documents in the Linked Document that we search for, and association invent.It with
Find person p, software s WITH $ soft=p.invent
It is of equal value.
The document for expanding search space, association can not only be implicitly defined in WHERE sentences, moreover it is possible to carry out LDM choosing Select computing.Such as:
FIND person p WHERE p.invent.name=' simba '
It is then implicit to be determined that LDM includes document sets (person, software), incidence set (invent), and will Software documents are asked to meet the condition that name attributes are ' simba '.
Document, link, the URL of attribute hierarchies can be included in RETURN sentences, or represents URL variable.The sentence The main project for performing LDM, the result of return is a Linked Document.
In summary, SimbaQL execution is divided into four steps:Determine document, establish document between relation, selection, projection.It is false If the basic search space in FIND is A, B;The document implicitly determined in WHERE sentences associating between C, and A and C L1, alternative condition condition;Projector space in RETURN sentences is space, and other process computings obtain document and are doc;So LDM computings are corresponding to SimbaQL sentences:
Result=σspaceπcondition((A×dB)A×L1C)
Such as SimbaQL sentences:
FIND person p, software s WHERE p.name=' bluejoe ' and p.invent.name=' simbaql’return p.name
Corresponding LDM computings are:
Result=σp.nameπP.name=' bluejoe ' and software=' simbaql '(Person×inventSoftware)
3rd, data referencing grammer (alternatively referred to as attribute list reaches grammer, as shown in Figure 3)
Due to LDM by database connection for a network, we can be come in citation network using similar URL form Data.This URL form is as follows:
<datasource>.<document>.<link*>.<identity>.<propertyName>
Wherein, datasource represents the data source registered in associated document, and document is represented in associated document Document class, link represent the articulation set in associated document, and multiple link can be included in URL, and identity represents document Id, propertyName represent the attribute of document.
Data can be quoted in different levels, such as the name attributes to person documents in MySQL database Reference can be expressed as:
MySQL.person.name
Father associations to person documents in MongoDB databases, which carry out application, to be expressed as:
MongoDB.person.father
What association represented is collection of document corresponding to the association, and we can also continue to deeply be quoted, such as
MongoDB.person.father.name
Data corresponding to data referencing URL be actually LDM opening relationships computing and project after result.Than Data such as MongoDB.person.father representatives are that two Person class documents are established into father to associate, and to Father relations carry out the result of project.
4th, intermediate variable grammer
Intermediate variable can represent the collection of document, numerical value, character string with basic search space correlation.The expression of variable by $ symbols are formed with identifier, and its definition uses WITH statement:
Such as:
WITH $ c1=p.knows.knows (collection of document)
WITH $ c2=123 (numerical value)
WITH $ c3=' bluejoe ' (character string)
Intermediate variable is used in grammer is matched, and corresponding condition coupling operation is carried out according to the type of intermediate variable.When When intermediate variable is collection of document, its main function is a part of content that replacement data quotes URL.
5th, matching condition grammer
Matching condition is the expression formula that a return value guided by WHERE sentences is bool types.The grammer of expression formula Rule is as follows:
1) collection of document A, B polymerization screening:(<Document A>.link|<Document A>)=<Document B>
2) collection of document screens:(<Document>.attribute|<Association>.attribute) operator master datas class Type
3)<Expression formula>AND|OR<Expression formula>
The operator operators wherein supported at present include:><=>=<=.For<Association>.attribute or<Text Shelves A>.link the situation of a collection of document is represented, the meaning of "=" operator is " presence ", such as:P.knows.name=' Bluejoe ' represents the people that one entitled " bluejoe " in the people known be present, and p.knows=p1 represents the people that p knows The middle meaning that p1 be present.
It should be noted that although WHERE sentences correspond to LDM Selecting operation, the Selecting operation in SimbaQL is only The attribute value of document is selected.Such as:
FIND person p, software s WHERE p.invent=s AND s.name=' SimbaQL ' RETURN p.name
Although including p.invent=s in alternative condition, actually real alternative condition is s.name=' SimbaQL’.
6th, SimbaQL analysis programs
SimbaQL analysis programs mainly include following several classes:
Whole syntactic structure associated class:Statement、SearchSpace、VariableDefines、Conditions、 SubSpace, the table 1 that its implication is seen below.
Syntax tree abstract class and interface:Node (node), Condition (condition), Variable (variable), Document (document), AttributeDocument (document drawn by attribute, such as p.knows, $ p.knows etc.), ValueExprecession (Value Types of expression formula).
Syntax tree concrete kind:RawDocument、RawAttribute、RawVarible、WithVarible、 StringValue、IntegerValue、TerminalCondition、And、Or、Not、VaribleAttribute、 DocumentRefference、Operator.Wherein RawDocument be used for representing in FIND Person p Person p this The document of sample;RawAttribute is used for representing attribute as p.name;VaribleAttribute represents $ k.name so The attribute drawn by variable;WithVarible represents variable as $ k in WITH $ k=p.knows;RawVarible represents p The Document Variables so defined by FIND;StringValue, IntegerValue represent character string and integer respectively;And、Or、 Not is used for preposition and, or, not in expression;DocumentRefference is represented Knows connections in p.knows.p.knows.name;Operator is used for representing comparison operator, TerminalCondition represents such as p.age>The 30 this expression formulas that can not split again.
The essential information of above class is as shown in table 4:
The abstract syntax tree essential information of table 4.
Except the related JAVA classes of above abstract syntax tree, analysis program also includes ANTLR4 morphology syntax parsing class: SimbaQLLexer、SimbaQLParser、SimbaQLBaseListener、SimbaQLBaseVisitor、 SimbaQLVisitor.Wherein SimbaQLLexer is by the morphology resolver of the SimbaQL sentences of ANTLR4 generations, for sentencing Word in conclusion sentence whether grammaticalness;SimbaQLParser is SimbaQL Syntactic parsers; SimbaQLBaseListener, SimbaQLBaseVisitor are the base that syntax tree is accessed with listener and visitor respectively Class;SimbaQLVisitor is to be inherited from SimbaQLBaseVisitor to access syntax tree for visitor modes.
Abstract syntax tree builds class:AstBuilder, for building abstract syntax tree, such provides an input SimbaQL sentences, one syntax tree guided by Statement of output.
Syntax error checks class:AstChecker、DBchecker.Wherein AstChecker can detect not meeting language The query statement of method;DBchecker is used to detect the content to conflict with data source in query statement, for example is wrapped in query statement Containing p.knows, and do not have knows connections in data source.
Syntax parsing case (output syntax parsing tree):SimbaParser、Treeprinter.SimbaParser is one Individual structure query statement syntax analytic tree simultaneously prints the case program of parsing tree construction;Treeprinter is printing grammer solution Analyse the program of tree.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this area Technical scheme can be modified by personnel or equivalent substitution, without departing from the spirit and scope of the present invention, this The protection domain of invention should be to be defined described in claims.

Claims (10)

1. a kind of interactive query method suitable for a variety of big data management systems, its step includes:
1) associated document model is established, it includes document sets and incidence set, and the incidence set is that the association between document is formed Set;
2) different original data models is converted into associated document model, connected different data sources by associated document model It is connected in one;
3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;
4) using the unified query language for being suitable for multivariate data, realize to relevant database, chart database and file system The unified query of system.
2. the method as described in claim 1, it is characterised in that the document in the document sets of the associated document model is one group The set that attribute is formed, the attribute are the set that same categorical data is formed;Each document acquiescence includes a primary key category Property, the primary key attributes is global unique mark;The document sets possess a name identifiers to illustrate to collect with incidence set Document and the semanteme associated in conjunction.
3. the method as described in claim 1, it is characterised in that the unified query language for being suitable for multivariate data management system Speech includes tetra- clauses of FIND, WITH, WHERE, RETURN;The basic variable of FIND sentences determination inquiry, these variables are necessary Represent document;WITH statement determines the intermediate variable used in matching condition grammer;WHERE sentences determine returning result needs The condition of satisfaction;RETURN sentences, which contain, needs the data referencing for returning to user.
4. method as claimed in claim 3, it is characterised in that:Basic query space in FIND sentences is by a kind of document or more Class document is formed, and requires that associated document model can not carry out the comparison between two class documents of onrelevant;In WITH statement Implicit defines the expansion for carrying out the document in basic query space and association;Can not only be implicit in WHERE sentences Ground defines the document for expanding search space, association, moreover it is possible to is associated the Selecting operation of document mid-module;In RETURN sentences Comprising document, link, attribute hierarchies URL, or represent URL variable, the sentence mainly performs the throwing of associated document model Shadow computing, the result of return is an associated document.
5. method as claimed in claim 3, it is characterised in that the implementation procedure of the unified query language is divided into four steps:Really Determine document, establish document between relation, selection, projection.
6. method as claimed in claim 3, it is characterised in that connected different data sources by the associated document model It is integrated, forms a network, and the data referencing grammer of the unified query language is formed using the form similar to URL, comes The unified data accessed in network.
7. method as claimed in claim 3, it is characterised in that intermediate variable in the unified query language represent with it is basic The related collection of document in search space, numerical value, character string, intermediate variable uses in grammer is matched, according to the class of intermediate variable Type carries out corresponding condition coupling operation.
8. method as claimed in claim 3, it is characterised in that the matching condition in the unified query language be one by The return value of WHERE sentences guiding is the expression formula of bool types, and the syntax rule of expression formula is as follows:
1) collection of document A, B polymerization screening:(<Document A>.link|<Document A>)=<Document B>;
2) collection of document screens:(<Document>.attribute|<Association>.attribute) operator basic data types;
3)<Expression formula>AND|OR<Expression formula>.
9. method as claimed in claim 3, it is characterised in that the analysis program in the unified query language includes:Entirely Syntactic structure associated class, syntax tree abstract class and interface, syntax tree concrete kind.
10. the method as described in claim 1, it is characterised in that developed in actual applications for the unified query language The SDK of Database Systems, and some compensation operations are carried out on the basis of local data database query language, then client-side program leads to Cross and call the API in SDK to use the unified query language operating database;Or the unification is directly based upon to database Query language designs communication protocol, the Query Result that client-side program is needed by sending network request.
CN201710515380.6A 2017-06-29 2017-06-29 Interactive query method suitable for various big data management systems Active CN107515887B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710515380.6A CN107515887B (en) 2017-06-29 2017-06-29 Interactive query method suitable for various big data management systems

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710515380.6A CN107515887B (en) 2017-06-29 2017-06-29 Interactive query method suitable for various big data management systems

Publications (2)

Publication Number Publication Date
CN107515887A true CN107515887A (en) 2017-12-26
CN107515887B CN107515887B (en) 2021-01-08

Family

ID=60721837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710515380.6A Active CN107515887B (en) 2017-06-29 2017-06-29 Interactive query method suitable for various big data management systems

Country Status (1)

Country Link
CN (1) CN107515887B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109033260A (en) * 2018-07-06 2018-12-18 天津大学 Knowledge mapping Interactive Visualization querying method based on RDF
CN109241054A (en) * 2018-08-02 2019-01-18 成都松米科技有限公司 A kind of multimodal data library system, implementation method and server
CN110109951A (en) * 2017-12-29 2019-08-09 华为软件技术有限公司 A kind of method of correlation inquiry, database application system and server
CN110765151A (en) * 2018-07-27 2020-02-07 北京国双科技有限公司 Calculation formula processing method and device
CN111221785A (en) * 2018-11-27 2020-06-02 中云开源数据技术(上海)有限公司 Semantic data lake construction method of multi-source heterogeneous data
CN111475534A (en) * 2020-05-12 2020-07-31 北京爱笔科技有限公司 Data query method and related equipment
CN112084248A (en) * 2020-09-11 2020-12-15 党丹 Intelligent data retrieval, lookup and model acquisition method based on graph database
CN112148925A (en) * 2019-06-27 2020-12-29 北京百度网讯科技有限公司 User identification correlation query method, device, equipment and readable storage medium
CN112632037A (en) * 2020-12-24 2021-04-09 山东浪潮通软信息科技有限公司 Method and device for graphically defining query data set
CN113282625A (en) * 2021-05-31 2021-08-20 重庆富民银行股份有限公司 SQL-based API data query and processing system and method
CN113515610A (en) * 2021-06-21 2021-10-19 中盾创新档案管理(北京)有限公司 File management method based on object-oriented language processing
CN113761290A (en) * 2021-03-10 2021-12-07 中科天玑数据科技股份有限公司 Query method and query system for realizing full-text search graph database based on SQL

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208723A1 (en) * 2006-03-03 2007-09-06 International Business Machines Corporation System and method for building a unified query that spans heterogeneous environments
CN102073701A (en) * 2010-12-30 2011-05-25 浪潮集团山东通用软件有限公司 Semantic definition-based multi-data source data querying method
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method
CN106294402A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The data search method of a kind of heterogeneous data source and device thereof
CN106372177A (en) * 2016-08-30 2017-02-01 东华大学 Query expansion method supporting correlated query and fuzzy grouping of mixed data type

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070208723A1 (en) * 2006-03-03 2007-09-06 International Business Machines Corporation System and method for building a unified query that spans heterogeneous environments
CN102073701A (en) * 2010-12-30 2011-05-25 浪潮集团山东通用软件有限公司 Semantic definition-based multi-data source data querying method
CN106294402A (en) * 2015-05-21 2017-01-04 阿里巴巴集团控股有限公司 The data search method of a kind of heterogeneous data source and device thereof
CN105468702A (en) * 2015-11-18 2016-04-06 中国科学院计算机网络信息中心 Large-scale RDF data association path discovery method
CN106372177A (en) * 2016-08-30 2017-02-01 东华大学 Query expansion method supporting correlated query and fuzzy grouping of mixed data type

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
徐仕超 等: "基于数据库集群的海量RDF数据联合查询系统的研究与实现", 《科研信息化技术与应用》 *
沈志宏 等: "OpenCSDB关联数据在科学数据库中的应用研究", 《中国图书馆学报》 *
王林彬 等: "基于NoSQL的RDF数据存储与查询技术综述", 《计算机应用研究》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109951B (en) * 2017-12-29 2022-12-06 华为技术有限公司 Correlation query method, database application system and server
CN110109951A (en) * 2017-12-29 2019-08-09 华为软件技术有限公司 A kind of method of correlation inquiry, database application system and server
CN109033260A (en) * 2018-07-06 2018-12-18 天津大学 Knowledge mapping Interactive Visualization querying method based on RDF
CN109033260B (en) * 2018-07-06 2021-08-31 天津大学 Knowledge graph interactive visual query method based on RDF
CN110765151A (en) * 2018-07-27 2020-02-07 北京国双科技有限公司 Calculation formula processing method and device
CN109241054A (en) * 2018-08-02 2019-01-18 成都松米科技有限公司 A kind of multimodal data library system, implementation method and server
CN111221785A (en) * 2018-11-27 2020-06-02 中云开源数据技术(上海)有限公司 Semantic data lake construction method of multi-source heterogeneous data
CN112148925A (en) * 2019-06-27 2020-12-29 北京百度网讯科技有限公司 User identification correlation query method, device, equipment and readable storage medium
CN112148925B (en) * 2019-06-27 2024-03-01 北京百度网讯科技有限公司 User identification association query method, device, equipment and readable storage medium
CN111475534A (en) * 2020-05-12 2020-07-31 北京爱笔科技有限公司 Data query method and related equipment
CN111475534B (en) * 2020-05-12 2023-04-14 北京爱笔科技有限公司 Data query method and related equipment
CN112084248A (en) * 2020-09-11 2020-12-15 党丹 Intelligent data retrieval, lookup and model acquisition method based on graph database
CN112632037A (en) * 2020-12-24 2021-04-09 山东浪潮通软信息科技有限公司 Method and device for graphically defining query data set
CN112632037B (en) * 2020-12-24 2023-04-07 浪潮通用软件有限公司 Method and device for graphically defining query data set
CN113761290A (en) * 2021-03-10 2021-12-07 中科天玑数据科技股份有限公司 Query method and query system for realizing full-text search graph database based on SQL
CN113282625B (en) * 2021-05-31 2022-10-04 重庆富民银行股份有限公司 SQL-based API data query and processing system and method
CN113282625A (en) * 2021-05-31 2021-08-20 重庆富民银行股份有限公司 SQL-based API data query and processing system and method
CN113515610B (en) * 2021-06-21 2022-09-13 中盾创新数字科技(北京)有限公司 File management method based on object-oriented language processing
CN113515610A (en) * 2021-06-21 2021-10-19 中盾创新档案管理(北京)有限公司 File management method based on object-oriented language processing

Also Published As

Publication number Publication date
CN107515887B (en) 2021-01-08

Similar Documents

Publication Publication Date Title
CN107515887A (en) A kind of interactive query method suitable for a variety of big data management systems
CN110291517B (en) Query language interoperability in graph databases
US20220050840A1 (en) Natural language query translation based on query graphs
EP2652645B1 (en) Extensible rdf databases
US5873079A (en) Filtered index apparatus and method
US5884304A (en) Alternate key index query apparatus and method
US5870739A (en) Hybrid query apparatus and method
Jensen et al. Converting XML DTDs to UML diagrams for conceptual data integration
CN106934062A (en) A kind of realization method and system of inquiry elasticsearch
WO2015069941A1 (en) Generic indexing for efficiently supporting ad-hoc query over hierarchically marked-up data
CA2576744A1 (en) System for ontology-based semantic matching in a relational database system
Biskup et al. Extracting information from heterogeneous information sources using ontologically specified target views
Bellatreche et al. A design methodology of ontology based database applications
Natarajan et al. [Retracted] Schema‐Based Mapping Approach for Data Transformation to Enrich Semantic Web
Das et al. MyNLIDB: a natural language interface to database
Gapanyuk Metagraph approach to the information-analytical systems development
Zhang et al. Construction of fuzzy ontologies from fuzzy XML models
CN111475534B (en) Data query method and related equipment
Gunaratna et al. Alignment and dataset identification of linked data in semantic web
Palopoli et al. Experiences using DIKE, a system for supporting cooperative information system and data warehouse design
US20230273947A1 (en) System and method for implementing ontologies in sql
Porto et al. ROSA: A Data Model and Query Language for e-Learning Objects.
Tang et al. Ontology-based semantic retrieval for education management systems
Liu et al. Complex event processing for sequence data and domain knowledge
Ramanujam et al. Relationalization of provenance data in complex RDF reification nodes

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant