CN107515887A - A kind of interactive query method suitable for a variety of big data management systems - Google Patents
A kind of interactive query method suitable for a variety of big data management systems Download PDFInfo
- Publication number
- CN107515887A CN107515887A CN201710515380.6A CN201710515380A CN107515887A CN 107515887 A CN107515887 A CN 107515887A CN 201710515380 A CN201710515380 A CN 201710515380A CN 107515887 A CN107515887 A CN 107515887A
- Authority
- CN
- China
- Prior art keywords
- document
- data
- query language
- model
- database
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of interactive query method suitable for a variety of big data management systems, its step includes:1) associated document model is established, it includes document sets and incidence set, and the incidence set is the set that the association between document is formed;2) different original data models is converted into associated document model, connected as one different data sources by associated document model;3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;4) using the unified query language for being suitable for multivariate data, the unified query to relevant database, chart database and file system is realized.Present invention firstly provides the unified query language for being suitable for multivariate data management system, it is possible to achieve to the unified query of relevant database, chart database, and file system.
Description
Technical field
The present invention relates to a kind of query language, and in particular to a kind of interactive inquiry language suitable for big data management system
Querying method is mentioned, belongs to big data, database technical field.
Background technology
With the continuous popularization of computer, the management of data and process demand are increasingly urgent, and people are directed to different data
The different data model of form and feature extraction, and realize corresponding data management system come realize the management of data and point
Analysis.More influential data model such as E-R models, since the last century 70's proposes, E-R models are ruled substantially
The database world is up to more than 40 years.Since last decade, as what internet and Internet of Things were applied gos deep into, large-scale structuring,
The generation of semi-structured, non-structural data has triggered NoSQL motions [Cattell R.Scalable SQL and NoSQL
data stores[J].ACM SIGMOD Record,2010,39(4):12-27].The database world is monopolized by initial SQL
Situation be transformed into the situation that traditional SQL, NoSQL, NewSQL divide and rule.
Structure one perfect big data application system, it is necessary to fully take into account from 4V [Gupta R, Gupta H,
Mohania M.Cloud computing and big data analytics:what is new from databases
perspective[C]//Proc of 1st BDA.,New Delhi,India:,Springer Berlin
Heidelberg,2012:42-61.] challenge, to the further analysis of big data, association mining, or even scientific discovery.With biology
Exemplified by the science data of subject, existing by instruments such as sequencing, mass spectrum, nuclear magnetic resonance, caused lots of genes sequence is literary daily
Part, protein sequence file, the micro-data such as 26S Proteasome Structure and Function of protein, also have and traditional use MongoDB or SQL numbers
According to storehouse come preserved species information, Physiological-biochemical Characters, the macro-data such as reaction condition information, also substantial amounts of document,
The knowledge informations such as patent.In order to preferably realize Knowledge Discovery, scientific research personnel can introduce Bio-ontology toward contact, be closed by RDF
The mode of network of networking manages the large-scale association between the data such as species, protein, gene.These microcosmic and macroscopic aspects
Information ultimately form an organic database, so as to which life is understood and studied from the aspect of entirety.With number
Generally require to dispatch a series of data pipeline completion according to the scientific discovery of driving, it can be seen that these streamlines can be crossed over
Multiple processes such as the collections of data, batch write-in, inquiry, analysis and visualization, one is there is among these and huge is asked
Topic:How to allow streamline programming personnel not consider further that the otherness of bottom data storage model, and can by it is a kind of it is unified in a manner of
Access and operation dataThis problem is mapped in data management technique, i.e., how to cross over SQL, NoSQL, NewSQL database
Border, realize the universal data access of multivariate data model, and provide for Computational frame as Hadoop, Spark unified
Data operating interface.
Relational database covers distributed data base to memory database at present, mainly have MySQL, PostgreSQL,
Oracle, SQLite etc., the uniformity of data access is ensured by ACID and affairs, data are carried out using table, row, keyword
Processing, is fixed, the application scenario of strong consistency suitable for structure.In October, 1986, U.S. ANSI is using SQL as relation data
The standard language (ANSI X3.135-1986) of base management system, it is adopted as international standard afterwards for ISO.SQL is so as to as current
Most popular relational database query language.
NoSQL databases include Key-Value databases, columnar database, document database, chart database.Due to
NoSQL databases also lack a set of unified query language at present, have part research to be directed to encapsulating out for NoSQL databases
The interface of SQL query, as Hive provides the HQL query languages similar to SQL, simplify the use difficulty of NoSQL databases.
Spark SQL are that a kind of SQL based on Spark DataFrame big datas processing framework is realized, support the big data based on SQL
Processing and analysis.Based on DataFrame, Spark can be current mass data storehouse such as MySQL, HBase, Cassandra,
MongoDB provides the SQL query analysis ability based on big data.
As an important branch in NoSQL databases, chart database is often used for managing large-scale related information,
Such as associating between species and gene, the social networks of people, Amazon warehouse retail main data system etc., support is based on attribute
The quick associative search of graph model.Typical chart database has Neo4j, Titan, Virtuoso etc. at present.For chart database,
Neo4J proposes Cypher query languages, and the correlation inquiry of diagram data model is succinctly expressed using the grammer similar to SQL, simplifies
Chart database uses difficulty.TinkerPop project Attribute Orienteds figure proposes Gremlin figure traversal metalanguages, supports a variety of figures
Database, such as Titan, OrientDB, TinkerGraph, it is referred to as the Perl of chart database circle.In addition, rdf model is base
In a kind of semantic description framework of graph model, it is adapted to expression semantic information and its association, typical RDF data storehouse has at present
Jena, Virtuoso etc..RDF data in 2004 accesses working group and has issued first RDF query language SPARQL, 2008
SPARQL agreements and query language formally turn into a W3C proposed standard.SPARQL uses structuralized query mode, passes through
Where subgraph matchs realize correlation inquiry, and at present, most of RDF data storehouse all supports the SPARQL of standard to inquire about.
It is not difficult to find out, a set of unified query language, wherein chart database is also lacked currently for SQL, NoSQL database
Due to its special inquiry and analysis mode, the another side of SQL query language has generally been pulled to.Therefore, people are in selection number
When according to model, generally require to make one's choice:It is to select SQL database (including supporting SQL NoSQL databases), still
Chart databaseThis choose often brings the otherness of upper layer application, be using the powerful Gremlin of association analysis ability,
SPARQL query languages, or use traditional SQL query language based on bivariate table
Based on background above, the present invention proposes a kind of new query language Simba, to realize to relational data
The unified query of storehouse, chart database, and file system.
The content of the invention
It is an object of the invention to provide a kind of interactive query method suitable for a variety of big data management systems, pass through
Unified query language Simba, relevant database, chart database can be directed to, and file system realizes inquiry.
The technical solution adopted by the present invention is as follows:
A kind of interactive query method suitable for a variety of big data management systems, its step include:
1) associated document model is established, it includes document sets and incidence set, and the incidence set is the association structure between document
Into set;
2) different original data models is converted into associated document model, by associated document model by different data
Source connects as one;
3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;
4) using the unified query language for being suitable for multivariate data, realize to relevant database, chart database and text
The unified query of part system.
Further, the unified query language for being suitable for multivariate data management system include FIND, WITH, WHERE,
Tetra- clauses of RETURN;FIND sentences determine the basic variable of inquiry, and these variables must represent document;WITH statement determines
The intermediate variable used in matching condition grammer;WHERE sentences determine that returning result needs the condition met;RETURN sentence bags
Having contained needs the data referencing for returning to user.
Further, the basic query space in FIND sentences is made up of a kind of document or multiclass document, and requires to close
Connection document model can not carry out the comparison between two class documents of onrelevant;It is implicit in WITH statement to define basic query
The expansion that document and association in space are carried out;The text for expanding search space can not only be implicitly defined in WHERE sentences
Shelves, association, moreover it is possible to be associated the Selecting operation of document mid-module;Document, link, attribute hierarchies are included in RETURN sentences
URL, or represent URL variable, the sentence mainly performs the project of associated document model, and the result of return is one
Associated document.
Further, the implementation procedure of the unified query language is divided into four steps:Determine document, establish document between close
System, selection, projection.
Further, different data sources is connected as one by the associated document model, forms a network, and
The data referencing grammer of the unified query language is formed using similar URL form, uniformly to access the data in network.
Further, the intermediate variable in the unified query language represents the document sets with basic search space correlation
Conjunction, numerical value, character string, intermediate variable are used in grammer is matched, and corresponding condition coupling is carried out according to the type of intermediate variable
Operation.
A set of intermediate language proposed by the present invention, independent of specific operating system and programming language.Due to Simba languages
Speech contains the operation of a variety of data models, and some of which operation can not be done directly by database, therefore in practical application
In, can be for SDK (Software Development Kit, the SDK of Simba language development Database Systems
Bag), some compensation operations are carried out on the basis of local data database query language.Such as:It can be managed for MongoDB database developments
Solve and perform the java applet bag (or C++ program bags) of Simba language, such CLIENT PROGRAM can is by calling in SDK
API (Application Programming Interface, application programming interface) operated using Simba language
MongoDB, i.e., data management system is inquired about by SimbaQL translaters, this pattern is as shown in Figure 1.
Another mode is that database is directly based upon Simba language design communication protocols, and client-side program can pass through
The network request of transmission Simba orders, the Query Result needed, the pattern are as shown in Figure 2.
As shown in figure 3, the Simba query languages of the present invention include following components:
1.SimbaQL syntactic structures:Each sons such as SimbaQL general structure and FIND, WITH, WHERE, RETURN are provided
The grammer of sentence.
2. data referencing grammer:Illustrate how the data in reference data source in SimbaQL;
3. intermediate variable grammer:Illustrate how defined in SimbaQL and use intermediate variable;
4. matching condition grammer:Illustrate how to write matching condition in SimbaQL;
5.SimbaQL analysis programs:The SimbaQL analysis programs based on Java are provided, to write SimbaQL clients
Program, or query engine configuration processor;
Compared with prior art, advantages of the present invention is as follows:
(1) unified query language for being suitable for multivariate data management system is proposed first, and the language can be realized to closing
It is the unified query of type database, chart database, and file system.It can retrieve and meet specified attribute condition in relation table
Record, can also retrieve the multiple summits for meeting specified associations condition in chart database, while can also retrieve file system
Specific file in system.In current development technique, application program must pass through SQL query language, Cypger/gremlin
Language, and API mode realize retrieval to relevant database, chart database, file system respectively, this otherness
Way brings the difficulty for grasping multilingual and the not versatility of programming.And by SimbaQL, then need a set of unification
Syntax format.This difference is as shown in Figure 4, Figure 5.
(2) the typical query mode to big data management system is concluded, to the SQL query function and Tu Cha of complexity
Ask function to be simplified, SimbaQL target is to cover most of query demand, and allows the data management system of main flow
The language is very easily supported, therefore has abandoned the sophisticated functions in SQL query and figure inquiry, such as:Subquery, or inquiry
As a result UNION etc. is operated.SimbaQL suggests these secondary operations, and big data Computational frame can be allowed to do, SimbaQL sheets
Body only completes the function of simple data query extraction.
(3) inquiry for a variety of data model databases that SimbaQL language is directed to, which includes a variety of data models
Computing.So if the query language of some model can not complete the computing of other models, the realization of SimbaQL language can be helped
The model is helped to complete.For example MongoDB query language can not complete the JOIN computings of document, and SimbaQL supports JOIN fortune
Calculate, therefore SimbaQL realization will compensate these computings.
(4) SimbaQL introduces the characteristic of intermediate variable, to express and hide the uninterested information of user.To look into
Look for exemplified by two related entities of tool:
FIND x, y WITH $ m=x.child WHERE $ m.child=y RETURN x, y
The sentence introduces intermediate variable m, and the object representated by the variable is x child, and y is his child.This is looked into
Ask to return to all grandparent and grandchild, but propose the application of the inquiry and need not be concerned about that whom m is specifically.
This mode avoids spelling and repeated simultaneously, realizes quick literary style.
(5) SimbaQL introduces the multistage expression way for quoting attribute, such as:X.knows.knows.name represents x understanding
Someone y understanding someone z name.In traditional query language, multistage reference is not supported.This mode is effective
Reduce the repetition of code, and with intuitively effect.
Brief description of the drawings
Fig. 1 is shown by way of SimbaQL translaters are inquired about data management system.
Fig. 2 is shown by way of SimbaQL procotols are directly inquired about data management system.
Fig. 3 shows the structure chart of present invention.
Fig. 4 shows to need in the prior art using the different management system of different language inquiries.
Fig. 5 shows that the present invention inquires about different management systems using SimbaQL language unities.
Fig. 6 is the structural representation of LDM models.
Embodiment
Below by specific embodiments and the drawings, the present invention will be further described.
The SimbaQL of present invention design is with LinkedDocument mid-modules (associated document model, Linked
Document Model, abbreviation LDM) based on, transported by LDM computings and the mapping of other model calculations and SDK compensation
Calculate, reach the purpose of a variety of data model database unified queries.
1st, Linked Document models
1) Linked Document model definitions
Document is the set being made up of one group of attribute, and attribute is the set that same categorical data is formed.Each document is write from memory
Recognize the primary key attributes for including a unique mark.Primary key attributes is similar with the function of IP address, it is necessary to is global unique mark;
The type of other attributes can be arbitrary, including document, association, a customization type etc..Association is a special text
Shelves, wherein (from must be included:Primary key, to:Primary key) two attributes, for representing the association between document, the association is
The knows referred between the relation between two datas, such as a person document and another person document associates representative
First man recognizes second people.Document sets and incidence set must all possess a name identifiers illustrate set in document and
The semanteme of association.Attribute number can be different in same class document or association, and this means that { ' id ':’fffff0’,
‘name’:‘bluejoe’,‘age’:30 } both teacher classes text can also can be used as a member of person class documents
A member of shelves.
LDM models are two tuples being made up of document sets and incidence set (document sets, incidence set), and wherein incidence set is
A variety of set of relationship between two class documents.The general configuration of LDM models is as shown in Figure 6.Wherein, Documents represents document
Collection, Links represent incidence set, and PersonDocument represents this kind of collection of document of people, and SoftwareDocument represents software
Class collection of document, InventLink represent the set of this kind of association of people's invention software, and 1,2 represent document unique identifier primary key,
Attr1, attr2 ... represent the attribute of document.
2) LDM transformation rules
LDM is directed to the inquiry and analysis of data, and it provides two kinds of transformation rule:Original data model arrives
LDM conversion, LDM to existing programming model require the conversion of form.
A) original data model → LDM
The formal definitions of data model translation are (G, L, M), and the Schema that wherein G represents world model that is to say LDM,
L represents local data model (relational model, key-value models, document model, attribute graph model), and M represents reflecting from L to G
Penetrate rule.Original data model to LDM conversion primary concern is that the semanteme of data, and the conversion of data type aspect then may be used
To be determined according to system requirements by developer oneself.The original data model that conversion given below includes have relational model,
Key-value models, document model and attribute graph model, main transformation rule are as shown in table 1.Wherein customized conversion rule
It is then according to the characteristic of former data model, extracts the data acquisition system for meeting some features.Such as in extraction key-value models
The data of key comprising person are as Person class collection of document;Extract the summit that lable in attribute graph model is Person
As Person class documents;By personid phases in personid and the Software document of Person classes document in document model
Articulation set invent is extracted as Deng this relation.
Transformation rule of the original data model of table 1. to LDM
LDM | Relational model | Key-value models | Document model | Attribute graph model |
Attribute | Attribute | Key | Attribute | Attribute |
Document | Record | Pair | Document | Summit |
Collection of document | Table | It is self-defined | Set | It is self-defined |
Connection | External key | It is self-defined | It is self-defined | Side |
Articulation set | External key | It is self-defined | It is self-defined | It is self-defined |
It should be noted that whether document sets or incidence set must all possess a name in LDM, therefore for
In the transfer process of the external key of relational model and other self-defined parts, it is necessary to provide a name conduct by switch crew
The semanteme of set element.For example in attribute graph model, the node that lable is ' person ' can be made as in LDM
Person class documents;The node comprising attribute ' teacher ' can also be made as the teacher class documents in LDM, and in fact
This two classes document may correspond to same node.
In addition, archetype to LDM conversion can be not limited to model above, developer can define it according to demand
Its data model is to LDM transformation rule, such as file system, column database etc..
B) LDM → programming model
LDM to programming model conversion primary concern is that relation in data structure.Currently a popular programming model is such as
The acceptable data structure such as map/reduce, spark SQL, Pergel mainly has array, table, figure.Therefore it is given below
LDM to these three data structures transformation rule, as shown in table 2.
Table 2.LDM to array, table, figure transformation rule
3) LDM operation rules
LDM operation rule is the computing based on relational model, key-value models, document model and attribute graph model
Definition.Including the set operation of relational model, concatenation operation, Selecting operation, project;Key-value models
Get computings;The selection of document model, project;The traversal and Selecting operation of attribute graph model.The operation method of LDM models
Then it is broadly divided into three classes:Set operation, association computing, document computing, specific operation rule are as shown in table 3.
Table 3.LDM operation rule
4) LDM data accesses rule
Due to LDM by database connection for a network, we can be come in citation network using similar URL form
Data.This URL form is as follows:
<datasource>.<document>.<link>.<identity>.<propertyName>
Wherein, datasource represents data source, such as MySQL, MongoDB etc., and document represents data source to LDM
The document of mapping, link represent the association that data source maps to LDM, and identity represents the primary key of document, propertyName
Represent the attribute-name of document.
Data can be quoted in different levels, such as the name attributes to person documents in MySQL database
Reference can be expressed as:
MySQL.person.name
Father associations to person documents in MongoDB databases, which carry out application, to be expressed as:
MongoDB.person.father
What association represented is collection of document corresponding to the association, and we can also continue to deeply be quoted, such as
MongoDB.person.father.name
Data corresponding to data referencing URL be actually LDM opening relationships computing and project after result.Than
Data such as MongoDB.person.father representatives are that two Person class documents are established into father to associate, and to
Father relations carry out the result of project.
2nd, SimbaQL syntactic structures
As SQL and relational model, based on Linked Document models, every SimbaQL sentence can be converted into
Linked Document operational formula, operational formula are made up of the following computing in table 2:" establishing and closing in " association computing "
Connection ", " Selecting operation ";" Selecting operation ", " project " in " document computing ".SimbaQL query statements mainly include
Tetra- clauses of FIND, WITH, WHERE, RETURN, syntactic structure are as follows:
FIND<documents>
WITH<variables>
WHERE<conditions>
RETURN<urls>
Wherein, FIND clause determines the basic variable of inquiry, and each variable is necessarily corresponded in Linked Doument
A kind of document;WITH statement determines the intermediate variable used in matching condition grammer, and these intermediate variables can be a variety of numbers
According to type, document is not limited to;WHERE determines that returning result needs the condition met;RETURN sentences contain needs and returned to
The data referencing of user.LDM calculating processes corresponding to SimbaQL are given below.
First, the basic query space in FIND is made up of a kind of document or multiclass document, and SimbaQL requires LDM
The comparison between two class documents of onrelevant can not be carried out.For example the data of person objects and software objects are carried out
Inquiry can be expressed as:
FIND MySQL.person p,MySQL.software s
If do not associated between person documents and software documents, then we can only to person and
Software carries out Selecting operation respectively, and can not carry out Selecting operation as similar p.inventid=s.id.
The expansion defined the document in basic query space and association progress of variable implicitly defined in WITH,
Such as:
FIND person p WITH $ soft=p.invent
Above sentence represents to contain Software documents in the Linked Document that we search for, and association
invent.It with
Find person p, software s WITH $ soft=p.invent
It is of equal value.
The document for expanding search space, association can not only be implicitly defined in WHERE sentences, moreover it is possible to carry out LDM choosing
Select computing.Such as:
FIND person p WHERE p.invent.name=' simba '
It is then implicit to be determined that LDM includes document sets (person, software), incidence set (invent), and will
Software documents are asked to meet the condition that name attributes are ' simba '.
Document, link, the URL of attribute hierarchies can be included in RETURN sentences, or represents URL variable.The sentence
The main project for performing LDM, the result of return is a Linked Document.
In summary, SimbaQL execution is divided into four steps:Determine document, establish document between relation, selection, projection.It is false
If the basic search space in FIND is A, B;The document implicitly determined in WHERE sentences associating between C, and A and C
L1, alternative condition condition;Projector space in RETURN sentences is space, and other process computings obtain document and are
doc;So LDM computings are corresponding to SimbaQL sentences:
Result=σspaceπcondition((A×dB)A×L1C)
Such as SimbaQL sentences:
FIND person p, software s WHERE p.name=' bluejoe ' and p.invent.name='
simbaql’return p.name
Corresponding LDM computings are:
Result=σp.nameπP.name=' bluejoe ' and software=' simbaql '(Person×inventSoftware)
3rd, data referencing grammer (alternatively referred to as attribute list reaches grammer, as shown in Figure 3)
Due to LDM by database connection for a network, we can be come in citation network using similar URL form
Data.This URL form is as follows:
<datasource>.<document>.<link*>.<identity>.<propertyName>
Wherein, datasource represents the data source registered in associated document, and document is represented in associated document
Document class, link represent the articulation set in associated document, and multiple link can be included in URL, and identity represents document
Id, propertyName represent the attribute of document.
Data can be quoted in different levels, such as the name attributes to person documents in MySQL database
Reference can be expressed as:
MySQL.person.name
Father associations to person documents in MongoDB databases, which carry out application, to be expressed as:
MongoDB.person.father
What association represented is collection of document corresponding to the association, and we can also continue to deeply be quoted, such as
MongoDB.person.father.name
Data corresponding to data referencing URL be actually LDM opening relationships computing and project after result.Than
Data such as MongoDB.person.father representatives are that two Person class documents are established into father to associate, and to
Father relations carry out the result of project.
4th, intermediate variable grammer
Intermediate variable can represent the collection of document, numerical value, character string with basic search space correlation.The expression of variable by
$ symbols are formed with identifier, and its definition uses WITH statement:
Such as:
WITH $ c1=p.knows.knows (collection of document)
WITH $ c2=123 (numerical value)
WITH $ c3=' bluejoe ' (character string)
Intermediate variable is used in grammer is matched, and corresponding condition coupling operation is carried out according to the type of intermediate variable.When
When intermediate variable is collection of document, its main function is a part of content that replacement data quotes URL.
5th, matching condition grammer
Matching condition is the expression formula that a return value guided by WHERE sentences is bool types.The grammer of expression formula
Rule is as follows:
1) collection of document A, B polymerization screening:(<Document A>.link|<Document A>)=<Document B>
2) collection of document screens:(<Document>.attribute|<Association>.attribute) operator master datas class
Type
3)<Expression formula>AND|OR<Expression formula>
The operator operators wherein supported at present include:><=>=<=.For<Association>.attribute or<Text
Shelves A>.link the situation of a collection of document is represented, the meaning of "=" operator is " presence ", such as:P.knows.name='
Bluejoe ' represents the people that one entitled " bluejoe " in the people known be present, and p.knows=p1 represents the people that p knows
The middle meaning that p1 be present.
It should be noted that although WHERE sentences correspond to LDM Selecting operation, the Selecting operation in SimbaQL is only
The attribute value of document is selected.Such as:
FIND person p, software s WHERE p.invent=s AND s.name=' SimbaQL '
RETURN p.name
Although including p.invent=s in alternative condition, actually real alternative condition is s.name='
SimbaQL’.
6th, SimbaQL analysis programs
SimbaQL analysis programs mainly include following several classes:
Whole syntactic structure associated class:Statement、SearchSpace、VariableDefines、Conditions、
SubSpace, the table 1 that its implication is seen below.
Syntax tree abstract class and interface:Node (node), Condition (condition), Variable (variable), Document
(document), AttributeDocument (document drawn by attribute, such as p.knows, $ p.knows etc.),
ValueExprecession (Value Types of expression formula).
Syntax tree concrete kind:RawDocument、RawAttribute、RawVarible、WithVarible、
StringValue、IntegerValue、TerminalCondition、And、Or、Not、VaribleAttribute、
DocumentRefference、Operator.Wherein RawDocument be used for representing in FIND Person p Person p this
The document of sample;RawAttribute is used for representing attribute as p.name;VaribleAttribute represents $ k.name so
The attribute drawn by variable;WithVarible represents variable as $ k in WITH $ k=p.knows;RawVarible represents p
The Document Variables so defined by FIND;StringValue, IntegerValue represent character string and integer respectively;And、Or、
Not is used for preposition and, or, not in expression;DocumentRefference is represented
Knows connections in p.knows.p.knows.name;Operator is used for representing comparison operator,
TerminalCondition represents such as p.age>The 30 this expression formulas that can not split again.
The essential information of above class is as shown in table 4:
The abstract syntax tree essential information of table 4.
Except the related JAVA classes of above abstract syntax tree, analysis program also includes ANTLR4 morphology syntax parsing class:
SimbaQLLexer、SimbaQLParser、SimbaQLBaseListener、SimbaQLBaseVisitor、
SimbaQLVisitor.Wherein SimbaQLLexer is by the morphology resolver of the SimbaQL sentences of ANTLR4 generations, for sentencing
Word in conclusion sentence whether grammaticalness;SimbaQLParser is SimbaQL Syntactic parsers;
SimbaQLBaseListener, SimbaQLBaseVisitor are the base that syntax tree is accessed with listener and visitor respectively
Class;SimbaQLVisitor is to be inherited from SimbaQLBaseVisitor to access syntax tree for visitor modes.
Abstract syntax tree builds class:AstBuilder, for building abstract syntax tree, such provides an input
SimbaQL sentences, one syntax tree guided by Statement of output.
Syntax error checks class:AstChecker、DBchecker.Wherein AstChecker can detect not meeting language
The query statement of method;DBchecker is used to detect the content to conflict with data source in query statement, for example is wrapped in query statement
Containing p.knows, and do not have knows connections in data source.
Syntax parsing case (output syntax parsing tree):SimbaParser、Treeprinter.SimbaParser is one
Individual structure query statement syntax analytic tree simultaneously prints the case program of parsing tree construction;Treeprinter is printing grammer solution
Analyse the program of tree.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this area
Technical scheme can be modified by personnel or equivalent substitution, without departing from the spirit and scope of the present invention, this
The protection domain of invention should be to be defined described in claims.
Claims (10)
1. a kind of interactive query method suitable for a variety of big data management systems, its step includes:
1) associated document model is established, it includes document sets and incidence set, and the incidence set is that the association between document is formed
Set;
2) different original data models is converted into associated document model, connected different data sources by associated document model
It is connected in one;
3) associated document model is based on, establishes the unified query language for being suitable for multivariate data;
4) using the unified query language for being suitable for multivariate data, realize to relevant database, chart database and file system
The unified query of system.
2. the method as described in claim 1, it is characterised in that the document in the document sets of the associated document model is one group
The set that attribute is formed, the attribute are the set that same categorical data is formed;Each document acquiescence includes a primary key category
Property, the primary key attributes is global unique mark;The document sets possess a name identifiers to illustrate to collect with incidence set
Document and the semanteme associated in conjunction.
3. the method as described in claim 1, it is characterised in that the unified query language for being suitable for multivariate data management system
Speech includes tetra- clauses of FIND, WITH, WHERE, RETURN;The basic variable of FIND sentences determination inquiry, these variables are necessary
Represent document;WITH statement determines the intermediate variable used in matching condition grammer;WHERE sentences determine returning result needs
The condition of satisfaction;RETURN sentences, which contain, needs the data referencing for returning to user.
4. method as claimed in claim 3, it is characterised in that:Basic query space in FIND sentences is by a kind of document or more
Class document is formed, and requires that associated document model can not carry out the comparison between two class documents of onrelevant;In WITH statement
Implicit defines the expansion for carrying out the document in basic query space and association;Can not only be implicit in WHERE sentences
Ground defines the document for expanding search space, association, moreover it is possible to is associated the Selecting operation of document mid-module;In RETURN sentences
Comprising document, link, attribute hierarchies URL, or represent URL variable, the sentence mainly performs the throwing of associated document model
Shadow computing, the result of return is an associated document.
5. method as claimed in claim 3, it is characterised in that the implementation procedure of the unified query language is divided into four steps:Really
Determine document, establish document between relation, selection, projection.
6. method as claimed in claim 3, it is characterised in that connected different data sources by the associated document model
It is integrated, forms a network, and the data referencing grammer of the unified query language is formed using the form similar to URL, comes
The unified data accessed in network.
7. method as claimed in claim 3, it is characterised in that intermediate variable in the unified query language represent with it is basic
The related collection of document in search space, numerical value, character string, intermediate variable uses in grammer is matched, according to the class of intermediate variable
Type carries out corresponding condition coupling operation.
8. method as claimed in claim 3, it is characterised in that the matching condition in the unified query language be one by
The return value of WHERE sentences guiding is the expression formula of bool types, and the syntax rule of expression formula is as follows:
1) collection of document A, B polymerization screening:(<Document A>.link|<Document A>)=<Document B>;
2) collection of document screens:(<Document>.attribute|<Association>.attribute) operator basic data types;
3)<Expression formula>AND|OR<Expression formula>.
9. method as claimed in claim 3, it is characterised in that the analysis program in the unified query language includes:Entirely
Syntactic structure associated class, syntax tree abstract class and interface, syntax tree concrete kind.
10. the method as described in claim 1, it is characterised in that developed in actual applications for the unified query language
The SDK of Database Systems, and some compensation operations are carried out on the basis of local data database query language, then client-side program leads to
Cross and call the API in SDK to use the unified query language operating database;Or the unification is directly based upon to database
Query language designs communication protocol, the Query Result that client-side program is needed by sending network request.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710515380.6A CN107515887B (en) | 2017-06-29 | 2017-06-29 | Interactive query method suitable for various big data management systems |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710515380.6A CN107515887B (en) | 2017-06-29 | 2017-06-29 | Interactive query method suitable for various big data management systems |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107515887A true CN107515887A (en) | 2017-12-26 |
CN107515887B CN107515887B (en) | 2021-01-08 |
Family
ID=60721837
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710515380.6A Active CN107515887B (en) | 2017-06-29 | 2017-06-29 | Interactive query method suitable for various big data management systems |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107515887B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109033260A (en) * | 2018-07-06 | 2018-12-18 | 天津大学 | Knowledge mapping Interactive Visualization querying method based on RDF |
CN109241054A (en) * | 2018-08-02 | 2019-01-18 | 成都松米科技有限公司 | A kind of multimodal data library system, implementation method and server |
CN110109951A (en) * | 2017-12-29 | 2019-08-09 | 华为软件技术有限公司 | A kind of method of correlation inquiry, database application system and server |
CN110765151A (en) * | 2018-07-27 | 2020-02-07 | 北京国双科技有限公司 | Calculation formula processing method and device |
CN111221785A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Semantic data lake construction method of multi-source heterogeneous data |
CN111475534A (en) * | 2020-05-12 | 2020-07-31 | 北京爱笔科技有限公司 | Data query method and related equipment |
CN112084248A (en) * | 2020-09-11 | 2020-12-15 | 党丹 | Intelligent data retrieval, lookup and model acquisition method based on graph database |
CN112148925A (en) * | 2019-06-27 | 2020-12-29 | 北京百度网讯科技有限公司 | User identification correlation query method, device, equipment and readable storage medium |
CN112632037A (en) * | 2020-12-24 | 2021-04-09 | 山东浪潮通软信息科技有限公司 | Method and device for graphically defining query data set |
CN113282625A (en) * | 2021-05-31 | 2021-08-20 | 重庆富民银行股份有限公司 | SQL-based API data query and processing system and method |
CN113515610A (en) * | 2021-06-21 | 2021-10-19 | 中盾创新档案管理(北京)有限公司 | File management method based on object-oriented language processing |
CN113761290A (en) * | 2021-03-10 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Query method and query system for realizing full-text search graph database based on SQL |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208723A1 (en) * | 2006-03-03 | 2007-09-06 | International Business Machines Corporation | System and method for building a unified query that spans heterogeneous environments |
CN102073701A (en) * | 2010-12-30 | 2011-05-25 | 浪潮集团山东通用软件有限公司 | Semantic definition-based multi-data source data querying method |
CN105468702A (en) * | 2015-11-18 | 2016-04-06 | 中国科学院计算机网络信息中心 | Large-scale RDF data association path discovery method |
CN106294402A (en) * | 2015-05-21 | 2017-01-04 | 阿里巴巴集团控股有限公司 | The data search method of a kind of heterogeneous data source and device thereof |
CN106372177A (en) * | 2016-08-30 | 2017-02-01 | 东华大学 | Query expansion method supporting correlated query and fuzzy grouping of mixed data type |
-
2017
- 2017-06-29 CN CN201710515380.6A patent/CN107515887B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070208723A1 (en) * | 2006-03-03 | 2007-09-06 | International Business Machines Corporation | System and method for building a unified query that spans heterogeneous environments |
CN102073701A (en) * | 2010-12-30 | 2011-05-25 | 浪潮集团山东通用软件有限公司 | Semantic definition-based multi-data source data querying method |
CN106294402A (en) * | 2015-05-21 | 2017-01-04 | 阿里巴巴集团控股有限公司 | The data search method of a kind of heterogeneous data source and device thereof |
CN105468702A (en) * | 2015-11-18 | 2016-04-06 | 中国科学院计算机网络信息中心 | Large-scale RDF data association path discovery method |
CN106372177A (en) * | 2016-08-30 | 2017-02-01 | 东华大学 | Query expansion method supporting correlated query and fuzzy grouping of mixed data type |
Non-Patent Citations (3)
Title |
---|
徐仕超 等: "基于数据库集群的海量RDF数据联合查询系统的研究与实现", 《科研信息化技术与应用》 * |
沈志宏 等: "OpenCSDB关联数据在科学数据库中的应用研究", 《中国图书馆学报》 * |
王林彬 等: "基于NoSQL的RDF数据存储与查询技术综述", 《计算机应用研究》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110109951B (en) * | 2017-12-29 | 2022-12-06 | 华为技术有限公司 | Correlation query method, database application system and server |
CN110109951A (en) * | 2017-12-29 | 2019-08-09 | 华为软件技术有限公司 | A kind of method of correlation inquiry, database application system and server |
CN109033260A (en) * | 2018-07-06 | 2018-12-18 | 天津大学 | Knowledge mapping Interactive Visualization querying method based on RDF |
CN109033260B (en) * | 2018-07-06 | 2021-08-31 | 天津大学 | Knowledge graph interactive visual query method based on RDF |
CN110765151A (en) * | 2018-07-27 | 2020-02-07 | 北京国双科技有限公司 | Calculation formula processing method and device |
CN109241054A (en) * | 2018-08-02 | 2019-01-18 | 成都松米科技有限公司 | A kind of multimodal data library system, implementation method and server |
CN111221785A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Semantic data lake construction method of multi-source heterogeneous data |
CN112148925A (en) * | 2019-06-27 | 2020-12-29 | 北京百度网讯科技有限公司 | User identification correlation query method, device, equipment and readable storage medium |
CN112148925B (en) * | 2019-06-27 | 2024-03-01 | 北京百度网讯科技有限公司 | User identification association query method, device, equipment and readable storage medium |
CN111475534A (en) * | 2020-05-12 | 2020-07-31 | 北京爱笔科技有限公司 | Data query method and related equipment |
CN111475534B (en) * | 2020-05-12 | 2023-04-14 | 北京爱笔科技有限公司 | Data query method and related equipment |
CN112084248A (en) * | 2020-09-11 | 2020-12-15 | 党丹 | Intelligent data retrieval, lookup and model acquisition method based on graph database |
CN112632037A (en) * | 2020-12-24 | 2021-04-09 | 山东浪潮通软信息科技有限公司 | Method and device for graphically defining query data set |
CN112632037B (en) * | 2020-12-24 | 2023-04-07 | 浪潮通用软件有限公司 | Method and device for graphically defining query data set |
CN113761290A (en) * | 2021-03-10 | 2021-12-07 | 中科天玑数据科技股份有限公司 | Query method and query system for realizing full-text search graph database based on SQL |
CN113282625B (en) * | 2021-05-31 | 2022-10-04 | 重庆富民银行股份有限公司 | SQL-based API data query and processing system and method |
CN113282625A (en) * | 2021-05-31 | 2021-08-20 | 重庆富民银行股份有限公司 | SQL-based API data query and processing system and method |
CN113515610B (en) * | 2021-06-21 | 2022-09-13 | 中盾创新数字科技(北京)有限公司 | File management method based on object-oriented language processing |
CN113515610A (en) * | 2021-06-21 | 2021-10-19 | 中盾创新档案管理(北京)有限公司 | File management method based on object-oriented language processing |
Also Published As
Publication number | Publication date |
---|---|
CN107515887B (en) | 2021-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107515887A (en) | A kind of interactive query method suitable for a variety of big data management systems | |
CN110291517B (en) | Query language interoperability in graph databases | |
US20220050840A1 (en) | Natural language query translation based on query graphs | |
EP2652645B1 (en) | Extensible rdf databases | |
US5873079A (en) | Filtered index apparatus and method | |
US5884304A (en) | Alternate key index query apparatus and method | |
US5870739A (en) | Hybrid query apparatus and method | |
Jensen et al. | Converting XML DTDs to UML diagrams for conceptual data integration | |
CN106934062A (en) | A kind of realization method and system of inquiry elasticsearch | |
WO2015069941A1 (en) | Generic indexing for efficiently supporting ad-hoc query over hierarchically marked-up data | |
CA2576744A1 (en) | System for ontology-based semantic matching in a relational database system | |
Biskup et al. | Extracting information from heterogeneous information sources using ontologically specified target views | |
Bellatreche et al. | A design methodology of ontology based database applications | |
Natarajan et al. | [Retracted] Schema‐Based Mapping Approach for Data Transformation to Enrich Semantic Web | |
Das et al. | MyNLIDB: a natural language interface to database | |
Gapanyuk | Metagraph approach to the information-analytical systems development | |
Zhang et al. | Construction of fuzzy ontologies from fuzzy XML models | |
CN111475534B (en) | Data query method and related equipment | |
Gunaratna et al. | Alignment and dataset identification of linked data in semantic web | |
Palopoli et al. | Experiences using DIKE, a system for supporting cooperative information system and data warehouse design | |
US20230273947A1 (en) | System and method for implementing ontologies in sql | |
Porto et al. | ROSA: A Data Model and Query Language for e-Learning Objects. | |
Tang et al. | Ontology-based semantic retrieval for education management systems | |
Liu et al. | Complex event processing for sequence data and domain knowledge | |
Ramanujam et al. | Relationalization of provenance data in complex RDF reification nodes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |