CN107515887B

CN107515887B - Interactive query method suitable for various big data management systems

Info

Publication number: CN107515887B
Application number: CN201710515380.6A
Authority: CN
Inventors: 沈志宏; 李跃鹏; 黎建辉
Original assignee: Computer Network Information Center of CAS
Current assignee: Computer Network Information Center of CAS
Priority date: 2017-06-29
Filing date: 2017-06-29
Publication date: 2021-01-08
Anticipated expiration: 2037-06-29
Also published as: CN107515887A

Abstract

The invention relates to an interactive query method suitable for various big data management systems, which comprises the following steps: 1) establishing an association document model which comprises a document set and an association set, wherein the association set is a set formed by the association between documents; 2) converting different original data models into associated document models, and connecting different data sources into a whole through the associated document models; 3) establishing a uniform query language suitable for the multi-metadata based on the associated document model; 4) and the unified query language suitable for the multivariate data is utilized to realize the unified query of the relational database, the graph database and the file system. The invention provides a unified query language suitable for a multivariate data management system for the first time, and can realize unified query of a relational database, a graph database and a file system.

Description

Interactive query method suitable for various big data management systems

Technical Field

The invention relates to a query language, in particular to an interactive query language and a query method suitable for a big data management system, and belongs to the technical field of big data and databases.

Background

With the continuous popularization of computers, the demands for data management and processing are increasingly urgent, people propose different data models aiming at different data forms and characteristics, and realize the management and analysis of data by a corresponding data management system. More influential data models, such as the E-R model, have dominated the database world for over 40 years since the 70's of the last century. In recent decades, with the penetration of internet and internet of things applications, the generation of large-scale structured, semi-structured, unstructured data has triggered NoSQL movements [ Cattell r. The database world is transformed from the first SQL monopoly scenario to the traditional SQL, NoSQL, NewSQL divide and conquer scenario.

Constructing a complete big data application system needs to fully consider the data from 4V [ Gupta R, Gupta H, Mohania M. cloud computing and big data analytics: what is new from data bases expert? [C] v/Proc of 1st BDA, New Delhi, India:, Springer Berlin Heidelberg,2012: 42-61 ], further analysis of the big data, correlation mining, and even scientific discovery. Taking scientific data of biology as an example, there are a large amount of microscopic data such as gene sequence files, protein sequence files, and structure and function of protein, which are generated by sequencing, mass spectrometry, nuclear magnetic resonance and other instruments every day, macroscopic data such as species information, physiological and biochemical properties, reaction condition information, and the like, which are stored by traditional MongoDB or SQL database, and also a large amount of knowledge information such as literature, patent and the like. In order to better realize knowledge discovery, researchers often introduce biological ontologies and manage large-scale association among species, proteins, genes and other data in an RDF (resource description framework) association network mode. These micro-and macro-level information ultimately form an organic database to understand and study life from an overall level. Data-driven scientific discovery often needs to be completed by scheduling a series of data pipelines, and it can be seen that these pipelines span multiple processes of data acquisition, batch writing, query, analysis, visualization, and the like, which has a huge problem: how can pipeline programmers access and manipulate data in a uniform manner without considering the differences in the underlying data storage model? The problem is mapped to a data management technology, namely how to cross the boundary of SQL, NoSQL and NewSQL databases, realize the uniform data access of a multivariate data model, and provide a uniform data operation interface for calculation frameworks such as Hadoop and Spark.

At present, a relational database covers a distributed database to a memory database, mainly comprises MySQL, PostgreSQL, Oracle, SQLite and the like, the consistency of data access is ensured through ACID and transactions, and a table, a column and keywords are used for processing data, so that the method is suitable for application occasions with fixed structures and strong consistency. In 10 months 1986, ANSI in the united states used SQL as the standard language for relational database management systems (ANSI X3.135-1986), followed by ISO adoption as an international standard. SQL is thus the most widely used relational database query language at the present time.

The NoSQL database comprises a Key-Value database, a column database, a document database and a database. Because the NoSQL database is lack of a set of uniform query language at present, some researches are dedicated to packaging an SQL query interface aiming at the NoSQL database, for example, Hive provides an HQL query language similar to SQL, and the use difficulty of the NoSQL database is simplified. Spark SQL is a SQL implementation based on Spark DataFrame big data processing framework, and supports SQL-based big data processing and analysis. Based on DataFrame, Spark can provide SQL query analysis capability based on big data for the current massive databases such as MySQL, HBase, Cassandra, MongoDB.

As an important branch in NoSQL databases, graph databases are often used to manage large-scale association information, such as associations between species and genes, social relationships among people, amazon warehouse retail owner data systems, and the like, and support fast association retrieval based on an attribute graph model. Typical graph databases currently include Neo4j, Titan, Virtuoso, and the like. For the graph database, Neo4J provides a Cypher query language, and associated query of a grammatical concise expression graph data model similar to SQL is adopted, so that the difficulty in using the graph database is simplified. The TinkerPop project proposes a Gremlin graph traversal analysis language facing attribute graphs, and supports various graph databases, such as Titan, OrientDB, TinkerGraph and the like, which are called Perl of graph database boundary. In addition, the RDF model is a semantic description framework based on a graph model, is suitable for expressing semantic information and association thereof, and currently, typical RDF databases include Jena, Virtuoso and the like. The RDF data access working group published the first RDF query language SPARQL in 2004, and the SPARQL protocol and query language were formally a W3C recommendation in 2008. SPARQL adopts a structured query mode, and realizes related query through where subgraph matching, and at present, most RDF databases support standard SPARQL query.

It is obvious that there is still no unified set of query language for SQL and NoSQL databases, and the graph database is usually pushed to the other side of SQL query language due to its special query and analysis method. Therefore, when selecting a data model, one often needs to make a choice: is it chosen for SQL databases (including SQL-enabled NoSQL databases), or for graph databases? The choice often brings differences of upper-level applications, namely, the Gremlin and SPARQL query languages with strong association analysis capability are adopted, or the traditional SQL query language based on a two-dimensional table is adopted?

Based on the above background, the present invention provides a new query language Simba, which is used to implement unified query of relational databases, graph databases, and file systems.

Disclosure of Invention

The invention aims to provide an interactive query method suitable for various big data management systems, which can realize query aiming at a relational database, a graph database and a file system through a uniform query language Simba.

The technical scheme adopted by the invention is as follows:

an interactive query method suitable for a plurality of big data management systems comprises the following steps:

1) establishing an association document model which comprises a document set and an association set, wherein the association set is a set formed by the association between documents;

2) converting different original data models into associated document models, and connecting different data sources into a whole through the associated document models;

3) establishing a uniform query language suitable for the multi-metadata based on the associated document model;

4) and the unified query language suitable for the multivariate data is utilized to realize the unified query of the relational database, the graph database and the file system.

Further, the uniform query language suitable for the multivariate data management system comprises four clauses of FIND, WITH, WHERE and RETURN; the FIND statement determines the basic variables of the query, which must represent the document; the WITH statement determines intermediate variables used in matching the conditional grammar; the WHERE statement determines the conditions which need to be met when the returned result is returned; the RETURN statement contains the data reference that needs to be returned to the user.

Further, the basic query space in the FIND statement is composed of one type of document or multiple types of documents, and requires that the associated document model cannot perform a comparison between two types of documents without association; the expansion of the documents and the association in the basic query space is implicitly defined in the WITH statement; the WHERE statement can implicitly define the document and the association for expanding the query space and can also perform the selection operation of an intermediate model of the associated document; the RETURN statement contains a document, a link, a URL of an attribute hierarchy, or a variable representing the URL, and mainly performs a projection operation of an associated document model, and a returned result is an associated document.

Further, the execution process of the unified query language is divided into four steps: determining documents, establishing relations among the documents, selecting and projecting.

Further, different data sources are connected into a whole through the associated document model to form a network, and the data reference grammar of the uniform query language is formed in a URL-like form to uniformly access data in the network.

Further, intermediate variables in the unified query language represent document sets, numerical values and character strings related to the basic search space, the intermediate variables are used in matching grammar, and corresponding condition matching operation is carried out according to types of the intermediate variables.

The set of intermediate languages provided by the invention is independent of a specific operating system and a programming language. Since the Simba language includes operations of a plurality of data models, some of the operations cannot be directly completed by the database, in practical applications, some compensation operations can be performed on the basis of the local database query language by developing an SDK (Software Development Kit) of the database system for the Simba language. Such as: a Java package (or C + + package) capable of understanding and executing the Simba language is developed for the MongoDB database, so that the client can operate the MongoDB by calling an API (Application Programming Interface) in the SDK in the Simba language, that is, query the data management system through the SimbaQL translator, which is shown in fig. 1.

Alternatively, the database designs the communication protocol directly based on the Simba language, and the client program can obtain the required query result by sending the network request of the Simba command, and the mode is as shown in fig. 2.

As shown in FIG. 3, the Simba query language of the present invention comprises the following parts:

SimbaQL syntax structure: the overall structure of SimbaQL and the syntax of each clause such as FIND, WITH, WHERE, RETURN, etc. are provided.

2. Data reference syntax: explaining how to refer to data in a data source in the SimbaQL;

3. intermediate variable syntax: it is explained how intermediate variables are defined and used in SimbaQL;

4. matching condition grammar: explaining how to write matching conditions in the SimbaQL;

SimbaQL resolution procedure: providing a Java-based SimbaQL analysis program for writing a SimbaQL client program or querying an engine execution program;

compared with the prior art, the invention has the following advantages:

(1) the unified query language suitable for the multivariate data management system is provided for the first time, and the language can realize unified query on a relational database, a graph database and a file system. Records in the relational table that satisfy the specified attribute conditions may be retrieved, multiple vertices in the graph database that satisfy the specified association conditions may be retrieved, and specific files in the file system may also be retrieved. In the current development technology, an application program must respectively realize the retrieval of a relational database, a graph database and a file system through an SQL query language, a Cypger/gremlin language and an API (application programming interface), and the difference method brings difficulty in mastering various languages and non-universality of programming. While with SimbaQL, a set of uniform syntax formats is required. This difference is shown in fig. 4 and 5.

(2) The typical query mode of a big data management system is summarized, the complex SQL query function and graph query function are simplified, the goal of SimbaQL is to cover most query requirements, and the mainstream data management system can conveniently support the language, so that the complex functions in SQL query and graph query are abandoned, such as: sub-query, or UNION of query results. SimbaQL suggests these secondary operations, which can be done by a big data computation framework, and the SimbaQL itself only accomplishes the function of simple data query extraction.

(3) The SimbaQL language aims at the query of the database of the data models, and the query comprises the operation of the data models. Therefore, if the query language of a certain model cannot complete the operations of other models, the realization of the SimbaQL language can help the model to complete. For example, MongoDB's query language cannot complete JOIN operations on documents, while SimbaQL supports JOIN operations, so the implementation of SimbaQL compensates for these operations.

(4) SimbaQL introduces the properties of intermediate variables to express and hide information that is not of interest to the user. Take the example of finding two entities with associations:

FIND x,y WITH$m＝x.child WHERE$m.child＝y RETURN x,y

this statement introduces the intermediate variable $ m, the object represented by this variable is the child of x, and y is his child. The query is used to return all grandgrandchildren, but the application that proposes the query does not need to care about who $ m is specific.

Meanwhile, spelling repetition is avoided by the method, and the quick writing method is realized.

(5) SimbaQL introduces an expression mode of multi-level reference attributes, such as: name represents the name of someone z that someone y that x knows. In conventional query languages, multi-level referencing is not supported. The method effectively reduces the repetition of codes and has intuitive effect.

Drawings

Figure 1 illustrates the manner in which a query is made to a data management system by a SimbaQL translator.

Fig. 2 illustrates the manner in which queries are made directly to the data management system via the SimbaQL network protocol.

Fig. 3 shows a block diagram of the content of the present invention.

Fig. 4 shows that different management systems need to be queried in different languages in the prior art.

FIG. 5 shows the present invention using SimbaQL language to query different management systems in a unified way.

Fig. 6 is a schematic structural diagram of the LDM model.

Detailed Description

The invention is further illustrated by the following specific examples and the accompanying drawings.

The SimbaQL design of the invention is based on a Linkeddocument intermediate Model (an associated Document Model, LDM for short), and achieves the purpose of uniformly querying various data Model databases through the mapping of LDM operation and other Model operations and the compensation operation of SDK.

1. Linked Document model

1) Linked Document model definition

A document is a collection of a set of attributes, which are collections of the same type of data. Each document contains by default a uniquely identified master code attribute. The main code attribute is similar to the function of an IP address and must be a global unique identifier; the type of other attributes may be arbitrary, including a document, association, custom type, etc. An association is a special document that must contain two attributes (from: main code, to: main code) to represent an association between documents, which refers to a relationship between two pieces of data, such as a knows association between one person document and another person document representing that the first person knows the second person. Both the document set and the association set must have a name identifier to account for the semantics of the documents and associations in the set. The number of attributes in the same type of document or association may be different, which means that { 'id': fff0 ',' name ': blue joe', 'age': 30} can be a member of both person class documents and teacher class documents.

The LDM model is a binary set (document set, association set) composed of a document set and an association set, wherein the association set is a plurality of relationship sets between two types of documents. The general structure of the LDM model is shown in fig. 6. Wherein Documents represents a document set, Links represents an association set, PersonDocument represents a document set such as a person, software document represents a document set such as a software document set, InventLink represents a set of associations such as a person invented software, 1 and 2 represent main codes of unique identifiers of Documents, and attr1 and attr2 … represent attributes of the Documents.

2) LDM conversion rules

LDM is directed to the query and analysis of data, which provides two types of transformation rules: conversion of raw data model to LDM, conversion of LDM to existing programming model requires format conversion.

a) Original data model → LDM

The formal definition of the data model transformation is (G, L, M), where G represents Schema of the global model, i.e. LDM, L represents the local data model (relational model, key-value model, document model, property graph model), and M represents the mapping rule from L to G. The conversion of the original data model into the LDM mainly considers the semantics of the data, and the conversion at the data type level can be determined by developers according to the system requirements. The transformations given below include the raw data models of relational, key-value, document, and property graph models, with the main transformation rules as shown in table 1. The self-defined conversion rule is to extract a data set meeting certain characteristics according to the characteristics of the original data model. For example, data containing Person's keys in the key-value model is extracted to be used as a Person class document set; extracting a peak with a legacy of Person in the attribute graph model as a Person class document; the relation that personid of the Person class document in the document model is equal to personid in the Software document is extracted as a connection set invent.

TABLE 1 conversion rules of raw data model to LDM

LDM	Relational model	Key-value model	Document model	Attribute graph model
					Properties	Properties	Key	Properties	Properties
Document	Recording	Pair	Document	Vertex point
					Document collection	Watch (A)	Self-defining	Collection	Self-defining
Connection of	External key	Self-defining	Self-defining	Edge
					Connection set	External key	Self-defining	Self-defining	Self-defining

It should be noted that in LDM, both the document set and the association set must have a name, so during the conversion process for the foreign key and other custom parts of the relationship model, a name must be provided by the converter as the semantic of the collection element. For example, in the attribute graph model, a node with a stable of 'person' may be used as a person class document in the LDM; it is also possible to let the node containing the attribute 'teacher' be the teacher class document in the LDM, and in fact both classes of documents may correspond to the same node.

In addition, the conversion of the original model to the LDM may not be limited to the above model, and a developer may define conversion rules of other data models to the LDM, such as a file system, a column database, etc., according to requirements.

b) LDM → Programming model

The conversion of LDM into programming models primarily takes into account relationships on the data structure. The data structures accepted by the current popular programming models such as map/reduce, spark SQL, Pergel and the like mainly comprise arrays, tables and graphs. The rules for the conversion of LDM to these three data structures are therefore given below, as shown in Table 2.

TABLE 2 conversion rules of LDM to array, table, graph

3) LDM operation rule

The operation rule of the LDM is defined based on the operation of a relationship model, a key-value model, a document model and an attribute graph model. The method comprises the steps of set operation, connection operation, selection operation and projection operation of a relation model; get operation of the key-value model; selecting a document model and performing projection operation; and traversing the attribute graph model and selecting operation. Algorithms of the LDM model are mainly divided into three categories: the specific operation rules of the set operation, the association operation and the document operation are shown in table 3.

TABLE 3 operational rules of LDM

4) LDM data access rules

Since LDM links databases to a network, we can use a URL-like format to reference data in the network. This URL is of the form:

the data source represents a data source, such as MySQL, MongoDB and the like, the document represents a document mapped from the data source to the LDM, the link represents the association mapped from the data source to the LDM, the identity represents the main code of the document, and the propertylame represents the attribute name of the document.

Data can be referenced at different levels, for example, a reference to the name property of a person document in a MySQL database can be expressed as:

MySQL.person.name

applying the gather association of a person document in a MongoDB database can be expressed as:

MongoDB.person.father

the association represents the document set corresponding to the association, and further reference can be carried out, such as

MongoDB.person.father.name

The data corresponding to the data reference URL is actually the result after the relational operation and the projection operation of the LDM. For example, the data represented by mongodb.

2. SimbaQL grammar structure

Like SQL and relational models, based on the Linked Document model, each SimbaQL statement can be converted into an operational formula of Linked Document, which is composed of the following operations in table 2: "establish association" and "select operation" in "association operation"; "selection operation" and "projection operation" in "document operation". The SimbaQL query statement mainly comprises four clauses of FIND, WITH, WHERE and RETURN, and the grammar structure is as follows:

FIND<documents>

WITH<variables>

WHERE<conditions>

RETURN<urls>

the FIND clause determines basic variables of the query, and each variable must correspond to a type of document in the Linked document; the WITH statement determines intermediate variables used in matching the conditional grammar, which may be of various data types and are not limited to documents; the WHERE determines the conditions which need to be met when the returned result is returned; the RETURN statement contains the data reference that needs to be returned to the user. The LDM operation process corresponding to SimbaQL is given below.

First, the basic query space in FIND consists of one or more classes of documents, and SimbaQL requires that LDM cannot make comparisons between two classes of documents without associations. For example, querying data of a person object and a software object can be expressed as:

FIND MySQL.person p,MySQL.software s

if there is no association between the person document and the software document, we can only perform selection operations on person and software separately, but not on selection operations like p.inventid ═ s.id.

The variables defined in the WITH implicitly define the extension of the documents and associations in the basic query space, such as:

FIND person p WITH$soft＝p.invent

the above statements indicate that the Linked Document we search contains Software documents, and associated invent. It is and

Find person p,software s WITH$soft＝p.invent

are equivalent.

The WHERE statement can not only implicitly define the document and the association for expanding the query space, but also can perform LDM selection operation. Such as:

FIND person p WHERE p.invent.name＝’simba’

then it is implicitly determined that the set of documents (person, software) and the associated set (invent) are included in the LDM, and the software document is required to satisfy the condition that the name attribute is 'simba'.

The RETURN statement may contain a URL of a document, a link, an attribute hierarchy, or a variable representing a URL. The statement mainly executes the projection operation of the LDM, and the returned result is a Linked Document.

In summary, the execution of SimbaQL is divided into four steps: determining documents, establishing relations among the documents, selecting and projecting. Assume the basic search space in FIND is A, B; c and the association L1 between A and C are implicitly determined in the WHERE statement, and the selection condition is condition; the projection space in the RETURN statement is space, and the other documents obtained through operation are doc; then the LDM operation corresponding to the SimbaQL statement is:

result＝σ_spaceπ_condition((A×_dB)_A×_L1C)

such as the SimbaQL statement:

FIND person p，software s WHERE p.name＝’bluejoe’and p.invent.name＝’simbaql’return p.name

the corresponding LDM operation is:

result＝σ_p.nameπ_{p.name＝′bluejoe′and software＝′simbaql′}(Person×_inventSoftware)

3. data reference syntax (also called attribute expression syntax, as shown in FIG. 3)

the data source represents a data source registered in the associated document, the document represents a document class in the associated document, the link represents a connection set in the associated document, the URL may contain a plurality of links, the identity represents the id of the document, and the propertyName represents the attribute of the document.

MySQL.person.name

MongoDB.person.father

MongoDB.person.father.name

4. Intermediate variable syntax

The intermediate variables may represent a set of documents, values, strings, related to the basic search space. The representation of the variable consists of a $ symbol and an identifier, which defines the use of the WITH statement:

such as:

success $ c1 ═ p.knows.knows (document set)

WITH $ c2 ═ 123 (value)

WITH $ c3 ═ 'bluejoe' (string)

The intermediate variables are used in the matching grammar, and corresponding condition matching operation is carried out according to the types of the intermediate variables. When the intermediate variable is a document set, its main role is to replace a part of the content of the data reference URL.

5. Matching condition grammar

The match condition is an expression of the type pool returned by the WHERE statement. The grammatical rules of the expression are as follows:

1) aggregation screening of document sets A and B: (< document a >. link | < document a >) < document B >

2) Screening a document set: (document >. attribute | < association >. attribute) operator basic data type

3) < expression > AND | OR < expression >

Wherein the operator operators currently supported include: each of the two methods is as follows. For the case where < association >. attribute or < document a >. link represents a collection of documents, the "═ operator means" present ", for example: name ' bluejoe ' represents the presence of a person named ' bluejoe ' among known persons, and p.knows ' p1 indicates the presence of p1 among p known persons.

It should be noted that although the WHERE statement corresponds to the selection operation of the LDM, the selection operation in SimbaQL only selects the attribute value of the document. Such as:

FIND person p，software s WHERE p.invent＝s AND s.name＝’SimbaQL’RETURN p.name

although the selection condition includes p.invent ═ s, the actual selection condition is s.name ═ SimbaQL'.

6. SimbaQL resolution program

The SimbaQL parsing program mainly includes the following classes:

the whole grammar structure related class: statement, SearchSpace, VariableDefines, Conditions, SubSpace, the meanings of which are given in Table 1 below.

Syntax tree abstract classes and interfaces: node, Condition, Variable, Document, AttributeDocument (documents derived from attributes such as p.knows, $ p.knows, etc.), value Expression (value type of expression).

Syntax tree specific classes: RawDocument, RawAttribute, RawValible, WithValible, StringValue, IntegerValue, TerminalCondition, And, Or, Not, Varible Attribute, Document reference, Operator. Where RawDocument is used to represent a document such as Person p in FIND Person p; rawtribute is used to represent attributes such as p.name; VaribleAttribute represents an attribute such as $ k.name that is derived by a variable; withvariable denotes a variable WITH $ k ═ k in p.knows; RawVariable represents p such document variables defined by FIND; StringValue and IntegerValue respectively represent character strings and integers; and add, Or, Not are used to represent prepositions And, Or, Not in the expression; document reference denotes a knows connection in p.knows.p.knows.name; operator is used to represent the compare Operator and TerminalCondition represents an irreparable expression such as p.age > 30.

The basic information of the above classes is shown in table 4:

TABLE 4 Abstract syntax Tree basic information

In addition to the related JAVA classes of the above abstract syntax tree, the parser also contains the lexical syntax parsing class of ANTLR 4: SimbaQLLEr, SimbaQLParser, SimbaQLBaseListener, SimbaQLBaseVisitor, SimbaQLVisitor. Wherein, the SimbaQLLEr is a lexical analyzer of the SimbaQL sentence generated by ANTLR4 and is used for judging whether the words in the sentence accord with grammar; SimbaQLParser is a syntactic parser for SimbaQL; SimbaQLBaseListener and SimbaQLBaseVisitor access the base classes of the syntax tree with listener and visitor, respectively; SimbaQLVisitor is inherited from SimbaQLBaseVisitor for the viewer mode access syntax tree.

Abstract syntax tree building class: the AstBuilder is used for constructing an abstract syntax tree, and the abstract syntax tree provides an input SimbaQL Statement and outputs a syntax tree guided by state.

Syntax error checking class: AstChecker, DBchecker. Wherein the AstChecker can detect query statements that do not conform to syntax; DBchecker is used to detect content in a query statement that conflicts with a data source, such as p.knows contained in the query statement, but there is no knows connection in the data source.

Syntax parsing case (output syntax parsing tree): SimbaParser, treepriner. SimbaParser is a case program for constructing a syntax parsing tree of a query statement and printing out a parsing tree structure; treeprenter is a program that prints a syntax parse tree.

The above embodiments are only intended to illustrate the technical solution of the present invention and not to limit the same, and a person skilled in the art can modify the technical solution of the present invention or substitute the same without departing from the spirit and scope of the present invention, and the scope of the present invention should be determined by the claims.

Claims

1. An interactive query method suitable for a plurality of big data management systems comprises the following steps:

4) unified query of a relational database, a graph database and a file system is realized by using a unified query language suitable for multivariate data;

the documents in the document set of the associated document model are a set formed by a group of attributes, and the attributes are sets formed by the same type of data; each document default contains a main code attribute, and the main code attribute is a global unique identifier; the document set and the association set have a name identifier to explain the semantics of the documents and the associations in the set;

the uniform query language suitable for the multivariate data management system comprises four clauses of FIND, WITH, WHERE and RETURN; the FIND statement determines the basic variables of the query, which must represent the document; the WITH statement determines intermediate variables used in matching the conditional grammar; the WHERE statement determines the conditions which need to be met when the returned result is returned; the RETURN statement contains the data reference that needs to be returned to the user;

the basic query space in the FIND statement is composed of one type of document or multiple types of documents, and requires that the associated document model cannot perform comparison between two types of documents without association; the expansion of the documents and the association in the basic query space is implicitly defined in the WITH statement; the WHERE statement can implicitly define the document and the association for expanding the query space and can also perform the selection operation of an intermediate model of the associated document; the RETURN statement comprises a document, a link, a URL of an attribute level or a variable representing the URL, the statement mainly executes projection operation of a relevant document model, and a returned result is a relevant document;

connecting different data sources into a whole through the associated document model to form a network, and forming data reference grammar of the uniform query language in a form similar to URL (uniform resource locator) to uniformly access data in the network; the URL is of the form:

the data source represents a data source, the document represents a document mapped from the data source to an associated document model, the link represents the association mapped from the data source to the associated document model, the identity represents the main code of the document, and the propertylame represents the attribute name of the document.

2. The method of claim 1, wherein the execution of the unified query language is divided into four steps: determining documents, establishing relations among the documents, selecting and projecting.

3. The method of claim 1, wherein intermediate variables in the unified query language represent document sets, values, strings associated with the basic search space, the intermediate variables being used in the matching syntax, and corresponding conditional matching operations being performed according to the type of the intermediate variables.

4. The method of claim 1, wherein the matching condition in the unified query language is an expression of the boy type returned by the WHERE statement, and the syntax rules of the expression are as follows:

1) aggregation screening of document sets A and B: (document a >. link | < document a >) < document B >;

2) screening a document set: (document >. attribute | < association >. attribute) operator basic data type;

3) < expression > AND | OR < expression >.

5. The method of claim 1, wherein the parser in the unified query language comprises: the whole grammar structure related class, grammar tree abstract class and interface, grammar tree concrete class.

6. The method of claim 1, wherein in real applications, an SDK of a database system is developed for the unified query language and some compensation operations are performed on the basis of a local database query language, and then a client program operates the database using the unified query language by calling an API in the SDK; or a communication protocol is designed for the database directly based on the uniform query language, and the client program obtains the required query result by sending a network request.