CN107818181A - Indexing means and its system based on Plcient interactive mode engines - Google Patents
Indexing means and its system based on Plcient interactive mode engines Download PDFInfo
- Publication number
- CN107818181A CN107818181A CN201711203695.3A CN201711203695A CN107818181A CN 107818181 A CN107818181 A CN 107818181A CN 201711203695 A CN201711203695 A CN 201711203695A CN 107818181 A CN107818181 A CN 107818181A
- Authority
- CN
- China
- Prior art keywords
- task
- hiveql
- query
- sentences
- plcient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to the indexing means and its system based on Plcient interactive mode engines, this method includes obtaining HiveQL sentences;Plcient compilings are carried out to HiveQL sentences, obtain execution task;Execution task is submitted into control node;Execution task is handed to executive process engine and performed by control node, obtains metadata information;Metadata information is submitted into task tracker or explorer and performed;Read file in HDFS to be operated accordingly, obtain and return to implementing result.The present invention defines syntax rule using Antlr open source softwares, it enormously simplify the compiling resolving of morphology and grammer, design stage by stage makes whole compilation process code easily safeguard, each logical operator only completes single function, simplify whole MapReduce programs, realize the ageing of enhancing big data retrieval so that inquiry mode is more flexible, and execution efficiency is higher.
Description
Technical field
The present invention relates to indexing means, more specifically refer to indexing means based on Plcient interactive mode engines and its
System.
Background technology
Data are required for index to be aided in during inquiry or processing at present, but traditional relationship type number
The problem of following be present according to the index in storehouse, first, index is stored in local hard drive, index does not allow manageability, disaster tolerance and High Availabitity
Realize that cost is higher, the moving costs of index and the capacity of unit hard disk constrain its index scale and size, if logical
To cross the modes such as redundancy (" master/slave " or " double to write ") and realize data disaster tolerance, the design difficulty of data consistency is larger,
If there is " bad point " problem, it is abnormal, operation system that certain segment data that a certain moment reads, which wherein has byte value,
System can not timely find that, but it is also possible that causing the pointer entirely indexed abnormal, the data inquired are inaccurate;Second,
The management of table, index, scheduling were once mixed in together, and it is too many to dispatch the thing of system administration, should manage index, manage the heart again
Jump, also to safeguard disaster tolerance, cause not come in the machine scale of scheduling system, same computing resource is assigned to only fixed index
Data cause the too many waste of computing resource;Third, too high to hardware requirement, data long-lasting must be trapped in internal memory,
Otherwise quickly can not load and inquire about data, it is higher to hardware requirement be typically all need big internal memory (more than 128G) and
SSD hard disks, the data of 10,000,000,000 scales even need hundreds machine to support quick inquiry, come for the data of trillion scales
Say that cost is too high;Fourth, when carrying out computing using Spark, because code quality problem long-play can often malfunction, in frame
In terms of structure, because mass data is buffered in RAM, the Java recovery slow situation of rubbish is serious, causes Spark performances unstable
It is fixed;Big data can not be handled, independent machine processing data are excessive, or cause intermediate result to exceed because data go wrong
During RAM size, usually there is ram space deficiency or can not obtain a result;The SQL statistics of complexity can not be supported;Spark at present
The SQL syntax integrated degree of support can't be applied in complex data analysis.In terms of manageability, SparkYARN knot
Imperfection is closed, this just to bury secret worry during use, various problems easily occurs;Fifth, when using Hive, due to Hive framves
For structure on MapReduce Framework, the flexibility of executive plan is poor, and the selection that optimizer can be done is seldom.
Therefore, it is necessary to design the indexing means based on Plcient interactive mode engines, realize enhancing big data retrieval when
Effect property so that inquiry mode is more flexible, and execution efficiency is higher.
The content of the invention
The defects of it is an object of the invention to overcome prior art, there is provided the index side based on Plcient interactive mode engines
Method and its system.
To achieve the above object, the present invention uses following technical scheme:Index side based on Plcient interactive mode engines
Method, methods described include:
Obtain HiveQL sentences;
Plcient compilings are carried out to HiveQL sentences, obtain execution task;
Execution task is submitted into control node;
Execution task is handed to executive process engine and performed by control node, obtains metadata information;
Metadata information is submitted into task tracker or explorer and performed;
Read file in HDFS to be operated accordingly, obtain and return to implementing result.
Its further technical scheme is:The step of obtaining HiveQL sentences, including step in detail below:
Query task is submitted to control node;
Obtain query task;
Hive metadata informations corresponding to being obtained according to query task from Metadata Repository, form HiveQL sentences.
Its further technical scheme is:To HiveQL sentences carry out Plcient compilings, obtain execution task the step of, bag
Include step in detail below:
HiveQL sentences are converted into abstract syntax tree;
Abstract syntax tree is converted into query block;
Query block is converted into logical query plan and rewrites logical query plan;
Logic plan is converted into physics plan, forms execution task.
Its further technical scheme is:The step of HiveQL sentences are converted into abstract syntax tree, including walk in detail below
Suddenly:
The syntax rule of HiveQL sentences is defined using Antlr;
Morphology and syntax parsing are carried out to HiveQL sentences according to syntax rule, form abstract syntax tree.
Its further technical scheme is:The step of abstract syntax tree is converted into query block, including step in detail below:
Ergodic abstract syntax tree in sequence;
Different nominal nodes is obtained, is saved in corresponding attribute, forms outer query block and subquery block.
Its further technical scheme is:Query block is converted into the inquiry plan of logic and rewrites the step of logical query plan
Suddenly, including in detail below step:
Traversal queries block, query block is translated as to perform operation tree;
Conversion performs operation tree, union operator;
Traversal performs operation tree, will perform operation tree and is translated as MapReduce tasks, forms logical query plan and rewrite
Logical query plan.
Present invention also offers the directory system based on Plcient interactive mode engines, including sentence acquiring unit, compiling list
Member, submit unit, handover unit, execution unit and operation reading unit;
The sentence acquiring unit, for obtaining HiveQL sentences;
The compilation unit, for carrying out Plcient compilings to HiveQL sentences, obtain execution task;
The submission unit, for execution task to be submitted into control node;
The handover unit, execution task is handed into executive process engine for control node and performed, obtains first number
It is believed that breath;
The execution unit, for metadata information to be submitted into task tracker or explorer and performed;
The operation reading unit, operated accordingly for reading file in HDFS, obtain and return to implementing result.
Its further technical scheme is:The sentence acquiring unit include task submit module, task acquisition module and
Data obtaining module;
The task submits module, for submitting query task to control node;
The task acquisition module, for obtaining query task;
Described information acquisition module, for Hive metadata corresponding to being obtained according to query task from Metadata Repository
Information, form HiveQL sentences.
Its further technical scheme is:The compilation unit includes statement converter modular, abstract modular converter, query block and turned
Change the mold block and physical transformation module;
The statement converter modular, for HiveQL sentences to be converted into abstract syntax tree;
The abstract modular converter, for abstract syntax tree to be converted into query block;
The query block modular converter, for query block to be converted into logical query plan and rewrites logical query plan;
The physical transformation module, for logic plan to be converted into physics plan, form execution task.
Its further technical scheme is:The statement converter modular includes defining submodule and analyzing sub-module;
The definition submodule, for defining the syntax rule of HiveQL sentences using Antlr;
The analyzing sub-module, for carrying out morphology and syntax parsing to HiveQL sentences according to syntax rule, formed
Abstract syntax tree.
Compared with the prior art, the invention has the advantages that:The index based on Plcinet interactive mode engines of the present invention
Method, by obtaining HiveQL sentences, the sentence is compiled using Plcinet, forms execution task, perform the task,
Implementing result is obtained, defines syntax rule using Antlr open source softwares, the compiling that enormously simplify morphology and grammer parses
Journey, design stage by stage make whole compilation process code easily safeguard, each logical operator only completes single function, simplify
Whole MapReduce programs, realize the ageing of enhancing big data retrieval so that inquiry mode is more flexible, execution efficiency is more
It is high.
The invention will be further described with specific embodiment below in conjunction with the accompanying drawings.
Brief description of the drawings
Fig. 1 is the flow chart for the indexing means based on Plcient interactive mode engines that the specific embodiment of the invention provides;
Fig. 2 is the flow chart for the acquisition HiveQL sentences that the specific embodiment of the invention provides;
Fig. 3 is carrying out Plcient compilings to HiveQL sentences and obtaining execution task for specific embodiment of the invention offer
Flow chart;
Fig. 4 is the structured flowchart for the directory system based on Plcient interactive mode engines that the specific embodiment of the invention provides.
Embodiment
In order to more fully understand the technology contents of the present invention, technical scheme is entered with reference to specific embodiment
One step introduction and explanation, but it is not limited to this.
Specific embodiment as shown in figures 1-4, the index side based on Plcient interactive mode engines that the present embodiment provides
Method, during data directory being used in, realize the ageing of enhancing big data retrieval so that inquiry mode is more flexible,
Execution efficiency is higher.
Plclient is to prolong cloud for user to big data heuristic, the demand of extemporaneous analysis and the analysis software developed.
Plclient applies traditional database index technology in big data technology, breaks the deadlock of current big data computing technique.
Big data is retrieved to ageing stronger, inquiry mode is more flexible, the higher direction evolution of execution efficiency.Technical plclient
Write using Java language, be grounded gas, SQL interfaces, user is also easier to left-hand seat use, while the daily total amount of hundred billion increment trillion
Data volume can also meet the needs of high-end user.Plclient major techniques direction is advantageous in that in massive index, massive index
The speed of retrieval is accelerated, packet, statistics and sorting time in inquiry are reduced, by performance and the response time of improving system
To economize on resources.Utilizing one's abilities for massive index technology makes plclient still keep inquiry to ring under such large-scale data volume
At several seconds between seasonable, data import delay in a few minutes.Plclient full name prolong cloud plclient, are one and are divided based on Hadoop
Real-time, multidimensional, interactive inquiry under cloth framework, statistics, analysis engine, there is the second level under trillion data scales
Performance, and possess the reliable and stable performance of enterprise-level.Plclient is a fine-grained index, the rope of fmer-granularity
Draw.Data are imported immediately, index in-time generatin, and efficiently related data is navigated to by index.Plclient and Spark depth collection
Into Spark is calculated plclient retrieval sets Direct Analysis, and same scene allows Spark performances to accelerate hundred times.Applicable pair
As follows:
First, traditional relational data, more data can not be accommodated, the serious affected user of search efficiency;
Second, doing full-text search using SOLR, ES at present, think that analytic function that solr and ES is provided very little, can not be complete
SOLR and ES becomes unstable after into the service logic of complexity, or data quantitative change more, fall piece with it is balanced in continuous pernicious follow
Ring, it is impossible to which automatic to recover service, operation maintenance personnel needs frequent midnight to get up to restart the situation of cluster;
Third, based on the analysis to mass data, but suffer from existing off-line calculation platform speed and the response time without
Meet the user of business need;
Fourth, need to do user's portrait behavior class data the user of multidimensional orientation analysis;
Fifth, need the user retrieved to substantial amounts of UGC (User Generate Content) data;
Sixth, when needing to carry out quick, interactive inquiry on large data sets;
Seventh, when needing to carry out data analysis, and more than the storage of simple key-value pair when;
Eigth, when needing to analyze caused data in real time.
As shown in figure 1, present embodiments providing the indexing means based on Plcient interactive mode engines, this method includes:
S1, obtain HiveQL sentences;
S2, Plcient compilings are carried out to HiveQL sentences, obtain execution task;
S3, execution task submitted into control node;
Execution task is handed to executive process engine and performed by S4, control node, obtains metadata information;
S5, metadata information is submitted into task tracker or explorer and performed;
File is operated accordingly in S6, reading HDFS, is obtained and is returned to implementing result.
In certain embodiments, above-mentioned S1 steps, the step of obtaining HiveQL sentences, including step in detail below:
S11, query task is submitted to control node;
S12, obtain query task;
S13, obtained from Metadata Repository according to query task corresponding to Hive metadata informations, form HiveQL languages
Sentence.
Specifically, the tasks such as inquiry are submitted to control node by user, after compiler obtains the query task of the user, root
Go in Metadata Repository to obtain the Hive metadata informations of needs according to user task.
Further, in certain embodiments, above-mentioned S2 steps, Plcient compilings is carried out to HiveQL sentences, obtained
The step of taking execution task, including step in detail below:
S21, HiveQL sentences are converted into abstract syntax tree;
S22, abstract syntax tree is converted into query block;
S23, query block is converted into logical query plan and rewrites logical query plan;
S24, logic plan is converted into physics plan, forms execution task.
Specifically, first, control node can input a character string SQL, then become abstract syntax tree by resolver,
Completed particular by Antlr, that is, SQL according to grammar file is become abstract syntax tree, abstract syntax by Anltr
Tree becomes query block into core.One most simple query block, generally, a From clause can generate a query block.
It is a recursive procedure to generate query block, and the query block of generation passes through logical query plan process, becomes an execution figure,
It is a directed acyclic graph.OPDAG passes through logic optimization device, and the side on this figure or node are adjusted, and order is revised,
Become the directed acyclic graph after an optimization.These optimization process may include predicate under push away, subregion is cut out, association is sorted
Deng have passed through logic optimization, this directed acyclic graph will be also able to carry out.So there is the process of generation physics executive plan.
The Hive practice is typically to encounter the place for needing to distribute, and cuts a knife, generates one of MapReduce operation.Such as
Group By partial applications, Join partial applications, Distribute By partial applications, Distinct partial applications.So many knives are cut down
Afterwards, that logic executive plan just now, that is, that logic directed acyclic graph, many subgraphs have been diced up, each
Subgraph forms a node.These nodes have been linked to be an executive plan figure, that is, Task Tree again, and these Task Trees are entered one
Step optimization, for example based on input selection execution route, increase backup job etc., be adjusted, this optimization is exactly by physics meter
Transfer and bring completion, changed by physics plan, this each node is exactly a MapReduce operation or local work
Industry, it is possible to perform.
It is that the data of different tables mark in map output valve that the above-mentioned Join that refers to, which is, catabolic phase according to
Marker for judgment data source.By the output key value that GroupBy field combination is map, using MapReduce sequence,
The reduce stages preserve crucial list and distinguish different key points.When an only distinct field, discounting for Map
The Hash GroupBy in stage, it is only necessary to GroupBy fields and Distinct field combinations are exported into key value for map, utilized
Mapreduce sequence, while the key value using GroupBy fields as reduce, crucial list is preserved in the reduce stages
Duplicate removal can be completed;If multiple Distinct fields, such as following SQL:select dealid,count
(distinct uid),count(distinct date)from order group by dealid;Then using following two
Mode carries out duplicate removal:
If, can not be with first, still according to the method for a Distinct field above, i.e. this implementation of figure below
Sorted respectively according to UID and daily record, also can not just pass through crucial list duplicate removal, it is still desirable to pass through in internal memory in the reduce stages
Cryptographic Hash duplicate removal;
Second, can be to all Distinct field numbers, each row of data generation n row data, then same field will
Sort respectively, at this moment only needing to record crucial list in the reduce stages can duplicate removal;This implementation make use of well
MapReduce sequence, save the memory consumption of reduce stage duplicate removals.
It should be noted that when generating reduce values, being expert at except first Distinct field needs to retain key
Value, remaining Distinct data rows value field can be sky.
Further, in certain embodiments, above-mentioned S21 steps, HiveQL sentences are converted into abstract syntax tree
The step of, including step in detail below:
S211, the syntax rule for defining using Antlr HiveQL sentences;
S212, morphology and syntax parsing, formation abstract syntax tree are carried out to HiveQL sentences according to syntax rule.
For above-mentioned S211 steps, the definition file of syntax rule was Hive.g mono- before 0.10 version in Hive
File, as syntax rule becomes increasingly complex, java class file can be can exceed that most by the Java parsings class of syntax rule generation
The big upper limit, 0.11 version have splitted into Hive.g 4 files of 5 files, morphological rule HiveLexer.g and syntax rule
SelectClauseParser.g, FromClauseParser.g, IdentifiersParser.g, HiveParser.g.
One section of following grammer is the syntax rule of SelectStatement in Hive SQL, there it can be seen that
SelectStatement includes the clauses such as select, from, where, groupby, having, orderby.Antlr is to Hive
The code of SQL parsings is as follows:
HiveLexerX, HiveParser are the morphology automatically generated after Antlr compiles to grammar file Hive.g respectively
Parsing and syntax parsing class, the parsing of complexity is carried out in the two classes.Internal layer subquery can also generate a TOK_
DESTINATION nodes, this node are the nodes specially added in grammer rewriting, and reason is all in Hive looks into
The data of inquiry can be stored in the interim files of HDFS, and either middle subquery still inquires about final result,
Insert sentences are eventually write data under the HDFS catalogues where table.
Above-mentioned S212 steps, morphological analysis is carried out based on SQL lexical analyzers, it is specific as follows:
Input source program text.In many cases, in order to preferably be identified to word symbol, input string is pre-processed one
Under.Pretreatment mainly filters space, skips annotation, newline etc..During morphological analysis, sometimes for part of speech is determined, it need to surpass
Preceding several characters of scanning.For formula translation, keyword can use, space symbol not as reserved word as identifier
Without in all senses.In order to determine part of speech, several characters of pre-scanning are needed.
In FORTRAN:
1 DO99K=1,10;
2 IF (5.EQ.M) I=10;
3 DO99K=1.10;
4 IF (5)=55;
Sentence 1 and 2 is DO and IF statement respectively, and sentence 3 and 4 is assignment statement.In order to correctly distinguish 1 and 3,2 and 4 languages
Sentence, needs several characters of pre-scanning.
1 DO99K=1,10;2 IF (5.EQ.M) I=10;
3 DO99K=1.10;4 IF (5)=55;
The difference of sentence 1 and 3 is first boundary symbol after symbol:One is comma, and another is accorded with for end of the sentence.Sentence
2 and 4 main distinction is the first character after right parenthesis:One is letter, and another is equal sign.In order to identify in 1,2
Keyword, it is necessary to the multiple characters of pre-scanning.In advance untill it can affirm the place of part of speech.In order to distinguish 1 and 3, it is necessary to
At first boundary's symbol after pre-scanning to equal sign.For sentence 2,4, it is necessary to pre-scanning to the left bracket phase after IF
Untill first character after that corresponding right parenthesis.
Use state transition diagram identifies word symbol, and state transition graph is a limited directional diagram.In state transition graph
In, there are an initial state, at least one final state.Wherein 0 is initial state, and 2 be final state.This transition diagram identifies the mistake of (receiving) identifier
Cheng Shi:Since initial state 0, if it is a letter that character is inputted under state 0, read into it, and be transferred to state 1.In state 1
Under, if next input character is letter or number, read into it, and reenter state 1.It is straight that this process is repeated always
(this character has also been entered by reading) is put into state 2 when finding that input character is no longer letter or number to state 1.State 2 is
Final state, it means to have identified an identifier to this, and identification process calls off.Final state tie beat asterisk mean it is more
Read, into a character for being not belonging to identifier portion, it should be returned in input port.If input character not in state 0
For " letter ", then mean not identify identifier, in other words, this conversion work is unsuccessful.
Regular expression is a kind of important representation (mark) for illustrating word, is the instrument for defining regular set.In word
In method analysis, regular expression is used for describing the form that indications may have.Definition (regular formula and it represented by it is regular
Collection):If alphabet is S;E andAll it is the regular formula on S, the regular set represented by them is respectively { e } and { };It is anyS,
A is a regular formula on S, and the regular set represented by it is { a };It is assumed that U and V are the regular formulas on S, represented by them
Regular set is respectively L (U) and L (V), then, (U), U | V, UV, U* are also regular formulas, the regular set represented by them point
Not Wei L (U),L (U) L (V) and (L (U)) *;The expression formula only defined by limited number of time using above-mentioned three step
The regular formula being only on S, only the word collection represented by these regular formulas are only the regular set on S." the 1/ of the operator of regular formula
2 " read to be "or", and it is " connection " that " ", which reads,;" * " is read as " closure " (that is, the deadweight of arbitrary finite time is multiply-connected to be connect).It will not obscure
When, bracket can save, but the priority of regulation operator is " (", ") ", " * ", " ", " 1/2 ".Connector " " can typically save
Do not write slightly." * ", " " and " 1/2 " is all left combination.
Further, in certain embodiments, above-mentioned S22 steps, abstract syntax tree is converted into the step of query block
Suddenly, including in detail below step:
S221, in sequence ergodic abstract syntax tree;
S222, different nominal nodes is obtained, be saved in corresponding attribute, form outer query block and subquery
Block.
Query block is the most basic component units of a SQL, including three parts:Input source, calculating process, output.Letter
A query block is exactly a subquery for list.
QB#aliasToSubq (the aliasToSubq attributes for representing query block class) preserves the inquiry block object of subquery,
AliasToSubq key values are the alias of subquery;
QB#qbp is the abstract syntax to an operation part in query block ParseInfo one basic SQL unit of preservation
Tree construction, this HashMap of query block ParseInfo#nameToDest preserve the output of query unit, and the form of key value is
Inclause-i (because Hive supports Multi Insert sentences, it is possible that there is multiple outputs), the value is corresponding
ASTNode nodes, i.e. TOK_DESTINATION nodes, remaining HashMap attribute of class query block ParseInfo preserve defeated respectively
Go out the corresponding relation with the ASTNode nodes of each operation.
QBParseInfo#JoinExpr is used to preserve TOK_JOIN nodes.
QB#qbJoinTree is the structuring to Join syntax trees.
QB#qbm preserves the metamessage of each input table, such as path of the table on HDFS, preserves the tray of table data
Formula etc..
This object of QB Expr is to represent that Union is operated.
The process of abstract syntax tree generation query block is a recursive process, preorder traversal abstract syntax tree, is run into not
Same nominal node, is saved in corresponding attribute, mainly comprising following process:
TOK_QUERY=>Create inquiry block object, circular recursion child node;
TOK_FROM=>Table name Grammar section is saved in the TOK_INSERT=of inquiry block object>Circular recursion section
Point;
TOK_DESTINATION=>The Grammar section for exporting target is stored in query block ParseInfo objects
In nameToDest attributes;
TOK_SELECT=>Respectively by the Grammar section of query expression be stored in destToAggregationExprs,
TOK_WHERE=>The grammer of Where parts is stored in the destToWhereExpr attributes of query block ParseInfo objects
In.
By said process, final sample SQL generates two inquiry block objects.
Further, above-mentioned S23 steps, query block is converted into the inquiry plan of logic and rewrites Boolean query meter
The step of drawing, including step in detail below:
S231, traversal queries block, query block is translated as to perform operation tree;
S232, conversion perform operation tree, union operator;
S233, traversal perform operation tree, will perform operation tree and are translated as MapReduce tasks, form logical query plan
And rewrite logical query plan.
Above-mentioned S231 steps, should by Map stages and Reduce stages by operation tree composition is performed to S233 steps
Perform in operation tree and be provided with several logical operators, that is, complete in Map stages or Reduce stages it is single specifically
Operation, basic operator include TableScanOperator, SelectOperator, FilterOperator,
JoinOperator, GroupByOperator and ReduceSinkOperator;Above-mentioned TableScanOperator is
The data of table are originally inputted from the Map interfaces of MapReduce frameworks, control the number of data lines of scan table, mark is from former table
Access evidence;JoinOperator completes Join operations;FilterOperator completes filter operation;ReduceSinkOperator
It is that the field combination sequence at Map ends is turned into decomposition value, partition value, is only possible to appear in the Map stages, while also indicate Hive
The end in Map stages in the MapReduce programs of generation.
Data transfer of the logical operator between the Map Reduce stages is all the process of a streaming.Each logic
Operator passes data to sub- logical operator calculating afterwards after operation is completed to data line.
The underlying attribute and method of logical operator are as follows:
RowSchema represents Operator output field;
NputObjInspector and outputObjInspector is parsing input and output field;
ProcessOp receives the data of father's logical operator transmission, and forward gives the data transfer handled well to sub- logic
Operator processing;
Hive after a logical operator processing, can be renumberd, colExprMap per data line to field
Record each expression formula and pass through the title corresponding relation of current logic operator before and after the processing, optimize rank in next phase logic
Section is used for recalling field name;
Because Hive MapReduce programs are a dynamic programs, that is, not knowing a MapReduce task can enter
Row is what computing, it may be possible to Join, it is also possible to GroupBy, so the parameter needed when logical operator is by all operations is protected
Exist in logical operator form, logical operator form is submitting task presequence to HDFS, in MapReduce tasks
Before performing simultaneously unserializing is read from HDFS.Position of the execution operation tree in Map stages on HDFS is in Job.getConf
(“hive.exec.plan”)+“/map.xml”。
Above-mentioned S23 steps, specifically travel through the guarantor of the query block generated during one and QBParseInfo objects
The attribute of grammer is deposited, is comprised the following steps:
QB#aliasToSubq=>There are subquery, recursive call;
QB#aliasToTabs=>TableScanOperator;
QBParseInfo#joinExpr=>QBJoinTree=>ReduceSinkOperator+JoinOperator;
QBParseInfo#destToWhereExpr=>FilterOperator;
QBParseInfo#destToGroupby=>ReduceSinkOperator+GroupByOperator;
QBParseInfo#destToOrderby=>ReduceSinkOperator+ExtractOperator.
Due to Join, either GroupBy or OrderBy is required to complete in the Reduce stages, so in the corresponding behaviour of generation
A decomposition logical operator all can be first generated before the logical operator of work, by field groups merge serializing for decomposition value or
Partition value.
It is specific first according to subquery block, preorder traversal QBJoinTree, class QBJoinTree for above-mentioned S231 steps
Preserve the ASTNode of left and right table and the alias of this inquiry, preamble traversal detail.usersequence_client, in generation
Between table and dim.user Join operation trees, according to query block 2FilterOperator, now the traversal of query block 2 is completed.
Whether selection logical operator can need to parse field in some scenarios according to some condition judgments.According to inquiry
The QBParseInfo#destToGroupby generation ReduceSinkOperator+GroupByOperato of block 1, after having parsed,
A FileSinkOperator can be generated, writes data into HDFS.
Above-mentioned S232 steps, specifically converted by logical layer optimizer and perform operation tree, after performing operation tree by traversal
The Key values and subregion Key values of former and later two RS outputs can be found,
Optimizer detects that the Key values of iRS outputs completely include cRS Key values, and clooating sequence is consistent;2. main RS subregions
Key values completely include sub- RS subregions Key values.Meet optimal conditions, executive plan can be optimized.By sub- RS and main RS and son
Logical operator between RS deletes, and the RS of reservation Key values are key, value field, and subregion Key values are key fields.
For above-mentioned S233 steps, following steps are specifically included:
S2331, output table is generated, realize that the HDFS temporary files that will be ultimately generated are moved under object table catalogue;
S2332, the downward depth-first traversal of one of root node from execution operation tree;
S2333, the boundary for indicating Map/Reduce, the boundary between multiple tasks;
S2334, other root nodes are traveled through, met and encounter Join logical operators merging MapReduce tasks;
S2335, generation signal task more new metadata;
The relation of logical operator between S2336, cutting Map and Reduce.
To above-mentioned S2332 steps, specifically, all root nodes of operation tree will be performed and be stored in a toWalk array
In, the element in array is taken out in circulation, takes out last element T S [p] and is put into stack opStack { TS [p] }, finds in stack
Element meet setting rule, such as " " .join ([t+ " % " for t in opStack])==" TS% ";Then generate one
MapReduceTask [Stage-1] object;Continue traversal TS [p] sub- logical operator, sub- logical operator is stored in stack
In opStack, after first RS pushes on, i.e. during stack opStack={ TS [p], FIL [18], RS [4] }, it will meet to set
Rule, such as " " .join ([t+ " % " for t in opStack])==" TS%.*RS% ", and protected in task resolution attribute
Deposit;Continue traversal JOIN [5] sub- logical operator, sub- logical operator is stored in stack opStack when second RS is put
During stacking, i.e., when stack meet " " .join ([t+ " % " for t in opStack])==" RS%.*RS% ", and searching loop
During opStack each Suffix array clustering, create a new JOIN [5] and JOIN [5] and generate a sub- logical operator RS
[6] TS [20] of MapReduceTask [Stage-2] object reference is generated.Continue traversal RS [6] son
Operator, sub- Operator is stored in stack opStack, after most all sub- logical operators are stored in stacks at last, met
Rule " " .join ([t+ " % " for t in opStack])==" FS% " when, by MapReduceTask [Stage-3] even
Pick up and, and generate a merging phase, opStack stacks are emptied, toWalk second element is added into stack.From
During opStack={ TS [du], RS [7] }, meet regular R2TS%.*RS%, now by MapReduceTask's [Stage-5]
Map<Logical operator;MapReduceWork>MapReduceTask [Stage-2] is found in object.
For above-mentioned S2333 steps, the execution operation tree in Map tasks and Reduce tasks is cut by boundary of RS
Open, perform operation tree generation MapReduce task overall pictures, in the present embodiment, final symbiosis is into 3 MapReduce tasks.
Further, above-mentioned S24 steps, it is that logic plan is converted into physics plan, shape by physical layer optimizer
It is exactly briefly that small table is read in into internal memory in the Map stages, the big table of sequential scan completes Join into the task of execution.
Physical layer optimizer conversion logic plan is divided into two stages:By MapReduce logic plans, small table is read in
Internal memory, generation Hash forms are uploaded in distributed cache, and this process needs to be compressed Hash forms.
MapReduce tasks read Hash tables in Map stages, each combination from Distributed cache memories
Lattice the big table of sequential scan, are directly combined into internal memory in the Map stages, are passed data to next MapReduce and are appointed
Business.
If it is interim table that Join two tables one, which open table, a conditioning tasks will be generated, during operation judge be
It is no to use MapJoin, now need optimizer that common Join is converted into MapJoin, conversion process is as follows:
Depth-first traversal Task Tree;
Join logical operators are found, judge left and right table data volume size;
Pair with small table+big table=>MapJoinTask, for small/big table+middle table=>ConditionalTask, traversal
The MapReduce tasks of upper stage generation, finding has a table in JOIN [8] be interim table, first Stage-2 is carried out deep
Degree copy, it is backup tasks due to needing to retain original execution plan, executive plan has been copied into portion), generate one
MapJoin logical operators substitute Join logical operators, then generate a MapReduce locals task and read small table generation
Hash forms are uploaded in Distributed cache memories.
Also need to, using optimizer traversal Task Tree, all local MapReduce tasks be splitted into two during this
Task.
Above-mentioned S2 steps, syntax rule is defined using Antlr open source softwares, enormously simplify the compiling of morphology and grammer
Resolving, it is thus only necessary to safeguard a grammar file;Design stage by stage makes whole compilation process code easily safeguard,
So that follow-up various optimizers are easily with pluggable mode switch, for example characteristic newest Hive0.13
Vectorization and support to Tez engines are all pluggable, and each logical operator only completes single function, letter
Whole MapReduce programs are changed.
The above-mentioned indexing means based on Plcinet interactive mode engines, by obtaining HiveQL sentences, using Plcinet
The sentence is compiled, forms execution task, performs the task, implementing result is obtained, language is defined using Antlr open source softwares
Method rule, enormously simplify the compiling resolving of morphology and grammer, design stage by stage makes whole compilation process code easy
Safeguard, each logical operator only completes single function, simplifies whole MapReduce programs, realizes enhancing big data inspection
Rope it is ageing so that inquiry mode is more flexible, and execution efficiency is higher.
As shown in figure 4, the present embodiment additionally provides the directory system based on Plcient interactive mode engines, it includes sentence
Acquiring unit 1, compilation unit 2, submit unit 3, handover unit 4, execution unit 5 and operation reading unit 6.
Sentence acquiring unit 1, for obtaining HiveQL sentences.
Compilation unit 2, for carrying out Plcient compilings to HiveQL sentences, obtain execution task.
Unit 3 is submitted, for execution task to be submitted into control node.
Handover unit 4, execution task is handed into executive process engine for control node and performed, obtain metadata letter
Breath.
Execution unit 5, for metadata information to be submitted into task tracker or explorer and performed.
Reading unit 6 is operated, is operated accordingly for reading file in HDFS, is obtained and return to implementing result.
In certain embodiments, above-mentioned sentence acquiring unit 1 includes task submission module, task acquisition module and letter
Cease acquisition module.
Task submits module, for submitting query task to control node.
Task acquisition module, for obtaining query task.
Data obtaining module, for Hive metadata letter corresponding to being obtained according to query task from Metadata Repository
Breath, form HiveQL sentences.
Specifically, the tasks such as inquiry are submitted to control node by user, after compiler obtains the query task of the user, root
Go in Metadata Repository to obtain the Hive metadata informations of needs according to user task.
Further, in certain embodiments, above-mentioned compilation unit 2 includes statement converter modular, abstract modulus of conversion
Block, query block modular converter and physical transformation module.
Statement converter modular, for HiveQL sentences to be converted into abstract syntax tree.
Abstract modular converter, for abstract syntax tree to be converted into query block.
Query block modular converter, for query block to be converted into logical query plan and rewrites logical query plan.
Physical transformation module, for logic plan to be converted into physics plan, form execution task.
Specifically, first, control node can input a character string SQL, then become abstract syntax tree by resolver,
Completed particular by Antlr, that is, SQL according to grammar file is become abstract syntax tree, abstract syntax by Anltr
Tree becomes query block into core.One most simple query block, generally, a From clause can generate a query block.
It is a recursive procedure to generate query block, and the query block of generation passes through logical query plan process, becomes an execution figure,
It is a directed acyclic graph.OPDAG passes through logic optimization device, and the side on this figure or node are adjusted, and order is revised,
Become the directed acyclic graph after an optimization.These optimization process may include predicate under push away, subregion is cut out, association is sorted
Deng have passed through logic optimization, this directed acyclic graph will be also able to carry out.So there is the process of generation physics executive plan.
The Hive practice is typically to encounter the place for needing to distribute, and cuts a knife, generates one of MapReduce operation.Such as
Group By partial applications, Join partial applications, Distribute By partial applications, Distinct partial applications.So many knives are cut down
Afterwards, that logic executive plan just now, that is, that logic directed acyclic graph, many subgraphs have been diced up, each
Subgraph forms a node.These nodes have been linked to be an executive plan figure, that is, Task Tree again, and these Task Trees are entered one
Step optimization, for example based on input selection execution route, increase backup job etc., be adjusted, this optimization is exactly by physics meter
Transfer and bring completion, changed by physics plan, this each node is exactly a MapReduce operation or local work
Industry, it is possible to perform.
It is that the data of different tables mark in map output valve that the above-mentioned Join that refers to, which is, catabolic phase according to
Marker for judgment data source.By the output key value that GroupBy field combination is map, using MapReduce sequence,
The reduce stages preserve crucial list and distinguish different key points.When an only distinct field, discounting for Map
The Hash GroupBy in stage, it is only necessary to GroupBy fields and Distinct field combinations are exported into key value for map, utilized
Mapreduce sequence, while the key value using GroupBy fields as reduce, crucial list is preserved in the reduce stages
Duplicate removal can be completed;If multiple Distinct fields, such as following SQL:select dealid,count
(distinct uid),count(distinct date)from order group by dealid;Then using following two
Mode carries out duplicate removal:
If, can not be with first, still according to the method for a Distinct field above, i.e. this implementation of figure below
Sorted respectively according to UID and daily record, also can not just pass through crucial list duplicate removal, it is still desirable to pass through in internal memory in the reduce stages
Cryptographic Hash duplicate removal;
Second, can be to all Distinct field numbers, each row of data generation n row data, then same field will
Sort respectively, at this moment only needing to record crucial list in the reduce stages can duplicate removal;This implementation make use of well
MapReduce sequence, save the memory consumption of reduce stage duplicate removals.
It should be noted that when generating reduce values, being expert at except first Distinct field needs to retain key
Value, remaining Distinct data rows value field can be sky.
Further, in certain embodiments, above-mentioned statement converter modular includes defining submodule and parsing
Module.
Submodule is defined, for defining the syntax rule of HiveQL sentences using Antlr;
Analyzing sub-module, for carrying out morphology and syntax parsing to HiveQL sentences according to syntax rule, formed abstract
Syntax tree.
In addition, in certain embodiments, above-mentioned abstract modular converter also includes the first spider module and node obtains
Module.
First spider module, for ergodic abstract syntax tree in sequence.
Node acquisition module, for obtaining different nominal nodes, it is saved in corresponding attribute, forms outer query block
And subquery block.
Query block is the most basic component units of a SQL, including three parts:Input source, calculating process, output.Letter
A query block is exactly a subquery for list.
QB#aliasToSubq (the aliasToSubq attributes for representing query block class) preserves the inquiry block object of subquery,
AliasToSubq key values are the alias of subquery;
QB#qbp is the abstract syntax to an operation part in query block ParseInfo one basic SQL unit of preservation
Tree construction, this HashMap of query block ParseInfo#nameToDest preserve the output of query unit, and the form of key value is
Inclause-i (because Hive supports Multi Insert sentences, it is possible that there is multiple outputs), the value is corresponding
ASTNode nodes, i.e. TOK_DESTINATION nodes, remaining HashMap attribute of class query block ParseInfo preserve defeated respectively
Go out the corresponding relation with the ASTNode nodes of each operation.
QBParseInfo#JoinExpr is used to preserve TOK_JOIN nodes.
QB#qbJoinTree is the structuring to Join syntax trees.
QB#qbm preserves the metamessage of each input table, such as path of the table on HDFS, preserves the tray of table data
Formula etc..
This object of QB Expr is to represent that Union is operated.
The process of abstract syntax tree generation query block is a recursive process, preorder traversal abstract syntax tree, is run into not
Same nominal node, is saved in corresponding attribute, mainly comprising following process:
TOK_QUERY=>Create inquiry block object, circular recursion child node;
TOK_FROM=>Table name Grammar section is saved in the TOK_INSERT=of inquiry block object>Circular recursion section
Point;
TOK_DESTINATION=>The Grammar section for exporting target is stored in query block ParseInfo objects
In nameToDest attributes;
TOK_SELECT=>Respectively by the Grammar section of query expression be stored in destToAggregationExprs,
TOK_WHERE=>The grammer of Where parts is stored in the destToWhereExpr attributes of query block ParseInfo objects
In.
By said process, final sample SQL generates two inquiry block objects.
Further, in certain embodiments, above-mentioned query block modular converter includes the second spider module, becomes mold changing
Block and the 3rd spider module.
Second spider module, for traversal queries block, query block is translated as to perform operation tree.
Conversion module, operation tree, union operator are performed for converting.
3rd spider module, operation tree is performed for traveling through, operation tree will be performed and be translated as MapReduce tasks, formed
Logical query plan simultaneously rewrites logical query plan.
By Map stages and Reduce stages by operation tree composition is performed, is patrolled in execution operation tree provided with several
Operator is collected, that is, single specific operation is completed in Map stages or Reduce stages, basic operator includes
TableScanOperator、SelectOperator、FilterOperator、JoinOperator、GroupByOperator
And ReduceSinkOperator;Above-mentioned TableScanOperator is original from the Map interfaces of MapReduce frameworks
The data of table are inputted, control the number of data lines of scan table, mark is evidence of being fetched from former table;JoinOperator completes Join behaviour
Make;FilterOperator completes filter operation;ReduceSinkOperator is to turn to the field combination sequence at Map ends point
Solution value, partition value, it is only possible to appear in the Map stages, while also indicates the Map stages in the MapReduce programs of Hive generations
End.
Data transfer of the logical operator between the Map Reduce stages is all the process of a streaming.Each logic
Operator passes data to sub- logical operator calculating afterwards after operation is completed to data line.
The underlying attribute and method of logical operator are as follows:
RowSchema represents Operator output field;
NputObjInspector and outputObjInspector is parsing input and output field;
ProcessOp receives the data of father's logical operator transmission, and forward gives the data transfer handled well to sub- logic
Operator processing;
Hive after a logical operator processing, can be renumberd, colExprMap per data line to field
Record each expression formula and pass through the title corresponding relation of current logic operator before and after the processing, optimize rank in next phase logic
Section is used for recalling field name;
Because Hive MapReduce programs are a dynamic programs, that is, not knowing a MapReduce task can enter
Row is what computing, it may be possible to Join, it is also possible to GroupBy, so the parameter needed when logical operator is by all operations is protected
Exist in logical operator form, logical operator form is submitting task presequence to HDFS, in MapReduce tasks
Before performing simultaneously unserializing is read from HDFS.Position of the execution operation tree in Map stages on HDFS is in Job.getConf
(“hive.exec.plan”)+“/map.xml”。
For above-mentioned physical transformation module, small table is read in into internal memory in the Map stages, the big table of sequential scan is completed
Join。
Physical layer optimizer conversion logic plan is divided into two stages:By MapReduce logic plans, small table is read in
Internal memory, generation Hash forms are uploaded in distributed cache, and this process needs to be compressed Hash forms.
MapReduce tasks read Hash tables in Map stages, each combination from Distributed cache memories
Lattice the big table of sequential scan, are directly combined into internal memory in the Map stages, are passed data to next MapReduce and are appointed
Business.
If it is interim table that Join two tables one, which open table, a conditioning tasks will be generated, during operation judge be
It is no to use MapJoin, now need optimizer that common Join is converted into MapJoin, conversion process is as follows:
Depth-first traversal Task Tree;
Join logical operators are found, judge left and right table data volume size;
Pair with small table+big table=>MapJoinTask, for small/big table+middle table=>ConditionalTask, traversal
The MapReduce tasks of upper stage generation, finding has a table in JOIN [8] be interim table, first Stage-2 is carried out deep
Degree copy, it is backup tasks due to needing to retain original execution plan, executive plan has been copied into portion), generate one
MapJoin logical operators substitute Join logical operators, then generate a MapReduce locals task and read small table generation
Hash forms are uploaded in Distributed cache memories.
Also need to, using optimizer traversal Task Tree, all local MapReduce tasks be splitted into two during this
Task.
In summary, for compilation unit 2, syntax rule is defined using Antlr open source softwares, enormously simplify word
The compiling resolving of method and grammer, it is thus only necessary to safeguard a grammar file;Design stage by stage makes whole compiled
Range code is easily safeguarded so that follow-up various optimizers are easily with pluggable mode switch, and for example Hive 0.13 is newest
Characteristic Vectorization and support to Tez engines are all pluggable, and each logical operator only completes single work(
Can, simplify whole MapReduce programs.
The above-mentioned directory system based on Plcinet interactive mode engines, by obtaining HiveQL sentences, using Plcinet
The sentence is compiled, forms execution task, performs the task, implementing result is obtained, language is defined using Antlr open source softwares
Method rule, enormously simplify the compiling resolving of morphology and grammer, design stage by stage makes whole compilation process code easy
Safeguard, each logical operator only completes single function, simplifies whole MapReduce programs, realizes enhancing big data inspection
Rope it is ageing so that inquiry mode is more flexible, and execution efficiency is higher.
The above-mentioned technology contents that the present invention is only further illustrated with embodiment, in order to which reader is easier to understand, but not
Represent embodiments of the present invention and be only limitted to this, any technology done according to the present invention extends or recreation, by the present invention's
Protection.Protection scope of the present invention is defined by claims.
Claims (10)
1. the indexing means based on Plcient interactive mode engines, it is characterised in that methods described includes:
Obtain HiveQL sentences;
Plcient compilings are carried out to HiveQL sentences, obtain execution task;
Execution task is submitted into control node;
Execution task is handed to executive process engine and performed by control node, obtains metadata information;
Metadata information is submitted into task tracker or explorer and performed;
Read file in HDFS to be operated accordingly, obtain and return to implementing result.
2. the indexing means according to claim 1 based on Plcient interactive mode engines, it is characterised in that obtain
The step of HiveQL sentences, including step in detail below:
Query task is submitted to control node;
Obtain query task;
Hive metadata informations corresponding to being obtained according to query task from Metadata Repository, form HiveQL sentences.
3. the indexing means according to claim 1 or 2 based on Plcient interactive mode engines, it is characterised in that right
HiveQL sentences carry out Plcient compilings, the step of obtaining execution task, including step in detail below:
HiveQL sentences are converted into abstract syntax tree;
Abstract syntax tree is converted into query block;
Query block is converted into logical query plan and rewrites logical query plan;
Logic plan is converted into physics plan, forms execution task.
4. the indexing means according to claim 3 based on Plcient interactive mode engines, it is characterised in that by HiveQL
Sentence is converted to the step of abstract syntax tree, including step in detail below:
The syntax rule of HiveQL sentences is defined using Antlr;
Morphology and syntax parsing are carried out to HiveQL sentences according to syntax rule, form abstract syntax tree.
5. the indexing means according to claim 4 based on Plcient interactive mode engines, it is characterised in that by abstract language
The step of method tree is converted into query block, including step in detail below:
Ergodic abstract syntax tree in sequence;
Different nominal nodes is obtained, is saved in corresponding attribute, forms outer query block and subquery block.
6. the indexing means according to claim 5 based on Plcient interactive mode engines, it is characterised in that by query block
The step of being converted into the inquiry plan of logic and rewriteeing logical query plan, including step in detail below:
Traversal queries block, query block is translated as to perform operation tree;
Conversion performs operation tree, union operator;
Traversal performs operation tree, will perform operation tree and is translated as MapReduce tasks, forms logical query plan and rewrite logic
Inquiry plan.
7. the directory system based on Plcient interactive mode engines, it is characterised in that including sentence acquiring unit, compilation unit, carry
Presentate member, handover unit, execution unit and operation reading unit;
The sentence acquiring unit, for obtaining HiveQL sentences;
The compilation unit, for carrying out Plcient compilings to HiveQL sentences, obtain execution task;
The submission unit, for execution task to be submitted into control node;
The handover unit, execution task is handed into executive process engine for control node and performed, obtain metadata letter
Breath;
The execution unit, for metadata information to be submitted into task tracker or explorer and performed;
The operation reading unit, operated accordingly for reading file in HDFS, obtain and return to implementing result.
8. the directory system according to claim 7 based on Plcient interactive mode engines, it is characterised in that the sentence
Acquiring unit includes task and submits module, task acquisition module and data obtaining module;
The task submits module, for submitting query task to control node;
The task acquisition module, for obtaining query task;
Described information acquisition module, for Hive metadata letter corresponding to being obtained according to query task from Metadata Repository
Breath, form HiveQL sentences.
9. the directory system according to claim 8 based on Plcient interactive mode engines, it is characterised in that the compiling
Unit includes statement converter modular, abstract modular converter, query block modular converter and physical transformation module;
The statement converter modular, for HiveQL sentences to be converted into abstract syntax tree;
The abstract modular converter, for abstract syntax tree to be converted into query block;
The query block modular converter, for query block to be converted into logical query plan and rewrites logical query plan;
The physical transformation module, for logic plan to be converted into physics plan, form execution task.
10. the directory system according to claim 9 based on Plcient interactive mode engines, it is characterised in that the sentence
Modular converter includes defining submodule and analyzing sub-module;
The definition submodule, for defining the syntax rule of HiveQL sentences using Antlr;
The analyzing sub-module, for carrying out morphology and syntax parsing to HiveQL sentences according to syntax rule, formed abstract
Syntax tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711203695.3A CN107818181A (en) | 2017-11-27 | 2017-11-27 | Indexing means and its system based on Plcient interactive mode engines |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711203695.3A CN107818181A (en) | 2017-11-27 | 2017-11-27 | Indexing means and its system based on Plcient interactive mode engines |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107818181A true CN107818181A (en) | 2018-03-20 |
Family
ID=61610223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711203695.3A Pending CN107818181A (en) | 2017-11-27 | 2017-11-27 | Indexing means and its system based on Plcient interactive mode engines |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107818181A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851452A (en) * | 2020-01-16 | 2020-02-28 | 医渡云(北京)技术有限公司 | Data table connection processing method and device, electronic equipment and storage medium |
CN112181704A (en) * | 2020-09-28 | 2021-01-05 | 京东数字科技控股股份有限公司 | Big data task processing method and device, electronic equipment and storage medium |
CN113438275A (en) * | 2021-05-27 | 2021-09-24 | 众安在线财产保险股份有限公司 | Data migration method and device, storage medium and data migration equipment |
CN113887251A (en) * | 2021-09-29 | 2022-01-04 | 内蒙古工业大学 | Mongolian Chinese machine translation method combining Meta-KD framework and fine-grained compression |
CN117648341A (en) * | 2023-11-21 | 2024-03-05 | 上海金仕达卫宁软件科技有限公司 | Method and system for quickly assembling data based on disk memory in limited resources |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103761080A (en) * | 2013-12-25 | 2014-04-30 | 中国农业大学 | Structured query language (SQL) based MapReduce operation generating method and system |
US20140280030A1 (en) * | 2013-03-12 | 2014-09-18 | Microsoft Corporation | Method of converting query plans to native code |
CN104298771A (en) * | 2014-10-30 | 2015-01-21 | 南京信息工程大学 | Massive web log data query and analysis method |
US20150379426A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Optimized decision tree based models |
CN105279286A (en) * | 2015-11-27 | 2016-01-27 | 陕西艾特信息化工程咨询有限责任公司 | Interactive large data analysis query processing method |
CN107122443A (en) * | 2017-04-24 | 2017-09-01 | 中国科学院软件研究所 | A kind of distributed full-text search system and method based on Spark SQL |
-
2017
- 2017-11-27 CN CN201711203695.3A patent/CN107818181A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140280030A1 (en) * | 2013-03-12 | 2014-09-18 | Microsoft Corporation | Method of converting query plans to native code |
CN103761080A (en) * | 2013-12-25 | 2014-04-30 | 中国农业大学 | Structured query language (SQL) based MapReduce operation generating method and system |
US20150379426A1 (en) * | 2014-06-30 | 2015-12-31 | Amazon Technologies, Inc. | Optimized decision tree based models |
CN104298771A (en) * | 2014-10-30 | 2015-01-21 | 南京信息工程大学 | Massive web log data query and analysis method |
CN105279286A (en) * | 2015-11-27 | 2016-01-27 | 陕西艾特信息化工程咨询有限责任公司 | Interactive large data analysis query processing method |
CN107122443A (en) * | 2017-04-24 | 2017-09-01 | 中国科学院软件研究所 | A kind of distributed full-text search system and method based on Spark SQL |
Non-Patent Citations (4)
Title |
---|
DIZAOXN729021: "Hive SQL的编译过程", 《HTTPS://BLOG.CSDN.NET/DIZAOXN729021/ARTICLE/DETAILS/102452617》 * |
WEIXIN_37242857: "延云YDB基础", 《HTTPS://BLOG.CSDN.NET/WEIXIN_37242857/ARTICLE/DETAILS/57123190》 * |
扫大街的程序员: "HiveSQL 执行计划深度解析", 《HTTPS://BLOG.CSDN.NET/MOON_YANG_BJ/ARTICLE/DETAILS/31744381》 * |
杨卓荦: "[大数据微课回顾]杨卓荦:Hive原理及查询优化", 《HTTP://WWW.360DOC.COM/CONTENT/16/0803/14/29157075_580488232.SHTML》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110851452A (en) * | 2020-01-16 | 2020-02-28 | 医渡云(北京)技术有限公司 | Data table connection processing method and device, electronic equipment and storage medium |
CN112181704A (en) * | 2020-09-28 | 2021-01-05 | 京东数字科技控股股份有限公司 | Big data task processing method and device, electronic equipment and storage medium |
CN113438275A (en) * | 2021-05-27 | 2021-09-24 | 众安在线财产保险股份有限公司 | Data migration method and device, storage medium and data migration equipment |
CN113887251A (en) * | 2021-09-29 | 2022-01-04 | 内蒙古工业大学 | Mongolian Chinese machine translation method combining Meta-KD framework and fine-grained compression |
CN117648341A (en) * | 2023-11-21 | 2024-03-05 | 上海金仕达卫宁软件科技有限公司 | Method and system for quickly assembling data based on disk memory in limited resources |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10521427B2 (en) | Managing data queries | |
CN107122443B (en) | A kind of distributed full-text search system and method based on Spark SQL | |
US8332389B2 (en) | Join order for a database query | |
US9195712B2 (en) | Method of converting query plans to native code | |
US9152697B2 (en) | Real-time search of vertically partitioned, inverted indexes | |
US10762087B2 (en) | Database search | |
US20130006968A1 (en) | Data integration system | |
EP3671526B1 (en) | Dependency graph based natural language processing | |
CN110019314B (en) | Dynamic data packaging method based on data item analysis, client and server | |
CN108009270A (en) | A kind of text searching method calculated based on distributed memory | |
CN114265945A (en) | Blood relationship extraction method and device and electronic equipment | |
CN114461603A (en) | Multi-source heterogeneous data fusion method and device | |
CN107818181A (en) | Indexing means and its system based on Plcient interactive mode engines | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN117421302A (en) | Data processing method and related equipment | |
CN110008448B (en) | Method and device for automatically converting SQL code into Java code | |
US20090132473A1 (en) | Apparatus, method, and computer program product for processing databases | |
CN115857918A (en) | Data processing method and device, electronic equipment and storage medium | |
Solodovnikova et al. | Handling evolution in big data architectures | |
Fan et al. | TwigStack-MR: An approach to distributed XML twig query using MapReduce | |
KR102599008B1 (en) | Method for processing multi-queries based on multi-query scheduler and data processing system providing the method | |
Tian | Accelerating data preparation for big data analytics | |
KR102605929B1 (en) | Method for processing structured data and unstructured data by allocating different processor resource and data processing system providing the method | |
JP2000163446A (en) | Extendable inquiry processor | |
Schäfer | On Enabling Efficient and Scalable Processing of Semi-Structured Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180320 |
|
RJ01 | Rejection of invention patent application after publication |