CN109101468B - Execution optimization method of text data conversion script - Google Patents

Execution optimization method of text data conversion script Download PDF

Info

Publication number
CN109101468B
CN109101468B CN201810873554.0A CN201810873554A CN109101468B CN 109101468 B CN109101468 B CN 109101468B CN 201810873554 A CN201810873554 A CN 201810873554A CN 109101468 B CN109101468 B CN 109101468B
Authority
CN
China
Prior art keywords
data
execution
cost
conversion script
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810873554.0A
Other languages
Chinese (zh)
Other versions
CN109101468A (en
Inventor
江大伟
陈珂
魏嘉荣
寿黎但
陈刚
胡天磊
伍赛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201810873554.0A priority Critical patent/CN109101468B/en
Publication of CN109101468A publication Critical patent/CN109101468A/en
Application granted granted Critical
Publication of CN109101468B publication Critical patent/CN109101468B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an execution optimization method of a text data conversion script. Analyzing the text data conversion script to generate an execution plan tree aiming at the text data conversion script executed through network distributed processing; using the tuple-based multiple set as a data model of the text data, wherein the text data conversion script comprises data operations for modifying and converting the structure and the content of the multiple set; adopting a corresponding execution optimization method according to different execution scenes of the conversion script; and generating a logic program for processing and running according to the optimized execution plan result, thereby efficiently converting and processing the data on the big data platform. The method can be applied to processing mass text data in the data preparation stage, and can effectively reduce the time-space cost of the text data conversion script during the execution and improve the efficiency of the data preparation stage by applying the execution optimization method facing the text data conversion script.

Description

Execution optimization method of text data conversion script
Technical Field
The invention relates to an optimization method for processing mass text data, in particular to an execution optimization method for a text data conversion script.
Background
With the rapid development of the related fields such as the mobile internet, the internet of things and the like, data shows an explosive growth trend, and the types of the data which can be utilized are more and more abundant while the data volume is larger and larger. Through data analysis, people can extract key information from the data, and find rules to make decisions.
Data preparation is the first step in data analysis. The traditional data preparation mainly adopts an ETL technology, and requires a user to realize the extraction, conversion and loading processes of data in a manual program coding mode or by using program logic preset in a third-party ETL tool. The emerging self-service data preparation technology provides an interactive data conversion processing method based on a graphical interface. Through a data visualization technology and a machine learning technology, the self-service data preparation method visually displays data to a user, simultaneously conjectures the data conversion intention of the user and generates data conversion operation according to interactive operation such as mouse click and the like of the user in a graphical interface, and finally processes the data. The self-service data preparation method avoids program coding of data conversion logic, reduces the technical threshold of data preparation, and effectively improves the efficiency of the data preparation stage.
The data targeted by the current data analysis is not limited to traditional structured data, but also covers semi/unstructured text data such as XML, JSON and logs. Emerging big data platforms can effectively store large-scale semi/unstructured text data. The self-service data preparation technology facing the big data increases the capacity of processing mass data on the basis of the self-service data preparation technology.
In the autonomous data preparation technology, a text data conversion language is a language for modeling user interaction operations in a graphical interface, and a text data conversion script is a program script described using the text data conversion language. The text data conversion script can convert and process massive text data.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides an execution optimization method of a text data conversion script, which can solve the problem that the text data conversion script processes massive text data and can efficiently and extendably process program logic of the massive text data and execute the program logic.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention can adopt corresponding execution optimization methods according to different execution scenes of the conversion script, namely the execution scene of the single-data conversion script and the execution scene of the multi-data conversion script, thereby generating the efficient and extensible program processing logic aiming at mass text data.
Aiming at the text data conversion script executed by network distributed processing, the following method steps are adopted for processing:
(1) analyzing the text data conversion script to generate an execution plan tree, and checking the legality and validity of the nodes in the execution plan tree; using a tuple-based multi-set (a two-dimensional table formed by rows and columns) as a data model of text data, wherein a text data conversion script comprises data operations for modifying and converting the structure and the content of the multi-set;
the text data conversion script is a program script described using a text data conversion language.
The text data conversion language is a language for modeling user interaction operations in a graphical interface.
An execution plan tree is an abstract data structure based on trees.
In the execution plan tree, one node is a data operation, the parent-child relationship of the nodes in the tree represents the execution sequence of the data operation, and the data operation represented by the parent node can be executed only after the data operation represented by all the child nodes is completed.
(2) Adopting a corresponding execution optimization method according to different execution scenes of the conversion script;
(3) and generating and operating a logic program facing the big data platform according to an execution plan result (execution plan tree/graph) obtained after optimization, thereby efficiently converting and processing the data on the big data platform.
In the execution plan tree of the step (1), one node is a data operation, the parent-child relationship of the nodes in the tree represents the execution sequence of the data operation, and the data operation represented by the parent node can be executed only after the data operation represented by all the child nodes is completed.
The different execution scenes of the conversion script are divided into a single data conversion script and a multi-data conversion script.
Aiming at a single data conversion script, in an execution scene of the single data conversion script (single script), the following step-by-step optimization method oriented to the execution plan tree is adopted for processing, and specifically:
1.1) operation push-down optimization: under the condition of not changing the data conversion processing result, the spatiotemporal cost in the execution process is reduced by changing the execution sequence of the nodes in the execution plan tree, and the method specifically comprises the following steps: if the tuples in the two multi-sets have repeated values under a certain attribute, the screening operation performed after the connection operation is performed on the attribute values of the multi-sets is performed before the connection operation, so that the execution order of the screening operation relative to the connection operation is changed;
1.2) operation combination: the space-time cost in the execution process is reduced by a method for combining two adjacent nodes in the execution plan tree, and the method specifically comprises the following steps: if two adjacent nodes operate aiming at the multiple concentrated columns, the two adjacent nodes are merged, and the nodes are the nodes corresponding to the data operation in the execution plan tree;
1.3) connection optimization: when a plurality of multiple sets are connected, an optimal connection operation method is selected according to the characteristics of the multiple sets, and the method specifically comprises the following steps: taking the number of tuples in the multi-set as a multi-set characteristic, and then processing in parallel according to the following two conditions:
if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is within 30%, one multiple set is subjected to screening operation and then connected with the other multiple set;
and if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is more than 3 times, transmitting the multiple set with the smaller characteristic to the node of the distributed network where the multiple set with the larger characteristic is located.
Aiming at the multiple data conversion scripts, in the execution scene of the multiple data conversion scripts (multiple scripts), a cost-based graph optimization method is adopted for processing, and the method specifically comprises the following steps:
2.1) constructing an execution plan graph: merging a plurality of execution plan trees corresponding to a plurality of data conversion scripts in a mode of merging common sub-nodes to obtain an execution plan graph, wherein the execution plan graph is also a graph structure formed by nodes, and the common sub-nodes refer to nodes with the same operation semantics;
2.2) cost-based operation merging: and the nodes in the execution planning graph are optimized and combined through a specially designed cost model, so that the space-time cost in the execution process is reduced.
The invention aims to realize the minimum and the optimal space-time cost, wherein the space-time cost refers to the sum of the spent execution time and the occupied physical resources.
In the step 2.2) described above,
the invention realizes the data operation merging method based on input sharing based on the idea of optimizing merging operation, and measures the execution cost corresponding to the data operation by establishing a cost model.
Therefore, in the execution plan diagram, for a group of data operations sharing the same input, by comparing the cost of independent execution and the cost of combined execution, the invention judges whether the execution efficiency is improved after the data operations sharing the same input are combined.
2.2.1) establishing the following cost model aiming at independent data operation, wherein the independent data operation refers to data operation under mutually independent nodes which are shared and input in the same sub-node, namely, the nodes of the independent data operation are connected to the same sub-node together and have no connection relation with each other;
for example for data operation J1,J2,…,JnAll the n operations share the same data input and are divided into independent disjoint m groups, each group Gi(1 ≦ i ≦ m) will include several operations that will be combined into one operation to execute.
The cost model adopts the following method to calculate the cost of the sum of a plurality of independent data operations and the data operation J obtained after the plurality of independent data operations are combined*The cost of (2):
for n independent data operations J1,J2,…,JnSum of costs of all data operations
Figure GDA0002427886700000031
The calculation is as follows:
Figure GDA0002427886700000032
wherein the content of the first and second substances,
Figure GDA0002427886700000033
for reading data in distributed processingGet a cost, CtFor the cost of network transmission, ClFor the local read-write cost of the data,
Figure GDA0002427886700000041
for data operation JiThe size of the intermediate result of (a),
Figure GDA0002427886700000042
for data operation JiSorting the number of merged passes from middle to outer;
2.2.2) operating on n independent data J1,J2,…,JnOperating the n independent data J1,J2,…,JnMerge get data operation J*The cost is calculated as:
Figure GDA0002427886700000043
wherein the content of the first and second substances,
Figure GDA0002427886700000044
operating J on merged data*The size of the intermediate result of (a),
Figure GDA0002427886700000045
operating J on merged data*Sorting the number of merged passes from middle to outer;
all independent data operations are combined together to form a data operation as a grouping combination scheme, or different grouping combinations of the data operations are carried out on all the independent data operations to obtain a plurality of data operations as different grouping combination schemes, the same nodes do not exist in different groups, namely all the independent data operations are arranged, combined and divided into different combinations to form data operations under different combinations, the cost of various possible combination schemes is solved and calculated by using a cost model and a dynamic programming algorithm, and the data operation grouping combination scheme with the lowest cost is searched for as the optimal scheme.
The invention solves the optimized grouping problem by combining the cost model and the dynamic programming algorithm, can obtain the data operation grouping combination scheme with the minimum execution cost by the mode, and can ensure that the total cost after the operation grouping combination execution is minimum.
The invention has the beneficial effects that:
the method of the invention adopts a corresponding optimization method according to two different scenes of single data conversion script execution and multi-data conversion script execution, and finally generates and executes the text data conversion script as program logic capable of efficiently and extendably processing mass text data.
The method designed by the invention can be applied to processing mass text data in the data preparation stage, and can effectively reduce the time-space cost of the text data conversion script in the execution process and improve the efficiency of the data preparation stage by applying the execution optimization method facing the text data conversion script.
Drawings
FIG. 1 is a flow chart of the steps performed by the present invention.
FIG. 2 is a schematic diagram of an execution plan tree.
FIG. 3 is a schematic diagram of an execution plan tree conversion to big data processing jobs.
Detailed Description
The technical solution of the present invention will now be further explained with reference to specific embodiments and examples.
Referring to fig. 1, the specific implementation process and the working principle of the present invention are as follows:
step 1: and analyzing the text data conversion script to generate an execution plan tree.
The execution plan tree may be abstractly represented as a tree structure composed of data operations as nodes, which corresponds to the data conversion flow described by the text data conversion script. In the execution plan tree, the data input operation is used as a leaf node, the data output operation is used as a root node of the tree, and the data conversion operation is used as an internal node in the tree.
The parent-child relationship of the node represents the dependency relationship between the data operations corresponding to the node due to the input and output of data, that is, the output data generated by the data operation corresponding to the child node is used as the input data of the data operation corresponding to the parent node. FIG. 2 is an execution plan tree containing 11 data operations.
After the execution plan tree is generated, the validity and validity of the nodes in the execution plan tree need to be checked.
Step 2: and according to the execution scene of the text data conversion script, dividing the execution into single data conversion script execution and multi-data conversion script execution, and respectively optimizing.
1) In the single data conversion script execution scene, a step optimization method facing to the execution plan tree is adopted. The step-by-step optimization method facing the execution plan tree comprises the following three steps:
1.1) operation push-down optimization: under the condition of not changing the data conversion processing result, the spatiotemporal cost in the execution process is reduced by changing the execution sequence of the nodes in the execution plan tree, and the method specifically comprises the following steps: if the tuples in the two multisets have repeated values under a certain attribute, the screening operation performed after the connection operation is performed on the attribute values of the multisets is advanced to the front of the connection operation, so that the execution order of the screening operation relative to the connection operation is changed.
1.2) operation combination: the space-time cost in the execution process is reduced by a method for combining two adjacent nodes in the execution plan tree, and the method specifically comprises the following steps: and if the two adjacent nodes operate aiming at the rows in the multiple concentration, merging the two adjacent nodes, wherein the nodes are the nodes corresponding to the data operation in the execution plan tree.
1.3) connection optimization: when a plurality of multiple sets are connected, an optimal connection operation method is selected according to the characteristics of the multiple sets, and the method specifically comprises the following steps: taking the number of tuples in the multi-set as a multi-set characteristic, and then processing in parallel according to the following two conditions:
if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is within 30%, one multiple set is subjected to screening operation and then connected with the other multiple set;
and if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is more than 3 times, transmitting the multiple set with the smaller characteristic to the node of the distributed network where the multiple set with the larger characteristic is located.
In the scene of executing the multi-data conversion script, a graph optimization method based on cost is adopted. The cost-based graph optimization method consists of two steps:
2.1) constructing an execution plan graph: and merging a plurality of execution plan trees corresponding to the plurality of data conversion scripts in a mode of merging common sub-nodes to obtain an execution plan graph, wherein the execution plan graph is also a graph structure formed by nodes, and the common sub-nodes refer to nodes with the same operation semantics.
Based on the idea of data operation consanguinity, data operations with the same operation meaning are identified by computing hash values of the data operations, thereby eliminating duplicate operation logic. After the execution plan graph is built, the execution plan graph can be further optimized by using an optimization method of a single data conversion script.
And traversing and merging the nodes from the output nodes (namely the nodes with the out degree of 0 in the execution plan graph) in the execution plan graph in sequence, replacing the nodes in the original execution plan graph with the merged nodes, and finally obtaining the nodes after further optimization.
2.2) cost-based operation merging: and optimizing and combining nodes in the execution planning graph through a self-defined cost model, and reducing the time-space cost in the execution process.
2.2.1) establishing the following cost model aiming at independent data operation, wherein the independent data operation refers to data operation under mutually independent nodes which are shared and input in the same sub-node, namely, the nodes of the independent data operation are connected to the same sub-node together and have no connection relation with each other;
for example for data operation J1,J2,…,JnAll the n operations share the same data input and are divided into independent disjoint m groups, each group Gi(1 ≦ i ≦ m) will include several operations that will be combined into one operation to execute.
The cost model calculates a plurality of independence by adopting the following methodThe sum of the data operations and the data operation J resulting from the merging of multiple independent data operations*The cost of (2):
for n independent data operations J1,J2,…,JnSum of costs of all data operations
Figure GDA0002427886700000061
The calculation is as follows:
Figure GDA0002427886700000062
wherein the content of the first and second substances,
Figure GDA0002427886700000063
for the read cost of data in distributed processing, CtFor the cost of network transmission, ClFor the local read-write cost of the data,
Figure GDA0002427886700000064
for data operation JiThe size of the intermediate result of (a),
Figure GDA0002427886700000065
for data operation JiSorting the number of merged passes from middle to outer;
2.2.2) operating on n independent data J1,J2,…,JnOperating the n independent data J1,J2,…,JnMerge get data operation J*The cost is calculated as:
Figure GDA0002427886700000066
wherein the content of the first and second substances,
Figure GDA0002427886700000067
operating J on merged data*The size of the intermediate result of (a),
Figure GDA0002427886700000068
operating J on merged data*Sorting the number of merged passes from middle to outer;
all independent data operations are combined together to form a data operation as a grouping combination scheme, or different grouping combinations of the data operations are carried out on all the independent data operations to obtain a plurality of data operations as different grouping combination schemes, the same nodes do not exist in different groups, namely all the independent data operations are arranged, combined and divided into different combinations to form data operations under different combinations, the cost of various possible combination schemes is solved and calculated by using a cost model and a dynamic programming algorithm, and the data operation grouping combination scheme with the lowest cost is searched for as the optimal scheme.
And step 3: and generating and operating a program processing logic facing the big data platform according to the optimized execution plan tree/graph, thereby efficiently converting and processing the data on the big data platform.
Typically, an execution plan tree/graph will be converted into a set of one or more large data processing jobs. The execution plan tree transformation in FIG. 3 results in a set of big data processing jobs. The execution plan tree contains 11 data operations, and the set of large data processing jobs contains a total of 8 jobs, where J1And J4Processing logic including data input operations and line slicing operations, and J8Processing logic including packet aggregation operations and data output operations.
For a plurality of big data processing jobs generated by the execution plan tree, the jobs are topologically ordered according to the dependency relationship among different jobs to determine the execution order corresponding to the jobs.

Claims (3)

1. A method for executing and optimizing text data conversion script is characterized in that: aiming at the text data conversion script executed by network distributed processing, the following method steps are adopted for processing:
(1) analyzing the text data conversion script to generate an execution plan tree; using the tuple-based multiple set as a data model of the text data, wherein the text data conversion script comprises data operations for modifying and converting the structure and the content of the multiple set;
(2) adopting a corresponding execution optimization method according to different execution scenes of the conversion script;
(3) generating a logic program for processing and running according to an execution plan result obtained after optimization, thereby efficiently converting and processing data on a big data platform;
the different execution scenes of the conversion script are divided into a single data conversion script and a multi-data conversion script;
aiming at a single data conversion script, the following step-by-step optimization method oriented to the execution plan tree is adopted for processing, and the method specifically comprises the following steps:
1.1) operation push-down optimization: under the condition of not changing the data conversion processing result, the spatiotemporal cost in the execution process is reduced by changing the execution sequence of the nodes in the execution plan tree, and the method specifically comprises the following steps: if the tuples in the two multiple sets have repeated values under a certain attribute, the screening operation carried out after the connection operation is carried out on the attribute values of the multiple sets is carried out before the connection operation;
1.2) operation combination: the space-time cost in the execution process is reduced by a method for combining two adjacent nodes in the execution plan tree, and the method specifically comprises the following steps: if two adjacent nodes operate aiming at the rows in the multiple concentration, the two adjacent nodes are merged;
1.3) connection optimization: when a plurality of multiple sets are connected, an optimal connection operation method is selected according to the characteristics of the multiple sets, and the method specifically comprises the following steps: taking the number of tuples in the multi-set as a multi-set characteristic, and then processing in parallel according to the following two conditions:
if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is within 30%, one multiple set is subjected to screening operation and then connected with the other multiple set;
if the two multiple sets need to be connected and the difference of the characteristics of the two multiple sets is more than 3 times, transmitting the multiple set with smaller characteristics to the node of the distributed network where the multiple set with larger characteristics is located;
aiming at the multi-data conversion script, a cost-based graph optimization method is adopted for processing, and the method specifically comprises the following steps:
2.1) constructing an execution plan graph: merging a plurality of execution plan trees corresponding to a plurality of data conversion scripts in a mode of merging common sub-nodes to obtain an execution plan graph, wherein the common sub-nodes refer to nodes with the same operation semantics;
2.2) cost-based operation merging: and the nodes in the execution planning graph are optimized and combined through a specially designed cost model, so that the space-time cost in the execution process is reduced.
2. The method of claim 1, wherein the method comprises: in the execution plan tree of the step (1), one node is a data operation, the parent-child relationship of the nodes in the tree represents the execution sequence of the data operation, and the data operation represented by the parent node can be executed only after the data operation represented by all the child nodes is completed.
3. The method of claim 1, wherein the method comprises: the step 2.2) is specifically as follows:
2.2.1) establishing the following cost model aiming at independent data operation, wherein the independent data operation refers to data operation under mutually independent nodes which are shared and input in the same sub-node;
the cost model adopts the following method to calculate the cost of the sum of a plurality of independent data operations and the data operation J obtained after the plurality of independent data operations are combined*The cost of (2):
for n independent data operations J1,J2,…,JnSum of costs of all data operations
Figure FDA0002440048290000021
The calculation is as follows:
Figure FDA0002440048290000022
wherein the content of the first and second substances,
Figure FDA0002440048290000023
for the read cost of data in distributed processing, CtFor the cost of network transmission, ClFor the local read-write cost of the data,
Figure FDA0002440048290000024
for data operation JiThe size of the intermediate result of (a),
Figure FDA0002440048290000025
for data operation JiSorting the number of merged passes from middle to outer;
2.2.2) operating on n independent data J1,J2,…,JnOperating the n independent data J1,J2,…,JnMerge get data operation J*The cost is calculated as:
Figure FDA0002440048290000026
wherein the content of the first and second substances,
Figure FDA0002440048290000027
operating J on merged data*The size of the intermediate result of (a),
Figure FDA0002440048290000028
operating J on merged data*Sorting the number of merged passes from middle to outer;
all independent data operations are combined together to form one data operation as a grouping combination scheme, or different grouping combinations of the data operations are carried out on all the independent data operations to obtain a plurality of data operations as different grouping combination schemes, the cost of various possible combination schemes is solved and calculated by using a cost model and a dynamic programming algorithm, and the data operation grouping combination scheme with the lowest cost is searched.
CN201810873554.0A 2018-08-02 2018-08-02 Execution optimization method of text data conversion script Active CN109101468B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810873554.0A CN109101468B (en) 2018-08-02 2018-08-02 Execution optimization method of text data conversion script

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810873554.0A CN109101468B (en) 2018-08-02 2018-08-02 Execution optimization method of text data conversion script

Publications (2)

Publication Number Publication Date
CN109101468A CN109101468A (en) 2018-12-28
CN109101468B true CN109101468B (en) 2020-07-03

Family

ID=64848295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810873554.0A Active CN109101468B (en) 2018-08-02 2018-08-02 Execution optimization method of text data conversion script

Country Status (1)

Country Link
CN (1) CN109101468B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103631870B (en) * 2013-11-06 2017-02-01 广东电子工业研究院有限公司 System and method used for large-scale distributed data processing
CN103761080B (en) * 2013-12-25 2017-02-15 中国农业大学 Structured query language (SQL) based MapReduce operation generating method and system
JP6516110B2 (en) * 2014-12-01 2019-05-22 日本電気株式会社 Multiple Query Optimization in SQL-on-Hadoop System
CN105005606B (en) * 2015-07-03 2018-06-29 华南理工大学 XML data query method and system based on MapReduce
CN107315743A (en) * 2016-04-26 2017-11-03 上海赢华软件科技有限公司 A kind of big data conversion method and system based on adapter

Also Published As

Publication number Publication date
CN109101468A (en) 2018-12-28

Similar Documents

Publication Publication Date Title
US10769147B2 (en) Batch data query method and apparatus
CN104573124B (en) A kind of education cloud application statistical method based on parallelization association rule algorithm
CN103678520B (en) A kind of multi-dimensional interval query method and its system based on cloud computing
CN109815283B (en) Heterogeneous data source visual query method
US9953069B2 (en) Business intelligence document
CN110909039A (en) Big data mining tool and method based on drag type process
US20160292167A1 (en) Multi-system query execution plan
US11416473B2 (en) Using path encoding method and relational set operations for search and comparison of hierarchial structures
CN106897322A (en) The access method and device of a kind of database and file system
CN102033748A (en) Method for generating data processing flow codes
CN106547882A (en) A kind of real-time processing method and system of big data of marketing in intelligent grid
US11841839B1 (en) Preprocessing and imputing method for structural data
CN113741883B (en) RPA lightweight data middling station system
CN103336791A (en) Hadoop-based fast rough set attribute reduction method
CN114820279B (en) Distributed deep learning method and device based on multiple GPUs and electronic equipment
Wang et al. TSMH Graph Cube: A novel framework for large scale multi-dimensional network analysis
CN109933589B (en) Data structure conversion method for data summarization based on ElasticSearch aggregation operation result
CN108073582B (en) Computing framework selection method and device
TWI436222B (en) Real - time multi - dimensional analysis system and method on cloud
US20190258634A1 (en) Data stream connection method and apparatus
Senthilkumar et al. An efficient FP-Growth based association rule mining algorithm using Hadoop MapReduce
CN109101468B (en) Execution optimization method of text data conversion script
Bai et al. Association rule mining algorithm based on Spark for pesticide transaction data analyses
US10922278B2 (en) Systems and methods for database compression and evaluation
CN106933844A (en) Towards the construction method of the accessibility search index of extensive RDF data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant