CN108256028A - The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment - Google Patents

The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment Download PDF

Info

Publication number
CN108256028A
CN108256028A CN201810025016.6A CN201810025016A CN108256028A CN 108256028 A CN108256028 A CN 108256028A CN 201810025016 A CN201810025016 A CN 201810025016A CN 108256028 A CN108256028 A CN 108256028A
Authority
CN
China
Prior art keywords
sample
collection
stratified
row
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810025016.6A
Other languages
Chinese (zh)
Other versions
CN108256028B (en
Inventor
史英杰
刘怡
郭飞
刘昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute Fashion Technology
Original Assignee
Beijing Institute Fashion Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute Fashion Technology filed Critical Beijing Institute Fashion Technology
Priority to CN201810025016.6A priority Critical patent/CN108256028B/en
Publication of CN108256028A publication Critical patent/CN108256028A/en
Application granted granted Critical
Publication of CN108256028B publication Critical patent/CN108256028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, is included the following steps:Dynamic sampling system includes the processed offline stage for creating stratified sample and the online processing stage for dynamic select sample;In the processed offline stage, load row collection parsing module parses load query sentence;Data characteristics analysis module analyzes data characteristics;The total cover index of cover index computing module;Layering row collection determining module selection arranges collection for creating the layering of stratified sample;Stratified sample establishment is carried out in stratified sample data creation module;In the online processing stage, inquiry parsing module parses user's query statement;The stratified sample data of sample selection module selection sampling Least-cost;Sample size determining module determines the sample size extracted from each sample layer.It is of the invention effectively to solve the problems, such as since grouping estimation small caused by data skew is inaccurate in approximate query, and sampling cost is reduced under the limitation of finite sample memory space.

Description

The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment
Technical field
The present invention relates to a kind of data sampling method for approximate query, towards looking into particularly in cloud computing environment more Ask the dynamic sampling method of load.
Background technology
Cloud computing environment provides a kind of mode of the management big data of high scalability and high performance-price ratio, becomes the big number of management According to Mainstream Platform.However for big data inquiry even under cloud computing environment, be also unable to reach in real time processing and with The speed requirement of user's interaction.For extemporaneous inquiry and exploratory data analysis application, taken a substantial amount of time with it and Computing resource is completely accurate as a result, the result for being quickly obtained estimation is more meaningful to obtain.Approximate query processing technology is based on Sample data estimates query result, so as to greatly reduce the query execution time, has important meaning for big data analysis Justice.
Approximate query processing technology based on sample data is adopted at random by Acharya et al. propositions, the technology using unified The method of sample, i.e., the probability that each tuple is extracted are equal.Uniform random sample is suitable for the equally distributed situation of data, excellent Point is simple to operation, but when data skew causes packet aggregation inquiry small grouping occur, uniform random sample will cause The accuracy of estimated result seriously reduces, so as to lose estimation meaning.Weight sampling method is proposed by Surajit et al., it should Method analyzes the number that every tuple is met by inquiry predicate, and the probability weights being sampled in this, as it, tuple expire The inquiry predicate number of foot is more, then the probability being sampled is bigger.The weight sampling technology can alleviate unification to a certain extent Estimate inaccurate problem during stochastical sampling as caused by data skew, however its effect depend entirely on calculating sampling weights institute according to According to load, when query statement is different from, sampling weights will be nonsensical.Congress's sampling side is proposed by Swarup et al. Method, this method for institute it is possible that grouping row and inquiry establishment one general sample.However the validity of the sample with The growth of inquiry quantity continuously decreases, and its pretreatment time exponentially increases with the increase of columns, can not cope with more The application scenarios of query statement.Generally speaking, above-mentioned technology it is less in query statement type and it is fixed carry out, Its autgmentability is not strong in practical application.In addition, above-mentioned technology is proposed in relational database field, cloud meter can not be suitable for Calculate environment.
Invention content
For problems of the prior art, the purpose of the present invention is to provide approximation is used in a kind of cloud computing environment The Dynamic and Multi dimensional method of sampling of inquiry, this method is used for the data preprocessing phase during approximate query, to raw data set It is pre-processed, generates the sample data set of multiple layerings, when query statement arrives, according to query statement content and its desire Sample size dynamic select sample data set, and the sample size extracted from each sample layer is provided.Method provided by the invention It effectively solves the problems, such as since grouping estimation small caused by data skew is inaccurate in approximate query, and is stored in limited sample Space limitation is lower to reduce sampling cost.
To achieve these goals, technical scheme is as follows:
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, this method to include the following steps:
1) dynamic sampling system includes creating the processed offline stage of stratified sample and for dynamic select sample The online processing stage;
2) load row collection parsing module, data characteristics analysis module, cover index is set to calculate mould in the processed offline stage Block, layering row collection determining module and stratified sample data creation module;
3) load row collection parsing module parses load query sentence, extracts the grouping of every load query sentence Row collection calculates the number that each grouping row collection occurs, generates candidate layering row collection set CS, and analyzes wherein each candidate layering row Collect CSiBetween relationship, result is exported and gives data characteristics analysis module;
4) data characteristics analysis module starts MapReduce operation scanning raw data set, and by raw data set Data distribution result export to cover index computing module;
5) cover index computing module combination data distribution result is calculated based on each candidate's layering row collection CSiIt is layered Total cover index under sampling situations;
6) the information selection of layering row collection determining module combination cover index and sample memory space etc. is used to create layering sample This layering row collection;
7) start a MapReduce operation in stratified sample data creation module and carry out stratified sample establishment, Map functions Raw data set is scanned, is transmitted it to according to tuple for creating the value that each layering row of stratified sample collect Corresponding Reduce functions, Reduce functions update statistical information simultaneously export tuple data to stratified sample data set;
8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;
9) inquiry parsing module parses the query statement that user inputs online, and extracts every user and inquire language The grouping row collection CS of sentenceq
10) sample selection module arranges collection CS according to the grouping of user's query statementqIt selects to adopt from stratified sample data set The stratified sample data of sample Least-cost;
11) sample size determining module determines what is extracted from each sample layer according to the sample size of approximate query sentence Sample size.
The present invention has the following advantages:
1st, the present invention is determined by analyzing load characteristic and data distribution characteristics for creating layering row collection, and based on layering Row collection and sample memory space create multiple multiple-dimensional hierarchical sample datas, are brought so as to solve data skew in approximate query Estimated result inaccuracy problem;
2nd, during the row collection for creating stratified sample is determined, the present invention embodies given layering sample by cover index After this, different query statements carries out the cost of stratified sampling using the sample, therefore the present invention is established for the extension of query load Basis is determined;
3rd, it after giving stratified sample and total sample size, is determining from each sample layer during sample size, this Invention combines inquiry packets row collection CSqRow collection CS is layered with samplesRelation you can well imagine out solution:(1) work as CSs=CSq When, larger value is selected as corresponding from the sample size and the sample size proportional to sample layer size that each sample layer is divided equally The sample size of sample layer, so as to solve the problems, such as that small grouping and big packet samples amount are too small;(2) whenWhen, this Invention first by sample in CSqThe identical sample of the upper value of row collection is laminated and determines that sample is big as a big sample layer It is small, stratified sampling is then carried out again in each large sample layer, it is achieved thereby that being dynamically determined the sample size of each sample layer.
Description of the drawings
Fig. 1 is the Dynamic and Multi dimensional sampling frame figure that approximate query is used in cloud computing environment.
Specific embodiment
With reference to embodiment and Figure of description, the present invention is further illustrated.
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, is included the following steps:1) dynamic Sampling system includes the processed offline stage for creating stratified sample and the online processing stage for dynamic select sample;2) Load row collection parsing module, data characteristics analysis module, cover index computing module, layering row collection are set in the processed offline stage Determining module and stratified sample data creation module;3) load row collection parsing module parses load query sentence, extracts Go out the grouping row collection of every query statement, the number that each row collection occurs is calculated, and analyze the relationship between each row collection, by result It exports and gives data characteristics analysis module;4) data characteristics analysis module starts a MapReduce operations scanning raw data set, And data distribution result is exported and gives cover index computing module;5) cover index computing module combination Data distribution information calculates Total cover index in the case of stratified sampling is carried out by each candidate's layering row collection;6) layering row collection determining module refers to reference to covering The information selection such as number and sample memory space arranges collection for creating the layering of stratified sample;7) start in stratified sample creation module One MapReduce operation, Map function pair raw data sets are scanned, will according to value of the tuple on each layering row collection It is transmitted to corresponding Reduce functions, and Reduce functions update statistical information simultaneously exports tuple data to stratified sample number According to collection;8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;9) inquiry solution Analysis module parses query statement input by user, and extracts grouping row collection;10) sample selection module is according to inquiry language The stratified sample data of the grouping row collection selection sampling Least-cost of sentence;11) sample size determining module is according to approximate query language The sample size of sentence determines the sample size extracted from each sample layer.
In the step 3), load row set analysis module the step of it is as follows:(1) to all SQL query statements in load It is parsed, extracts corresponding grouping row collection;(2) number that each grouping row collection occurs is calculated, and generates candidate layering row Collect set CS={ CS1,CS2,...,CSM};(3) any two layering row collection CS in CS is analyzediAnd CSjRelationship, ifThen by CSj-CSiSet RS is stored in, and result is exported to data analysis module.
In the step 4), start a MapReduce operations scanning initial data and data characteristics is analyzed, count The tuple number that raw data set respectively arranges different values on collection in RS is calculated, is as follows:(1) the Map functions in Map stages Each tuple r of raw data set is analyzed, and forms key-value pair, sets and the entitled key of collection is respectively arranged in RS, setting tuple is corresponding Packet attributes value on row collection is value;(2) the Combine functions in Map stages will belong to it is same row collection several key-values into Row merges, and forms a new key-value to output;(3) all key-values for belonging to same row collection are same to being transferred to Reduce functions, the value of the function pair key-value pair merge, and calculate the value number of different attribute value on the row collection, so as to Generation raw data set respectively arranges the number of different values on collection in RS.
In the step 5), cover index computing module is calculated with either rank collection CS in CSiCollection, which is arranged, for layering creates layering During sample, each candidate layering row collection CS in CSjCover index CIi,j, calculation is:If CSj=CSi, then CIi,j=1; IfThen CIi,j=1/vi,j, wherein vi,jRepresent that raw data set collects CS in rowj-CSiOn different value numbers; Other situations, then CIi,j=0.
In the step 6), layering row collection determines to be as follows:(1) for any candidate layering row collection CSi, meter It calculates based on CSiCreate total cover index f during stratified samplei, calculation formula is:In formula, PjRepresent CSj The probability occurred in load, calculation formula areNjFor CSjThe number occurred in the load;Ci,jTo be based on CSiWhen creating stratified sample, row collection CSjCover index;(2) total cover index of all candidate layering row collection is subjected to descending Sequence selects first X maximum candidate layering row collection of total cover index as the row collection eventually for establishment stratified sample, X by The space size that system is used to store sample determines.
In the step 7), start a MapReduce operation and carry out stratified sample establishment, be as follows:(1) The Map functions scanning raw data set in Map stages, analyzes every tuple r and generates key-value pair, set key is to collect title by row With the structure that is formed of value on row collection, wherein output of the row collection title from step 6) is as a result, set the entire tuple to be Value;(2) belong to same row collection and the identical key-value of value is to being transferred to same Reduce functions on the row collection, at this In function, statistics belongs to the tuple number of same sample layer, and tuple is exported to file, forms stratified sample file;
In the step 9), the query statement that user inputs online is parsed, extracts grouping row collection CSq, then The stratified sample data of sampling Least-cost are selected by step 10), selection method is:If there are a sample S (CSs) layering Row collection CSs=CSq, then the sample is selected;Otherwise, selection sample S (CSs), wherein CSsIt is to meet conditionMinimum Row collection.According to total sample size N of user's approximate query sentence, determined by step 11) sample size determining module from each sample The sample size selected in this layer, if meeting CSs=CSq, then it is from the sample size of each sample layer extraction
Wherein, T is the number of sample layer, | Gj| it is the size of each sample layer, | R | it is the size of raw data set;If MeetIt is then determining to be from the step of each sample layer sample drawn size:(1) by sample in CSqIt is taken on row collection It is laminated and consider as a big sample layer to be worth identical sample, the sample size extracted from each large sample layer is
(2) in each big sample layer GiIn, from wherein each small sample layer GijThe sample size of extraction is
Above-described embodiment is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill of the art For personnel, without departing from the principle of the present invention, several improvement and equivalent replacement can also be made, these are to the present invention Claim be improved with the technical solution after equivalent replacement, each fall within protection scope of the present invention.

Claims (7)

1. in a kind of cloud computing environment be used for approximate query the Dynamic and Multi dimensional method of sampling, which is characterized in that this method include with Lower step:
1) dynamic sampling system includes creating the processed offline stage of stratified sample and for the online of dynamic select sample Processing stage;
2) load row collection parsing module in the processed offline stage is set, data characteristics analysis module, cover index computing module, is divided Stratose collection determining module and stratified sample data creation module;
3) load row collection parsing module parses load query sentence, extracts the grouping row of every load query sentence Collection calculates the number that each grouping row collection occurs, generates candidate layering row collection set CS, and analyzes wherein each candidate layering row collection CSiBetween relationship, result is exported and gives data characteristics analysis module;
4) data characteristics analysis module starts MapReduce operation scanning raw data set, and by the number of raw data set It is exported according to distribution results to cover index computing module;
5) cover index computing module combination data distribution result is calculated based on each candidate's layering row collection CSiCarry out stratified sampling In the case of total cover index;
6) layering row collection determining module combination cover index and sample memory space etc. information select to create stratified sample Layering row collection;
7) start a MapReduce operation in stratified sample data creation module and carry out stratified sample establishment, Map function pairs are former Beginning data set is scanned, and is transmitted it to relatively for creating the value that each layering row of stratified sample collect according to tuple The Reduce functions answered, Reduce functions update statistical information simultaneously export tuple data to stratified sample data set;
8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;
9) inquiry parsing module parses the query statement that user inputs online, and extracts every user's query statement Grouping row collection CSq
10) sample selection module arranges collection CS according to the grouping of user's query statementqThe selection sampling cost from stratified sample data set Minimum stratified sample data;
11) sample that sample size determining module determines to be extracted from each sample layer according to the sample size of approximate query sentence Size.
2. the method as described in claim 1, which is characterized in that in the step 3), load row set analysis module looks into load It askes sentence to be parsed, include the following steps:
(1) all SQL query statements in load query sentence are parsed, extracts corresponding grouping row collection;
(2) number that each grouping row collection occurs is calculated, and generates candidate layering row collection set CS={ CS1,CS2,...,CSM};
(3) it analyzes any two candidate in CS and is layered row collection CSiAnd CSjRelationship, ifThen by CSj-CSiDeposit Set RS, and result is exported and gives data characteristics analysis module.
3. the method as described in claim 1, which is characterized in that in the step 4), start a MapReduce operations scanning Initial data simultaneously analyzes data characteristics, includes the following steps:
(1) each tuple r of the Map Functional Analysis raw data sets in Map stages, and key-value pair is formed, it sets and collection is respectively arranged in RS Entitled key, packet attributes value of the setting tuple on respective column collection is value;
(2) several key-values that the Combine functions in Map stages will belong to same row collection are to merging, formed one it is new Key-value is to output;
(3) it is all to belong to the same key-values for arranging collection to being transferred to same Reduce functions, the function pair key-value pair Value merges, and calculates the value number of different attribute value on the row collection, is respectively arranged on collection not in RS so as to generate raw data set With the number of value.
4. the method as described in claim 1, which is characterized in that in the step 5), cover index computing module is calculated with CS Middle either rank collection CSiWhen arranging collection establishment stratified sample for layering, each candidate layering row collection CS in CSjCover index CIi,j, meter Calculation mode is:If CSj=CSi, then CIi,j=1;IfThen CIi,j=1/vi,j, wherein vi,jRepresent raw data set In CSj-CSiOn different value numbers;Other situations, then CIi,j=0.
5. the method as described in claim 1, which is characterized in that in the step 6), layering row collection determining module combines covering The selection of the information such as index and sample memory space arranges collection for creating the layering of stratified sample, includes the following steps:
(1) for any candidate layering row collection CSi, calculate based on CSiCreate total cover index f during stratified samplei,Wherein, PjRepresent CSjThe probability occurred in load query sentence,NjFor CSjNegative Carry the number occurred in query statement;Ci,jTo be based on CSiWhen creating stratified sample, CSjCover index;
(2) total cover index of all candidate layering row collection is subjected to descending sort, selects the maximum preceding X time of total cover index Choosing layering row collection arranges collection as eventually for the grouping for creating stratified sample, and X is used to store the sky of sample by dynamic sampling system Between size determine.
6. the method as described in claim 1, which is characterized in that in the step 7), start a MapReduce operation and carry out Stratified sample creates, and includes the following steps:
(1) the Map functions scanning raw data set in Map stages, analyzes every tuple r and generates key-value pair, set key is by arranging The structure that value in collection title and row collection is formed, wherein output of the row collection title from step 6) is as a result, setting is entire Tuple is value;
(2) belong to same row collection and collect the identical key-value of value to being transferred to same Reduce functions in grouping row, In the function, statistics belongs to the tuple number of same sample layer, and tuple is exported to file, forms stratified sample file.
7. the method as described in claim 1, which is characterized in that in the step 9), the query statement that is inputted online to user It is parsed, extracts grouping row collection CSq, then selected to sample the stratified sample data of Least-cost, selecting party by step 10) Method is:If there are a sample S (CSs) layering row collection CSs=CSq, then the sample is selected;Otherwise, selection sample S (CSs), Wherein CSsIt is to meet conditionMinimum row collection;According to total sample size N of user's approximate query sentence, by step 11) sample size determining module determines the sample size selected from each sample layer, if meeting CSs=CSq, then from each sample This layer extraction sample size be
Wherein, T is the number of grouping, | Gj| it is the size of each sample layer, | R | it is the size of raw data set;If meetIt is then determining to be from the step of each sample layer sample drawn size:(1) by sample in CSqThe upper value phase of row collection Same sample is laminated and considers that the sample size extracted from each large sample layer is as a big sample layer
(2) in each big sample layer GiIn, from wherein each small sample layer GijThe sample size of extraction is
CN201810025016.6A 2018-01-11 2018-01-11 Multi-dimensional dynamic sampling method for approximate query in cloud computing environment Active CN108256028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810025016.6A CN108256028B (en) 2018-01-11 2018-01-11 Multi-dimensional dynamic sampling method for approximate query in cloud computing environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810025016.6A CN108256028B (en) 2018-01-11 2018-01-11 Multi-dimensional dynamic sampling method for approximate query in cloud computing environment

Publications (2)

Publication Number Publication Date
CN108256028A true CN108256028A (en) 2018-07-06
CN108256028B CN108256028B (en) 2021-09-28

Family

ID=62726068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810025016.6A Active CN108256028B (en) 2018-01-11 2018-01-11 Multi-dimensional dynamic sampling method for approximate query in cloud computing environment

Country Status (1)

Country Link
CN (1) CN108256028B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435647A (en) * 2023-12-20 2024-01-23 北京遥感设备研究所 Approximate query method, device and equipment based on incremental sampling

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1081610A2 (en) * 1999-09-03 2001-03-07 Cognos Incorporated Methods for transforming metadata models
CN102521386A (en) * 2011-12-22 2012-06-27 清华大学 Method for grouping space metadata based on cluster storage
EP3035211A1 (en) * 2014-12-18 2016-06-22 Business Objects Software Ltd. Visualizing large data volumes utilizing initial sampling and multi-stage calculations
CN106095951A (en) * 2016-06-13 2016-11-09 哈尔滨工程大学 Data space multi-dimensional indexing method based on load balancing and inquiry log
CN106528815A (en) * 2016-11-14 2017-03-22 中国人民解放军理工大学 Method and system for probabilistic aggregation query of road network moving objects

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1081610A2 (en) * 1999-09-03 2001-03-07 Cognos Incorporated Methods for transforming metadata models
CN102521386A (en) * 2011-12-22 2012-06-27 清华大学 Method for grouping space metadata based on cluster storage
EP3035211A1 (en) * 2014-12-18 2016-06-22 Business Objects Software Ltd. Visualizing large data volumes utilizing initial sampling and multi-stage calculations
CN106095951A (en) * 2016-06-13 2016-11-09 哈尔滨工程大学 Data space multi-dimensional indexing method based on load balancing and inquiry log
CN106528815A (en) * 2016-11-14 2017-03-22 中国人民解放军理工大学 Method and system for probabilistic aggregation query of road network moving objects

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YINGJIE SHI: "You Can Stop Early with COLA: Online Processing ofAggregate Queries in the Cloud", 《CIKM "12: PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE 》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435647A (en) * 2023-12-20 2024-01-23 北京遥感设备研究所 Approximate query method, device and equipment based on incremental sampling
CN117435647B (en) * 2023-12-20 2024-03-29 北京遥感设备研究所 Approximate query method, device and equipment based on incremental sampling

Also Published As

Publication number Publication date
CN108256028B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
US10354170B2 (en) Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus
US10216793B2 (en) Optimization of continuous queries in hybrid database and stream processing systems
CN103927346B (en) Query connection method on basis of data volumes
CN107193967A (en) A kind of multi-source heterogeneous industry field big data handles full link solution
US20180276264A1 (en) Index establishment method and device
US20120117054A1 (en) Query Analysis in a Database
CN105550374A (en) Random forest parallelization machine studying method for big data in Spark cloud service environment
CN110888859B (en) Connection cardinality estimation method based on combined deep neural network
WO2022257436A1 (en) Data warehouse construction method and system based on wireless communication network, and device and medium
CN110909066B (en) Streaming data processing method based on SparkSQL and RestAPI
CN111259933B (en) High-dimensional characteristic data classification method and system based on distributed parallel decision tree
CN109815283A (en) A kind of heterogeneous data source visual inquiry method
CN109033314A (en) The Query method in real time and system of extensive knowledge mapping in the case of memory-limited
US10467276B2 (en) Systems and methods for merging electronic data collections
CN113868230B (en) Large-scale connection optimization method based on Spark computing framework
CN111046059B (en) Low-efficiency SQL statement analysis method and system based on distributed database cluster
CN110597876B (en) Approximate query method for predicting future query based on offline learning historical query
CN102222108A (en) Scripting method and device
JP2019527441A (en) Distributed Computing Framework and Distributed Computing Method (DISTRIBUTED COMPUTING FRAMEWORK AND DISTRIBUTED COMPUTING METHOD)
CN108256028A (en) The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment
CN107679097A (en) A kind of distributed data processing method, system and storage medium
CN116090413A (en) Serialization-based general RDF data compression method
CN110347755A (en) A kind of big data multidimensional data analysis method and system based on Hadoop and HBase
CN111629217B (en) XGboost algorithm-based VOD (video on demand) service cache optimization method in edge network environment
CN113836395B (en) Service developer on-demand recommendation method and system based on heterogeneous information network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant