CN108256028A - The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment - Google Patents
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment Download PDFInfo
- Publication number
- CN108256028A CN108256028A CN201810025016.6A CN201810025016A CN108256028A CN 108256028 A CN108256028 A CN 108256028A CN 201810025016 A CN201810025016 A CN 201810025016A CN 108256028 A CN108256028 A CN 108256028A
- Authority
- CN
- China
- Prior art keywords
- sample
- collection
- stratified
- row
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, is included the following steps:Dynamic sampling system includes the processed offline stage for creating stratified sample and the online processing stage for dynamic select sample;In the processed offline stage, load row collection parsing module parses load query sentence;Data characteristics analysis module analyzes data characteristics;The total cover index of cover index computing module;Layering row collection determining module selection arranges collection for creating the layering of stratified sample;Stratified sample establishment is carried out in stratified sample data creation module;In the online processing stage, inquiry parsing module parses user's query statement;The stratified sample data of sample selection module selection sampling Least-cost;Sample size determining module determines the sample size extracted from each sample layer.It is of the invention effectively to solve the problems, such as since grouping estimation small caused by data skew is inaccurate in approximate query, and sampling cost is reduced under the limitation of finite sample memory space.
Description
Technical field
The present invention relates to a kind of data sampling method for approximate query, towards looking into particularly in cloud computing environment more
Ask the dynamic sampling method of load.
Background technology
Cloud computing environment provides a kind of mode of the management big data of high scalability and high performance-price ratio, becomes the big number of management
According to Mainstream Platform.However for big data inquiry even under cloud computing environment, be also unable to reach in real time processing and with
The speed requirement of user's interaction.For extemporaneous inquiry and exploratory data analysis application, taken a substantial amount of time with it and
Computing resource is completely accurate as a result, the result for being quickly obtained estimation is more meaningful to obtain.Approximate query processing technology is based on
Sample data estimates query result, so as to greatly reduce the query execution time, has important meaning for big data analysis
Justice.
Approximate query processing technology based on sample data is adopted at random by Acharya et al. propositions, the technology using unified
The method of sample, i.e., the probability that each tuple is extracted are equal.Uniform random sample is suitable for the equally distributed situation of data, excellent
Point is simple to operation, but when data skew causes packet aggregation inquiry small grouping occur, uniform random sample will cause
The accuracy of estimated result seriously reduces, so as to lose estimation meaning.Weight sampling method is proposed by Surajit et al., it should
Method analyzes the number that every tuple is met by inquiry predicate, and the probability weights being sampled in this, as it, tuple expire
The inquiry predicate number of foot is more, then the probability being sampled is bigger.The weight sampling technology can alleviate unification to a certain extent
Estimate inaccurate problem during stochastical sampling as caused by data skew, however its effect depend entirely on calculating sampling weights institute according to
According to load, when query statement is different from, sampling weights will be nonsensical.Congress's sampling side is proposed by Swarup et al.
Method, this method for institute it is possible that grouping row and inquiry establishment one general sample.However the validity of the sample with
The growth of inquiry quantity continuously decreases, and its pretreatment time exponentially increases with the increase of columns, can not cope with more
The application scenarios of query statement.Generally speaking, above-mentioned technology it is less in query statement type and it is fixed carry out,
Its autgmentability is not strong in practical application.In addition, above-mentioned technology is proposed in relational database field, cloud meter can not be suitable for
Calculate environment.
Invention content
For problems of the prior art, the purpose of the present invention is to provide approximation is used in a kind of cloud computing environment
The Dynamic and Multi dimensional method of sampling of inquiry, this method is used for the data preprocessing phase during approximate query, to raw data set
It is pre-processed, generates the sample data set of multiple layerings, when query statement arrives, according to query statement content and its desire
Sample size dynamic select sample data set, and the sample size extracted from each sample layer is provided.Method provided by the invention
It effectively solves the problems, such as since grouping estimation small caused by data skew is inaccurate in approximate query, and is stored in limited sample
Space limitation is lower to reduce sampling cost.
To achieve these goals, technical scheme is as follows:
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, this method to include the following steps:
1) dynamic sampling system includes creating the processed offline stage of stratified sample and for dynamic select sample
The online processing stage;
2) load row collection parsing module, data characteristics analysis module, cover index is set to calculate mould in the processed offline stage
Block, layering row collection determining module and stratified sample data creation module;
3) load row collection parsing module parses load query sentence, extracts the grouping of every load query sentence
Row collection calculates the number that each grouping row collection occurs, generates candidate layering row collection set CS, and analyzes wherein each candidate layering row
Collect CSiBetween relationship, result is exported and gives data characteristics analysis module;
4) data characteristics analysis module starts MapReduce operation scanning raw data set, and by raw data set
Data distribution result export to cover index computing module;
5) cover index computing module combination data distribution result is calculated based on each candidate's layering row collection CSiIt is layered
Total cover index under sampling situations;
6) the information selection of layering row collection determining module combination cover index and sample memory space etc. is used to create layering sample
This layering row collection;
7) start a MapReduce operation in stratified sample data creation module and carry out stratified sample establishment, Map functions
Raw data set is scanned, is transmitted it to according to tuple for creating the value that each layering row of stratified sample collect
Corresponding Reduce functions, Reduce functions update statistical information simultaneously export tuple data to stratified sample data set;
8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;
9) inquiry parsing module parses the query statement that user inputs online, and extracts every user and inquire language
The grouping row collection CS of sentenceq;
10) sample selection module arranges collection CS according to the grouping of user's query statementqIt selects to adopt from stratified sample data set
The stratified sample data of sample Least-cost;
11) sample size determining module determines what is extracted from each sample layer according to the sample size of approximate query sentence
Sample size.
The present invention has the following advantages:
1st, the present invention is determined by analyzing load characteristic and data distribution characteristics for creating layering row collection, and based on layering
Row collection and sample memory space create multiple multiple-dimensional hierarchical sample datas, are brought so as to solve data skew in approximate query
Estimated result inaccuracy problem;
2nd, during the row collection for creating stratified sample is determined, the present invention embodies given layering sample by cover index
After this, different query statements carries out the cost of stratified sampling using the sample, therefore the present invention is established for the extension of query load
Basis is determined;
3rd, it after giving stratified sample and total sample size, is determining from each sample layer during sample size, this
Invention combines inquiry packets row collection CSqRow collection CS is layered with samplesRelation you can well imagine out solution:(1) work as CSs=CSq
When, larger value is selected as corresponding from the sample size and the sample size proportional to sample layer size that each sample layer is divided equally
The sample size of sample layer, so as to solve the problems, such as that small grouping and big packet samples amount are too small;(2) whenWhen, this
Invention first by sample in CSqThe identical sample of the upper value of row collection is laminated and determines that sample is big as a big sample layer
It is small, stratified sampling is then carried out again in each large sample layer, it is achieved thereby that being dynamically determined the sample size of each sample layer.
Description of the drawings
Fig. 1 is the Dynamic and Multi dimensional sampling frame figure that approximate query is used in cloud computing environment.
Specific embodiment
With reference to embodiment and Figure of description, the present invention is further illustrated.
The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment, is included the following steps:1) dynamic
Sampling system includes the processed offline stage for creating stratified sample and the online processing stage for dynamic select sample;2)
Load row collection parsing module, data characteristics analysis module, cover index computing module, layering row collection are set in the processed offline stage
Determining module and stratified sample data creation module;3) load row collection parsing module parses load query sentence, extracts
Go out the grouping row collection of every query statement, the number that each row collection occurs is calculated, and analyze the relationship between each row collection, by result
It exports and gives data characteristics analysis module;4) data characteristics analysis module starts a MapReduce operations scanning raw data set,
And data distribution result is exported and gives cover index computing module;5) cover index computing module combination Data distribution information calculates
Total cover index in the case of stratified sampling is carried out by each candidate's layering row collection;6) layering row collection determining module refers to reference to covering
The information selection such as number and sample memory space arranges collection for creating the layering of stratified sample;7) start in stratified sample creation module
One MapReduce operation, Map function pair raw data sets are scanned, will according to value of the tuple on each layering row collection
It is transmitted to corresponding Reduce functions, and Reduce functions update statistical information simultaneously exports tuple data to stratified sample number
According to collection;8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;9) inquiry solution
Analysis module parses query statement input by user, and extracts grouping row collection;10) sample selection module is according to inquiry language
The stratified sample data of the grouping row collection selection sampling Least-cost of sentence;11) sample size determining module is according to approximate query language
The sample size of sentence determines the sample size extracted from each sample layer.
In the step 3), load row set analysis module the step of it is as follows:(1) to all SQL query statements in load
It is parsed, extracts corresponding grouping row collection;(2) number that each grouping row collection occurs is calculated, and generates candidate layering row
Collect set CS={ CS1,CS2,...,CSM};(3) any two layering row collection CS in CS is analyzediAnd CSjRelationship, ifThen by CSj-CSiSet RS is stored in, and result is exported to data analysis module.
In the step 4), start a MapReduce operations scanning initial data and data characteristics is analyzed, count
The tuple number that raw data set respectively arranges different values on collection in RS is calculated, is as follows:(1) the Map functions in Map stages
Each tuple r of raw data set is analyzed, and forms key-value pair, sets and the entitled key of collection is respectively arranged in RS, setting tuple is corresponding
Packet attributes value on row collection is value;(2) the Combine functions in Map stages will belong to it is same row collection several key-values into
Row merges, and forms a new key-value to output;(3) all key-values for belonging to same row collection are same to being transferred to
Reduce functions, the value of the function pair key-value pair merge, and calculate the value number of different attribute value on the row collection, so as to
Generation raw data set respectively arranges the number of different values on collection in RS.
In the step 5), cover index computing module is calculated with either rank collection CS in CSiCollection, which is arranged, for layering creates layering
During sample, each candidate layering row collection CS in CSjCover index CIi,j, calculation is:If CSj=CSi, then CIi,j=1;
IfThen CIi,j=1/vi,j, wherein vi,jRepresent that raw data set collects CS in rowj-CSiOn different value numbers;
Other situations, then CIi,j=0.
In the step 6), layering row collection determines to be as follows:(1) for any candidate layering row collection CSi, meter
It calculates based on CSiCreate total cover index f during stratified samplei, calculation formula is:In formula, PjRepresent CSj
The probability occurred in load, calculation formula areNjFor CSjThe number occurred in the load;Ci,jTo be based on
CSiWhen creating stratified sample, row collection CSjCover index;(2) total cover index of all candidate layering row collection is subjected to descending
Sequence selects first X maximum candidate layering row collection of total cover index as the row collection eventually for establishment stratified sample, X by
The space size that system is used to store sample determines.
In the step 7), start a MapReduce operation and carry out stratified sample establishment, be as follows:(1)
The Map functions scanning raw data set in Map stages, analyzes every tuple r and generates key-value pair, set key is to collect title by row
With the structure that is formed of value on row collection, wherein output of the row collection title from step 6) is as a result, set the entire tuple to be
Value;(2) belong to same row collection and the identical key-value of value is to being transferred to same Reduce functions on the row collection, at this
In function, statistics belongs to the tuple number of same sample layer, and tuple is exported to file, forms stratified sample file;
In the step 9), the query statement that user inputs online is parsed, extracts grouping row collection CSq, then
The stratified sample data of sampling Least-cost are selected by step 10), selection method is:If there are a sample S (CSs) layering
Row collection CSs=CSq, then the sample is selected;Otherwise, selection sample S (CSs), wherein CSsIt is to meet conditionMinimum
Row collection.According to total sample size N of user's approximate query sentence, determined by step 11) sample size determining module from each sample
The sample size selected in this layer, if meeting CSs=CSq, then it is from the sample size of each sample layer extraction
Wherein, T is the number of sample layer, | Gj| it is the size of each sample layer, | R | it is the size of raw data set;If
MeetIt is then determining to be from the step of each sample layer sample drawn size:(1) by sample in CSqIt is taken on row collection
It is laminated and consider as a big sample layer to be worth identical sample, the sample size extracted from each large sample layer is
(2) in each big sample layer GiIn, from wherein each small sample layer GijThe sample size of extraction is
Above-described embodiment is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill of the art
For personnel, without departing from the principle of the present invention, several improvement and equivalent replacement can also be made, these are to the present invention
Claim be improved with the technical solution after equivalent replacement, each fall within protection scope of the present invention.
Claims (7)
1. in a kind of cloud computing environment be used for approximate query the Dynamic and Multi dimensional method of sampling, which is characterized in that this method include with
Lower step:
1) dynamic sampling system includes creating the processed offline stage of stratified sample and for the online of dynamic select sample
Processing stage;
2) load row collection parsing module in the processed offline stage is set, data characteristics analysis module, cover index computing module, is divided
Stratose collection determining module and stratified sample data creation module;
3) load row collection parsing module parses load query sentence, extracts the grouping row of every load query sentence
Collection calculates the number that each grouping row collection occurs, generates candidate layering row collection set CS, and analyzes wherein each candidate layering row collection
CSiBetween relationship, result is exported and gives data characteristics analysis module;
4) data characteristics analysis module starts MapReduce operation scanning raw data set, and by the number of raw data set
It is exported according to distribution results to cover index computing module;
5) cover index computing module combination data distribution result is calculated based on each candidate's layering row collection CSiCarry out stratified sampling
In the case of total cover index;
6) layering row collection determining module combination cover index and sample memory space etc. information select to create stratified sample
Layering row collection;
7) start a MapReduce operation in stratified sample data creation module and carry out stratified sample establishment, Map function pairs are former
Beginning data set is scanned, and is transmitted it to relatively for creating the value that each layering row of stratified sample collect according to tuple
The Reduce functions answered, Reduce functions update statistical information simultaneously export tuple data to stratified sample data set;
8) inquiry parsing module, sample selection module and sample size determining module are set in the online processing stage;
9) inquiry parsing module parses the query statement that user inputs online, and extracts every user's query statement
Grouping row collection CSq;
10) sample selection module arranges collection CS according to the grouping of user's query statementqThe selection sampling cost from stratified sample data set
Minimum stratified sample data;
11) sample that sample size determining module determines to be extracted from each sample layer according to the sample size of approximate query sentence
Size.
2. the method as described in claim 1, which is characterized in that in the step 3), load row set analysis module looks into load
It askes sentence to be parsed, include the following steps:
(1) all SQL query statements in load query sentence are parsed, extracts corresponding grouping row collection;
(2) number that each grouping row collection occurs is calculated, and generates candidate layering row collection set CS={ CS1,CS2,...,CSM};
(3) it analyzes any two candidate in CS and is layered row collection CSiAnd CSjRelationship, ifThen by CSj-CSiDeposit
Set RS, and result is exported and gives data characteristics analysis module.
3. the method as described in claim 1, which is characterized in that in the step 4), start a MapReduce operations scanning
Initial data simultaneously analyzes data characteristics, includes the following steps:
(1) each tuple r of the Map Functional Analysis raw data sets in Map stages, and key-value pair is formed, it sets and collection is respectively arranged in RS
Entitled key, packet attributes value of the setting tuple on respective column collection is value;
(2) several key-values that the Combine functions in Map stages will belong to same row collection are to merging, formed one it is new
Key-value is to output;
(3) it is all to belong to the same key-values for arranging collection to being transferred to same Reduce functions, the function pair key-value pair
Value merges, and calculates the value number of different attribute value on the row collection, is respectively arranged on collection not in RS so as to generate raw data set
With the number of value.
4. the method as described in claim 1, which is characterized in that in the step 5), cover index computing module is calculated with CS
Middle either rank collection CSiWhen arranging collection establishment stratified sample for layering, each candidate layering row collection CS in CSjCover index CIi,j, meter
Calculation mode is:If CSj=CSi, then CIi,j=1;IfThen CIi,j=1/vi,j, wherein vi,jRepresent raw data set
In CSj-CSiOn different value numbers;Other situations, then CIi,j=0.
5. the method as described in claim 1, which is characterized in that in the step 6), layering row collection determining module combines covering
The selection of the information such as index and sample memory space arranges collection for creating the layering of stratified sample, includes the following steps:
(1) for any candidate layering row collection CSi, calculate based on CSiCreate total cover index f during stratified samplei,Wherein, PjRepresent CSjThe probability occurred in load query sentence,NjFor CSjNegative
Carry the number occurred in query statement;Ci,jTo be based on CSiWhen creating stratified sample, CSjCover index;
(2) total cover index of all candidate layering row collection is subjected to descending sort, selects the maximum preceding X time of total cover index
Choosing layering row collection arranges collection as eventually for the grouping for creating stratified sample, and X is used to store the sky of sample by dynamic sampling system
Between size determine.
6. the method as described in claim 1, which is characterized in that in the step 7), start a MapReduce operation and carry out
Stratified sample creates, and includes the following steps:
(1) the Map functions scanning raw data set in Map stages, analyzes every tuple r and generates key-value pair, set key is by arranging
The structure that value in collection title and row collection is formed, wherein output of the row collection title from step 6) is as a result, setting is entire
Tuple is value;
(2) belong to same row collection and collect the identical key-value of value to being transferred to same Reduce functions in grouping row,
In the function, statistics belongs to the tuple number of same sample layer, and tuple is exported to file, forms stratified sample file.
7. the method as described in claim 1, which is characterized in that in the step 9), the query statement that is inputted online to user
It is parsed, extracts grouping row collection CSq, then selected to sample the stratified sample data of Least-cost, selecting party by step 10)
Method is:If there are a sample S (CSs) layering row collection CSs=CSq, then the sample is selected;Otherwise, selection sample S (CSs),
Wherein CSsIt is to meet conditionMinimum row collection;According to total sample size N of user's approximate query sentence, by step
11) sample size determining module determines the sample size selected from each sample layer, if meeting CSs=CSq, then from each sample
This layer extraction sample size be
Wherein, T is the number of grouping, | Gj| it is the size of each sample layer, | R | it is the size of raw data set;If meetIt is then determining to be from the step of each sample layer sample drawn size:(1) by sample in CSqThe upper value phase of row collection
Same sample is laminated and considers that the sample size extracted from each large sample layer is as a big sample layer
(2) in each big sample layer GiIn, from wherein each small sample layer GijThe sample size of extraction is
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810025016.6A CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810025016.6A CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108256028A true CN108256028A (en) | 2018-07-06 |
CN108256028B CN108256028B (en) | 2021-09-28 |
Family
ID=62726068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810025016.6A Active CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108256028B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435647A (en) * | 2023-12-20 | 2024-01-23 | 北京遥感设备研究所 | Approximate query method, device and equipment based on incremental sampling |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1081610A2 (en) * | 1999-09-03 | 2001-03-07 | Cognos Incorporated | Methods for transforming metadata models |
CN102521386A (en) * | 2011-12-22 | 2012-06-27 | 清华大学 | Method for grouping space metadata based on cluster storage |
EP3035211A1 (en) * | 2014-12-18 | 2016-06-22 | Business Objects Software Ltd. | Visualizing large data volumes utilizing initial sampling and multi-stage calculations |
CN106095951A (en) * | 2016-06-13 | 2016-11-09 | 哈尔滨工程大学 | Data space multi-dimensional indexing method based on load balancing and inquiry log |
CN106528815A (en) * | 2016-11-14 | 2017-03-22 | 中国人民解放军理工大学 | Method and system for probabilistic aggregation query of road network moving objects |
-
2018
- 2018-01-11 CN CN201810025016.6A patent/CN108256028B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1081610A2 (en) * | 1999-09-03 | 2001-03-07 | Cognos Incorporated | Methods for transforming metadata models |
CN102521386A (en) * | 2011-12-22 | 2012-06-27 | 清华大学 | Method for grouping space metadata based on cluster storage |
EP3035211A1 (en) * | 2014-12-18 | 2016-06-22 | Business Objects Software Ltd. | Visualizing large data volumes utilizing initial sampling and multi-stage calculations |
CN106095951A (en) * | 2016-06-13 | 2016-11-09 | 哈尔滨工程大学 | Data space multi-dimensional indexing method based on load balancing and inquiry log |
CN106528815A (en) * | 2016-11-14 | 2017-03-22 | 中国人民解放军理工大学 | Method and system for probabilistic aggregation query of road network moving objects |
Non-Patent Citations (1)
Title |
---|
YINGJIE SHI: "You Can Stop Early with COLA: Online Processing ofAggregate Queries in the Cloud", 《CIKM "12: PROCEEDINGS OF THE 21ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE 》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435647A (en) * | 2023-12-20 | 2024-01-23 | 北京遥感设备研究所 | Approximate query method, device and equipment based on incremental sampling |
CN117435647B (en) * | 2023-12-20 | 2024-03-29 | 北京遥感设备研究所 | Approximate query method, device and equipment based on incremental sampling |
Also Published As
Publication number | Publication date |
---|---|
CN108256028B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10354170B2 (en) | Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus | |
US10216793B2 (en) | Optimization of continuous queries in hybrid database and stream processing systems | |
CN103927346B (en) | Query connection method on basis of data volumes | |
CN107193967A (en) | A kind of multi-source heterogeneous industry field big data handles full link solution | |
US20180276264A1 (en) | Index establishment method and device | |
US20120117054A1 (en) | Query Analysis in a Database | |
CN105550374A (en) | Random forest parallelization machine studying method for big data in Spark cloud service environment | |
CN110888859B (en) | Connection cardinality estimation method based on combined deep neural network | |
WO2022257436A1 (en) | Data warehouse construction method and system based on wireless communication network, and device and medium | |
CN110909066B (en) | Streaming data processing method based on SparkSQL and RestAPI | |
CN111259933B (en) | High-dimensional characteristic data classification method and system based on distributed parallel decision tree | |
CN109815283A (en) | A kind of heterogeneous data source visual inquiry method | |
CN109033314A (en) | The Query method in real time and system of extensive knowledge mapping in the case of memory-limited | |
US10467276B2 (en) | Systems and methods for merging electronic data collections | |
CN113868230B (en) | Large-scale connection optimization method based on Spark computing framework | |
CN111046059B (en) | Low-efficiency SQL statement analysis method and system based on distributed database cluster | |
CN110597876B (en) | Approximate query method for predicting future query based on offline learning historical query | |
CN102222108A (en) | Scripting method and device | |
JP2019527441A (en) | Distributed Computing Framework and Distributed Computing Method (DISTRIBUTED COMPUTING FRAMEWORK AND DISTRIBUTED COMPUTING METHOD) | |
CN108256028A (en) | The Dynamic and Multi dimensional method of sampling of approximate query is used in a kind of cloud computing environment | |
CN107679097A (en) | A kind of distributed data processing method, system and storage medium | |
CN116090413A (en) | Serialization-based general RDF data compression method | |
CN110347755A (en) | A kind of big data multidimensional data analysis method and system based on Hadoop and HBase | |
CN111629217B (en) | XGboost algorithm-based VOD (video on demand) service cache optimization method in edge network environment | |
CN113836395B (en) | Service developer on-demand recommendation method and system based on heterogeneous information network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |