CN108256028B - Multi-dimensional dynamic sampling method for approximate query in cloud computing environment - Google Patents
Multi-dimensional dynamic sampling method for approximate query in cloud computing environment Download PDFInfo
- Publication number
- CN108256028B CN108256028B CN201810025016.6A CN201810025016A CN108256028B CN 108256028 B CN108256028 B CN 108256028B CN 201810025016 A CN201810025016 A CN 201810025016A CN 108256028 B CN108256028 B CN 108256028B
- Authority
- CN
- China
- Prior art keywords
- sample
- hierarchical
- data
- column set
- query
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A multi-dimensional dynamic sampling method for approximating a query in a cloud computing environment, comprising the steps of: the dynamic sampling system comprises an offline processing stage for creating layered samples and an online processing stage for dynamically selecting samples; in the off-line processing stage, a load list set analysis module analyzes a load query statement; the data characteristic analysis module analyzes the data characteristics; the coverage index calculation module calculates the total coverage index; the hierarchical column set determining module selects a hierarchical column set used for creating hierarchical samples; the method comprises the steps of establishing a layered sample at a layered sample data establishing module; in the on-line processing stage, the query analysis module analyzes the query sentence of the user; the sample selection module selects the layered sample data with the minimum sampling cost; the sample size determination module determines the size of the sample drawn from each sample layer. The invention effectively solves the problem of inaccurate small-packet estimation caused by data tilt in approximate query, and reduces sampling cost under the limit of limited sample storage space.
Description
Technical Field
The invention relates to a data sampling method for approximate query, in particular to a dynamic sampling method facing to multi-query load in a cloud computing environment.
Background
The cloud computing environment provides a high-expansibility and high-cost-performance mode for managing big data, and becomes a mainstream platform for managing the big data. However, even in a cloud computing environment, queries for large data cannot meet the speed requirements for real-time processing and interaction with users. For ad hoc query and exploratory data analysis applications, it is more meaningful to quickly obtain estimated results rather than expend a lot of time and computational resources to obtain fully accurate results. The approximate query processing technology estimates the query result based on sample data, thereby greatly reducing the query execution time and having important significance for big data analysis.
An approximate query processing technique based on sample data is proposed by Acharya et al, which uses a uniform random sampling method, i.e., each tuple is extracted with equal probability. The unified random sampling is suitable for the condition of uniform distribution of data, has the advantages of simplicity and easiness in operation, but when small groups are generated in group aggregation query due to data inclination, the accuracy of an estimation result is seriously reduced due to the unified random sampling, so that the estimation significance is lost. The Surajit et al provides a weighted sampling method, which analyzes the number of query predicates that each tuple can satisfy, and takes the number as the probability weight of the tuple being sampled, and the more the number of query predicates that the tuple satisfies, the greater the probability of being sampled. The weighted sampling technology can relieve the problem of inaccurate estimation caused by data inclination in unified random sampling to a certain extent, but the effect of the weighted sampling technology completely depends on the load on which the sampling weight is calculated, and when the query statement is different from the query statement, the sampling weight has no meaning. A congress sampling method was proposed by swaroup et al that creates a common sample for all possible grouped columns and queries. However, the effectiveness of the sample is gradually reduced along with the increase of the number of queries, and the preprocessing time is exponentially increased along with the increase of the number of columns, so that the application scenario of the multi-query statement cannot be dealt with. In general, the above techniques are performed under the condition that the query statement is of a small and fixed type, and the extensibility is not strong in practical application. In addition, the above techniques are all proposed in the field of relational databases, and cannot be applied to a cloud computing environment.
Disclosure of Invention
The method is used for a data preprocessing stage in an approximate query process, preprocessing an original data set to generate a plurality of layered sample data sets, dynamically selecting the sample data sets according to query statement contents and sampling sizes of the query statement contents when a query statement arrives, and providing sample size extracted from each sample layer. The method provided by the invention effectively solves the problem of inaccurate small packet estimation caused by data tilt in approximate query, and reduces sampling cost under the limit of limited sample storage space.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a method for multi-dimensional dynamic sampling for approximating a query in a cloud computing environment, the method comprising the steps of:
1) the dynamic sampling system comprises an offline processing stage for creating layered samples and an online processing stage for dynamically selecting samples;
2) a load list set analysis module, a data characteristic analysis module, a coverage index calculation module, a hierarchical list set determination module and a hierarchical sample data creation module are arranged at an offline processing stage;
3) the load column set analysis module analyzes the load query statements, extracts the grouped column sets of each load query statement, calculates the occurrence frequency of each grouped column set, generates a candidate hierarchical column set CS, and analyzes each candidate hierarchical column set CSiThe relationship between the data and the data is output to a data characteristic analysis module;
4) the data characteristic analysis module starts a MapReduce operation to scan the original data set and outputs a data distribution result of the original data set to the coverage index calculation module;
5) the coverage index calculation module calculates the CS based on each candidate hierarchical column set by combining the data distribution resultiPerforming total coverage index under the condition of layered sampling;
6) the hierarchical column set determining module combines information such as a coverage index and a sample storage space to select a hierarchical column set for creating a hierarchical sample;
7) starting a MapReduce operation at a layered sample data creating module to create layered samples, scanning an original data set by a Map function, transmitting the original data set to a corresponding Reduce function according to values of tuples on each layered column set for creating the layered samples, updating statistical information by the Reduce function and outputting tuple data to the layered sample data set;
8) a query analysis module, a sample selection module and a sample size determination module are arranged at the online processing stage;
9) the query analysis module analyzes the query sentences input by the user on line and extracts each user query sentenceOf a group column set CSq;
10) The sample selection module selects CS according to the grouping column set of the user query statementqSelecting the layered sample data with the minimum sampling cost from the layered sample data set;
11) the sample size determination module determines a sample size drawn from each sample layer according to a sample size of the approximate query statement.
The invention has the following advantages:
1. the method is used for establishing the hierarchical column set by analyzing the load characteristics and the data distribution characteristics, and establishing a plurality of multi-dimensional hierarchical sample data based on the hierarchical column set and the sample storage space, so that the problem of inaccurate estimation result caused by data inclination in approximate query is solved;
2. in the process of determining the column set for creating the hierarchical sample, after the coverage index represents the given hierarchical sample, different query statements use the sample to perform hierarchical sampling, so that the method lays a foundation for the expansion of query load;
3. given the sample size of the hierarchical layer and the total sample size, the present invention combines the query of the packet column set CS in determining the sample size from each sample layerqWith sample hierarchical set of columns CSsThe relationships of (a) respectively propose solutions: (1) when CS is useds=CSqWhen the sampling size of the corresponding sample layer is selected, a larger value is selected from the average sample size of each sample layer and the sample size proportional to the sample layer size, so that the problem that the sample size of small groups and large groups is too small is solved; (2) when in useThe invention first puts the sample in CSqAnd combining the sample layers with the same value on the column set into a large sample layer to determine the sample size, and then performing layered sampling in each large sample layer, thereby dynamically determining the sample size of each sample layer.
Drawings
FIG. 1 is a diagram of a multi-dimensional dynamic sampling framework for approximating queries in a cloud computing environment.
Detailed Description
The invention is further described with reference to the following examples and the accompanying drawings.
A multi-dimensional dynamic sampling method for approximating a query in a cloud computing environment, comprising the steps of: 1) the dynamic sampling system comprises an offline processing stage for creating layered samples and an online processing stage for dynamically selecting samples; 2) a load list set analysis module, a data characteristic analysis module, a coverage index calculation module, a hierarchical list set determination module and a hierarchical sample data creation module are arranged at an offline processing stage; 3) the load column set analysis module analyzes the load query sentences, extracts the grouped column sets of each query sentence, calculates the occurrence frequency of each column set, analyzes the relation among the column sets and outputs the result to the data characteristic analysis module; 4) the data characteristic analysis module starts a MapReduce operation to scan an original data set and outputs a data distribution result to the coverage index calculation module; 5) the coverage index calculation module is used for calculating the total coverage index under the condition of carrying out layered sampling according to each candidate layered column set by combining the data distribution information; 6) the hierarchical column set determining module combines information such as a coverage index and a sample storage space to select a hierarchical column set for creating a hierarchical sample; 7) starting a MapReduce operation at a layered sample creating module, scanning an original data set by a Map function, transmitting the original data set to a corresponding Reduce function according to values of tuples on each layered column set, updating statistical information by the Reduce function and outputting tuple data to a layered sample data set; 8) a query analysis module, a sample selection module and a sample size determination module are arranged at the online processing stage; 9) the query analysis module analyzes the query sentence input by the user and extracts a grouping list set; 10) the sample selection module selects the layered sample data with the minimum sampling cost according to the grouping column set of the query statement; 11) the sample size determination module determines a sample size drawn from each sample layer according to a sample size of the approximate query statement.
In the step 3), the load list set analysis module comprises the following steps: (1) analyzing all SQL query statements in the load, and extracting corresponding groupsA column set; (2) calculating the occurrence times of each grouping column set and generating a candidate layering column set CS ═ CS1,CS2,...,CSM}; (3) analyzing any two hierarchical sets of columns CS in CSiAnd CSjIn a relation of (1), ifThen CS will bej-CSiAnd storing the result into the set RS and outputting the result to the data analysis module.
In the step 4), a MapReduce job is started to scan the original data and analyze the data characteristics, and the number of tuples of the original data set with different values on each RS column set is calculated, which specifically comprises the following steps: (1) analyzing each tuple r of the original data set by a Map function in a Map stage, forming a key-value pair, setting the name of each column set in the RS as a key, and setting the grouping attribute value of the tuple on the corresponding column set as a value; (2) the combination function in the Map stage combines a plurality of key-value pairs belonging to the same column set to form a new key-value pair output; (3) all key-value pairs belonging to the same column set are transmitted to the same Reduce function, the function combines the values of the key-value pairs, and the value number of different attribute values on the column set is calculated, so that the number of different values of the original data set on each column set of the RS is generated.
In the step 5), the coverage index calculation module calculates any column set CS in the CSiWhen creating hierarchical samples for a hierarchical set of columns, each candidate hierarchical set of columns CS in CSjCoverage index CIi,jThe calculation method is as follows: if CSj=CSiThen CI isi,j1 is ═ 1; if it isThen CI isi,j=1/vi,jWherein v isi,jRepresenting a set of columns CS of a raw data seti-CSjThe different values of the above are the numbers; otherwise, then CIi,j=0。
In the step 6), the specific steps for determining the hierarchical column set are as follows: (1) for any candidate hierarchical column set CSiThe calculation is based on CSiCreating scoresTotal coverage index f in case of layer samplesiThe calculation formula is as follows:in the formula, PjDenotes CSjThe probability of occurrence in the load is calculated by the formulaNjIs CSjNumber of occurrences in the load; CIi,jIs based on CSiWhen creating hierarchical samples, the column set CSjThe coverage index of (a); (2) and sorting the total coverage indexes of all candidate hierarchical column sets in a descending order, selecting the first X candidate hierarchical column sets with the maximum total coverage indexes as the column sets finally used for creating the hierarchical samples, wherein X is determined by the size of the space used for storing the samples by the system.
In the step 7), a MapReduce job is started to create a hierarchical sample, and the specific steps are as follows: (1) scanning an original data set by a Map function in a Map stage, analyzing each tuple r and generating a key-value pair, setting a key as a structural body formed by a column set name and values on the column set, wherein the column set name is from an output result in the step 6), and setting the whole tuple as a value; (2) key-value pairs which belong to the same column set and have the same value on the column set are transmitted to the same Reduce function, in the function, the number of tuples belonging to the same sample layer is counted, and the tuples are output to a file to form a layered sample file;
in the step 9), the query sentence input by the user on line is analyzed, and the grouping column set CS is extractedqThen, step 10) selects the layered sample data with the minimum sampling cost, and the selection method is as follows: if there is one sample S (CS)s) Hierarchical set of columns CSs=CSqThen the sample is selected; otherwise, sample S (CS) is selecteds) Wherein CSsIs to satisfy the conditionThe minimum column set. According to the total sample size N of the user approximate query statement, the step 11) sample size determining module determinesDetermining the number of samples selected from each sample layer if CS is satisfieds=CSqThe size of the sample extracted from each sample layer is
Wherein T is the number of sample layers, | GjL is the size of each sample layer, and l R is the size of the original data set; if it satisfiesThe step of determining the size of the sample to be extracted from each sample layer is: (1) subjecting the sample to CSqThe sample layers with the same value on the column set are combined into a large sample layer, and the size of the sample extracted from each large sample layer is
(2) At each large sample layer GiFrom each of which a small sample layer GijThe size of the sample extracted is
The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents fall within the scope of the invention as defined in the claims.
Claims (5)
1. A method for multi-dimensional dynamic sampling for approximating a query in a cloud computing environment, the method comprising the steps of:
1) the dynamic sampling system comprises an offline processing stage for creating layered samples and an online processing stage for dynamically selecting samples;
2) a load list set analysis module, a data characteristic analysis module, a coverage index calculation module, a hierarchical list set determination module and a hierarchical sample data creation module are arranged at an offline processing stage;
3) the load column set analysis module analyzes the load query statements, extracts the grouped column sets of each load query statement, calculates the occurrence frequency of each grouped column set, generates a candidate hierarchical column set CS, and analyzes each candidate hierarchical column set CSiThe relationship between the data and the data is output to a data characteristic analysis module;
4) the data characteristic analysis module starts a MapReduce operation to scan the original data set and outputs a data distribution result of the original data set to the coverage index calculation module;
5) the coverage index calculation module calculates the CS based on each candidate hierarchical column set by combining the data distribution resultiPerforming total coverage index under the condition of layered sampling; the coverage index calculation module calculates any column set CS in the CSiWhen creating hierarchical samples for a hierarchical set of columns, each candidate hierarchical set of columns CS in CSjCoverage index CIi,jThe calculation method is as follows: if CSj=CSiThen CI isi,j1 is ═ 1; if it isThen CI isi,j=1/vi,jWherein v isi,jRepresenting the original data set at CSi-CSjThe different values of the above are the numbers; otherwise, then CIi,j=0;
6) The hierarchical column set determination module, in combination with the coverage index and the sample storage space information, selects a hierarchical column set for creating hierarchical samples, comprising the steps of:
(1) for any candidate hierarchical column set CSiThe calculation is based on CSiTotal coverage index f when creating hierarchical samplesi,Wherein, PjDenotes CSjThe probability of occurrence in a load query statement,Njis CSjNumber of occurrences in a load query statement; CIi,jIs based on CSiWhen creating layered samples CSjThe coverage index of (a);
(2) sorting the total coverage indexes of all candidate hierarchical column sets in a descending order, selecting the first X candidate hierarchical column sets with the maximum total coverage indexes as the grouping column sets finally used for creating the hierarchical samples, wherein X is determined by the space size of the dynamic sampling system used for storing the samples;
7) starting a MapReduce operation at a layered sample data creating module to create layered samples, scanning an original data set by a Map function, transmitting the original data set to a corresponding Reduce function according to values of tuples on each layered column set for creating the layered samples, updating statistical information by the Reduce function and outputting tuple data to the layered sample data set;
8) a query analysis module, a sample selection module and a sample size determination module are arranged at the online processing stage;
9) the query analysis module analyzes the query sentences input by the user on line and extracts the grouping column set CS of each user query sentenceq;
10) The sample selection module selects CS according to the grouping column set of the user query statementqSelecting the layered sample data with the minimum sampling cost from the layered sample data set;
11) the sample size determination module determines a sample size drawn from each sample layer according to a sample size of the approximate query statement.
2. The method as claimed in claim 1, wherein in the step 3), the load list set analysis module parses the load query statement, which includes the following steps:
(1) analyzing all SQL query sentences in the load query sentences, and extracting corresponding grouping column sets;
(2) calculating the occurrence times of each grouping column set and generating a candidate layering column set CS ═ CS1,CS2,...,CSM};
3. The method as claimed in claim 1, wherein in the step 4), a MapReduce job is started to scan raw data and analyze data characteristics, and the method comprises the following steps:
(1) analyzing each tuple r of the original data set by a Map function in a Map stage, forming a key-value pair, setting the name of each column set in the RS as a key, and setting the grouping attribute value of the tuple on the corresponding column set as a value;
(2) the combination function in the Map stage combines the key-value pairs belonging to the same column set to form a new key-value pair output;
(3) all key-value pairs belonging to the same column set are transmitted to the same Reduce function, the function combines the values of the key-value pairs, and the value number of different attribute values on the column set is calculated, so that the number of different values of the original data set on each column set of the RS is generated.
4. The method as claimed in claim 1, wherein in the step 7), a MapReduce job is started for hierarchical sample creation, which includes the following steps:
(1) scanning an original data set by a Map function in a Map stage, analyzing each tuple r and generating a key-value pair, setting a key as a structural body formed by a column set name and values on the column set, wherein the column set name is from an output result in the step 6), and setting the whole tuple as a value;
(2) and key-value pairs which belong to the same column set and have the same value on the grouped column set are transmitted to the same Reduce function, in the function, the number of tuples belonging to the same sample layer is counted, and the tuples are output to a file to form a layered sample file.
5. The method as claimed in claim 1, wherein in step 9), the query sentence inputted by the user on-line is analyzed, and the grouping column set CS is extractedqThen, step 10) selects the layered sample data with the minimum sampling cost, and the selection method is as follows: if there is one sample S (CS)s) Hierarchical set of columns CSs=CSqThen the sample is selected; otherwise, sample S (CS) is selecteds) Wherein CSsIs to satisfy the conditionA minimum column set of; according to the total sample size N of the user approximate query statement, the sample size determining module in the step 11) determines the number of samples selected from each sample layer, if the number satisfies CSs=CSqThe size of the sample extracted from each sample layer is
Wherein T is the number of sample layers, | GjL is the size of each sample layer, and l R is the size of the original data set; if it satisfiesThe step of determining the size of the sample to be extracted from each sample layer is: (1) subjecting the sample to CSqThe sample layers with the same value on the column set are combined into a large sample layer, and the size of the sample extracted from each large sample layer is
(2) At each large sample layer GiFrom each of which a small sample layer GijThe size of the sample extracted is
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810025016.6A CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810025016.6A CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108256028A CN108256028A (en) | 2018-07-06 |
CN108256028B true CN108256028B (en) | 2021-09-28 |
Family
ID=62726068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810025016.6A Active CN108256028B (en) | 2018-01-11 | 2018-01-11 | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108256028B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117435647B (en) * | 2023-12-20 | 2024-03-29 | 北京遥感设备研究所 | Approximate query method, device and equipment based on incremental sampling |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1081610A2 (en) * | 1999-09-03 | 2001-03-07 | Cognos Incorporated | Methods for transforming metadata models |
CN102521386A (en) * | 2011-12-22 | 2012-06-27 | 清华大学 | Method for grouping space metadata based on cluster storage |
EP3035211A1 (en) * | 2014-12-18 | 2016-06-22 | Business Objects Software Ltd. | Visualizing large data volumes utilizing initial sampling and multi-stage calculations |
CN106095951A (en) * | 2016-06-13 | 2016-11-09 | 哈尔滨工程大学 | Data space multi-dimensional indexing method based on load balancing and inquiry log |
CN106528815A (en) * | 2016-11-14 | 2017-03-22 | 中国人民解放军理工大学 | Method and system for probabilistic aggregation query of road network moving objects |
-
2018
- 2018-01-11 CN CN201810025016.6A patent/CN108256028B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1081610A2 (en) * | 1999-09-03 | 2001-03-07 | Cognos Incorporated | Methods for transforming metadata models |
CN102521386A (en) * | 2011-12-22 | 2012-06-27 | 清华大学 | Method for grouping space metadata based on cluster storage |
EP3035211A1 (en) * | 2014-12-18 | 2016-06-22 | Business Objects Software Ltd. | Visualizing large data volumes utilizing initial sampling and multi-stage calculations |
CN106095951A (en) * | 2016-06-13 | 2016-11-09 | 哈尔滨工程大学 | Data space multi-dimensional indexing method based on load balancing and inquiry log |
CN106528815A (en) * | 2016-11-14 | 2017-03-22 | 中国人民解放军理工大学 | Method and system for probabilistic aggregation query of road network moving objects |
Non-Patent Citations (1)
Title |
---|
You Can Stop Early with COLA: Online Processing ofAggregate Queries in the Cloud;Yingjie Shi;《CIKM "12: Proceedings of the 21st ACM international conference on Information and knowledge 》;20121031;1223-1232 * |
Also Published As
Publication number | Publication date |
---|---|
CN108256028A (en) | 2018-07-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11687801B2 (en) | Knowledge graph data structures and uses thereof | |
CN103955489B (en) | Based on the Massive short documents of Information Entropy Features weight quantization this distributed KNN sorting algorithms and system | |
US11003649B2 (en) | Index establishment method and device | |
US9798831B2 (en) | Processing data in a MapReduce framework | |
CN110990638A (en) | Large-scale data query acceleration device and method based on FPGA-CPU heterogeneous environment | |
US20200059689A1 (en) | Query processing in data analysis | |
JP2017188137A (en) | Method, program and system for automatic discovery of relationship between fields in environment where different types of data sources coexist | |
CN110569289B (en) | Column data processing method, equipment and medium based on big data | |
Yun et al. | Fastraq: A fast approach to range-aggregate queries in big data environments | |
CN118210908B (en) | Retrieval enhancement method and device, electronic equipment and storage medium | |
JP6159908B6 (en) | Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment | |
US8073834B2 (en) | Efficient handling of multipart queries against relational data | |
Wan et al. | LKAQ: Large-scale knowledge graph approximate query algorithm | |
US11782991B2 (en) | Accelerated large-scale similarity calculation | |
JPWO2017170459A6 (en) | Method, program, and system for automatic discovery of relationships between fields in a heterogeneous data source mixed environment | |
WO2018053889A1 (en) | Distributed computing framework and distributed computing method | |
CN108256028B (en) | Multi-dimensional dynamic sampling method for approximate query in cloud computing environment | |
CN110704515B (en) | Two-stage online sampling method based on MapReduce model | |
Zhao et al. | Parallel K-Medoids Improved Algorithm Based on MapReduce | |
Li | Collaborative filtering recommendation algorithm based on cluster | |
CN112650770B (en) | MySQL parameter recommendation method based on query work load analysis | |
Fu | An improved parallel collaborative filtering algorithm based on Hadoop | |
Li et al. | Heterogeneous embeddings for relational data integration tasks | |
Ni et al. | Approximate Query Processing with Error Guarantees | |
Zhang et al. | DATA MINING TECHNOLOGY BASED ON ASSOCIATION RULES ALGORITHM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |