CN110245022B - Parallel Skyline processing method and system under mass data - Google Patents

Parallel Skyline processing method and system under mass data Download PDF

Info

Publication number
CN110245022B
CN110245022B CN201910543347.3A CN201910543347A CN110245022B CN 110245022 B CN110245022 B CN 110245022B CN 201910543347 A CN201910543347 A CN 201910543347A CN 110245022 B CN110245022 B CN 110245022B
Authority
CN
China
Prior art keywords
skyline
data
area
calculation
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910543347.3A
Other languages
Chinese (zh)
Other versions
CN110245022A (en
Inventor
鲁芹
梁心美
李名玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201910543347.3A priority Critical patent/CN110245022B/en
Publication of CN110245022A publication Critical patent/CN110245022A/en
Application granted granted Critical
Publication of CN110245022B publication Critical patent/CN110245022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3818Decoding for concurrent execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The disclosure provides a parallel Skyline processing method and system under mass data, comprising: distributing the web data to worker nodes: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation; carrying out Skyline calculation on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node; and (3) calculating Skyline of a master node: and summarizing the candidate Skyline services of all worker nodes, and dividing all data into 4 regions by an improved Skyline algorithm to obtain the global Skyline service. The traditional Skyline algorithm is improved, the service set is subjected to region division, a large number of data points without domination relation are reduced, the occupation of a memory is saved, and the improved Skyline algorithm is parallelized through a Spark platform.

Description

Parallel Skyline processing method and system under mass data
Technical Field
The disclosure relates to the technical field of computers, in particular to a parallel Skyline processing method and system under mass data.
Background
With the rapid development of service computing, more and more services with the same functional attributes and different non-functional attributes are provided, and the traditional web service selection method faces a great challenge when processing mass data, so how to quickly and effectively find out web services capable of meeting different user requirements from the mass web service data becomes a problem to be solved urgently.
In addition, with the rapid development of internet technology, a large amount of internet data is generated, and how to mine valuable data from the large amount of internet data becomes a problem which needs to be solved urgently. Hadoop and Spark big data platforms are born under the background, Hadoop is mainly based on disk operation, and intermediate results are stored in a disk. Spark is mainly based on memory calculation, a data model of RDD is introduced, Hadoop is far faster than the iteration speed of the algorithm, and Spark quickly becomes the research focus of scholars after birth.
Disclosure of Invention
The purpose of the embodiments of this specification is to provide a parallel Skyline processing method under mass data, where the obtained global Skyline service is not dominated by any other data, and the distribution of the entire data set can be well reflected, so that web services meeting different requirements can be selected from these global Skyline services.
The embodiment of the specification provides a parallel Skyline processing method under mass data, which is realized by the following technical scheme:
the method comprises the following steps:
distributing the web data to worker nodes: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation;
carrying out Skyline calculation on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node;
and (3) calculating Skyline of a master node: summarizing candidate Skyline services of all worker nodes, dividing all data into 4 areas through an improved Skyline algorithm, merging data points of the area 1 and the area 3, writing a Bitmap algorithm logic through a Spark operator, and calculating final Skyline points of the area 1 and the area 3, so that the global Skyline service is obtained.
In a further technical scheme, a QoS vector set is obtained through analysis, then keys corresponding to web services are generated according to a certain distribution strategy, the whole web service data are divided into different groups, and the web data of the groups with the same key value are distributed to the same node to perform Skyline point calculation.
The further technical scheme is that the local Skyline computing part processes the distributed web service data, a point with the minimum QoS attribute in the local Skyline service data is found out through a Spark operator, the point is the Skyline point, and then only once area division is carried out at the minimum point.
The further technical scheme includes that the data set is divided into 4 areas, data of an area 1 and an area 3 dominate an area 2 and an area 4, a final calculation area is combined, data points of the area 1 and the area 3 are combined, a Bitmap algorithm is written through a Spark operator, and final Skyline point points of the area 1 and the area 3 are calculated.
According to the further technical scheme, the global Skyline service is output to the user for selection.
The implementation mode of the description provides a parallel Skyline processing system under mass data, and the parallel Skyline processing system is realized by the following technical scheme:
the method comprises the following steps:
the web data to worker node distribution module comprises: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation;
a Skyline calculating module is carried out on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node;
the master node Skyline calculation module: summarizing candidate Skyline services of all worker nodes, dividing all data into 4 areas through an improved Skyline algorithm, merging data points of the area 1 and the area 3, writing a Bitmap algorithm logic through a Spark operator, and calculating final Skyline points of the area 1 and the area 3, so that the global Skyline service is obtained.
Compared with the prior art, the beneficial effect of this disclosure is:
the method improves the traditional Skyline algorithm, divides the service set into regions, greatly reduces data points without domination relation, saves the occupation of memory, realizes the parallelization of the improved Skyline algorithm through a Spark platform, and proves that the parallelized Skyline algorithm can better process mass web service data through experiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and are not to limit the disclosure.
FIG. 1 is a schematic diagram illustrating data region partitioning according to an embodiment of the present disclosure;
FIG. 2 is a diagram of a Hadoop web management page, according to an example embodiment of the present disclosure;
FIG. 3 is a diagram of a web management page for Spark, according to an example of the present disclosure;
FIG. 4 is a process diagram of a master node of an embodiment of the present disclosure;
fig. 5 is a schematic diagram of an operation condition of a Spark UI monitoring cluster according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of Spark Jobs in accordance with an exemplary embodiment of the disclosure;
FIG. 7 is a diagram showing experimental results of the third and ninth columns of QWS data selected according to an example of the present disclosure;
FIG. 8 is a diagram showing experimental results of selecting the second and third columns of QWS data in an example of the present disclosure;
FIG. 9 is a graph of the parallelization operation result of the 500 ten thousand data amount algorithm according to the embodiment of the disclosure;
fig. 10 is a graph of a parallelization operation result of the 1000 ten thousand data volume algorithm according to the embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example of implementation 1
The embodiment discloses a parallel Skyline processing method under mass data, and aims at solving the problems that when the data volume of the processed web service reaches hundreds of thousands, the calculation time of an improved Skyline algorithm becomes very slow, when the data volume reaches more than ten thousands, a computer is blocked or even can not calculate a final result, calculation needs to be carried out by means of a big data calculation framework, and the Skyline algorithm is subjected to parallel design.
The improved Skyline algorithm in the embodiment of the disclosure means that only once region division is carried out at the minimum point, and a dominated region is found.
The whole parallelization calculation process of the embodiment of the present disclosure is divided into three stages: distributing the web data to the sub-nodes, the sub-node Skyline calculation and the global Skyline calculation.
In a specific implementation example, distributing web data to worker nodes: the stage is the first stage of parallel computing, namely web data are uploaded to an HDFS (Hadoop distributed file system), the data are segmented through the HDFS, and segmented data blocks are distributed to worker nodes to perform parallel computing. The QoS vector set is obtained by analyzing the Web service data, then the keys corresponding to the Web services are generated according to a certain distribution strategy, the whole Web service data can be divided into different groups, and the Web data of the groups with the same key value are distributed to the same node for Skyline point calculation.
In a specific implementation example, the worker node performs Skyline calculation: the stage is a second stage of parallelization calculation, the local Skyline calculation part processes the distributed web service data, a point with the minimum QoS attribute in the local Skyline service data is found out through a Spark operator, the point is certainly Skyline point, then only once area division is carried out at the minimum point, a data set is divided into 4 areas as shown in figure 1, the point with the minimum QoS attribute in the local Skyline service data is found out according to the local Skyline calculation, area division is carried out at the minimum point, and the relationship with the dominated relationship is supported.
The data of the area 1 and the area 3 dominate the area 2 and the area 4, so that the dominance check between the areas without dominance relation is effectively filtered and reduced, the final calculation area is merged, the data points of the area 1 and the area 3 are merged, the Bitmap algorithm is logically written through a Spark operator, and the final Skyline point of the area 1 and the area 3 is calculated.
Local candidate Skyline services can be obtained through a local Skyline calculation stage, then each local Skyline candidate service is transmitted to a master main node through a network, and global Skyline services are finally obtained through the Skyline calculation of the master node.
In a specific implementation example, the master node Skyline calculates: the stage is a third stage of parallelization calculation, the stage summarizes candidate Skyline services of all other worker nodes, all data are divided into 4 areas as shown in FIG. 1 through an improved Skyline algorithm, data points of an area 1 and an area 3 are merged, a Bitmap algorithm is written in logically through a Spark operator, and the final Skyline point of the area 1 and the area 3 is calculated. This results in a global Skyline service. The global Skyline service may be output to the user for selection. The global Skyline service is not dominated by any other data and can well reflect the distribution situation of the whole data set, so that the web service meeting different requirements can be selected from the global Skyline services.
According to the method, when the data volume is small, the calculation time of the algorithm is longer than that under a single machine because the startup comparison of the Spark cluster consumes time, and when the data volume is very large, the result can still be calculated by the parallelization Skyline algorithm. And respectively calculating the time used by the parallelization algorithm under different data set scales.
Table 1: algorithmic temporal comparison
Figure BDA0002103252350000061
Figure BDA0002103252350000071
The experiment is divided into two parts in total, the data set of the experiment in the first part is QWS Dataset, and the QWS Dataset comprises 2507 real and effective Web services with a plurality of functional domains. The QWS Dataset has 11 QoS attributes in total, and the first 8 QoS attributes in the QWS Dataset are selected in the experiment. Experiments mainly verify the experimental results under a single machine and the experimental results under a distributed mode.
The data set of the second part of experiment is based on a large number of simulation data sets, when the data volume reaches more than ten million, a single machine is operated slowly or cannot be operated, and at this time, the operation effect of the algorithm under the distributed mode is verified.
Experimental analysis: firstly, a Hadoop cluster consisting of three nodes is built, then a Spark cluster is built on the Hadoop cluster, Hadoop is started, and an interface shown in figure 2 can be seen on a web management page. Spark is initiated and can be seen on the web administration page as shown in figure 3. At the master node, we will see the following process, as shown in FIG. 4.
Meanwhile, fig. 5 shows that the improved Skyline algorithm is submitted to a Spark cluster for execution, and the running condition of the cluster is monitored through a Spark cluster UI in the execution process.
Job is shown in FIG. 6: first, a first part of the experiment was performed, and two QoS attributes, the third column (response time) and the ninth column (delay) of the QWS dataset, were selected for the experiment. The experimental result is shown in fig. 7, and the obtained Skyline point of the improved Skyline algorithm is the same in both single machine and distributed mode, and is marked by a circle in the figure, that is: (0.4000,6.0000),(0.7000,2.0000),(0.3000, 64.0000),(1.0000).
Two QoS attributes, the second column (availability) and the third column (reliability) of the QWS dataset, were chosen for the experiments below. Since the two columns of data are benefit type data, the larger the data value, the better. The experimental result is shown in fig. 8, and the obtained Skyline point of the improved Skyline algorithm is the same in both single machine and distributed mode, and is marked by a circle in the figure, that is: (67.0000,0.5000),(48.0000,0.8000),(13.0000,0.9000),(83.0000,0.3000),(12.0000,1.7000).
Experiment one verifies the correctness of the parallelized Skyline algorithm.
Next, a second part of experiments is performed, wherein experimental data are based on a large number of simulation data sets, the data volumes are 500 ten thousand and 1000 ten thousand respectively, and thus, a computer is locked in a single machine state, and a final result cannot be calculated. Under distributed parallelization, the computer can easily calculate the final result, fig. 9 shows the result when the data amount is 500 ten thousand, and fig. 10 shows the result when the data amount is 1000 ten thousand.
The method has the advantages that the requirement on computer hardware for processing massive data is high, system resources are occupied, algorithm efficiency is reduced, even a computer is jammed, and a final result cannot be calculated.
Example II
The implementation mode of the description provides a parallel Skyline processing system under mass data, and the parallel Skyline processing system is realized by the following technical scheme:
the method comprises the following steps:
the web data to worker node distribution module comprises: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation;
a Skyline calculating module is carried out on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node;
the master node Skyline calculation module: summarizing candidate Skyline services of all worker nodes, dividing all data into 4 areas through an improved Skyline algorithm, merging data points of the area 1 and the area 3, writing a Bitmap algorithm logic through a Spark operator, and calculating the final Skyline point of the area 1 and the area 3, so that the global Skyline service is obtained
The implementation process of the specific module in this embodiment refers to the method for processing the mass data lower parallel Skyline in embodiment one, and a detailed description is not given here.
Example III
The embodiment in the specification provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the program to implement the steps of the method for parallel Skyline processing under mass data in the first embodiment.
Example four
The embodiments in this specification provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of implementing the method for parallel Skyline processing under mass data in the first example.
It is to be understood that throughout the description of the present specification, reference to the term "one embodiment", "another embodiment", "other embodiments", or "first through nth embodiments", etc., is intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or materials described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (6)

1. The parallel Skyline processing method under mass data is characterized by comprising the following steps:
distributing the web data to worker nodes: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation;
carrying out Skyline calculation on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node; the local Skyline computing part processes the distributed web service data, finds out a point with the minimum QoS attribute in the local Skyline service data through a Spark operator, wherein the point is Skyline point, and then only carries out once area division at the minimum point; dividing the data set into 4 areas, wherein the data of the area 1 and the area 3 dominate the area 2 and the area 4, merging the final calculation area, merging the data points of the area 1 and the area 3, writing the Bitmap algorithm logic through a Spark operator, and calculating the final Skyline point of the area 1 and the area 3;
and (3) calculating Skyline of a master node: summarizing candidate Skyline services of all worker nodes, dividing all data into 4 areas through an improved Skyline algorithm, merging data points of the area 1 and the area 3, writing a Bitmap algorithm logic through a Spark operator, and calculating final Skyline points of the area 1 and the area 3, so that the global Skyline service is obtained.
2. The mass data parallel Skyline processing method of claim 1, wherein QoS vector set is obtained through analysis, then keys corresponding to web services are generated according to a certain distribution strategy, the whole web service data is divided into different groups, and the web data of the groups with the same key value are distributed to the same node for Skyline point calculation.
3. The mass data parallel Skyline processing method according to claim 1, wherein the global Skyline service is output to the user for selection.
4. A parallel Skyline processing system under mass data is characterized by comprising:
the web data to worker node distribution module comprises: uploading web data to an HDFS (Hadoop file system), segmenting the data through the HDFS, and distributing segmented data blocks to worker nodes for parallel calculation;
a Skyline calculating module is carried out on the worker node: obtaining local candidate Skyline services through a local Skyline calculation stage, then transmitting each local Skyline candidate service to a master main node through a network, and finally obtaining global Skyline services through the Skyline calculation of the master node; the local Skyline computing part processes the distributed web service data, finds out a point with the minimum QoS attribute in the local Skyline service data through a Spark operator, wherein the point is Skyline point, and then only carries out once area division at the minimum point; dividing the data set into 4 areas, wherein the data of the area 1 and the area 3 dominate the area 2 and the area 4, merging the final calculation area, merging the data points of the area 1 and the area 3, writing the Bitmap algorithm logic through a Spark operator, and calculating the final Skyline point of the area 1 and the area 3;
the master node Skyline calculation module: summarizing candidate Skyline services of all worker nodes, dividing all data into 4 areas through an improved Skyline algorithm, merging data points of the area 1 and the area 3, writing a Bitmap algorithm logic through a Spark operator, and calculating final Skyline points of the area 1 and the area 3, so that the global Skyline service is obtained.
5. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for parallel Skyline processing of mass data according to any one of claims 1-3 when executing the program.
6. A computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the steps of the method for parallel Skyline processing under mass data according to any of claims 1-3.
CN201910543347.3A 2019-06-21 2019-06-21 Parallel Skyline processing method and system under mass data Active CN110245022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910543347.3A CN110245022B (en) 2019-06-21 2019-06-21 Parallel Skyline processing method and system under mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910543347.3A CN110245022B (en) 2019-06-21 2019-06-21 Parallel Skyline processing method and system under mass data

Publications (2)

Publication Number Publication Date
CN110245022A CN110245022A (en) 2019-09-17
CN110245022B true CN110245022B (en) 2021-11-12

Family

ID=67888762

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910543347.3A Active CN110245022B (en) 2019-06-21 2019-06-21 Parallel Skyline processing method and system under mass data

Country Status (1)

Country Link
CN (1) CN110245022B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688993B (en) * 2019-12-10 2020-04-17 中国人民解放军国防科技大学 Spark operation-based computing resource determination method and device
CN112787870B (en) * 2021-02-25 2021-11-02 苏州大学 Parallel flexible Skyline service discovery method with service quality perception

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254016A (en) * 2011-07-22 2011-11-23 中国人民解放军国防科学技术大学 Cloud-computing-environment-oriented fault-tolerant parallel Skyline inquiry method
CN106055674A (en) * 2016-06-03 2016-10-26 东南大学 top-k arrangement query method based on metric space in distributed environment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7707207B2 (en) * 2006-02-17 2010-04-27 Microsoft Corporation Robust cardinality and cost estimation for skyline operator
US20130132148A1 (en) * 2011-11-07 2013-05-23 Ecole Polytechnique Federale De Lausanne (Epfl) Method for multi-objective quality-driven service selection
CN104809210B (en) * 2015-04-28 2017-12-26 东南大学 One kind is based on magnanimity data weighting top k querying methods under distributed computing framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254016A (en) * 2011-07-22 2011-11-23 中国人民解放军国防科学技术大学 Cloud-computing-environment-oriented fault-tolerant parallel Skyline inquiry method
CN106055674A (en) * 2016-06-03 2016-10-26 东南大学 top-k arrangement query method based on metric space in distributed environment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
混合QoS感知的高效Web服务选择方法研究;张晓侠;《中国优秀硕士学位论文全文数据库 信息科技辑》;20160815;正文第1和33-55页 *

Also Published As

Publication number Publication date
CN110245022A (en) 2019-09-17

Similar Documents

Publication Publication Date Title
Mohebi et al. Iterative big data clustering algorithms: a review
US9477512B2 (en) Task-based modeling for parallel data integration
Huai et al. DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems
Serafini et al. Scalable graph neural network training: The case for sampling
US10268741B2 (en) Multi-nodal compression techniques for an in-memory database
Xu et al. Distributed formal concept analysis algorithms based on an iterative MapReduce framework
CN110245022B (en) Parallel Skyline processing method and system under mass data
CN109815021B (en) Resource key tree method and system for recursive tree modeling program
US11221890B2 (en) Systems and methods for dynamic partitioning in distributed environments
CN112948066A (en) Spark task scheduling method based on heterogeneous resources
Sun et al. Data intensive parallel feature selection method study
Shen et al. Performance prediction of parallel computing models to analyze cloud-based big data applications
Pimpley et al. Towards Optimal Resource Allocation for Big Data Analytics.
CN104281636A (en) Concurrent distributed processing method for mass report data
Tran et al. Big Data Processing with Apache Spark
He et al. Parallel feature selection using positive approximation based on mapreduce
US20160070752A1 (en) Method and apparatus to facilitate discrete-device accelertaion of queries on structured data
Jmaa et al. An efficient hardware implementation of timsort and mergesort algorithms using high level synthesis
Abdolazimi et al. Connected components of big graphs in fixed mapreduce rounds
CN110415162B (en) Adaptive graph partitioning method facing heterogeneous fusion processor in big data
CN109918410B (en) Spark platform based distributed big data function dependency discovery method
US11442792B2 (en) Systems and methods for dynamic partitioning in distributed environments
US9996642B2 (en) Method and apparatus for finite analysis pre-processing
Ovalle et al. Distributed Cache Strategies for Machine Learning Classification Tasks over Cluster Computing Resources
Greiner et al. The efficiency of mapreduce in parallel external memory

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant