CN106657099B - Spark data analysis service publishing system - Google Patents
Spark data analysis service publishing system Download PDFInfo
- Publication number
- CN106657099B CN106657099B CN201611248761.4A CN201611248761A CN106657099B CN 106657099 B CN106657099 B CN 106657099B CN 201611248761 A CN201611248761 A CN 201611248761A CN 106657099 B CN106657099 B CN 106657099B
- Authority
- CN
- China
- Prior art keywords
- service
- data analysis
- spark
- module
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
Abstract
The invention provides a data analysis service distribution system, which comprises a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard. By formulating a uniform service standard, a third-party client or a business system carries out big data analysis by calling data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and a Spark distributed computing system is adopted in the service operation environment, so that the speed and the efficiency of data analysis are greatly improved.
Description
Technical Field
The invention relates to the technical field of data analysis and mining, in particular to a Spark data analysis service release system.
Background
With the advent of the information age, the accumulation of data has grown geometrically. Various data analysis algorithms have emerged in order to mine valid information from existing massive data. In the actual operation process of data analysis, the most suitable algorithm cannot be determined immediately, and different calculation results need to be obtained by continuously trying different algorithms or algorithm combinations. And comparing different calculation results to obtain the optimal algorithm scheme and the optimal analysis result so as to obtain the most effective data feedback information.
Data analysts need to understand both the principles of algorithms and the specific code implementations of the algorithms. The requirement on technical personnel is high, and when different algorithms are combined to analyze data, the codes need to be continuously adjusted, so that the method is complex. The current internet has entered the information data era, and with the rapid growth of data, companies and scientific research institutions increasingly attach importance to mining effective information from existing data, and various data mining system architectures appear.
Data mining is rarely involved in traditional business systems, and traditional software companies need to spend a great deal of time and expense building an analytical mining platform in order to adapt to the development of large data.
Disclosure of Invention
The invention provides a data analysis service distribution system which overcomes the problems or at least partially solves the problems, unifies the service forms, reasonably utilizes cluster resources, and constructs cheap large data analysis services through Spark distributed architecture design.
According to one aspect of the invention, the system comprises a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard.
Preferably, the user adopts a B/S framework to view service information through a browser, adjust the service state, and set the service execution form and the service scale.
Preferably, the service standard formulation module specifies a unified service standard for different algorithms, specifically including a service parameter, a service result combination mode, and a service invocation mode.
Preferably, the service scheduling module is further configured to make the data analysis function as an HTTP interface of an open API.
Preferably, the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing running environment for the Spark data analysis unit.
Preferably, the distributed clusters include Spark clusters and Hadoop clusters.
Preferably, the Spark data analysis unit includes a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service release standard;
the flow issuing subunit is used for combining all the nodes of the flow chart to generate a task, making the task into a service and analyzing and processing the service request.
Preferably, the service scheduling module is configured to send the service request to an idle service according to a load balancing-random algorithm through cluster data provided by the distributed cluster.
Preferably, the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service state data, and service calculation process data.
According to the data analysis service distribution system provided by the invention, by formulating a uniform service standard, a third-party client or a business system carries out big data analysis by calling a data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and a Spark distributed computing system is adopted in the service operation environment, so that the speed and the efficiency of data analysis are greatly improved.
Drawings
Fig. 1 is a block diagram of a data analysis service distribution system according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Fig. 1 shows a data analysis service distribution system, which includes a Spark data analysis module, a service scheduling module, and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard, specifically comprises a service production standard, a parameter transmission standard and a result return standard, and can ensure the uniformity of the service and facilitate the use of a user through the standard; the service scheduling module is used for receiving the service request, sending the service request to the idle service, allocating a data analysis task, balancing cluster resources, executing a task cycle, and starting and closing the service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard. The service's operating environment employs a Spark distributed computing system. Spark distributed computing systems are one of the mainstream cloud computing frameworks. And a cloud computing mode is adopted, so that the speed and the efficiency of data analysis are greatly improved. The operating environment of the service adopts a Spark distributed computing system, so that different sequence combinations of algorithms can be realized to analyze and process data, and the analysis process is diversified.
In this embodiment, a user views service information, such as a service parameter, a service return value combination form, a service state, a flowchart, a service call log, and the like, through a browser by using a B/S framework; adjusting the service state and setting the service execution form, such as timing execution, periodic execution and the like; size of service, such as number of concurrencies, etc.
Preferably, the service standard formulation module assigns different algorithms to a uniform service standard, specifically comprising a service parameter, a service result combination mode and a service calling mode; by the standard, the uniformity of the service can be ensured, the use difficulty of the user can be reduced, the use of the user is facilitated, and the availability of the service and the reusability of the service system code are improved.
Preferably, the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing running environment for the Spark data analysis unit.
Preferably, the distributed clusters include Spark clusters and Hadoop clusters.
Preferably, the Spark data analysis unit further includes a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service standard; the flow chart comprises algorithm instance nodes and the relationship of the algorithm instance nodes, and the relationship of the algorithm instance nodes is determined through connecting lines among the algorithms.
The flow issuing subunit is used for combining all the nodes of the flow chart to generate a task and making the task into a service.
When a service request exists, the service scheduling module sends the service request to an idle service through cluster resource data provided by a distributed data set according to a load balancing-random algorithm; and the service scheduling module records the current state of each service and randomly calls the background idle service by adopting a random algorithm. Because each service is called roughly the same number of times as requests increase, probabilistically speaking, under the same execution environment.
Preferably, the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service state data, and service calculation process data.
The invention provides a Spark data analysis service release system, which increases the wide application of services and reduces the generation of errors and the complexity of service use by specifying a uniform service release standard, constructs a data analysis platform by a Spark data analysis architecture to realize analysis calculation and analysis processes, and greatly improves the speed and efficiency of data analysis by adopting a cloud calculation mode; the service system and the big data analysis are effectively isolated, the development cost of the service system is reduced, the data analysis function is made into an HTTP interface of an open API, and the third party can call the data conveniently.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (6)
1. A data analysis service distribution system is characterized by comprising a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard;
the system also comprises a B/S framework, wherein a user checks service information and adjusts the service state through a browser by adopting the B/S framework, and sets a service execution form and a service scale;
the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing operation environment for the Spark data analysis unit;
the Spark data analysis unit also comprises a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service release standard;
the flow issuing subunit is used for combining all the nodes of the flow chart to generate a task, making the task into a service and analyzing and processing the service request.
2. The data analysis service distribution system of claim 1, wherein the service standard formulation module specifies a unified service standard for different algorithms, specifically comprising a service parameter, a service result combination mode, and a service invocation mode.
3. The data analytics service distribution system of claim 1, wherein the service scheduling module is further configured to make data analytics functions as an HTTP interface to an open API.
4. The data analysis service distribution system of claim 1, wherein the distributed clusters comprise Spark clusters and Hadoop clusters.
5. The data analysis service distribution system of claim 1, wherein the service scheduling module is configured to send the service request to the idle service according to a load balancing-random algorithm through the cluster profile data provided by the distributed cluster.
6. The data analysis service distribution system of claim 1, wherein the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service status data, and service calculation process data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611248761.4A CN106657099B (en) | 2016-12-29 | 2016-12-29 | Spark data analysis service publishing system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611248761.4A CN106657099B (en) | 2016-12-29 | 2016-12-29 | Spark data analysis service publishing system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106657099A CN106657099A (en) | 2017-05-10 |
CN106657099B true CN106657099B (en) | 2020-06-16 |
Family
ID=58836389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611248761.4A Active CN106657099B (en) | 2016-12-29 | 2016-12-29 | Spark data analysis service publishing system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106657099B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108427992B (en) * | 2018-03-16 | 2020-09-01 | 济南飞象信息科技有限公司 | Machine learning training system and method based on edge cloud computing |
CN109729086B (en) * | 2018-12-28 | 2021-02-23 | 奇安信科技集团股份有限公司 | Policy management method, system, device, and medium |
CN110288104A (en) * | 2019-07-04 | 2019-09-27 | 北京百佑科技有限公司 | O&M flow system, O&M workflow management method and device |
CN111031123B (en) * | 2019-12-10 | 2022-06-03 | 中盈优创资讯科技有限公司 | Spark task submission method, system, client and server |
CN112115202A (en) * | 2020-09-18 | 2020-12-22 | 北京人大金仓信息技术股份有限公司 | Task distribution method and device in cluster environment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120173476A1 (en) * | 2011-01-04 | 2012-07-05 | Nasir Rizvi | System and Method for Rule-Based Asymmetric Data Reporting |
CN105608160A (en) * | 2015-12-21 | 2016-05-25 | 浪潮软件股份有限公司 | Distributed big data analysis method |
CN105930460A (en) * | 2016-04-21 | 2016-09-07 | 重庆邮电大学 | Multi-algorithm-integrated big data analysis middleware platform |
-
2016
- 2016-12-29 CN CN201611248761.4A patent/CN106657099B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106657099A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106657099B (en) | Spark data analysis service publishing system | |
US20200285514A1 (en) | Automated reconfiguration of real time data stream processing | |
de Assuncao et al. | Distributed data stream processing and edge computing: A survey on resource elasticity and future directions | |
Yang et al. | On 3G mobile e-commerce platform based on cloud computing | |
CN109074377B (en) | Managed function execution for real-time processing of data streams | |
Zhang et al. | Toward transcoding as a service: energy-efficient offloading policy for green mobile cloud | |
CN105045607A (en) | Method for achieving uniform interface of multiple big data calculation frames | |
CN103777950B (en) | Gridding method for resolving AOS (Advanced Orbiting System) telemetering data | |
Ning et al. | Mobile storm: Distributed real-time stream processing for mobile clouds | |
CN111414381B (en) | Data processing method and device, electronic equipment and storage medium | |
CN103023980B (en) | A kind of method and system of cloud platform processes user service request | |
CN104735095A (en) | Method and device for job scheduling of cloud computing platform | |
CN110781180B (en) | Data screening method and data screening device | |
CN103235835A (en) | Inquiry implementation method for database cluster and device | |
Jiang et al. | Towards max-min fair resource allocation for stream big data analytics in shared clouds | |
US10489179B1 (en) | Virtual machine instance data aggregation based on work definition metadata | |
CN111200606A (en) | Deep learning model task processing method, system, server and storage medium | |
CN112104679B (en) | Method, apparatus, device and medium for processing hypertext transfer protocol request | |
CN111818131A (en) | Message pushing and scheduling system and method | |
CN108259605B (en) | Data calling system and method based on multiple data centers | |
CN108540439B (en) | Data analysis method, system, device and storage medium | |
CN111190731A (en) | Cluster task scheduling system based on weight | |
Huang et al. | Communication, computing, and learning on the edge | |
US9264506B2 (en) | Pull data transfer method in request-response models | |
EP2622499B1 (en) | Techniques to support large numbers of subscribers to a real-time event |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |