CN106657099B - Spark data analysis service publishing system - Google Patents

Spark data analysis service publishing system Download PDF

Info

Publication number
CN106657099B
CN106657099B CN201611248761.4A CN201611248761A CN106657099B CN 106657099 B CN106657099 B CN 106657099B CN 201611248761 A CN201611248761 A CN 201611248761A CN 106657099 B CN106657099 B CN 106657099B
Authority
CN
China
Prior art keywords
service
data analysis
spark
module
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611248761.4A
Other languages
Chinese (zh)
Other versions
CN106657099A (en
Inventor
王莹
张立军
孙丙聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tianyuan Innovation Technology Co ltd
Original Assignee
Beijing Tianyuan Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tianyuan Innovation Technology Co ltd filed Critical Beijing Tianyuan Innovation Technology Co ltd
Priority to CN201611248761.4A priority Critical patent/CN106657099B/en
Publication of CN106657099A publication Critical patent/CN106657099A/en
Application granted granted Critical
Publication of CN106657099B publication Critical patent/CN106657099B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Abstract

The invention provides a data analysis service distribution system, which comprises a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard. By formulating a uniform service standard, a third-party client or a business system carries out big data analysis by calling data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and a Spark distributed computing system is adopted in the service operation environment, so that the speed and the efficiency of data analysis are greatly improved.

Description

Spark data analysis service publishing system
Technical Field
The invention relates to the technical field of data analysis and mining, in particular to a Spark data analysis service release system.
Background
With the advent of the information age, the accumulation of data has grown geometrically. Various data analysis algorithms have emerged in order to mine valid information from existing massive data. In the actual operation process of data analysis, the most suitable algorithm cannot be determined immediately, and different calculation results need to be obtained by continuously trying different algorithms or algorithm combinations. And comparing different calculation results to obtain the optimal algorithm scheme and the optimal analysis result so as to obtain the most effective data feedback information.
Data analysts need to understand both the principles of algorithms and the specific code implementations of the algorithms. The requirement on technical personnel is high, and when different algorithms are combined to analyze data, the codes need to be continuously adjusted, so that the method is complex. The current internet has entered the information data era, and with the rapid growth of data, companies and scientific research institutions increasingly attach importance to mining effective information from existing data, and various data mining system architectures appear.
Data mining is rarely involved in traditional business systems, and traditional software companies need to spend a great deal of time and expense building an analytical mining platform in order to adapt to the development of large data.
Disclosure of Invention
The invention provides a data analysis service distribution system which overcomes the problems or at least partially solves the problems, unifies the service forms, reasonably utilizes cluster resources, and constructs cheap large data analysis services through Spark distributed architecture design.
According to one aspect of the invention, the system comprises a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard.
Preferably, the user adopts a B/S framework to view service information through a browser, adjust the service state, and set the service execution form and the service scale.
Preferably, the service standard formulation module specifies a unified service standard for different algorithms, specifically including a service parameter, a service result combination mode, and a service invocation mode.
Preferably, the service scheduling module is further configured to make the data analysis function as an HTTP interface of an open API.
Preferably, the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing running environment for the Spark data analysis unit.
Preferably, the distributed clusters include Spark clusters and Hadoop clusters.
Preferably, the Spark data analysis unit includes a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service release standard;
the flow issuing subunit is used for combining all the nodes of the flow chart to generate a task, making the task into a service and analyzing and processing the service request.
Preferably, the service scheduling module is configured to send the service request to an idle service according to a load balancing-random algorithm through cluster data provided by the distributed cluster.
Preferably, the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service state data, and service calculation process data.
According to the data analysis service distribution system provided by the invention, by formulating a uniform service standard, a third-party client or a business system carries out big data analysis by calling a data analysis service, so that the business system and the big data analysis can be effectively isolated, and the development cost of the business system is reduced; and a Spark distributed computing system is adopted in the service operation environment, so that the speed and the efficiency of data analysis are greatly improved.
Drawings
Fig. 1 is a block diagram of a data analysis service distribution system according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
Fig. 1 shows a data analysis service distribution system, which includes a Spark data analysis module, a service scheduling module, and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard, specifically comprises a service production standard, a parameter transmission standard and a result return standard, and can ensure the uniformity of the service and facilitate the use of a user through the standard; the service scheduling module is used for receiving the service request, sending the service request to the idle service, allocating a data analysis task, balancing cluster resources, executing a task cycle, and starting and closing the service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard. The service's operating environment employs a Spark distributed computing system. Spark distributed computing systems are one of the mainstream cloud computing frameworks. And a cloud computing mode is adopted, so that the speed and the efficiency of data analysis are greatly improved. The operating environment of the service adopts a Spark distributed computing system, so that different sequence combinations of algorithms can be realized to analyze and process data, and the analysis process is diversified.
In this embodiment, a user views service information, such as a service parameter, a service return value combination form, a service state, a flowchart, a service call log, and the like, through a browser by using a B/S framework; adjusting the service state and setting the service execution form, such as timing execution, periodic execution and the like; size of service, such as number of concurrencies, etc.
Preferably, the service standard formulation module assigns different algorithms to a uniform service standard, specifically comprising a service parameter, a service result combination mode and a service calling mode; by the standard, the uniformity of the service can be ensured, the use difficulty of the user can be reduced, the use of the user is facilitated, and the availability of the service and the reusability of the service system code are improved.
Preferably, the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing running environment for the Spark data analysis unit.
Preferably, the distributed clusters include Spark clusters and Hadoop clusters.
Preferably, the Spark data analysis unit further includes a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service standard; the flow chart comprises algorithm instance nodes and the relationship of the algorithm instance nodes, and the relationship of the algorithm instance nodes is determined through connecting lines among the algorithms.
The flow issuing subunit is used for combining all the nodes of the flow chart to generate a task and making the task into a service.
When a service request exists, the service scheduling module sends the service request to an idle service through cluster resource data provided by a distributed data set according to a load balancing-random algorithm; and the service scheduling module records the current state of each service and randomly calls the background idle service by adopting a random algorithm. Because each service is called roughly the same number of times as requests increase, probabilistically speaking, under the same execution environment.
Preferably, the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service state data, and service calculation process data.
The invention provides a Spark data analysis service release system, which increases the wide application of services and reduces the generation of errors and the complexity of service use by specifying a uniform service release standard, constructs a data analysis platform by a Spark data analysis architecture to realize analysis calculation and analysis processes, and greatly improves the speed and efficiency of data analysis by adopting a cloud calculation mode; the service system and the big data analysis are effectively isolated, the development cost of the service system is reduced, the data analysis function is made into an HTTP interface of an open API, and the third party can call the data conveniently.
Finally, the method of the present application is only a preferred embodiment and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. A data analysis service distribution system is characterized by comprising a Spark data analysis module, a service scheduling module and a service standard formulation module; the service standard formulating module is used for formulating a uniform service release standard; the service scheduling module is used for receiving a service request and sending the service request to an idle service; the Spark data analysis module is used for constructing a service container and analyzing and processing the service request according to the service release standard;
the system also comprises a B/S framework, wherein a user checks service information and adjusts the service state through a browser by adopting the B/S framework, and sets a service execution form and a service scale;
the Spark data analysis module comprises a Spark data analysis unit and a distributed cluster;
the Spark data analysis unit is used for analyzing and calculating the distributed service request through a Spark distributed computing system;
the distributed cluster is used for providing a distributed computing operation environment for the Spark data analysis unit;
the Spark data analysis unit also comprises a service subunit and a process issuing subunit;
the business subunit is used for randomly combining and drawing an algorithm for realizing the service request into a flow chart according to the service release standard;
the flow issuing subunit is used for combining all the nodes of the flow chart to generate a task, making the task into a service and analyzing and processing the service request.
2. The data analysis service distribution system of claim 1, wherein the service standard formulation module specifies a unified service standard for different algorithms, specifically comprising a service parameter, a service result combination mode, and a service invocation mode.
3. The data analytics service distribution system of claim 1, wherein the service scheduling module is further configured to make data analytics functions as an HTTP interface to an open API.
4. The data analysis service distribution system of claim 1, wherein the distributed clusters comprise Spark clusters and Hadoop clusters.
5. The data analysis service distribution system of claim 1, wherein the service scheduling module is configured to send the service request to the idle service according to a load balancing-random algorithm through the cluster profile data provided by the distributed cluster.
6. The data analysis service distribution system of claim 1, wherein the service scheduling module communicates with the service through a socket, and the communication content includes service request data, service result data, service status data, and service calculation process data.
CN201611248761.4A 2016-12-29 2016-12-29 Spark data analysis service publishing system Active CN106657099B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611248761.4A CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611248761.4A CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Publications (2)

Publication Number Publication Date
CN106657099A CN106657099A (en) 2017-05-10
CN106657099B true CN106657099B (en) 2020-06-16

Family

ID=58836389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611248761.4A Active CN106657099B (en) 2016-12-29 2016-12-29 Spark data analysis service publishing system

Country Status (1)

Country Link
CN (1) CN106657099B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427992B (en) * 2018-03-16 2020-09-01 济南飞象信息科技有限公司 Machine learning training system and method based on edge cloud computing
CN109729086B (en) * 2018-12-28 2021-02-23 奇安信科技集团股份有限公司 Policy management method, system, device, and medium
CN110288104A (en) * 2019-07-04 2019-09-27 北京百佑科技有限公司 O&M flow system, O&M workflow management method and device
CN111031123B (en) * 2019-12-10 2022-06-03 中盈优创资讯科技有限公司 Spark task submission method, system, client and server
CN112115202A (en) * 2020-09-18 2020-12-22 北京人大金仓信息技术股份有限公司 Task distribution method and device in cluster environment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173476A1 (en) * 2011-01-04 2012-07-05 Nasir Rizvi System and Method for Rule-Based Asymmetric Data Reporting
CN105608160A (en) * 2015-12-21 2016-05-25 浪潮软件股份有限公司 Distributed big data analysis method
CN105930460A (en) * 2016-04-21 2016-09-07 重庆邮电大学 Multi-algorithm-integrated big data analysis middleware platform

Also Published As

Publication number Publication date
CN106657099A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106657099B (en) Spark data analysis service publishing system
US20200285514A1 (en) Automated reconfiguration of real time data stream processing
de Assuncao et al. Distributed data stream processing and edge computing: A survey on resource elasticity and future directions
Yang et al. On 3G mobile e-commerce platform based on cloud computing
CN109074377B (en) Managed function execution for real-time processing of data streams
Zhang et al. Toward transcoding as a service: energy-efficient offloading policy for green mobile cloud
CN105045607A (en) Method for achieving uniform interface of multiple big data calculation frames
CN103777950B (en) Gridding method for resolving AOS (Advanced Orbiting System) telemetering data
Ning et al. Mobile storm: Distributed real-time stream processing for mobile clouds
CN111414381B (en) Data processing method and device, electronic equipment and storage medium
CN103023980B (en) A kind of method and system of cloud platform processes user service request
CN104735095A (en) Method and device for job scheduling of cloud computing platform
CN110781180B (en) Data screening method and data screening device
CN103235835A (en) Inquiry implementation method for database cluster and device
Jiang et al. Towards max-min fair resource allocation for stream big data analytics in shared clouds
US10489179B1 (en) Virtual machine instance data aggregation based on work definition metadata
CN111200606A (en) Deep learning model task processing method, system, server and storage medium
CN112104679B (en) Method, apparatus, device and medium for processing hypertext transfer protocol request
CN111818131A (en) Message pushing and scheduling system and method
CN108259605B (en) Data calling system and method based on multiple data centers
CN108540439B (en) Data analysis method, system, device and storage medium
CN111190731A (en) Cluster task scheduling system based on weight
Huang et al. Communication, computing, and learning on the edge
US9264506B2 (en) Pull data transfer method in request-response models
EP2622499B1 (en) Techniques to support large numbers of subscribers to a real-time event

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant