CN106815019B - WEB interface integration method and device of Hadoop distributed algorithm - Google Patents

WEB interface integration method and device of Hadoop distributed algorithm Download PDF

Info

Publication number
CN106815019B
CN106815019B CN201611253462.XA CN201611253462A CN106815019B CN 106815019 B CN106815019 B CN 106815019B CN 201611253462 A CN201611253462 A CN 201611253462A CN 106815019 B CN106815019 B CN 106815019B
Authority
CN
China
Prior art keywords
component
data processing
data
web interface
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201611253462.XA
Other languages
Chinese (zh)
Other versions
CN106815019A (en
Inventor
金暐
高昕
邹潇湘
董琳
彭义刚
李佳
王锟
云晓春
舒敏
李海灵
王中华
侯美佳
曹强
王坤
徐娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN201611253462.XA priority Critical patent/CN106815019B/en
Publication of CN106815019A publication Critical patent/CN106815019A/en
Application granted granted Critical
Publication of CN106815019B publication Critical patent/CN106815019B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces

Abstract

The invention discloses a WEB interface integration method and a device of a Hadoop distributed algorithm, wherein the method comprises the following steps: when a certain data acquisition component is triggered, configuring the input of the data acquisition component, and selecting one or more data processing components as the output of the data acquisition component; configuring the input of the selected data processing assembly, and selecting one or more of other data processing assemblies as the output of the data processing assembly to form an assembly relation network; and after receiving the operation instruction, processing the input data of the triggered data acquisition component by utilizing each component of the component relation network to obtain a data processing result. By means of the technical scheme, the selected data acquisition components and the data processing components form a component relation network in the WEB interface, the input data of the triggered data acquisition components are processed by utilizing the components of the component relation network, programming is not needed, and the viewing effect can be immediately executed.

Description

WEB interface integration method and device of Hadoop distributed algorithm
Technical Field
The invention relates to the field of mobile communication, in particular to a WEB interface integration method and device of a Hadoop distributed algorithm.
Background
Hadoop contains rich components, and Mahout is a powerful data mining tool and a set of distributed machine learning algorithms, including: implementation, classification, clustering, etc. of distributed collaborative filtering known as Taste. The Mahout has the greatest advantage that the Mahout is realized based on hadoop, a plurality of algorithms which are operated on a single machine before are converted into a MapReduce mode, so that the data volume and the processing performance which can be processed by the algorithms are greatly improved, and a table 1 shows a machine learning algorithm realized in the Mahout.
Figure BDA0001198454020000011
Figure BDA0001198454020000021
TABLE 1
When an enterprise uses open-source Hadoop to build a large data platform and uses a Hadoop distributed algorithm, software personnel familiar with the Hadoop architecture are often required to be organized to develop various Mapreduce programs, on one hand, the development period of the programs is long, and on the other hand, the programs run in a Linux operating system and can be simply managed only in a Crontab scheduling mode. Therefore, a simple and easy-to-use way is needed to help enterprises shield the complexity of the Hadoop underlying technology, so that personnel of the enterprises only pay attention to data and business, the difficulty in program development and algorithm use is reduced as much as possible, and the construction target of a large data platform is quickly reached.
Disclosure of Invention
In order to help enterprises shield the complexity of a Hadoop underlying technology, enable personnel of the enterprises to pay attention to data and business only and reduce the difficulty in the aspects of program development and algorithm use as much as possible, the invention provides a WEB interface integration method and a WEB interface integration device for a Hadoop distributed algorithm.
The invention provides a WEB interface integration method of a Hadoop distributed algorithm, wherein a plurality of data acquisition components and a plurality of data processing components are loaded in a WEB interface, and the method comprises the following steps:
when a certain data acquisition component is triggered, configuring the input of the data acquisition component, and selecting one or more data processing components as the output of the data acquisition component;
configuring the input of the selected data processing assembly, and selecting one or more of other data processing assemblies as the output of the data processing assembly to form an assembly relation network;
and after receiving the operation instruction, processing the input data of the triggered data acquisition component by utilizing each component of the component relation network to obtain a data processing result.
The invention provides a WEB interface integrated device of a Hadoop distributed algorithm, wherein a plurality of data acquisition components and a plurality of data processing components are loaded in a WEB interface, and the WEB interface comprises a first configuration module, a second configuration module and a processing module;
the first configuration module is used for configuring the input of a data acquisition component after the data acquisition component is triggered, and selecting one or more data processing components as the output of the data acquisition component;
the second configuration module is used for configuring the input of the selected data processing assembly and selecting one or more of other data processing assemblies as the output of the data processing assembly to form an assembly relation network;
and the processing module is used for processing the input data of the triggered data acquisition component by utilizing each component of the component relation network after receiving the operation instruction to obtain a data processing result.
The invention has the following beneficial effects:
according to the WEB interface integration method of the Hadoop distributed algorithm, the selected data acquisition components and the data processing components form a component relation network in the WEB interface, the input data of the triggered data acquisition components are processed by utilizing the components of the component relation network, programming is not needed, the viewing effect can be immediately executed, and exploratory analysis is facilitated.
Drawings
FIG. 1 is a flow chart of a method for integrating a WEB interface of a Hadoop distributed algorithm according to an embodiment of the present invention;
FIG. 2 is a Web interface diagram of a data mining component;
FIG. 3 is a schematic view of a WEB interface for configuration data input of the FTP gather data component;
FIG. 4 is a schematic view of a WEB interface after connecting an FTP collected data component with a Kmeans algorithm component;
FIG. 5 is a schematic view of a WEB interface for configuration data input by the Kmeans algorithm component;
FIG. 6 is a schematic diagram of a WEB interface for the configuration data output of the Kmeans algorithm component;
fig. 7 is a schematic structural diagram of a WEB interface integrated device of a Hadoop distributed algorithm according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
In order to help enterprises shield the complexity of a Hadoop underlying technology, enable personnel of the enterprises to pay attention to data and business only and reduce the difficulty in the aspects of program development and algorithm use as much as possible, the invention provides a WEB interface integration method and a WEB interface integration device for a Hadoop distributed algorithm. The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
According to an embodiment of the method of the present invention, a method for integrating a WEB interface of a Hadoop distributed algorithm is provided, fig. 1 is a flowchart of the method for integrating a WEB interface of a Hadoop distributed algorithm according to the embodiment of the method of the present invention, wherein a plurality of data acquisition components and a plurality of data processing components are loaded in the WEB interface, as shown in fig. 1, the method for integrating a WEB interface of a Hadoop distributed algorithm according to the embodiment of the method of the present invention includes the following steps:
step 101: and when a certain data acquisition component is triggered, configuring the input of the data acquisition component, and selecting one or more data processing components as the output of the data acquisition component.
Specifically, the WEB interface integration method for the Hadoop distributed algorithm of the embodiment of the present invention further includes the following steps:
packaging programs related to a data source to obtain a plurality of data acquisition components; and packaging the program related to the data processing to obtain a plurality of data processing components.
Specifically, the data acquisition component includes an FTP acquisition component, a MySQL acquisition component, a URL acquisition component, an HDFS data acquisition component, a network disk data acquisition component, and the like.
Specifically, the input configuration of the data acquisition component includes a name, a period mode, and the like of the component, and the output configuration of the data acquisition component includes a data output format, and the like.
Step 102: and configuring the input of the selected data processing assembly, and selecting one or more of other data processing assemblies as the output of the data processing assembly to form a component relation network.
Specifically, the data processing components include a basic tool component, a conventional statistics application component, a data mining application component, a custom process component, a custom application component, a warehousing application component, and the like.
Specifically, the input configuration of the data processing component includes configuration parameters and the like related to a specific application, and the output configuration of the data processing component includes a data output format and the like.
Preferably, the WEB interface integration method for the Hadoop distributed algorithm according to the embodiment of the present invention further includes the following steps:
displaying the output result of the selected data acquisition component through the WEB interface so that a user can judge and adjust the input of the data acquisition component; displaying the output result of the selected data processing component through the WEB interface so that a user can judge and adjust the input of the data processing component; and displaying the component relation network through the WEB interface.
Specifically, the method further comprises the following steps before the component relationship network is formed;
judging whether the configuration of the data acquisition component and the data processing component is complete or not;
if the configuration of the data acquisition component and the data processing component is complete, a component relation network is formed;
and if the configuration of the data acquisition component and the data processing component is incomplete, displaying the incomplete configuration items through the WEB interface, and receiving the re-input of the incomplete configuration items by the user until the configuration is complete.
Step 103: and after receiving the operation instruction, processing the input data of the triggered data acquisition component by utilizing each component of the component relation network to obtain a data processing result.
Specifically, after obtaining the data processing result, the method further includes:
receiving viewing operation, editing operation, copying operation and deleting operation input by a user; the viewing operation comprises viewing a data processing cycle and a release state of the data processing application; the editing operation comprises changing the name, description and data cycle period of the data processing application; the copying operation comprises copying the data processing result; the deleting operation comprises deleting the data processing result.
To illustrate the method embodiments of the present invention in more detail, a specific embodiment is given.
Aiming at the usability of the Mahout in the development and use process, the invention provides a solution of a pure WEB interface, the Mahout algorithm is packaged into an independent data mining component, and FIG. 2 is a WEB interface schematic diagram of the data mining component.
When a certain component needs to be used, a data input component, such as a component for collecting data by the FTP, is selected first, and data input and output are configured, and fig. 3 is a schematic view of a WEB interface for inputting configuration data of the component for collecting data by the FTP.
And then dragging a Kmeans algorithm component from the menu, connecting the two components, and enabling the output of the FTP acquisition component to be the input of the Kmeans component, wherein FIG. 4 is a WEB interface schematic diagram after the FTP data acquisition component is connected with the Kmeans algorithm component.
Configuring an input parameter packet column and a task scheduling period of a Kmeans algorithm, and fig. 5 is a WEB interface schematic diagram of data input of a Kmeans algorithm component.
And setting an output data format of the Kmeans algorithm, wherein FIG. 6 is a WEB interface schematic diagram of the Kmeans algorithm component configuration data output.
After the configuration is completed, clicking 'immediate execution' on the right side of the component, namely scheduling the Mapreduce task of Hadoop, and automatically completing the execution of the task in the background.
The embodiment of the invention can complete the function of distributed processing on the data in a dragging mode of a WEB interface without programming. And the seeing effect can be immediately executed, which is beneficial to exploratory analysis.
According to an embodiment of the apparatus of the present invention, a WEB interface integration apparatus of a Hadoop distributed algorithm is provided, fig. 7 is a schematic structural diagram of the WEB interface integration apparatus of the Hadoop distributed algorithm according to the embodiment of the apparatus of the present invention, and as shown in fig. 7, the WEB interface integration apparatus of the Hadoop distributed algorithm according to the embodiment of the apparatus of the present invention includes: the first configuration module 70, the second configuration module 72, and the processing module 74 are described in detail below.
Specifically, the first configuration module 70 is configured to configure the input of a certain data acquisition component after the data acquisition component is triggered, and select one or more data processing components as the output of the data acquisition component;
the second configuration module 72 is configured to configure the input of the selected data processing component, and select one or more of the other data processing components as the output of the data processing component, so as to form a component relationship network;
and the processing module 74 is configured to, after receiving the operation instruction, process the input data of the triggered data acquisition component by using each component of the component relationship network to obtain a data processing result.
Specifically, the WEB interface integrated device of the Hadoop distributed algorithm further comprises a data acquisition component packaging module and a data processing component packaging module;
the data acquisition component packaging module is used for packaging programs related to data sources;
the data processing component packaging module is used for packaging programs related to data processing.
Preferably, the WEB interface is further configured to display an output of the selected data acquisition component, so that a user can determine and adjust an input of the data acquisition component; displaying the output of the selected data processing component for the user to judge and adjust the input of the data processing component; and displaying the component relation network.
Specifically, the WEB interface integration device of the Hadoop distributed algorithm further comprises a judgment module: the judging module is used for judging whether the configuration of the data acquisition component and the data processing component is complete or not; if the configuration of the data acquisition component and the data processing component is complete, a component relation network is formed; and if the configuration of the data acquisition component and the data processing component is incomplete, outputting the incomplete configuration item to a WEB interface.
Specifically, the WEB interface is further configured to receive a viewing operation, an editing operation, a copying operation, and a deleting operation, which are input by a user; the viewing operation comprises viewing a data processing cycle and a release state of the data processing application; the editing operation comprises changing the name, description and data cycle period of the data processing application; the copying operation comprises copying the data processing result; the deleting operation comprises deleting the data processing result.
The above description is only an example of the present invention, and is not intended to limit the present invention, and it is obvious to those skilled in the art that various modifications and variations can be made in the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (8)

1. A WEB interface integration method of a Hadoop distributed algorithm is characterized in that a plurality of data acquisition components and a plurality of data processing components are loaded in a WEB interface, and the method comprises the following steps:
when a certain data acquisition component is triggered, configuring the input of the data acquisition component, and selecting one or more data processing components as the output of the data acquisition component;
configuring the input of the selected data processing assembly, and selecting one or more of other data processing assemblies as the output of the data processing assembly to form an assembly relation network;
after receiving an operation instruction, processing the input data of the triggered data acquisition component by utilizing each component of the component relation network to obtain a data processing result;
further comprising the steps of:
displaying the output result of the selected data acquisition component through the WEB interface so that a user can judge and adjust the input of the data acquisition component;
displaying the output result of the selected data processing component through the WEB interface so that a user can judge and adjust the input of the data processing component;
and displaying the component relation network through the WEB interface.
2. The WEB interface integration method of claim 1, further comprising the steps of:
packaging programs related to a data source to obtain a plurality of data acquisition components;
and packaging the program related to the data processing to obtain a plurality of data processing components.
3. The WEB interface integration method of claim 1, wherein the forming of the component relationship network further comprises the following steps;
judging whether the configuration of the data acquisition component and the data processing component is complete or not;
if the configuration of the data acquisition component and the data processing component is complete, a component relation network is formed;
and if the configuration of the data acquisition component and the data processing component is incomplete, displaying the incomplete configuration items through the WEB interface, and receiving the re-input of the incomplete configuration items by the user until the configuration is complete.
4. The WEB interface integration method of claim 1, wherein after obtaining the data processing result, the method further comprises:
receiving viewing operation, editing operation, copying operation and deleting operation input by a user; the viewing operation comprises viewing the data processing cycle and the release state of the data processing component; the editing operation comprises changing the name, description and data cycle of the data processing component; the copying operation comprises copying the data processing result; the deleting operation comprises deleting the data processing result.
5. A WEB interface integrated device of a Hadoop distributed algorithm is characterized by comprising a first configuration module, a second configuration module and a processing module, wherein the WEB interface is loaded with a plurality of data acquisition components and a plurality of data processing components;
the first configuration module is used for configuring the input of a data acquisition component after the data acquisition component is triggered, and selecting one or more data processing components as the output of the data acquisition component;
the second configuration module is used for configuring the input of the selected data processing assembly and selecting one or more of other data processing assemblies as the output of the data processing assembly to form an assembly relation network;
the processing module is used for processing the input data of the triggered data acquisition component by utilizing each component of the component relation network after receiving the operation instruction to obtain a data processing result;
the WEB interface is also used for displaying the output of the selected data acquisition component so as to allow a user to judge and adjust the input of the data acquisition component; displaying the output of the selected data processing component for the user to judge and adjust the input of the data processing component; and displaying the component relation network.
6. The WEB interface integration apparatus according to claim 5, further comprising a data acquisition component encapsulation module and a data processing component encapsulation module;
the data acquisition component packaging module is used for packaging programs related to data sources;
the data processing component packaging module is used for packaging programs related to data processing.
7. The WEB interface integration apparatus according to claim 5, further comprising a determining module:
the judging module is used for judging whether the configuration of the data acquisition component and the data processing component is complete or not;
if the configuration of the data acquisition component and the data processing component is complete, a component relation network is formed;
and if the configuration of the data acquisition component and the data processing component is incomplete, outputting the incomplete configuration item to a WEB interface.
8. The WEB interface integration apparatus according to claim 5, wherein:
the WEB interface is also used for receiving the viewing operation, the editing operation, the copying operation and the deleting operation input by a user; the viewing operation comprises viewing the data processing cycle and the release state of the data processing component; the editing operation comprises changing the name, description and data cycle of the data processing component; the copying operation comprises copying the data processing result; the deleting operation comprises deleting the data processing result.
CN201611253462.XA 2016-12-30 2016-12-30 WEB interface integration method and device of Hadoop distributed algorithm Expired - Fee Related CN106815019B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611253462.XA CN106815019B (en) 2016-12-30 2016-12-30 WEB interface integration method and device of Hadoop distributed algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611253462.XA CN106815019B (en) 2016-12-30 2016-12-30 WEB interface integration method and device of Hadoop distributed algorithm

Publications (2)

Publication Number Publication Date
CN106815019A CN106815019A (en) 2017-06-09
CN106815019B true CN106815019B (en) 2020-09-01

Family

ID=59109611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611253462.XA Expired - Fee Related CN106815019B (en) 2016-12-30 2016-12-30 WEB interface integration method and device of Hadoop distributed algorithm

Country Status (1)

Country Link
CN (1) CN106815019B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704283A (en) * 2017-09-15 2018-02-16 深圳市诚壹科技有限公司 A kind of method and apparatus for configuring Gitlab components in a distributed system
CN109165055B (en) * 2018-08-30 2022-09-06 百度在线网络技术(北京)有限公司 Unmanned system component loading method and device, computer equipment and medium
CN111221839A (en) * 2018-11-23 2020-06-02 北京京东金融科技控股有限公司 Data processing method, system, electronic device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN103345400A (en) * 2013-07-24 2013-10-09 百度在线网络技术(北京)有限公司 Method and device for processing data
CN104573063A (en) * 2015-01-23 2015-04-29 四川中科腾信科技有限公司 Data analysis method based on big data
US20150121233A1 (en) * 2013-10-31 2015-04-30 Google Inc. Synchronized Distributed Networks with Frictionless Application Installation
CN106156307A (en) * 2016-06-30 2016-11-23 北京奇虎科技有限公司 The data handling system of a kind of real-time calculating platform and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102033748A (en) * 2010-12-03 2011-04-27 中国科学院软件研究所 Method for generating data processing flow codes
CN103345400A (en) * 2013-07-24 2013-10-09 百度在线网络技术(北京)有限公司 Method and device for processing data
US20150121233A1 (en) * 2013-10-31 2015-04-30 Google Inc. Synchronized Distributed Networks with Frictionless Application Installation
CN104573063A (en) * 2015-01-23 2015-04-29 四川中科腾信科技有限公司 Data analysis method based on big data
CN106156307A (en) * 2016-06-30 2016-11-23 北京奇虎科技有限公司 The data handling system of a kind of real-time calculating platform and method

Also Published As

Publication number Publication date
CN106815019A (en) 2017-06-09

Similar Documents

Publication Publication Date Title
US10558433B2 (en) Declarative design-time experience platform for code generation
US9203707B1 (en) Integration of cloud-based services to create custom business processes
US9619122B2 (en) Method and apparatus for automatic device program generation
US9233468B2 (en) Commanding a mobile robot using glyphs
CN106021102B (en) The generation method and device of automatic test file
CN106815019B (en) WEB interface integration method and device of Hadoop distributed algorithm
TW200504559A (en) Configurable PLC and SCADA-based control system
CN107209773B (en) Automatic invocation of unified visual interface
US10521243B2 (en) Pre/post deployment customization
US20150143220A1 (en) Previewing an extraction rule for raw machine data and modifying the rule through counter-example
EP2664998A1 (en) Microblog message processing method and device thereof
CN104407856A (en) SDK (Software Development Kit) file packaging method and SDK file packaging device
CN105892816A (en) Method and device for calculating equipment desktop resetting
WO2020220891A1 (en) Method and apparatus for generating configuration file of site in internet of things system
WO2018010339A1 (en) Target object processing method and device
CN102880471A (en) Command execution method based on command line and command line operating system
US20180349932A1 (en) Methods and systems for determining persona of participants by the participant use of a software product
CN107291460B (en) Television terminal, compiling server code control method and storage medium
CN103914292A (en) RIA (Rich Internet Application) based user interface generation method and device
CN111026432A (en) Big data processing platform, platform construction method and storage medium
CN109389972B (en) Quality testing method and device for semantic cloud function, storage medium and equipment
US20220342742A1 (en) Graphical management of big data pipelines
CN114003329A (en) Data display component creating method and device, storage medium and electronic device
KR101510243B1 (en) system for making and sharing clipping image
Urli et al. How to exploit domain knowledge in multiple software product lines?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200901

Termination date: 20201230

CF01 Termination of patent right due to non-payment of annual fee