CN115686702A

CN115686702A - Data pulling method, system, storage medium and electronic equipment

Info

Publication number: CN115686702A
Application number: CN202211019572.5A
Authority: CN
Inventors: 黄进成
Original assignee: Shumei Tianxia Beijing Technology Co ltd; Beijing Nextdata Times Technology Co ltd
Current assignee: Shumei Tianxia Beijing Technology Co ltd; Beijing Nextdata Times Technology Co ltd
Priority date: 2022-08-24
Filing date: 2022-08-24
Publication date: 2023-02-03

Abstract

The invention relates to the technical field of data pulling, in particular to a data pulling method, a data pulling system, a storage medium and electronic equipment, wherein the method comprises the following steps: receiving a task request input by a user on a front-end interface according to a preset data format; the task request is submitted to the HADOOP cluster through the Livy plug-in, the HADOOP cluster pulls data corresponding to the task request and returns the data, a user who does not understand programming can write the task request into a front-end interface according to a preset data format, then the task request is submitted to the HADOOP cluster through the Livy plug-in, the HADOOP cluster pulls the data corresponding to the task request and returns the data, the technical threshold can be lowered, and the data pulling efficiency is improved.

Description

Data pulling method, system, storage medium and electronic equipment

Technical Field

The present invention relates to the field of data pulling technologies, and in particular, to a data pulling method, a data pulling system, a storage medium, and an electronic device.

Background

Data extraction refers to a process of extracting required information from original data according to a certain purpose so as to further store, convert and analyze the information. For the condition that the data is stored in the HADOP cluster, the data is extracted by manual extraction mostly. Usually, the data extraction requirement needs to be confirmed by data department personnel and a demand side, and corresponding data is specified and extracted according to the requirement of the demand side. The process is not complicated, but the manual extraction mode is a time-consuming and labor-consuming matter for simple or conventional data extraction requirements. Moreover, because programming extraction is needed, the operation of extracting data by personnel in a non-data department is very difficult, and the daily data extraction requirements aiming at the interior of a company and customers are also many, so that the problem of how to solve the data extraction requirements is a problem which needs to be solved urgently.

Disclosure of Invention

The invention provides a data pulling method, a system, a storage medium and an electronic device, aiming at the defects of the prior art.

The technical scheme of the data pulling method of the invention is as follows:

receiving a task request input by a user on a front-end interface according to a preset data format;

and submitting the task request to an HADOOP cluster through a Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request.

The beneficial effects of the data pulling method of the invention are as follows:

the method has the advantages that a user who does not understand programming can write a task request into a front-end interface according to a preset data format, and then submit the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request and returns the data, the technical threshold can be lowered, and the data pulling efficiency is improved.

On the basis of the above scheme, the data pulling method of the present invention may be further improved as follows.

Further, the task request includes a data source and a pull condition, so that the HADOOP cluster pulls data corresponding to the task request, including:

and enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pull condition.

Further, the returning of the data corresponding to the task request by the HADOOP cluster includes:

and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.

Further, still include:

according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;

and sequentially sending each task request to the Livy plug-in unit through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in unit.

The beneficial effect of adopting the above further scheme is: the method can process a large number of task requests, improves the throughput of data pulling, and is convenient for a plurality of users to use simultaneously.

The technical scheme of the data pulling system is as follows:

the system comprises a receiving module and a pulling return module;

the receiving module is used for: receiving a task request input by a user on a front-end interface according to a preset data format;

the pull return module is to: and submitting the task request to an HADOOP cluster through a Livy plug-in, and enabling the HADOOP cluster to pull and return data corresponding to the task request.

The beneficial effects of the data pulling system of the invention are as follows:

the method has the advantages that a user who does not understand programming can write a task request in a front-end interface according to a preset data format, and then submits the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request, the technical threshold can be reduced, and the data pulling efficiency is improved.

On the basis of the scheme, the data pulling system can be further improved as follows.

Further, the task request includes a data source and a pull condition, and the pull return module causes the HADOOP cluster to pull data corresponding to the task request, including:

Further, the process of the pull return module enabling the HADOOP cluster to return the data corresponding to the task request includes:

Further, the system also comprises a task queue module, wherein the task queue module is used for:

The beneficial effect of adopting the above further scheme is: a large number of task requests can be processed, the throughput of data pulling is improved, and a plurality of users can use the data at the same time conveniently.

A storage medium of the present invention stores therein instructions that, when read by a computer, cause the computer to execute a data pull method of any one of the above.

An electronic device of the present invention includes a processor and the storage medium, where the processor executes instructions in the storage medium.

Drawings

Fig. 1 is a flowchart illustrating a data pulling method according to an embodiment of the present invention;

fig. 2 is a second flowchart illustrating a data pull method according to an embodiment of the invention;

fig. 3 is a schematic structural diagram of a data pulling system according to an embodiment of the present invention.

Detailed Description

As shown in fig. 1, a data pulling method according to an embodiment of the present invention includes the following steps:

s1, receiving a task request input by a user on a front-end interface according to a preset data format;

the preset data format comprises words such as AND AND OR, AND the user combines the keywords AND the data source which are required to be searched through the words such as AND AND OR to form a task request.

S2, submitting the task request to an HADOOP cluster through the Livy plug-in, and enabling the HADOOP cluster to pull data corresponding to the task request and return the data;

and returning the data corresponding to the HADOOP cluster pull task request through the Livy plug-in.

Optionally, in the above technical solution, in S2, the task request includes a data source and a pull condition, so that the HADOOP cluster pulls data corresponding to the task request, including:

and S21, enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pulling condition.

The user combines the keywords to be searched through words such as AND, OR AND the like to form a pull condition.

The names of a plurality of data sources can be set on the front-end interface, and a user can determine the data sources in the task request in a selection mode and the like.

For example, a user wants to check information about a product such as a rice cooker, AND inputs a task request on a front-end interface according to a preset data format, wherein the pull condition in the task request is "model of rice cooker" AND "production place of rice cooker", the data source in the task request is a preset "product information data source", AND the specific format of the task request can be "model of rice cooker" AND "production place of rice cooker" AND "product information data source", wherein the user can set a plurality of data sources according to actual conditions, AND different data are stored in each data source. The user may specifically set a plurality of data sources in the cloud, or set a plurality of data sources in the database. The task request is submitted to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request, such as pictures of the electric cooker, product information of the electric cooker and the like, from a data source, namely a product information data source according to a pulling condition, and provides the data to a user.

Optionally, in the foregoing technical solution, in S2, the causing the HADOOP cluster to return the data corresponding to the task request includes:

and S22, enabling the HADOOP cluster to return a download link for downloading the data corresponding to the task request, so that the user can acquire the data corresponding to the task request through the download link.

Optionally, in the above technical solution, the method further includes:

and sequentially sending each task request to the Livy plug-in through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in.

The method can process a large number of task requests, improves the throughput of data pulling, and is convenient for a plurality of users to use simultaneously.

A data pulling method according to the present invention is described below by an embodiment, as shown in fig. 2, and specifically includes:

s100, constructing a front-end interface, and drawing up a data format of the front-end interface, namely drawing up a preset data format;

s101, a rear end receives a task request input by a user on a front end interface according to a preset data format, analyzes a pulling condition and a data source in the task request, assembles the task into a data extraction task, and stores request parameters, namely the pulling condition and the data source in the task request into a database;

s102, constructing a task scheduler, putting a data extraction task corresponding to the data request into a current task queue, updating the state of the data extraction task corresponding to the data request and waiting for sending, wherein the state of the data extraction task corresponding to the data request is to be processed;

s103, sending the data extraction task corresponding to the data request to a Livy service interface, namely to a Livy plug-in, interacting the Livy plug-in with the HADOOP cluster, returning the task state through the Livy plug-in and updating the task state in real time;

s104, the task pulling operation of the data extraction task corresponding to the data request is finished, the task data is updated, and a corresponding mail is sent, wherein at the moment, the state of the data extraction task corresponding to the data request is the completion of processing;

the technical scheme of the invention has the following characteristics:

1) The data extraction operation is changed and visualized, and an interface can be clicked. Non-data department personnel can independently complete the operation of extracting data through the steps of setting extraction categories, selecting filtering conditions, required fields and the like through an interface without participation of data department personnel;

2) And assembling the data into an HTTP request through processing at the back end of the platform by using the data extraction conditions set by the personnel. The back end of the platform records a task template and sends a request to a Livy service, namely a Livy plug-in, and submits a data extraction task through the interaction of the Livy plug-in and the HADOOP cluster;

3) The platform constructs a database record for each task, identifies information such as task extraction time, task names and task templates, interacts with Livy service in real time, acquires the latest state of the task, reminds a person submitting the task in an email mode after the task is executed, and provides a download link on a task recording interface;

4) The task data of the platform is stored in the HADOOP cluster and is stored for a certain time, and the downloading link is continuously effective in the time. And storing the platform task record in the database, re-extracting the task by the task template information in the task record, and re-extracting the task recovery data by the task template when the task exceeds the storage time or is deleted by mistake.

The invention can well solve the problem that non-data personnel extract data when being unfamiliar with code operation, and the extracted data is simplified through the interface; the data is stored in the HADOOP cluster and can be downloaded at any time through the interface; the data extraction task is submitted through the Livy plug-in, so that personnel in a data department are relieved from simple and conventional boring requirements, and the department efficiency is improved; the task state can be timely acquired through interaction with the Livy plug-in, and the data extraction task state is guaranteed to be timely updated.

In the above embodiments, although the steps are numbered as S1, S2, etc., but only the specific embodiments are given in the present application, and a person skilled in the art may adjust the execution sequence of S1, S2, etc. according to the actual situation, which is also within the protection scope of the present invention, it is understood that some embodiments may include some or all of the above embodiments.

As shown in fig. 3, a data pull system 200, a receiving module 210 and a pull return module 220 according to an embodiment of the present invention;

the receiving module 210 is configured to: receiving a task request input by a user on a front-end interface according to a preset data format;

the pull back module 220 is used to: and submitting the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request and returns the data.

Optionally, in the foregoing technical solution, the task request includes a data source and a pull condition, and the pull returning module 220 enables the HADOOP cluster to pull data corresponding to the task request, including:

and the HADOOP cluster pulls the data corresponding to the task request from the data source according to the pulling condition.

Optionally, in the foregoing technical solution, the process of the pull-return module 220 making the HADOOP cluster return data corresponding to the task request includes:

Optionally, in the above technical solution, the system further includes a task queue module, where the task queue module is configured to:

The above steps related to the implementation of the corresponding functions of each parameter and each unit module in the data pulling system 200 of the present invention can refer to each parameter and step in the above embodiment related to a data pulling method, and are not described herein again.

The storage medium of an embodiment of the present invention stores instructions, and when the instructions are read by a computer, the computer is caused to execute any one of the above data pulling methods.

An electronic device according to an embodiment of the present invention includes a processor and the storage medium, where the processor executes instructions in the storage medium. The electronic device can be a computer, a mobile phone or the like.

As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product.

Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for pulling data, comprising:

and submitting the task request to an HADOOP cluster through a Livy plug-in, and enabling the HADOOP cluster to pull and return data corresponding to the task request.

2. The method according to claim 1, wherein the task request includes a data source and a pull condition, and the causing the HADOOP cluster to pull data corresponding to the task request includes:

3. The method of claim 1, wherein returning the data corresponding to the task request to the HADOOP cluster comprises:

4. A data pulling method according to any one of claims 1 to 3, further comprising:

5. A data pulling system is characterized by comprising a receiving module and a pulling return module;

the pull return module is to: and submitting the task request to an HADOOP cluster through a Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request.

6. The data pulling system of claim 5, wherein the task request comprises a data source and a pulling condition, and the pulling return module causes the HADOOP cluster to pull data corresponding to the task request, comprising:

7. The data pulling system of claim 5, wherein the process of the pull return module causing the HADOOP cluster to return the data corresponding to the task request comprises:

8. A data pulling system according to any one of claims 5 to 7, further comprising a task queue module, the task queue module being configured to:

9. A storage medium having stored therein instructions, which when read by a computer, cause the computer to execute a data pulling method according to any one of claims 1 to 4.

10. An electronic device comprising a processor and the storage medium of claim 9, wherein the processor executes instructions in the storage medium.