CN115686702A - Data pulling method, system, storage medium and electronic equipment - Google Patents

Data pulling method, system, storage medium and electronic equipment Download PDF

Info

Publication number
CN115686702A
CN115686702A CN202211019572.5A CN202211019572A CN115686702A CN 115686702 A CN115686702 A CN 115686702A CN 202211019572 A CN202211019572 A CN 202211019572A CN 115686702 A CN115686702 A CN 115686702A
Authority
CN
China
Prior art keywords
data
task request
task
hadoop cluster
pull
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211019572.5A
Other languages
Chinese (zh)
Inventor
黄进成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Original Assignee
Shumei Tianxia Beijing Technology Co ltd
Beijing Nextdata Times Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shumei Tianxia Beijing Technology Co ltd, Beijing Nextdata Times Technology Co ltd filed Critical Shumei Tianxia Beijing Technology Co ltd
Priority to CN202211019572.5A priority Critical patent/CN115686702A/en
Publication of CN115686702A publication Critical patent/CN115686702A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data pulling, in particular to a data pulling method, a data pulling system, a storage medium and electronic equipment, wherein the method comprises the following steps: receiving a task request input by a user on a front-end interface according to a preset data format; the task request is submitted to the HADOOP cluster through the Livy plug-in, the HADOOP cluster pulls data corresponding to the task request and returns the data, a user who does not understand programming can write the task request into a front-end interface according to a preset data format, then the task request is submitted to the HADOOP cluster through the Livy plug-in, the HADOOP cluster pulls the data corresponding to the task request and returns the data, the technical threshold can be lowered, and the data pulling efficiency is improved.

Description

Data pulling method, system, storage medium and electronic equipment
Technical Field
The present invention relates to the field of data pulling technologies, and in particular, to a data pulling method, a data pulling system, a storage medium, and an electronic device.
Background
Data extraction refers to a process of extracting required information from original data according to a certain purpose so as to further store, convert and analyze the information. For the condition that the data is stored in the HADOP cluster, the data is extracted by manual extraction mostly. Usually, the data extraction requirement needs to be confirmed by data department personnel and a demand side, and corresponding data is specified and extracted according to the requirement of the demand side. The process is not complicated, but the manual extraction mode is a time-consuming and labor-consuming matter for simple or conventional data extraction requirements. Moreover, because programming extraction is needed, the operation of extracting data by personnel in a non-data department is very difficult, and the daily data extraction requirements aiming at the interior of a company and customers are also many, so that the problem of how to solve the data extraction requirements is a problem which needs to be solved urgently.
Disclosure of Invention
The invention provides a data pulling method, a system, a storage medium and an electronic device, aiming at the defects of the prior art.
The technical scheme of the data pulling method of the invention is as follows:
receiving a task request input by a user on a front-end interface according to a preset data format;
and submitting the task request to an HADOOP cluster through a Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request.
The beneficial effects of the data pulling method of the invention are as follows:
the method has the advantages that a user who does not understand programming can write a task request into a front-end interface according to a preset data format, and then submit the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request and returns the data, the technical threshold can be lowered, and the data pulling efficiency is improved.
On the basis of the above scheme, the data pulling method of the present invention may be further improved as follows.
Further, the task request includes a data source and a pull condition, so that the HADOOP cluster pulls data corresponding to the task request, including:
and enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pull condition.
Further, the returning of the data corresponding to the task request by the HADOOP cluster includes:
and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.
Further, still include:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in unit through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in unit.
The beneficial effect of adopting the above further scheme is: the method can process a large number of task requests, improves the throughput of data pulling, and is convenient for a plurality of users to use simultaneously.
The technical scheme of the data pulling system is as follows:
the system comprises a receiving module and a pulling return module;
the receiving module is used for: receiving a task request input by a user on a front-end interface according to a preset data format;
the pull return module is to: and submitting the task request to an HADOOP cluster through a Livy plug-in, and enabling the HADOOP cluster to pull and return data corresponding to the task request.
The beneficial effects of the data pulling system of the invention are as follows:
the method has the advantages that a user who does not understand programming can write a task request in a front-end interface according to a preset data format, and then submits the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request, the technical threshold can be reduced, and the data pulling efficiency is improved.
On the basis of the scheme, the data pulling system can be further improved as follows.
Further, the task request includes a data source and a pull condition, and the pull return module causes the HADOOP cluster to pull data corresponding to the task request, including:
and enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pull condition.
Further, the process of the pull return module enabling the HADOOP cluster to return the data corresponding to the task request includes:
and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.
Further, the system also comprises a task queue module, wherein the task queue module is used for:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in unit through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in unit.
The beneficial effect of adopting the above further scheme is: a large number of task requests can be processed, the throughput of data pulling is improved, and a plurality of users can use the data at the same time conveniently.
A storage medium of the present invention stores therein instructions that, when read by a computer, cause the computer to execute a data pull method of any one of the above.
An electronic device of the present invention includes a processor and the storage medium, where the processor executes instructions in the storage medium.
Drawings
Fig. 1 is a flowchart illustrating a data pulling method according to an embodiment of the present invention;
fig. 2 is a second flowchart illustrating a data pull method according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a data pulling system according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, a data pulling method according to an embodiment of the present invention includes the following steps:
s1, receiving a task request input by a user on a front-end interface according to a preset data format;
the preset data format comprises words such as AND AND OR, AND the user combines the keywords AND the data source which are required to be searched through the words such as AND AND OR to form a task request.
S2, submitting the task request to an HADOOP cluster through the Livy plug-in, and enabling the HADOOP cluster to pull data corresponding to the task request and return the data;
and returning the data corresponding to the HADOOP cluster pull task request through the Livy plug-in.
The method has the advantages that a user who does not understand programming can write a task request in a front-end interface according to a preset data format, and then submits the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request, the technical threshold can be reduced, and the data pulling efficiency is improved.
Optionally, in the above technical solution, in S2, the task request includes a data source and a pull condition, so that the HADOOP cluster pulls data corresponding to the task request, including:
and S21, enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pulling condition.
The user combines the keywords to be searched through words such as AND, OR AND the like to form a pull condition.
The names of a plurality of data sources can be set on the front-end interface, and a user can determine the data sources in the task request in a selection mode and the like.
For example, a user wants to check information about a product such as a rice cooker, AND inputs a task request on a front-end interface according to a preset data format, wherein the pull condition in the task request is "model of rice cooker" AND "production place of rice cooker", the data source in the task request is a preset "product information data source", AND the specific format of the task request can be "model of rice cooker" AND "production place of rice cooker" AND "product information data source", wherein the user can set a plurality of data sources according to actual conditions, AND different data are stored in each data source. The user may specifically set a plurality of data sources in the cloud, or set a plurality of data sources in the database. The task request is submitted to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request, such as pictures of the electric cooker, product information of the electric cooker and the like, from a data source, namely a product information data source according to a pulling condition, and provides the data to a user.
Optionally, in the foregoing technical solution, in S2, the causing the HADOOP cluster to return the data corresponding to the task request includes:
and S22, enabling the HADOOP cluster to return a download link for downloading the data corresponding to the task request, so that the user can acquire the data corresponding to the task request through the download link.
Optionally, in the above technical solution, the method further includes:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in.
The method can process a large number of task requests, improves the throughput of data pulling, and is convenient for a plurality of users to use simultaneously.
A data pulling method according to the present invention is described below by an embodiment, as shown in fig. 2, and specifically includes:
s100, constructing a front-end interface, and drawing up a data format of the front-end interface, namely drawing up a preset data format;
s101, a rear end receives a task request input by a user on a front end interface according to a preset data format, analyzes a pulling condition and a data source in the task request, assembles the task into a data extraction task, and stores request parameters, namely the pulling condition and the data source in the task request into a database;
s102, constructing a task scheduler, putting a data extraction task corresponding to the data request into a current task queue, updating the state of the data extraction task corresponding to the data request and waiting for sending, wherein the state of the data extraction task corresponding to the data request is to be processed;
s103, sending the data extraction task corresponding to the data request to a Livy service interface, namely to a Livy plug-in, interacting the Livy plug-in with the HADOOP cluster, returning the task state through the Livy plug-in and updating the task state in real time;
s104, the task pulling operation of the data extraction task corresponding to the data request is finished, the task data is updated, and a corresponding mail is sent, wherein at the moment, the state of the data extraction task corresponding to the data request is the completion of processing;
the technical scheme of the invention has the following characteristics:
1) The data extraction operation is changed and visualized, and an interface can be clicked. Non-data department personnel can independently complete the operation of extracting data through the steps of setting extraction categories, selecting filtering conditions, required fields and the like through an interface without participation of data department personnel;
2) And assembling the data into an HTTP request through processing at the back end of the platform by using the data extraction conditions set by the personnel. The back end of the platform records a task template and sends a request to a Livy service, namely a Livy plug-in, and submits a data extraction task through the interaction of the Livy plug-in and the HADOOP cluster;
3) The platform constructs a database record for each task, identifies information such as task extraction time, task names and task templates, interacts with Livy service in real time, acquires the latest state of the task, reminds a person submitting the task in an email mode after the task is executed, and provides a download link on a task recording interface;
4) The task data of the platform is stored in the HADOOP cluster and is stored for a certain time, and the downloading link is continuously effective in the time. And storing the platform task record in the database, re-extracting the task by the task template information in the task record, and re-extracting the task recovery data by the task template when the task exceeds the storage time or is deleted by mistake.
The invention can well solve the problem that non-data personnel extract data when being unfamiliar with code operation, and the extracted data is simplified through the interface; the data is stored in the HADOOP cluster and can be downloaded at any time through the interface; the data extraction task is submitted through the Livy plug-in, so that personnel in a data department are relieved from simple and conventional boring requirements, and the department efficiency is improved; the task state can be timely acquired through interaction with the Livy plug-in, and the data extraction task state is guaranteed to be timely updated.
In the above embodiments, although the steps are numbered as S1, S2, etc., but only the specific embodiments are given in the present application, and a person skilled in the art may adjust the execution sequence of S1, S2, etc. according to the actual situation, which is also within the protection scope of the present invention, it is understood that some embodiments may include some or all of the above embodiments.
As shown in fig. 3, a data pull system 200, a receiving module 210 and a pull return module 220 according to an embodiment of the present invention;
the receiving module 210 is configured to: receiving a task request input by a user on a front-end interface according to a preset data format;
the pull back module 220 is used to: and submitting the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request and returns the data.
The method has the advantages that a user who does not understand programming can write a task request into a front-end interface according to a preset data format, and then submit the task request to the HADOOP cluster through the Livy plug-in, so that the HADOOP cluster pulls data corresponding to the task request and returns the data, the technical threshold can be lowered, and the data pulling efficiency is improved.
Optionally, in the foregoing technical solution, the task request includes a data source and a pull condition, and the pull returning module 220 enables the HADOOP cluster to pull data corresponding to the task request, including:
and the HADOOP cluster pulls the data corresponding to the task request from the data source according to the pulling condition.
Optionally, in the foregoing technical solution, the process of the pull-return module 220 making the HADOOP cluster return data corresponding to the task request includes:
and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.
Optionally, in the above technical solution, the system further includes a task queue module, where the task queue module is configured to:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in.
The method can process a large number of task requests, improves the throughput of data pulling, and is convenient for a plurality of users to use simultaneously.
The above steps related to the implementation of the corresponding functions of each parameter and each unit module in the data pulling system 200 of the present invention can refer to each parameter and step in the above embodiment related to a data pulling method, and are not described herein again.
The storage medium of an embodiment of the present invention stores instructions, and when the instructions are read by a computer, the computer is caused to execute any one of the above data pulling methods.
An electronic device according to an embodiment of the present invention includes a processor and the storage medium, where the processor executes instructions in the storage medium. The electronic device can be a computer, a mobile phone or the like.
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product.
Accordingly, the present disclosure may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A method for pulling data, comprising:
receiving a task request input by a user on a front-end interface according to a preset data format;
and submitting the task request to an HADOOP cluster through a Livy plug-in, and enabling the HADOOP cluster to pull and return data corresponding to the task request.
2. The method according to claim 1, wherein the task request includes a data source and a pull condition, and the causing the HADOOP cluster to pull data corresponding to the task request includes:
and enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pull condition.
3. The method of claim 1, wherein returning the data corresponding to the task request to the HADOOP cluster comprises:
and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.
4. A data pulling method according to any one of claims 1 to 3, further comprising:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in unit through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in unit.
5. A data pulling system is characterized by comprising a receiving module and a pulling return module;
the receiving module is used for: receiving a task request input by a user on a front-end interface according to a preset data format;
the pull return module is to: and submitting the task request to an HADOOP cluster through a Livy plug-in, so that the HADOOP cluster pulls and returns data corresponding to the task request.
6. The data pulling system of claim 5, wherein the task request comprises a data source and a pulling condition, and the pulling return module causes the HADOOP cluster to pull data corresponding to the task request, comprising:
and enabling the HADOOP cluster to pull the data corresponding to the task request from the data source according to the pull condition.
7. The data pulling system of claim 5, wherein the process of the pull return module causing the HADOOP cluster to return the data corresponding to the task request comprises:
and returning a download link for downloading the data corresponding to the task request by the HADOOP cluster, so that the user can acquire the data corresponding to the task request through the download link.
8. A data pulling system according to any one of claims 5 to 7, further comprising a task queue module, the task queue module being configured to:
according to the sequence of the input time of each task request, placing a plurality of task requests in a task queue;
and sequentially sending each task request to the Livy plug-in unit through the task queue so as to sequentially submit each task request to the HADOOP cluster through the Livy plug-in unit.
9. A storage medium having stored therein instructions, which when read by a computer, cause the computer to execute a data pulling method according to any one of claims 1 to 4.
10. An electronic device comprising a processor and the storage medium of claim 9, wherein the processor executes instructions in the storage medium.
CN202211019572.5A 2022-08-24 2022-08-24 Data pulling method, system, storage medium and electronic equipment Pending CN115686702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211019572.5A CN115686702A (en) 2022-08-24 2022-08-24 Data pulling method, system, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211019572.5A CN115686702A (en) 2022-08-24 2022-08-24 Data pulling method, system, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN115686702A true CN115686702A (en) 2023-02-03

Family

ID=85060548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211019572.5A Pending CN115686702A (en) 2022-08-24 2022-08-24 Data pulling method, system, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN115686702A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device
CN107515875A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 Data query method and device
CN113821560A (en) * 2020-06-18 2021-12-21 中兴通讯股份有限公司 DAP platform-based big data processing method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107515875A (en) * 2016-06-16 2017-12-26 阿里巴巴集团控股有限公司 Data query method and device
CN106682147A (en) * 2016-12-22 2017-05-17 北京锐安科技有限公司 Mass data based query method and device
CN113821560A (en) * 2020-06-18 2021-12-21 中兴通讯股份有限公司 DAP platform-based big data processing method and system

Similar Documents

Publication Publication Date Title
CN105573966B (en) Adaptive modification of content presented in a spreadsheet
CN108628830B (en) Semantic recognition method and device
CN110515951B (en) BOM standardization method and system, electronic device and storage medium
CN111078701B (en) Data extraction method and device based on relational database
CN109710631A (en) Auxiliary generates method, apparatus, equipment and the computer storage medium of SQL code
US8296723B2 (en) Configurable unified modeling language building blocks
CN104598570A (en) Resource fetching method and device
JP7309811B2 (en) Data annotation method, apparatus, electronics and storage medium
US10885013B2 (en) Automated application lifecycle tracking using batch processing
CN114491220A (en) Object processing method, device, equipment and medium
CN110704417A (en) Metadata management method, equipment and storage medium
CN115686702A (en) Data pulling method, system, storage medium and electronic equipment
US20210055928A1 (en) Integration test framework
CN116225902A (en) Method, device and equipment for generating test cases
CN113656028B (en) UI component code generation method, device and equipment, and CRUD page generation method, device and equipment
CN115134353A (en) Automatic file uploading method, file previewing method, device, medium and equipment
CN114880362A (en) Data analysis system
CN113190835A (en) Application program violation detection method, device, equipment and storage medium
CN113741864A (en) Automatic design method and system of semantic service interface based on natural language processing
CN113138974A (en) Database compliance detection method and device
CN111709716A (en) Data approval method, device, equipment and storage medium
CN112131379A (en) Method, device, electronic equipment and storage medium for identifying problem category
CN111176982A (en) Test interface generation method and device
CN111460274B (en) Information processing method and device
CN111079185A (en) Database information processing method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230203

RJ01 Rejection of invention patent application after publication