CN109635022B

CN109635022B - Visual elastic search data acquisition method and device

Info

Publication number: CN109635022B
Application number: CN201811290888.1A
Authority: CN
Inventors: 杨耀; 王纯斌; 钟武; 李森林
Original assignee: Chengdu Sefon Software Co Ltd
Current assignee: Chengdu Sefon Software Co Ltd
Priority date: 2018-10-31
Filing date: 2018-10-31
Publication date: 2021-04-13
Anticipated expiration: 2038-10-31
Also published as: CN109635022A

Abstract

The invention discloses a visual elastic search data acquisition method and a visual elastic search data acquisition device, wherein the visual elastic search data acquisition method comprises the following steps: creating a visual component, wherein the visual component comprises an input plug-in, an output plug-in and a scheduling plug-in; creating a task by associating an input plug-in, an output plug-in and a scheduling plug-in; respectively carrying out parameter configuration on the input plug-in, the output plug-in and the scheduling plug-in to obtain an input plug-in configuration file, an output plug-in configuration file and a scheduling plug-in configuration file; configuring a running node and a task strategy of a task; loading a task strategy and acquiring node information of a target operation node; sending the task to the target operation node according to the node information of the target operation node so that the target operation node can acquire and analyze the input plug-in configuration file, the output plug-in configuration file and the scheduling plug-in configuration file, and executing data acquisition; the invention simplifies the configuration process, can simultaneously carry out multi-task and multi-node concurrent acquisition, effectively improves the efficiency of data acquisition and reduces the use cost.

Description

Visual elastic search data acquisition method and device

Technical Field

The invention relates to the technical field of data acquisition, in particular to a visual elastic search data acquisition method and device.

Background

The elastic search is a search server based on Lucene, provides a full-text search engine with distributed multi-user capability, is based on RESTful web interfaces, is developed by Java, is issued as an open source code under Apache licensing terms, is a current popular enterprise-level search engine, is designed for cloud computing, can achieve real-time search, and is stable, reliable, rapid and convenient to install and use.

Efficient searching using the ElasticSearch has a precondition that data needs to be collected into the ElasticSearch first. At present, data acquisition is usually completed by using a third-party plug-in, a command line or complex configuration is needed, the learning cost and the use threshold are high, the data acquisition can be only executed in a single task, the acquisition speed is low, and the efficiency is low.

Disclosure of Invention

In order to solve the problems, the invention provides a visual ElasticSearch data acquisition method and device, which define the input and output of structured data and create a scheduling task through a graphical operation mode, realize data acquisition based on the ElasticSearch, simplify the acquisition process and improve the acquisition rate.

In order to achieve the purpose, the invention adopts the following technical scheme:

specifically, a visual elastic search data acquisition method is applied to a user terminal in communication connection with a node server, and the method comprises the following steps:

creating a visualization component comprising an input plug-in, an output plug-in and a scheduling plug-in;

creating a task by associating the input plug-in, the output plug-in and the scheduling plug-in;

respectively carrying out parameter configuration on the input plug-in, the output plug-in and the scheduling plug-in to obtain an input plug-in configuration file, an output plug-in configuration file and a scheduling plug-in configuration file;

configuring the running node and the task strategy of the task;

loading the task strategy and acquiring node information of a target operation node;

and sending the task to the target operation node according to the node information of the target operation node so that the target operation node can acquire and analyze the input plug-in configuration file, the output plug-in configuration file and the scheduling plug-in configuration file, and executing data acquisition.

Further, the input plug-in configuration file comprises data source information and a query script, wherein the data source information comprises an IP (Internet protocol) and port information of a data source database.

Further, the output plug-in configuration file comprises data target information, the data target is an ElasticSearch server, and the data target information comprises an IP, a port, an index name and a type name of the ElasticSearch server.

Further, the configuration file of the scheduling plug-in includes a scheduling type, a scheduling time and associated input and output.

Further, configuring the running node of the task includes configuring a name, an IP, and port information of the running node server.

Further, the task policy for configuring the task includes configuring an execution mode, a target running node server, a task log level, and a scheduling task, where the scheduling task is to associate the task with the target running node server.

Further, the input plug-in configuration file and the output plug-in configuration file are both saved as ktr files, and the scheduling plug-in configuration file is kjb file.

Further, the specific steps of the target operation node executing data acquisition are as follows: the target operation node analyzes the scheduling plug-in configuration file, acquires the scheduling type, scheduling time and associated input and output information of the task, acquires the input plug-in configuration file and the output plug-in configuration file according to the acquired associated input and output information, acquires the data source information and the data target information by analyzing the input plug-in configuration file and the output plug-in configuration file, and executes data acquisition according to the scheduling type and the scheduling time according to the data source information and the data target information.

Specifically, a visual elastic search data acquisition device, the device includes: the system comprises a designer and a manager, wherein the designer is used for creating tasks through a visual component, and the manager is used for configuring operation nodes and target operation nodes for distributing the tasks, and sending the tasks to the target operation nodes for execution.

Compared with the prior art, the invention has the beneficial effects that:

the invention defines the input and output of the structured data and establishes the scheduling task by a graphical operation mode, can start the acquisition of the structured data by only simple configuration aiming at different business requirements, improves the usability of data acquisition, can simultaneously carry out multi-task and multi-node concurrent acquisition and improves the high efficiency of data acquisition.

Drawings

FIG. 1 is a flow chart of a visualized ElasticSearch data acquisition method of the present invention;

FIG. 2 is a flow chart of a data collection process according to embodiment 1 of the present invention;

FIG. 3 is a block diagram of a visual ElasticSearch data acquisition device of the present invention.

Description of reference numerals: 101-designer, 102-manager.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Example 1

As shown in fig. 1, a visual elastic search data acquisition method is applied to a user terminal in communication connection with a node server, and the method includes:

the method comprises the steps of establishing a visual component, wherein the visual component comprises an input plug-in, an output plug-in and a scheduling plug-in, the data acquisition is completely visual, a user can start data acquisition only by simple configuration, abstract and difficult-to-remember commands and complex and tedious configuration operations are not needed, user experience is improved, learning cost and use thresholds are reduced, the acquisition process is intelligently controlled by a scheduling center, manual intervention is not needed, and acquisition details can be monitored in real time.

Tasks are created by associating input, output and dispatch plug-ins, each of which is independent, with a complete task chain consisting of input, output and dispatch plug-ins.

Configuring an input plug-in, wherein configuration information comprises an input plug-in name, data source information and a query script, and storing a configured input plug-in configuration file as a ktr file; the data source information includes information such as an IP and a port of a data source database, the query script is a script for querying data from the data source database, taking an oracle as the data source database as an example, and data needs to be read from the oracle, the data source information needs to fill information such as the IP and the port of the oracle, and the query script is an SQL statement for querying data from the oracle.

Configuring an output plug-in, wherein the output plug-in comprises an output plug-in name and data target information, the data target is a target database to which data is to be extracted, and a configured output plug-in configuration file is stored as a ktr file; in this embodiment, the data target is an ElasticSearch server, and the data target information includes an IP, a port, an index name, and a type name of the ElasticSearch server.

Configuring a scheduling plug-in, including whether repeated acquisition, scheduling type, scheduling time and associated input and output are carried out, and storing a configured scheduling plug-in configuration file as an kjb file; the scheduling types comprise immediate execution, how long it is before, a certain time of day, a certain time of week, a certain time of month; the scheduling time is specific scheduling time required to be specified after the scheduling type is configured; the associated input and output are input plug-ins and output plug-ins which need to be called by the specified current scheduling plug-in.

And configuring the operation node, including the name, IP and port information of the operation node server.

Configuring a task strategy, namely allocating target operation nodes of a task, wherein configuration information comprises an execution mode, a target operation node server, a task log level and a scheduling task, and the scheduling task is to associate the task to the allocated target operation node server; meanwhile, node clustering is adopted, tasks can be executed at multiple nodes, simultaneous multi-task execution is supported, the nodes support transverse expansion, data acquisition efficiency can be effectively improved, and when performance bottlenecks occur, the problem can be solved only by transversely expanding the nodes.

And loading a task strategy, acquiring node information of the target operation node, sending a task to the target operation node according to the node information of the target operation node so that the target operation node can acquire and analyze the input plug-in configuration file, the output plug-in configuration file and the scheduling plug-in configuration file, and executing data acquisition.

As shown in fig. 2, the specific steps of the target operation node executing data acquisition are as follows: the target operation node analyzes the scheduling plug-in configuration file, acquires the scheduling type, scheduling time and associated input and output information of the task, acquires the input plug-in configuration file and the output plug-in configuration file according to the acquired associated input and output information, acquires data source information and data target information by analyzing the input plug-in configuration file and the output plug-in configuration file, and performs data acquisition according to the scheduling type and the scheduling time according to the data source information and the data target information.

Example 2

As shown in fig. 3, a visualized elastic search data acquisition device includes: the system comprises a designer and a manager, wherein the designer is used for creating tasks through a visual component, a complete task chain is composed of 3 plug-ins including an input plug-in, an output plug-in and a scheduling plug-in, and the visual operation comprises the following specific processes: selecting a plug-in on a designer page, pressing a left mouse button, dragging into an editing area, double-clicking the plug-in to edit detailed configuration information of the plug-in, then pointing one plug-in by the mouse, pressing the left button, not dragging to the other plug-in to carry out connection association; the manager is used for distributing target operation nodes for the tasks and sending the created tasks to the target operation nodes, and the target operation nodes perform data acquisition according to the configuration information of the input plug-ins, the output plug-ins and the scheduling plug-ins.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A visual ElasticSearch data acquisition method is applied to a user terminal in communication connection with a node server, and is characterized by comprising the following steps:

configuring the running node and the task strategy of the task;

2. The visual ElasticSearch data acquisition method of claim 1, wherein the input plug-in configuration file comprises data source information and a query script, and the data source information comprises IP and port information of a data source database.

3. The visual ElasticSearch data acquisition method of claim 2, wherein the output plug-in configuration file comprises data target information, the data target is an ElasticSearch server, and the data target information comprises an IP, a port, an index name and a type name of the ElasticSearch server.

4. The visual ElasticSearch data acquisition method of claim 1, wherein the scheduling plug-in configuration file comprises a scheduling type, a scheduling time and associated input and output.

5. The visual ElasticSearch data acquisition method of claim 1, wherein configuring the running node of the task comprises configuring name, IP and port information of a running node server.

6. The visual ElasticSearch data acquisition method of claim 1, wherein the task policy for configuring the task comprises configuring an execution mode, a target running node server, a task log level and a scheduling task, wherein the scheduling task is to associate the task with the target running node server.

7. The visual ElasticSearch data acquisition method of claim 1, wherein the input plug-in configuration file and the output plug-in configuration file are both saved as ktr files, and the scheduling plug-in configuration file is kjb file.

8. The visual ElasticSearch data acquisition method according to claim 3, wherein the specific steps of the target operation node executing data acquisition are as follows: the target operation node analyzes the scheduling plug-in configuration file, acquires the scheduling type, scheduling time and associated input and output information of the task, acquires the input plug-in configuration file and the output plug-in configuration file according to the acquired associated input and output information, acquires the data source information and the data target information by analyzing the input plug-in configuration file and the output plug-in configuration file, and executes data acquisition according to the scheduling type and the scheduling time according to the data source information and the data target information.

9. A visualized elastic search data acquisition device, which is applied to the visualized elastic search data acquisition method according to any one of claims 1 to 8, wherein the device comprises: the system comprises a designer and a manager, wherein the designer is used for creating tasks through a visual component, and the manager is used for configuring operation nodes and target operation nodes for distributing the tasks, and sending the tasks to the target operation nodes for execution.