CN111221744B - Data acquisition method and device and electronic equipment - Google Patents

Data acquisition method and device and electronic equipment Download PDF

Info

Publication number
CN111221744B
CN111221744B CN202010329205.XA CN202010329205A CN111221744B CN 111221744 B CN111221744 B CN 111221744B CN 202010329205 A CN202010329205 A CN 202010329205A CN 111221744 B CN111221744 B CN 111221744B
Authority
CN
China
Prior art keywords
task
data acquisition
script
target
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010329205.XA
Other languages
Chinese (zh)
Other versions
CN111221744A (en
Inventor
周少鹏
王滨
万里
毕志城
田启航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202010329205.XA priority Critical patent/CN111221744B/en
Publication of CN111221744A publication Critical patent/CN111221744A/en
Application granted granted Critical
Publication of CN111221744B publication Critical patent/CN111221744B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging

Abstract

The application provides a data acquisition method, a data acquisition device and electronic equipment, wherein the method comprises the following steps: when a data acquisition task creating instruction is detected, generating a target data acquisition script in an online editing mode based on task target information of data acquisition; generating a debugging task based on the target data acquisition script and the task target information; executing the debugging task, and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition; and when a data acquisition task execution instruction is detected, executing a data acquisition task aiming at the task target information based on the debugged target data acquisition script. The method can simplify the development process of the data acquisition script, shorten the development time and improve the development efficiency.

Description

Data acquisition method and device and electronic equipment
Technical Field
The present application relates to the field of computer data mining technologies, and in particular, to a data acquisition method and apparatus, and an electronic device.
Background
With the continuous development of network technology, the internet becomes a carrier of a large amount of information, and how to effectively extract and utilize the information becomes a great challenge.
The network data acquisition unit is a program for automatically downloading web pages, and can selectively access web pages and related links on the Internet according to a set grabbing target to acquire required information. The network data collector aims to capture a webpage related to a specific subject content and prepares data resources for subject-oriented user query.
Because the number of data acquisition tasks is often large, the resource consumption is large, the single device cannot meet the actual requirement, and the distributed deployment is convenient for effectively utilizing resources and expanding.
At present, a distributed data acquisition framework implementation scheme is to decouple each module of the distributed data acquisition framework through a message queue to achieve distributed deployment of a system; another implementation scheme of the distributed data acquisition framework is to realize module multiplexing of multiple data acquisition tasks by defining a plurality of responsibility chains, wherein each responsibility chain corresponds to one task name.
However, practice shows that in the current data acquisition implementation scheme, the development of the data acquisition script is completed offline, the data acquisition script cannot be developed quickly, the data acquisition requirement of a large amount of tasks in a short time cannot be met, and the online debugging function of the data acquisition script is not provided.
Disclosure of Invention
In view of this, the present application provides a data acquisition method, an apparatus and an electronic device.
Specifically, the method is realized through the following technical scheme:
according to a first aspect of embodiments of the present application, there is provided a data acquisition method, including:
when a data acquisition task creating instruction is detected, generating a target data acquisition script in an online editing mode based on task target information of data acquisition;
generating a debugging task based on the target data acquisition script and the task target information;
executing the debugging task, and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition;
and when a data acquisition task execution instruction is detected, executing a data acquisition task aiming at the task target information based on the debugged target data acquisition script.
According to a second aspect of embodiments of the present application, there is provided a data acquisition apparatus comprising:
the editing unit is used for generating a target data acquisition script in an online editing mode based on task target information of data acquisition when a data acquisition task creating instruction is detected;
the generating unit is used for generating a debugging task based on the target data acquisition script and the task target information;
the debugging unit is used for executing the debugging task and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition;
and the acquisition unit is used for executing the data acquisition task aiming at the task target information based on the debugged target data acquisition script when the data acquisition task execution instruction is detected.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus including:
a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is used for executing machine executable instructions to realize the data acquisition method.
According to the data acquisition method, when a data acquisition task creating instruction is detected, a target data acquisition script is generated in an online editing mode based on task target information of data acquisition, and a debugging task is generated based on the target data acquisition script and the task target information; executing the debugging task, and debugging the target data acquisition script based on the execution result so as to enable the debugged target data acquisition script to meet the preset condition; when a data acquisition task execution instruction is detected, a data acquisition task aiming at the task target information is executed based on the debugged target data acquisition script, the development flow of the data acquisition script is simplified, the development time is shortened, the development efficiency is improved, the large-scale data acquisition script development can be completed in a short time, and therefore efficient and rapid data capture is achieved.
Drawings
Fig. 1 is a schematic flow chart of a data acquisition method according to an exemplary embodiment of the present application;
FIG. 2 is a flowchart illustrating an exemplary implementation of a data collection task for executing target information of the task based on a debugged target data collection script according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a method for constructing a distributed data collection framework based on online scripting according to an exemplary embodiment of the present application;
FIG. 4 is a schematic diagram illustrating a process of data acquisition script development and data capture based on a distributed data acquisition framework for online script editing according to an exemplary embodiment of the present application;
FIG. 5 is a schematic diagram of a data acquisition device according to an exemplary embodiment of the present application;
fig. 6 is a schematic diagram of a hardware structure of the apparatus shown in fig. 5 according to an exemplary embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
In order to make the technical solutions provided in the embodiments of the present application better understood and make the above objects, features and advantages of the embodiments of the present application more comprehensible, the technical solutions in the embodiments of the present application are described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, a schematic flow chart of a data acquisition method according to an embodiment of the present disclosure is shown in fig. 1, where the data acquisition method may include the following steps:
and S100, when a data acquisition task creating instruction is detected, generating a target data acquisition script in an online editing mode based on task target information of data acquisition.
In the embodiment of the application, in order to simplify the development process of the data acquisition script, online script editing can be realized and the script development process can be simplified based on an online editing technology.
In the embodiment of the application, when a data acquisition task creation instruction is detected, a process of editing a data acquisition script on line may be triggered, and a data acquisition script (referred to as a target data acquisition script herein) corresponding to the data acquisition task is generated in an on-line editing manner based on task target information of data acquisition.
For example, the task target information may include a task name, address information of a task target, such as UR L (Uniform Resource L locator) or an IP (internet protocol) address of a target website that needs to perform data collection, and the like.
And S110, generating a debugging task based on the target data acquisition script and the task target information.
In the embodiment of the present application, in order to implement online debugging of the data acquisition script, when the target data acquisition script is generated in the manner described in step S100, a debugging task may be generated based on the target data acquisition script and task target information, and the debugging task is used for online debugging of the data acquisition script in a subsequent process.
And step S120, executing the debugging task, and debugging the target data acquisition script based on the execution result so as to enable the debugged target data acquisition script to meet the preset condition.
In this embodiment of the application, when the debugging task is generated in the manner described in step S110, the debugging of the target data acquisition script may be implemented by executing the debugging task, that is, executing the target data acquisition script, capturing data from the target website based on the task object, and debugging the target data acquisition script based on the data capturing result, so that the debugged target data acquisition script satisfies the preset condition, that is, the data captured from the target website by using the debugged target data acquisition script satisfies the requirement.
For example, when the debugging task is generated in the manner described in step S110, the debugging task may be issued to the message queue, the debugging task may be obtained from the message queue by the script execution engine, and the obtained debugging task may be executed.
For example, the debugging tasks in the message queue may be scheduled based on the priority of each debugging task in the message queue.
And S130, when a data acquisition task execution instruction is detected, executing a data acquisition task aiming at the task target information based on the debugged target data acquisition script.
In the embodiment of the application, when the debugging of the target data acquisition script is completed, the data acquisition task aiming at the task target information can be executed based on the debugged target data acquisition script.
Therefore, in the method flow shown in fig. 1, through online editing and debugging of the data acquisition script, the development flow of the data acquisition script is simplified, the development time is shortened, the development efficiency is improved, and large-scale data acquisition script development can be completed in a short time, so that efficient and rapid data capture is realized.
In one embodiment of the present application, in step S100, generating a target data collection script in an online editing manner based on task target information of data collection includes:
outputting a script online editing interface based on task target information acquired by data;
and generating a target data acquisition script based on a script editing instruction input through a script online editing interface.
For example, when a data collection task creation instruction is detected, a script online editing interface may be output based on task target information of data collection, the task target information may be displayed in the script online editing interface, and a user may perform online editing of a data collection script in a manner of inputting a script editing instruction through the script online editing interface.
Accordingly, the target data collection script may be generated based on a script editing instruction input through the script online editing interface.
In one example, since the main difference between different data collection scripts is usually reflected in the difference of data collection rules, where the data collection rules are used to extract required data from a web page, including regular expressions, element class names, element IDs, and the like, in order to improve the efficiency of online script editing, when online script editing is performed, a data collection script template may be displayed in a script online editing interface, and the data collection script template may be edited based on a received editing instruction for the data collection script template, such as an editing instruction for a script collection rule, to generate a corresponding data collection script, such as a target data collection script.
In an embodiment of the present application, as shown in fig. 2, the executing of the data collection task for the task target information based on the debugged target data collection script may be implemented by the following steps:
step S131, issuing the task to be executed for the task target information to a message queue, and storing task information of the task to be executed.
Step S132, when the task to be executed aiming at the task target information is scheduled, based on the task information of the task to be executed, the debugged target data acquisition script is called to capture data.
For example, when the scheduling of the target data collection script is completed in the manner described in step S120, a task to be executed for the task target information may be issued to the message queue, and the task information of the task to be executed may be stored.
For example, the task information may include task target information and task associated data collection script information (i.e., identification information of a data collection script used to execute the task, such as a script number), and the like.
For example, the to-be-executed task and the task information may be associated by a task name.
In one example, the debugging task and the task to be executed (including the instant acquisition task or/and the timing acquisition task) may be respectively issued to different message queues.
For example, the debugging task is issued to a debugging task message queue (which may be referred to as a first message queue), and the task to be executed is issued to a task to be executed message queue (which may be referred to as a second message queue).
In another example, the debugging task and the task to be executed may be issued to the same message queue.
Illustratively, the debugging task has a higher priority than the task to be executed in the same message queue.
Illustratively, considering that the webpage operation is an I/O (Input/output) operation, the response time is long, and therefore, in order to improve the data fetching efficiency, the data fetching may be implemented in an asynchronous fetching manner, so as to save the data fetching time.
Correspondingly, when the task needs to be executed, the task to be executed in the message queue can be scheduled by adopting an asynchronous calling mode based on the priority of each task to be executed in the message queue.
And calling the debugged data acquisition script to capture data for any scheduled task to be executed based on the task information.
The task information may include, in addition to the task target and the data collection script information related to the task, information such as a data collection rate and timing information (for the timing task) of the timing collection task.
In an example, in step S132, after invoking the debugged data acquisition script to perform data capture, the method may further include:
and filtering the captured data, performing data format standardization processing on the filtered data, and storing the processed data.
For example, in order to simplify the difficulty of data analysis, after the debugged data acquisition script is called to capture data, the captured data may be filtered to filter out dirty data in the captured data.
After the debugged data acquisition script is called to capture data, the captured data can be filtered, and the captured data can be subjected to data format standardization, so that the data analysis difficulty is further simplified, and the data analysis efficiency is improved.
In order to enable those skilled in the art to better understand the technical solutions provided by the embodiments of the present application, the technical solutions provided by the embodiments of the present application are described below with reference to specific examples.
Taking the distributed data acquisition framework system based on online script editing as an example, the framework system will be explained first.
In this embodiment, the distributed data collection framework system based on online script editing may include:
the front-end web task console module comprises functions of task creation, task detail display, task state control (running, stopping, debugging, editing and timing), speed control, running data statistics and the like;
the online editing and debugging module comprises an online editor (for compiling a data acquisition script), a dynamic debugger (for dynamically debugging the data acquisition script), a webpage element CSS (Cascading Style Sheets) selector (for compiling auxiliary data acquisition rules);
the result display module is used for displaying data acquisition task data capture results;
and the message queue module is used for data transmission among the modules of the system, helping the modules to decouple from each other and realizing distribution.
Illustratively, the message queue supports message middleware such as redis, rabbitmq or kafka.
The script execution engine module is used for executing script codes compiled by the online editor and returning an execution result;
the task scheduling module is used for determining task priority and performing asynchronous scheduling, timing scheduling, exception handling and the like of the tasks according to the priority, so that the concurrency characteristic of the system is improved;
the data capture module is used for asynchronously capturing target data, and the response time is longer because the webpage operation is I/O operation, so that the data capture time can be greatly saved by asynchronous capture;
the data processing module is used for filtering dirty data in the captured data and simplifying the data analysis difficulty;
the data storage module is used for storing task data, operation data, result data and capture data;
illustratively, the data storage module supports databases such as mysql, sqlite, mongodb, redis, or elastic search;
and the system monitoring module is used for monitoring the real-time running condition of the task and performing abnormal alarm and processing.
The following describes a method for constructing a distributed data acquisition framework based on online script editing.
In this embodiment, the method for constructing a distributed data collection framework based on online script editing may include the following steps:
step 1, constructing a front-end web task management and control interface, comprising the following steps: the method comprises the following functions of task new construction, task detail display, task state control (operation, stop, debugging, editing and timing), speed control, operation data statistics and the like;
for example, the task management and control interface in step 1 may include functions of task new creation, task detail display, task state control, and the like, and may further include: and the functions of task grouping control, task deletion, task detail display and the like are realized, so that the controllability of the data acquisition task is optimized.
Step 2, constructing a front-end data acquisition script online editing and debugging page, comprising: an online editor (which may also be referred to as an online code editor for compiling data acquisition scripts), a dynamic debugger (for dynamically debugging data acquisition scripts), a webpage element CSS selector (for assisting parsing rule compilation);
for example, the online editing and debugging interface in step 2 may include, in addition to an online editor, a dynamic debugger, and a webpage element CSS selector, a micro browser (for displaying a captured target webpage and assisting development of a data acquisition script), a data stream displayer (for displaying a data stream during running of a written data acquisition script), and the like, so that a real-time process of data acquisition may be better known in the data acquisition process, and further, a progress of data acquisition may be known in time and an abnormal situation occurring in the data acquisition process may be discovered.
Step 3, constructing a front-end result display page for displaying data acquisition task data capture results;
illustratively, the result presentation page in step 3 may include multi-dimensional task data besides the data capture result, and the result data is statistically analyzed to optimize the task result presentation effect.
Step 4, constructing a message queue module for data transmission among all modules of the system, helping all modules to decouple mutually and realizing distribution;
step 5, constructing a script execution engine for executing the script code compiled by the online editor and returning an execution result;
step 6, constructing a task scheduling module, which is used for determining task priority, performing asynchronous scheduling, timing scheduling, exception handling and the like of the tasks according to the priority, and improving the concurrency characteristic of the system;
for example, tasks in a message queue may be partitioned into multiple different sub-queues based on task attributes.
For example, the real-time collection task, the timing collection task, and the debugging task may be divided into different sub-queues, where the priority of the debugging task is the highest, the priority of the real-time collection task is the next highest, and the priority of the timing collection task is the lowest.
The priority of each debugging task can be determined according to the time sequence of tasks issued to the message queue, for example, the priority of the debugging task issued to the message queue first is higher than the priority of the debugging task issued to the message queue later, or determined according to other strategies.
The priority of each instant acquisition task may be determined according to the time sequence of the tasks issued to the message queue, for example, the priority of the instant acquisition task issued to the message queue first is higher than the priority of the instant acquisition task issued to the message queue later, or determined according to other strategies.
The priority of each timing acquisition task can be determined according to timing information, for example, the earlier the timing time arrives, the higher the priority of the timing acquisition task.
The task scheduling module can adopt a multi-process or/and multi-thread mode to realize asynchronous scheduling of each task in the message queue.
And the exception handling of the task scheduling module mainly comprises the step of scheduling the task which is unsuccessfully scheduled again based on the preset maximum scheduling times when the task scheduling fails until the task scheduling succeeds or the task scheduling times reach the preset maximum scheduling times.
Illustratively, the task scheduling module in step 6 may include functions such as task speed scheduling, in addition to task priority scheduling, timing scheduling, exception handling, and the like.
Step 7, constructing a data capture module for asynchronously capturing target data, wherein the response time is long due to the fact that the webpage operation is an I/O operation, and the data capture time can be greatly saved by the asynchronous capture;
step 8, constructing a data processing module for filtering dirty data in the captured data and simplifying the data analysis difficulty;
illustratively, the data parsing module in step 8 may include functions such as data format normalization, in addition to the data filtering function.
Step 9, constructing a data storage module for storing task data, operation data, result data and capture data;
and step 10, constructing a system monitoring module for monitoring the real-time running condition of the task and performing abnormal alarm and processing.
In one example, as shown in fig. 3, the method for constructing a distributed data collection framework based on online script editing may include the following steps:
step 1, constructing a front-end web task management and control interface based on a web front-end technology, wherein the method comprises the following steps: the method comprises the following functions of task new construction, task detail display, task state control (operation, stop, debugging, editing and timing), speed control, operation data statistics and the like.
Step 2, constructing a front-end data acquisition script online editing and debugging page based on rich text technology and web front-end technology, wherein the front-end data acquisition script online editing and debugging page comprises an online code editor (for compiling data acquisition scripts), a dynamic debugger (for dynamically debugging data acquisition scripts) and a webpage element CSS selector (for compiling auxiliary analysis rules);
step 3, constructing a front-end result display page based on the web front-end technology, wherein the front-end result display page is used for displaying data capture results of the data acquisition tasks;
step 4, constructing a message queue module based on various message middleware, wherein the message queue module is used for data transmission among all modules of the system, helping all modules to decouple mutually and realizing distribution;
step 5, constructing a script execution engine based on the virtualization technology, and executing the script codes compiled by the online editor and returning an execution result;
step 6, constructing a task scheduling module based on an asynchronous scheduling technology, and determining task priority, performing asynchronous scheduling, timing scheduling, exception handling and the like of the tasks according to the priority, and improving the concurrency characteristic of the system;
step 7, constructing a data capture module based on multi-process, multi-thread and coroutine technologies, wherein the data capture module is used for asynchronously capturing target data, and the asynchronous capture can greatly save data capture time due to the fact that webpage operation is I/O operation and the response time is long;
and 8, constructing a data processing module based on a data cleaning technology, and filtering dirty data in the captured data to simplify the data analysis difficulty.
Step 9, constructing a data storage module based on the various databases, wherein the data storage module is used for storing task data, operation data, result data and capture data;
and step 10, constructing a system monitoring module for monitoring the real-time running condition of the task and performing abnormal alarm and processing.
The following describes in detail the process of developing a data acquisition script and capturing data by using a system device deployed with a distributed data acquisition framework based on online script editing, where a specific implementation process of the process is shown in fig. 4, and may include the following steps:
1. assume that the target platform is: www.test.com, respectively;
2. the system provides a web console, clicks the web console for 'new task', fills in a task name for 'Test data acquisition', starts UR L for 'www.test.com', clicks 'new', and creates a task;
3. in the writing process, clicking 'operation', transmitting a debugging task into a message queue, executing a script through a script execution engine, returning a result to carry out dynamic debugging of the script, starting analysis rule assistance by clicking 'CSS selector', observing a capturing url of a data acquisition script by clicking 'stream', and observing a capturing target page by clicking 'html';
4. the script execution engine acquires a debugging task from the message queue, executes a data acquisition script and returns a result;
5. after the data acquisition script is compiled, clicking 'save', and storing the compiled data acquisition script;
6. returning to a web console, clicking the task state of the Test data collector, selecting 'running' (5 optional states in total), clicking 'rate control', and selecting '3 requests per second';
7. after the task parameters are configured, clicking 'start', issuing the task to be executed to a message queue, and storing the task information to a data storage module;
by way of example, the task information may include, but is not limited to, task goals, data collection rates, timing information for timed collection tasks (for timed tasks), and task-associated data collection scripts (which may be identified by script numbers), among others.
8. The asynchronous task scheduling module acquires an execution task, determines a priority and schedules the execution task;
9. the data capturing module calls a data acquisition script to capture data according to the task information;
10. the data processing module is used for cleaning and analyzing data, extracting required data and storing the data in the data storage module;
11. the operation monitoring module acquires task operation state data from the data storage module and performs display and exception handling;
for example, when the network access amount is large, access delay may occur, and further, the access target page may time out, and at this time, an abnormal retry may be performed until the access target page is successful, or the number of times of timeout for re-accessing the target page reaches the maximum retry number.
12. And the result display module acquires the data capture result from the data storage module and performs display and statistical analysis.
In the embodiment of the application, when a data acquisition task creating instruction is detected, a target data acquisition script is generated in an online editing mode based on task target information of data acquisition, and a debugging task is generated based on the target data acquisition script and the task target information; executing the debugging task, and debugging the target data acquisition script based on the execution result so as to enable the debugged target data acquisition script to meet the preset condition; when a data acquisition task execution instruction is detected, a data acquisition task aiming at the task target information is executed based on the debugged target data acquisition script, the development flow of the data acquisition script is simplified, the development time is shortened, the development efficiency is improved, the large-scale data acquisition script development can be completed in a short time, and therefore efficient and rapid data capture is achieved.
The methods provided herein are described above. The following describes the apparatus provided in the present application:
referring to fig. 5, a schematic structural diagram of a data acquisition device according to an embodiment of the present application is shown in fig. 5, where the data acquisition script may include:
the editing unit is used for generating a target data acquisition script in an online editing mode based on task target information of data acquisition when a data acquisition task creating instruction is detected;
the generating unit is used for generating a debugging task based on the target data acquisition script and the task target information;
the debugging unit is used for executing the debugging task and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition;
and the acquisition unit is used for executing the data acquisition task aiming at the task target information based on the debugged target data acquisition script when the data acquisition task execution instruction is detected.
In one embodiment, the editing unit generates a target data collection script in an online editing manner based on task target information of data collection, and includes:
outputting a script online editing interface based on task target information acquired by data;
and generating a target data acquisition script based on a script editing instruction input through the script online editing interface.
In one embodiment, the script online editing interface comprises a data acquisition script template;
the editing unit generates a target data acquisition script based on a script editing instruction input through the script online editing interface, and the target data acquisition script comprises the following steps:
and editing the data acquisition script template based on the received editing instruction aiming at the data acquisition script template so as to generate the target data acquisition script.
In one embodiment, the acquiring unit executes a data acquisition task for the task target information based on the debugged target data acquisition script, including:
sending the task to be executed aiming at the task target information to a message queue, and storing task information of the task to be executed, wherein the task information comprises task target information and task-associated data acquisition script information, and each task to be executed in the message queue is scheduled in an asynchronous calling mode according to priority;
and when a task to be executed aiming at the task target information is scheduled, calling a debugged target data acquisition script to capture data based on the task information of the task to be executed.
In one embodiment, after the acquiring unit executes the data acquisition task based on the debugged target data acquisition script, the method further includes:
and filtering the captured data, performing data format standardization processing on the filtered data, and storing the processed data.
Correspondingly, the application also provides a hardware structure of the device shown in fig. 5. Referring to fig. 6, the hardware structure may include: a processor and a machine-readable storage medium having stored thereon machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to implement the methods disclosed in the above examples of the present application.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be, for example, any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (9)

1. A method of data acquisition, comprising:
when a data acquisition task creating instruction is detected, generating a target data acquisition script in an online editing mode based on task target information of data acquisition;
generating a debugging task based on the target data acquisition script and the task target information;
executing the debugging task, and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition;
when a data acquisition task execution instruction is detected, executing a data acquisition task aiming at the task target information based on the debugged target data acquisition script; the task target information based on data acquisition generates a target data acquisition script in an online editing mode, and the method comprises the following steps:
outputting a script online editing interface based on task target information acquired by data;
and generating a target data acquisition script based on a script editing instruction input through the script online editing interface.
2. The method of claim 1, wherein the script online editing interface comprises a data collection script template;
the generating of the target data acquisition script based on the script editing instruction input through the script online editing interface comprises the following steps:
and editing the data acquisition script template based on the received editing instruction aiming at the data acquisition script template so as to generate the target data acquisition script.
3. The method of claim 1, wherein performing the data collection task for the task target information based on the debugged target data collection script comprises:
sending the task to be executed aiming at the task target information to a message queue, and storing task information of the task to be executed, wherein the task information comprises task target information and task-associated data acquisition script information, and each task to be executed in the message queue is scheduled in an asynchronous calling mode according to priority;
and when a task to be executed aiming at the task target information is scheduled, calling a debugged target data acquisition script to capture data based on the task information of the task to be executed.
4. The method of claim 1, wherein after performing the data collection task based on the debugged target data collection script, further comprising:
and filtering the captured data, performing data format standardization processing on the filtered data, and storing the processed data.
5. A data acquisition device, comprising:
the editing unit is used for generating a target data acquisition script in an online editing mode based on task target information of data acquisition when a data acquisition task creating instruction is detected;
the generating unit is used for generating a debugging task based on the target data acquisition script and the task target information;
the debugging unit is used for executing the debugging task and debugging the target data acquisition script based on an execution result so as to enable the debugged target data acquisition script to meet a preset condition;
the acquisition unit is used for executing a data acquisition task aiming at the task target information based on a debugged target data acquisition script when a data acquisition task execution instruction is detected;
the editing unit generates a target data acquisition script based on the task target information of data acquisition in an online editing mode, and the method comprises the following steps:
outputting a script online editing interface based on task target information acquired by data;
and generating a target data acquisition script based on a script editing instruction input through the script online editing interface.
6. The device of claim 5, wherein the script online editing interface comprises a data acquisition script template;
the editing unit generates a target data acquisition script based on a script editing instruction input through the script online editing interface, and the target data acquisition script comprises the following steps:
and editing the data acquisition script template based on the received editing instruction aiming at the data acquisition script template so as to generate the target data acquisition script.
7. The apparatus of claim 5, wherein the acquisition unit executes a data acquisition task for the task target information based on the debugged target data acquisition script, and wherein the data acquisition task comprises:
sending the task to be executed aiming at the task target information to a message queue, and storing task information of the task to be executed, wherein the task information comprises task target information and task-associated data acquisition script information, and each task to be executed in the message queue is scheduled in an asynchronous calling mode according to priority;
and when a task to be executed aiming at the task target information is scheduled, calling a debugged target data acquisition script to capture data based on the task information of the task to be executed.
8. The apparatus of claim 5, wherein after the acquisition unit executes the data acquisition task based on the debugged target data acquisition script, the apparatus further comprises:
and filtering the captured data, performing data format standardization processing on the filtered data, and storing the processed data.
9. An electronic device, comprising:
a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to implement the method steps of any of claims 1-4.
CN202010329205.XA 2020-04-23 2020-04-23 Data acquisition method and device and electronic equipment Active CN111221744B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010329205.XA CN111221744B (en) 2020-04-23 2020-04-23 Data acquisition method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010329205.XA CN111221744B (en) 2020-04-23 2020-04-23 Data acquisition method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111221744A CN111221744A (en) 2020-06-02
CN111221744B true CN111221744B (en) 2020-08-04

Family

ID=70831707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010329205.XA Active CN111221744B (en) 2020-04-23 2020-04-23 Data acquisition method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111221744B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112764908B (en) * 2021-01-26 2024-01-26 北京鼎普科技股份有限公司 Network data acquisition processing method and device and electronic equipment
CN115422305A (en) * 2022-11-04 2022-12-02 暨南大学 Network social media data management method and device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102801559B (en) * 2012-08-03 2015-02-18 南京富士通南大软件技术有限公司 Intelligent local area network data collecting method
WO2015063259A1 (en) * 2013-11-01 2015-05-07 Kapow Technologies Determining web page processing state
CN106354723B (en) * 2015-07-15 2019-06-04 北京中电普华信息技术有限公司 A kind of on-line data acquisition system
CN108256340B (en) * 2017-12-22 2020-06-12 中国平安人寿保险股份有限公司 Data acquisition method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN111221744A (en) 2020-06-02

Similar Documents

Publication Publication Date Title
JP5990605B2 (en) Method and system for acquiring AJAX web page content
US5898873A (en) System and method for visualizing system operation trace chronologies
CN105243159A (en) Visual script editor-based distributed web crawler system
US20100318852A1 (en) Visualization tool for system tracing infrastructure events
CN111221744B (en) Data acquisition method and device and electronic equipment
US20100223446A1 (en) Contextual tracing
CN105138312B (en) A kind of table generation method and device
CN109783161B (en) Method and device for determining running information of application program in iOS system
US8151250B2 (en) Program trace method using a relational database
CN103455600B (en) A kind of video URL grasping means, device and server apparatus
CN110750458A (en) Big data platform testing method and device, readable storage medium and electronic equipment
Schulz Extracting critical path graphs from MPI applications
EP3230869A1 (en) Separating test verifications from test executions
US20180143897A1 (en) Determining idle testing periods
WO2017164856A1 (en) Comparable user interface object identifications
CN112379993A (en) Robot process automation processing system, method and device
CN112818201A (en) Network data acquisition method and device, computer equipment and storage medium
CN112052073A (en) Script performance analysis method and device, readable storage medium and electronic equipment
US11119899B2 (en) Determining potential test actions
CN111026604B (en) Log file analysis method and device
CN109062784B (en) Interface parameter constraint code entry positioning method and system
US20140245159A1 (en) Transport script generation based on a user interface script
CN112685370B (en) Log collection method, device, equipment and medium
CN109062785B (en) Interface parameter constraint code positioning method and system
CN111274466A (en) Non-structural data acquisition system and method for overseas server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant