CN116383002A - Data acquisition system and method - Google Patents

Data acquisition system and method Download PDF

Info

Publication number
CN116383002A
CN116383002A CN202310136852.2A CN202310136852A CN116383002A CN 116383002 A CN116383002 A CN 116383002A CN 202310136852 A CN202310136852 A CN 202310136852A CN 116383002 A CN116383002 A CN 116383002A
Authority
CN
China
Prior art keywords
data
acquisition
data acquisition
time
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310136852.2A
Other languages
Chinese (zh)
Inventor
张钧涛
张猛
陈艺方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yishang Huiping Network Technology Co ltd
Original Assignee
Beijing Yishang Huiping Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yishang Huiping Network Technology Co ltd filed Critical Beijing Yishang Huiping Network Technology Co ltd
Priority to CN202310136852.2A priority Critical patent/CN116383002A/en
Publication of CN116383002A publication Critical patent/CN116383002A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a data acquisition system and a data acquisition method. The system comprises a data source end, wherein different storage media and/or different types of data are subjected to unified processing; the data acquisition center is used for constructing a data acquisition channel to be in butt joint with a data source end, wherein the data acquisition center is bound to a main node of a distributed file system cluster and is used for managing data acquisition tasks of different acquisition scenes; further comprises: and the acquisition channel is used as a unified channel for data flow. By the method and the device, abstract classification is carried out on specific scenes aiming at data acquisition, and different scenes are integrated and uniformly managed, so that manual intervention is reduced.

Description

Data acquisition system and method
Technical Field
The application relates to the field of big data processing, in particular to a data acquisition system and method.
Background
There are many technical solutions for data collection of different storage media and different frequencies in the big data field. In some schemes, a stable independent data acquisition system is constructed according to different data taking scenes, and a typical example is acquisition of real-time log data. In some schemes, a task scheduling system manages data acquisition tasks by writing data acquisition scripts, and typical examples are database data synchronization and file acquisition. In addition, in some schemes, a third party tool is adopted to interface with a large data cluster by adopting different components for different scenes, and typical examples are a data, a flime and the like.
In the related art, special system deployment is required for specific scenes, and in the related art, different data acquisition technologies may be adopted for different data acquisition scenes, so that the management of data acquisition tasks is inconvenient. The above forms can be understood as that the system needs to manually set the planning task of data acquisition according to different statistics frequency and modes, and the number of manual intervention parts is excessive.
Aiming at the problem that too much manual intervention is needed in different data acquisition scenes in the related technology, no effective solution is proposed at present.
Disclosure of Invention
The primary objective of the present application is to provide a data acquisition system to solve the problem that too much human intervention is required for different data acquisition scenarios.
To achieve the above object, according to one aspect of the present application, a data acquisition system is provided.
The data acquisition system according to the present application comprises:
the data source end performs unified processing on different storage media and/or different types of data;
the data acquisition center is used for constructing a data acquisition channel to be in butt joint with a data source end, wherein the data acquisition center is bound to a main node of a distributed file system cluster and is used for managing data acquisition tasks of different acquisition scenes;
further comprises: and the acquisition channel is used as a unified channel for data flow.
In some embodiments, the system further comprises:
registering acquisition tasks with the data acquisition center through the data source end, reporting acquisition frequency information, and simultaneously receiving the acquisition tasks registered with the data source end at the data acquisition center and adding the acquisition tasks into a task list of each time period according to the reported acquisition frequency information;
and/or the number of the groups of groups,
the data acquisition center performs cluster state inspection, and at the same time, the data source end performs inspection and the collectable state update on the collected data;
and/or the number of the groups of groups,
and initiating a channel construction request through the data acquisition center, reporting data at the data source terminal after constructing the channel, and performing data circulation and processing through the data source terminal.
In some embodiments, the data source comprises: the system comprises a data receiving module, a message queue reading module, a database acquisition module, a file acquisition module, an interface calling module and a real-time data caching module,
starting the data receiving module to receive real-time data according to a preset data acquisition mode, and writing the real-time data into the real-time data caching module;
and/or the number of the groups of groups,
starting a message queue reading module according to a preset data acquisition mode, consuming data by a butt joint message queue, and writing the data into the real-time data caching module;
and/or the number of the groups of groups,
according to a preset data acquisition mode, the database acquisition module is adopted to read a table of a database;
and/or the number of the groups of groups,
according to a preset data acquisition mode, reading a file under a designated directory by adopting the file acquisition module;
and/or the number of the groups of groups,
and according to a preset data acquisition mode, an interface calling module is started, and the real-time data caching module is written.
The real-time data caching module defaults to a time window form, and puts data into different time windows according to the effectiveness of the data.
In some embodiments, the preset data collection manner at least includes one of the following:
the method comprises the steps of collecting port data of a server, collecting buried point log data, collecting message queue data, collecting database data, collecting file data and collecting interface data.
In some embodiments, the data source further comprises:
the state management module is used for managing whether the current data source end data/program state is normally ready;
the data reporting module is used for interfacing the real-time caching module, the file acquisition module and the database acquisition module are used for taking out data from the corresponding modules and sending the data to the data acquisition channel.
In some embodiments, the data acquisition center comprises:
the registration center is used for passively receiving the task report of the data source end, managing the state information of the data source and transmitting the configuration data of the acquired task to the task management center;
the task management center is used for generating different task lists according to time, and adding corresponding tasks into the different task lists according to task configuration data issued by the registration center;
the heartbeat communication module is used for interactive communication with the data source end, judging the state of the data source end and reporting the state to the registration center;
the clock management module is used for managing the matching of the natural time and the task list, acquiring a corresponding task list, and issuing task analysis to the channel management center;
and the channel management center is used for creating a permanent or time-limited data acquisition channel according to the type of the task and managing the data acquisition channel.
In some embodiments, the data acquisition center is further to: according to the time management task list, when each time reaches a certain trigger time point, traversing the task list corresponding to the time point, and establishing a data acquisition channel to receive data;
and after the data of the data channel is received, starting the data analysis and cleaning module to process, and reporting the processed data to the distributed file system cluster.
In some embodiments, the data acquisition type of the data acquisition center includes at least one of:
strong real-time data acquisition, and correspondingly establishing a permanent data acquisition channel;
near real-time data acquisition, and periodically establishing a data acquisition channel at preset time intervals with minutes as frequency;
acquiring data according to the hours, and periodically establishing a data acquisition channel at preset time intervals with the hours as the frequency;
data are acquired according to days, and a data acquisition channel is periodically established at preset time intervals with the frequency of days;
and acquiring data according to the week, and periodically establishing a data acquisition channel at preset time intervals with the frequency in the week.
In some embodiments, the system further comprises: the task monitoring and early warning module is used for monitoring and finding out the data circulation problem and giving an alarm in time.
To achieve the above object, according to another aspect of the present application, there is provided a data acquisition method applied to the system as described above.
The data acquisition method comprises the following steps:
and after registering the acquisition task, reporting acquisition frequency information, and starting the data acquisition center to establish the acquisition channel for acquiring the data in the data source end.
According to the data acquisition system and method, different storage media and/or different types of data are subjected to unified processing through the data source end; the data acquisition center is used for constructing a data acquisition channel to be in butt joint with a data source end, wherein the data acquisition center is bound to a main node of a distributed file system cluster and is used for managing data acquisition tasks of different acquisition scenes; further comprises: and the acquisition channel is used as a unified channel for data flow. And carrying out unified processing on different data acquisition scenes through the data source end, and processing by the same data acquisition architecture. And the management of the data acquisition task is realized through the data acquisition center, and the passive received data in the past is changed into active acquired data. In addition, the method can be used for monitoring and state management of the data source, and the related problems caused by unsmooth upstream and downstream communication are avoided. Specifically, the "communication between upstream and downstream is not smooth" mainly includes that the database table structure is changed but the acquisition party is not notified, the database migration is not notified, the interface iteration data structure is not notified, and the like, so that the occurrence of the above-mentioned situations is avoided through the monitoring and state management of the data source.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, are included to provide a further understanding of the application and to provide a further understanding of the application with regard to the other features, objects and advantages of the application. The drawings of the illustrative embodiments of the present application and their descriptions are for the purpose of illustrating the present application and are not to be construed as unduly limiting the present application. In the drawings:
FIG. 1 is a schematic diagram of the basic architecture of a data acquisition system according to an embodiment of the present application;
FIG. 2 is a flow chart of data acquisition of a data acquisition system according to an embodiment of the present application;
FIG. 3 is a schematic diagram of the architecture of a data source of a data acquisition system according to an embodiment of the present application;
FIG. 4 is a schematic diagram of the architecture of a data acquisition center of a data acquisition system according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a data acquisition system in establishing a data acquisition channel according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data acquisition system according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In the present application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal" and the like indicate an azimuth or a positional relationship based on that shown in the drawings. These terms are used primarily to better describe the present application and its embodiments and are not intended to limit the indicated device, element or component to a particular orientation or to be constructed and operated in a particular orientation.
Also, some of the terms described above may be used to indicate other meanings in addition to orientation or positional relationships, for example, the term "upper" may also be used to indicate some sort of attachment or connection in some cases. The specific meaning of these terms in this application will be understood by those of ordinary skill in the art as appropriate.
Furthermore, the terms "mounted," "configured," "provided," "connected," "coupled," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; may be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements, or components. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art as the case may be.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 is a schematic diagram of a basic architecture of a data acquisition system according to an embodiment of the present application, where the whole system is abstracted into three modules of a data source end, an acquisition channel and a data acquisition center.
Specifically, the collection center is bound at HDFS (Hadoop Distributed File System), and is mainly responsible for managing data collection tasks, the collection center manages task lists according to time points, processes all tasks in the task lists when the time reaches a certain time point, actively builds a data collection channel to butt against a data source end, receives data, performs unified preliminary cleaning and analysis, and stores the data to a designated position of the cluster.
The data acquisition channel is a channel for data flow, the acquisition center actively initiates a butt joint data source end, the data source end transmits data to the acquisition center through the acquisition channel, wherein the data flow is formatted into a json array format, and the first element of the array is metadata of the data so as to be convenient for the acquisition center to analyze. The data source terminal is mainly responsible for unifying data of different media and different forms.
Fig. 2 is a schematic flow chart of data collection of a data collection system according to an embodiment of the present application, including a data source end and a data collection center, where a collection task is registered with the data collection center by the data source end and collection frequency information is reported, and meanwhile, the collection task registered with the data source end is received at the data collection center and added into a task list of each time period according to the reported collection frequency information.
In some embodiments, cluster status inspection is performed by the data acquisition center while the acquired data is inspected and the collectable status updated at the data source.
In some embodiments, a channel construction request is initiated through the data acquisition center, meanwhile, after the channel is constructed, data is reported at the data source end, and data circulation and processing are performed through the data source end.
Fig. 3 is a schematic architecture diagram of a data source of a data acquisition system according to an embodiment of the present application. Data acquisition is based on different data storage media, different data acquisition frequencies, different data acquisition modes, and is roughly divided into the following categories:
the corresponding acquisition type of the acquisition of the port data of the server is real-time and non-reserved data.
It should be noted that whether to persist refers to whether the data source location is persisted. For example, the data interception of the server port is equivalent to that after the data interception of the access port, the data normally flows into the corresponding process, and the port position is not reserved. For another example, log data is typically recorded into a database, and because log data does not involve a censoring operation, and is only used for analyzing queries, the optimal solution is to directly write into the HDFS cluster, so the data source of the log data does not have data retention. In other cases, such as database data, file data is left in the data source portion.
The corresponding acquisition type of the buried point log data is real-time and non-reserved data.
The corresponding collection type of the collection of the data aiming at the message queue is real-time and reserved data.
The corresponding collection type of the database data is non-real-time and reserved data.
The corresponding collection type of the file data is non-real-time and reserved data.
The corresponding acquisition type for the acquisition of the interface data is near real-time and passively acquired data. It will be appreciated that another real-time type is the need for a service, e.g. the need for a data collector is acquired every 1 minute.
And starting a data receiving module for the two conditions of the acquisition of the port data of the server and the acquisition of the buried point log data, exposing an interface for a port monitoring program or a service program, and writing the interface into a real-time data caching module.
And for the condition of collecting the data of the message queue, starting a message queue reading module, consuming the data for the message queue, and writing the data into a real-time data caching module.
And for the acquisition condition of database data, a database acquisition module is adopted to directly read the table of the database.
And (3) under the condition of collecting file data, a file collecting module is adopted to read the file under the appointed directory.
And for the condition of collecting the interface data, starting the interface calling module, writing a data collecting script according to the docking rule with the third party interface, and uniformly writing the data into the real-time data caching module.
For the caching of real-time data, a time window mode is adopted by default, the data are put into different time windows according to the effectiveness of the data, the acquisition frequency is reported to an acquisition center according to the size of the time window, and a data acquisition channel is established for receiving the data by the decision of the acquisition center. For example, the effective requirement is not strict, the default 1 minute is a time window, and acquiring data once in 1 minute for a large data center is approximately understood as acquiring data in real time. If the validity requirement is strong, the time window is set to 0. At this time, the acquisition center establishes a permanent data acquisition channel to receive real-time data according to the setting.
Preferably, the system further comprises a state management module for managing whether the current data source end data/program state is normally ready, reporting related information of self data acquisition to a data acquisition center when the system is started for the first time, and returning whether a channel can be created or not when the channel is established each time.
Preferably, the system further comprises a data reporting module, which is used for interfacing with the real-time caching module, the file acquisition module and the database acquisition module, and is responsible for taking out data from the corresponding module and sending the data to the data acquisition channel according to the task property of the data source when the data acquisition channel is communicated.
Fig. 4 is a schematic architecture diagram of a data collection center of a data collection system according to an embodiment of the present application, where the task list is added with the present data collection task, and the data collection is defined as the following according to experience of previous data collection:
and (5) carrying out strong real-time data acquisition, and correspondingly establishing a permanent data acquisition channel.
Near real-time data acquisition, and periodically establishes a data acquisition channel at preset time intervals with a frequency of minutes, for example, the near real-time data acquisition needs to establish the data acquisition channel every 1 minute.
The data acquisition channels are periodically established at preset time intervals with the frequency of hours according to the data acquisition of the hours, for example, the data acquisition channels are established once every hour according to the data acquisition requirement of days.
Data are acquired according to days, and a data acquisition channel is periodically established at preset time intervals with the frequency of days;
the data acquisition channels are periodically established at preset time intervals with the frequency in the week, for example, the data acquisition channels are established at a certain time in the week.
The data acquisition center manages task lists according to time, and traverses task lists corresponding to a certain trigger time point when the time reaches the time point every time, and establishes a data acquisition channel to receive data.
In specific implementation, the data acquisition center comprises a registration center for passively receiving task report of the data source end, managing state information of the data source, and transmitting configuration data of the acquisition task to a task management center
The data acquisition center comprises a task management center which generates different task lists according to time, and adds corresponding tasks to the different task lists according to task configuration data issued by the registration center. The task management center manages the task list, and dynamically creates, destroys and modifies the task list.
The data acquisition center comprises a heartbeat communication module which is responsible for interactive communication with the data source end, judges the state of the data source end and reports the state to the registration center, and if the data source end loses the heartbeat communication, the data acquisition center automatically alarms. Meanwhile, the module is also responsible for inquiring the readiness of the data source end when the channel is created.
The data acquisition center comprises a clock management module, manages the matching of the natural time and the task list, acquires the corresponding task list, and transmits the task analysis to the channel management center.
The data acquisition center comprises a channel management center, and a permanent/time-limited data acquisition channel is created and managed according to the type of the task, for example, the channel exception closing is responsible for disaster tolerance matters by the module, including but not limited to restarting the channel.
The data acquisition center comprises a data acquisition channel which is only a stable data flow channel and is used for carrying data transmission, and other functions are not provided.
After the data analysis and cleaning work in the data collection center is finished, the data is uploaded to the big data cluster by the reporting cluster module. The reporting cluster module is used for processing the problem of the HDFS small file.
For the management of the collection task, for example, a task list to be started is provided at all daily points, and if task information reported by a data source end is synchronous data at 1 time per day, the synchronous frequency is 1 day. The present data acquisition task will be added to the 1-point per day task list.
It is to be understood that the foregoing is illustrative and not restrictive of the scope of protection herein.
Fig. 5 is a schematic diagram of a data acquisition system when establishing a data acquisition channel according to an embodiment of the present application. And when the data acquisition channel is established, the flow is adopted. In addition, after receiving the data of the data channel, the data acquisition center starts the data analysis and cleaning module to process, and reports the processed data to the HDFS cluster.
Fig. 6 is a schematic structural diagram of a data acquisition system according to an embodiment of the present application, for which the system in fig. 6 includes: the data source end 610 performs unified processing on different storage media and/or different types of data; the data acquisition center 620 is used for constructing a data acquisition channel to be in butt joint with a data source end, wherein the data acquisition center is bound on a main node of the distributed file system cluster and is used for managing data acquisition tasks of different acquisition scenes; further comprises: the acquisition channel 630 serves as a unified channel for data flow.
Different data sources and acquisition tasks are managed by the data acquisition center, so that the problem of task deployment dispersion caused by different data acquisition forms is avoided, and the management efficiency of the data acquisition tasks is improved.
The data acquisition center and the cluster are deeply bound and are embedded into a Master node of the HDFS as self-developed plug-ins. The past data source end-data acquisition program-cluster end is simplified into the data source end-cluster end, so that the cluster has the capability of actively acquiring data, the structure of data acquisition is simplified, and the unexpected risk caused by additional development and deployment of the data acquisition program is avoided. In addition, the unified data preparation processing is carried out on different data acquisition scenes, and then the data are circulated by the unified data acquisition channel architecture, so that the maintenance cost brought by different data acquisition schemes is saved. Different acquisition frequencies need to be set for different types of data.
In contrast to the related art, the passive data receiving is that a task scheduling system executes a data synchronization script program periodically, and the script program connects a data source and a cluster. The script program is responsible for collecting data, transmitting the data and submitting the data. The data acquisition system in the embodiment of the application is active acquisition, specifically, the acquisition system is bound with or is a part of a narrow-definition distributed file system, so as to form the distributed file system in the broad sense in the embodiment of the application.
As a preference in this embodiment, the system further comprises: registering acquisition tasks with the data acquisition center through the data source end, reporting acquisition frequency information, and simultaneously receiving the acquisition tasks registered with the data source end at the data acquisition center and adding the acquisition tasks into a task list of each time period according to the reported acquisition frequency information; and/or, checking the cluster state through the data acquisition center, and checking the acquired data and updating the acquirable state at the data source end; and/or initiating a channel construction request through the data acquisition center, reporting data at the data source terminal after constructing the channel, and performing data circulation and processing through the data source terminal.
In the specific implementation, multiple interactions are performed between the data acquisition center and the data source end, firstly, registration of acquisition tasks is performed, then the acquisition tasks are added into a task list of each time period according to the reported acquisition frequency information in the acquisition center, and then the task list is issued to the data source end to construct a data channel and acquire data.
As a preference in this embodiment, the data source terminal includes: the system comprises a data receiving module, a message queue reading module, a database acquisition module, a file acquisition module, an interface calling module and a real-time data caching module, wherein the data receiving module is started to receive real-time data according to a preset data acquisition mode and is written into the real-time data caching module; and/or, according to a preset data acquisition mode, starting a message queue reading module, consuming data for the message queue, and writing the data into the real-time data caching module; and/or, according to a preset data acquisition mode, reading a table of a database by adopting the database acquisition module; and/or, according to a preset data acquisition mode, reading a file under a specified directory by adopting the file acquisition module; and according to a preset data acquisition mode, an interface calling module is started, and the real-time data caching module is written. The real-time data caching module defaults to a time window form, and puts data into different time windows according to the effectiveness of the data.
In the specific implementation, for the caching of real-time data, a time window mode is adopted by default, the data are placed in different time windows according to the effectiveness of the data, the acquisition frequency is reported to an acquisition center according to the size of the time window, and a data acquisition channel is established for receiving the data by the decision of the acquisition center. And then the state management module manages whether the current data source end data/program state is normally ready, reports the related information of self data acquisition to the data acquisition center when the data acquisition center is started for the first time, and returns whether the channel can be created or not when the channel is established each time. And finally, the data reporting module, the butt joint real-time caching module, the file acquisition module and the database acquisition module are responsible for taking out data from the corresponding modules and sending the data to the data acquisition channel according to the task property of the data source when the data acquisition channel is communicated.
As a preferable mode in this embodiment, the preset data collection manner includes at least one of the following: the method comprises the steps of collecting port data of a server, collecting buried point log data, collecting message queue data, collecting database data, collecting file data and collecting interface data.
In the implementation, for the two cases of the acquisition of the server port data and the acquisition of the buried point log data, a data receiving module is started, and for the port monitoring program or the service program exposure interface, the data receiving module is used for receiving real-time data and writing the real-time data into a real-time data caching module. And for the condition of collecting the data of the message queue, starting a message queue reading module, consuming the data for the message queue, and writing the data into a real-time data caching module. And for the acquisition condition of database data, a database acquisition module is adopted to directly read the table of the database. And (3) under the condition of collecting file data, a file collecting module is adopted to read the file under the appointed directory. And for the condition of collecting the interface data, starting the interface calling module, writing a data collecting script according to the docking rule with the third party interface, and uniformly writing the data into the real-time data caching module. For the caching of real-time data, a time window mode is adopted by default, the data are put into different time windows according to the effectiveness of the data, the acquisition frequency is reported to an acquisition center according to the size of the time window, and a data acquisition channel is established for receiving the data by the decision of the acquisition center.
As a preference in this embodiment, the data source terminal further includes: the state management module is used for managing whether the current data source end data/program state is normally ready; the data reporting module is used for docking the real-time caching module, the file acquisition module and the database acquisition module are used for taking out data from the corresponding modules and sending the data to the data acquisition channel.
When the method is implemented, after the data analysis and cleaning work is finished, the data is uploaded to the big data cluster by the reporting cluster module. The reporting cluster module is used for processing the problem of the HDFS small file. For administration of acquisition tasks, e.g. daily holidays, having a list of tasks to be initiated
As a preference in this embodiment, the data acquisition center includes: the registration center is used for passively receiving the task report of the data source end, managing the state information of the data source and transmitting the configuration data of the acquired task to the task management center; the task management center is used for generating different task lists according to time, and adding corresponding tasks into the different task lists according to task configuration data issued by the registration center; the heartbeat communication module is used for interactive communication with the data source end, judging the state of the data source end and reporting the state to the registration center; the clock management module is used for managing the matching of the natural time and the task list, acquiring a corresponding task list, and issuing task analysis to the channel management center; and the channel management center is used for creating a permanent or time-limited data acquisition channel according to the type of the task and managing the data acquisition channel.
In the implementation, according to different types of data, the data acquisition center manages task lists according to time, and when each time reaches a certain trigger time point, the task list corresponding to the time point is traversed, and a data acquisition channel is established to receive data. These data acquisition types include, but are not limited to, strong real-time data acquisition, requiring the establishment of a permanent data acquisition channel. Near real-time data acquisition requires that the data acquisition channel be established every 1 minute. The data acquisition channel is established once per hour according to the requirement of the data acquisition per hour. The data acquisition channel is established once an hour per day according to the data acquisition requirement of the days. The data acquisition channel is established once a week for a certain time.
As a preference in this embodiment, the data acquisition center is further configured to: according to the time management task list, when each time reaches a certain trigger time point, traversing the task list corresponding to the time point, and establishing a data acquisition channel to receive data; and after the data of the data channel is received, starting the data analysis and cleaning module to process, and reporting the processed data to the distributed file system cluster.
As a preferable aspect of this embodiment, the data collection type of the data collection center includes at least one of the following: strong real-time data acquisition, and correspondingly establishing a permanent data acquisition channel; near real-time data acquisition, and establishing a data acquisition channel once per minute; acquiring data according to the hour, and establishing a data acquisition channel once per hour; data are acquired according to days, and a data acquisition channel is established once per day for preset hours; and acquiring data according to the week, and establishing a data acquisition channel once per week for a preset time.
As a preference in this embodiment, the system further comprises: the task monitoring and early warning module is used for monitoring and finding out the data circulation problem and giving an alarm in time.
In specific implementation, the unified task monitoring and early warning module can well monitor and find the data circulation problem, and timely find and avoid the problem caused by unsmooth data upstream and downstream communication.
It will be apparent to those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be centralized on a single computing device, or distributed across a network of computing devices, or they may alternatively be implemented in program code executable by computing devices, such that they may be stored in a memory device and executed by computing devices, or individually fabricated as individual integrated circuit modules, or multiple modules or steps within them may be fabricated as a single integrated circuit module. Thus, the present application is not limited to any specific combination of hardware and software.
Embodiments of the present application also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.
Alternatively, in the present embodiment, the above-described storage medium may be configured to store a computer program for performing the steps of:
and after registering the acquisition task, reporting acquisition frequency information, and starting the data acquisition center to establish the acquisition channel for acquiring the data in the data source end.
Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing a computer program.
Embodiments of the present application also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:
and after registering the acquisition task, reporting acquisition frequency information, and starting the data acquisition center to establish the acquisition channel for acquiring the data in the data source end.
Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (10)

1. A data acquisition system, the system comprising:
the data source end performs unified processing on different storage media and/or different types of data;
the data acquisition center is used for constructing a data acquisition channel to be in butt joint with a data source end, wherein the data acquisition center is bound to a main node of a distributed file system cluster and is used for managing data acquisition tasks of different acquisition scenes;
further comprises: and the acquisition channel is used as a unified channel for data flow.
2. The data acquisition system of claim 1, wherein the system further comprises:
registering acquisition tasks with the data acquisition center through the data source end, reporting acquisition frequency information, and simultaneously receiving the acquisition tasks registered with the data source end at the data acquisition center and adding the acquisition tasks into a task list of each time period according to the reported acquisition frequency information;
and/or the number of the groups of groups,
the data acquisition center performs cluster state inspection, and at the same time, the data source end performs inspection and the collectable state update on the collected data;
and/or the number of the groups of groups,
and initiating a channel construction request through the data acquisition center, reporting data at the data source terminal after constructing the channel, and performing data circulation and processing through the data source terminal.
3. The data acquisition system of claim 1, wherein the data source comprises: the system comprises a data receiving module, a message queue reading module, a database acquisition module, a file acquisition module, an interface calling module and a real-time data caching module,
starting the data receiving module to receive real-time data according to a preset data acquisition mode, and writing the real-time data into the real-time data caching module;
and/or the number of the groups of groups,
starting a message queue reading module according to a preset data acquisition mode, consuming data by a butt joint message queue, and writing the data into the real-time data caching module;
and/or the number of the groups of groups,
according to a preset data acquisition mode, the database acquisition module is adopted to read a table of a database;
and/or the number of the groups of groups,
according to a preset data acquisition mode, reading a file under a designated directory by adopting the file acquisition module;
and/or the number of the groups of groups,
according to a preset data acquisition mode, an interface calling module is started, and the real-time data caching module is written;
the real-time data caching module defaults to a time window form, and puts data into different time windows according to the effectiveness of the data.
4. A data acquisition system according to claim 3, wherein the predetermined data acquisition means comprises at least one of:
the method comprises the steps of collecting port data of a server, collecting buried point log data, collecting message queue data, collecting database data, collecting file data and collecting interface data.
5. The data acquisition system of claim 3 wherein the data source further comprises:
the state management module is used for managing whether the current data source end data/program state is normally ready;
the data reporting module is used for interfacing the real-time caching module, the file acquisition module and the database acquisition module are used for taking out data from the corresponding modules and sending the data to the data acquisition channel.
6. The data acquisition system of claim 1, wherein the data acquisition center comprises:
the registration center is used for passively receiving the task report of the data source end, managing the state information of the data source and transmitting the configuration data of the acquired task to the task management center;
the task management center is used for generating different task lists according to time, and adding corresponding tasks into the different task lists according to task configuration data issued by the registration center;
the heartbeat communication module is used for interactive communication with the data source end, judging the state of the data source end and reporting the state to the registration center;
the clock management module is used for managing the matching of the natural time and the task list, acquiring a corresponding task list, and issuing task analysis to the channel management center;
and the channel management center is used for creating a permanent or time-limited data acquisition channel according to the type of the task and managing the data acquisition channel.
7. The data acquisition system of claim 6, wherein the data acquisition center is further configured to: according to the time management task list, when each time reaches a certain trigger time point, traversing the task list corresponding to the time point, and establishing a data acquisition channel to receive data;
and after the data of the data channel is received, starting the data analysis and cleaning module to process, and reporting the processed data to the distributed file system cluster.
8. The data acquisition system of claim 6, wherein the data acquisition type of the data acquisition center comprises at least one of:
strong real-time data acquisition, and correspondingly establishing a permanent data acquisition channel;
near real-time data acquisition, and periodically establishing a data acquisition channel at preset time intervals with minutes as frequency;
acquiring data according to the hours, and periodically establishing a data acquisition channel at preset time intervals with the hours as the frequency;
data are acquired according to days, and a data acquisition channel is periodically established at preset time intervals with the frequency of days;
and acquiring data according to the week, and periodically establishing a data acquisition channel at preset time intervals with the frequency in the week.
9. The data acquisition system of claim 1, wherein the system further comprises: the task monitoring and early warning module is used for monitoring and finding out the data circulation problem and giving an alarm in time.
10. A data acquisition method applied to the system according to any one of claims 1 to 9, the method comprising:
and after registering the acquisition task, reporting acquisition frequency information, and starting the data acquisition center to establish the acquisition channel for acquiring the data in the data source end.
CN202310136852.2A 2023-02-13 2023-02-13 Data acquisition system and method Pending CN116383002A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310136852.2A CN116383002A (en) 2023-02-13 2023-02-13 Data acquisition system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310136852.2A CN116383002A (en) 2023-02-13 2023-02-13 Data acquisition system and method

Publications (1)

Publication Number Publication Date
CN116383002A true CN116383002A (en) 2023-07-04

Family

ID=86960513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310136852.2A Pending CN116383002A (en) 2023-02-13 2023-02-13 Data acquisition system and method

Country Status (1)

Country Link
CN (1) CN116383002A (en)

Similar Documents

Publication Publication Date Title
CN109460349B (en) Test case generation method and device based on log
CN106844198B (en) Distributed dispatching automation test platform and method
CN109460841B (en) User account opening method, system and storage medium
CN112506870B (en) Data warehouse increment updating method and device and computer equipment
CN109902028A (en) Automated testing method, device, equipment and the storage medium of ACL characteristic
CN116204438A (en) Test case generation method, automatic test method and related device
CN105467907A (en) Automatic inspection system and method
CN114490053A (en) Context awareness strategy recommendation system based on edge calculation and supervised learning method
CN106656592B (en) Service management method and device based on role configuration
CN113419872A (en) Application system interface integration system, integration method, equipment and storage medium
CN102521339A (en) System and method for dynamic access of data sources
CN112417050A (en) Data synchronization method and device, system, storage medium and electronic device
CN116383002A (en) Data acquisition system and method
CN115460072A (en) Log processing system integrating log collection, analysis, storage and service
CN109921963B (en) Network state inspection method and system
CN112286918B (en) Method and device for fast access conversion of data, electronic equipment and storage medium
CN113627963B (en) Electric power refined operation rule base creation method
CN113407415A (en) Log management method and device of intelligent terminal
CN107330089B (en) Cross-network structured data collection system
CN112650815A (en) Method and device for synchronizing environmental data, storage medium and electronic device
CN112818059B (en) Information real-time synchronization method and device based on container release platform
CN112564953B (en) Method, device and equipment for managing remote equipment of office
CN115348185B (en) Control method and control device of distributed query engine
CN109995617A (en) Automated testing method, device, equipment and the storage medium of Host Administration characteristic
CN109684158A (en) Method for monitoring state, device, equipment and the storage medium of distributed coordination system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination