CN111881153A

CN111881153A - Data processing method and device, electronic equipment and machine-readable storage medium

Info

Publication number: CN111881153A
Application number: CN202010727480.7A
Authority: CN
Inventors: 赵宇; 徐寅斐; 柴瑜轩; 侯雪峰
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-11-03

Abstract

The invention provides a data processing method, a data processing device, electronic equipment and a machine-readable storage medium, wherein specified data are extracted from initial data of a target data source within a preset timing duration, and are subjected to aggregation processing through a preset aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the polymerization result to a data analysis system. In the method, before the data is input into the data analysis system, the data is subjected to aggregation processing, the aggregation result is periodically output to the data analysis system in a timing mode, and the data is aggregated, so that the data volume of the data received by the data analysis system can be reduced, the load bearing pressure of the system is reduced, and the data transmission pressure of a transmission network is also reduced.

Description

Data processing method and device, electronic equipment and machine-readable storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a machine-readable storage medium.

Background

The big data platform is mainly used for data analysis; the big data platform needs to collect data from a data source through a data collection system, and then analyzes and stores the collected data. In a data acquisition system in the related art, data acquired from a data source is generally completely transmitted to a big data platform, or the acquired data is simply processed, such as filtering and formatting, and then the simply processed data is transmitted to the big data platform; this approach tends to result in a large load bearing pressure on the large data platform, and a large data transmission pressure on the transmission network.

Disclosure of Invention

In view of the above, the present invention provides a data processing method, an apparatus, an electronic device and a machine-readable storage medium, so as to reduce the load bearing pressure of a large data platform and reduce the data transmission pressure of a transmission network.

In a first aspect, an embodiment of the present invention provides a data processing method, where the method is applied to an electronic device running with a data acquisition system, or an electronic device communicatively connected to a data output end of an electronic device running with a data acquisition system; the electronic equipment is in communication connection with the electronic equipment running with the data analysis system; the method comprises the following steps: within a preset timing duration, extracting specified data from initial data of a target data source, and carrying out aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the polymerization result to a data analysis system.

Further, the step of extracting the specified data from the initial data of the target data source includes: reading initial data from a target data source one by one; and aiming at each piece of initial data, extracting specified data from the piece of initial data, and performing aggregation processing on the specified data and data in a preset storage space through an aggregation algorithm to obtain an aggregation result.

Further, the specified data comprises at least one keyword and a keyword value corresponding to each keyword; the method comprises the following steps of carrying out aggregation processing on specified data and data in a preset storage space through an aggregation algorithm to obtain an aggregation result, wherein the aggregation result comprises the following steps: for each keyword, carrying out aggregation processing on a keyword value corresponding to the keyword in the specified data and a keyword value corresponding to the keyword in a storage space to obtain an aggregate value corresponding to the keyword; and determining each key and the aggregation value corresponding to the key as an aggregation result.

Further, the step of extracting the specified data from the initial data of the target data source includes: reading initial data from a target data source one by one; aiming at each piece of initial data, extracting specified data from the piece of initial data, and storing the specified data into a specified storage space; before the step of outputting the aggregation result to the data analysis system, the method further includes: and when the time length is reached, carrying out aggregation processing on the specified data in the storage space through an aggregation algorithm to obtain an aggregation result.

Further, the specified data comprises at least one keyword and a keyword value corresponding to each keyword; the method comprises the following steps of carrying out aggregation processing on specified data in a storage space through an aggregation algorithm to obtain an aggregation result, wherein the aggregation result comprises the following steps: for each keyword, carrying out aggregation processing on a keyword value corresponding to the keyword in a storage space to obtain an aggregate value corresponding to the keyword; and determining each key and the aggregation value corresponding to the key as an aggregation result.

Further, after the step of outputting the aggregation result to the data analysis system, the method further includes: when the timing length is reached, the data in the storage space is emptied.

Further, the target data source is configured with a configuration file in advance; the configuration file includes: the plug-in identification of the plug-in corresponding to the target data source; the plug-ins comprise a data acquisition plug-in, a data aggregation plug-in and a data output plug-in which are sequentially connected; wherein, the data acquisition plug-in is used for: reading initial data from a target data source; the data aggregation plug-in is used for: extracting specified data from the initial data, and carrying out aggregation processing on the specified data through an aggregation algorithm to obtain an aggregation result; the data output plug-in is used for: and outputting the aggregation result to a data analysis system.

Further, the configuration file also comprises plug-in parameters of the plug-in; wherein, the plug-in parameters of the data acquisition plug-in include: a source identification of the target data source and a separation identifier of each piece of initial data; the plug-in parameters of the data aggregation plug-in include: a time length is set, a keyword of specified data and an aggregation algorithm corresponding to a keyword value of the keyword are specified; the plug-in parameters of the data output plug-in include: address information of the data analysis system.

Further, the configuration file further comprises: queue identification of data transit queues between connected plug-ins.

In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the apparatus is disposed in an electronic device running a data acquisition system; or the electronic equipment is arranged on the electronic equipment which is in communication connection with the data output end of the electronic equipment running with the data acquisition system; the electronic equipment is in communication connection with the electronic equipment running with the data analysis system; the device comprises: the extraction module is used for extracting specified data from initial data of the target data source within a preset timing duration, and performing aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result; and the output module is used for outputting the aggregation result to the data analysis system when the time duration is reached.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to implement the data processing method in any one of the first aspect.

In a fourth aspect, embodiments of the invention provide a machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to carry out the data processing method of any one of the first aspects.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a data processing method, a data processing device, electronic equipment and a machine-readable storage medium, wherein specified data are extracted from initial data of a target data source within a preset timing duration, and are subjected to aggregation processing through a preset aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the polymerization result to a data analysis system. In the method, before the data is input into the data analysis system, the data is subjected to aggregation processing, the aggregation result is periodically output to the data analysis system in a timing mode, and the data is aggregated, so that the data volume of the data received by the data analysis system can be reduced, the load bearing pressure of the system is reduced, and the data transmission pressure of a transmission network is also reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a data acquisition program according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of another data acquisition procedure provided in the embodiment of the present invention;

fig. 3 is a schematic diagram of a data processing method according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating another data processing method according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating another data processing method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating a data flow direction according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating a specific data processing method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The data access of the large data platform needs to support a multi-data-source, multi-format and distributed data acquisition system. The current relatively sophisticated open source data collection systems include the Flume (journal collection system): the method comprises the following steps that the flash is a high-availability, high-reliability and distributed system for acquiring, aggregating and transmitting massive logs, and supports various data senders customized in the log system for collecting data; at the same time, flash provides the ability to simply process data and write to various data recipients (customizable). In addition, a data acquisition system such as LOGSTASH (data processing pipeline) is also provided, and the basic implementation is similar to that of flume. As shown in fig. 1, a data acquisition program in the prior art is mainly divided into three parts, namely, data reception, data formatting and data transmission; the main role of data formatting is to simply process data, such as filtering, formatting and the like; then transmitting the simply processed data to a big data platform; this approach easily results in a large load bearing pressure for the large data platform, and at the same time, has a high requirement on the transmission network.

Based on this, the embodiments of the present invention provide a data processing method, apparatus, electronic device and machine-readable storage medium, and the technology can be applied to an electronic device with a data processing function. To facilitate understanding of the embodiment, a detailed description will be given to a data processing method disclosed in the embodiment of the present invention.

The embodiment of the invention provides a data processing method, which is used for operating electronic equipment with a data acquisition system or electronic equipment in communication connection with a data output end of the electronic equipment with the data acquisition system; the electronic device is in communication with an electronic device running a data analysis system.

The electronic equipment operating with the data acquisition system can be electronic equipment specially used for data acquisition; the data acquisition system may also be run in the data source electronics; the electronic device may include three modules of data receiving, aggregate computing, and data transmitting as shown in fig. 2; the electronic equipment can be a computer, a server, terminal equipment and the like; compared with the prior art, the invention enhances the function of the data formatting module, supports the data stream to enter and can realize the aggregation calculation on the basis of the filtering and formatting of the module.

Specifically, the data source of the big data platform may be various, and may be received from networks such as HTTP (Hyper text Transfer Protocol), TCP (Transmission Control Protocol), FTP (File Transfer Protocol), etc., read from a File or a disk, or received from the output of other software, such as software such as Kafka (Apache Kafka, an open source stream processing platform), MQ (Message Queue), etc.; furthermore, the data sources may be distributed across many electronic devices, such as where the logs of a distributed cluster are distributed across different machines; therefore, data access of a large data platform needs to support a multi-data-source, multi-format and distributed data acquisition system.

As shown in fig. 3, the method comprises the steps of:

step S302, extracting specified data from initial data of a target data source within a preset timing duration, and performing aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result;

the preset timing duration may be timed by a timer, the timer may be a periodic timer, and the timing duration may be set according to actual data acquisition needs, for example, the timing duration may be thirty seconds, one minute, two minutes, or the like; the target data source may include data sources in various formats, such as data sources in text, paquet, json, xml, and the like; what types of data source data the electronic device needs to extract may be configured in advance; the initial data generally refers to original data in a data source, and the initial data may include some invalid data, such as illegal fields, format errors, unmatched values, and the like; generally, when specific data is extracted, invalid data in the data can be filtered, and simultaneously, data in different formats can be unified in format; the specific data may include a specific field in the initial data, and data corresponding to the field, for example, data with a field of time in log data, data with a field of IP (internet protocol), data with a field of age in text data, and the like; what specific data the electronic device extracts from the initial data in the data source may also be configured in advance; the above aggregation algorithm may be an operation that performs a calculation on a set of values and returns a single value, which may include calculating a sum, a mean, a maximum, a minimum, etc. of specified data; or performing data statistics on the specified data according to a certain time period, for example, counting the number of lines of log data; the aggregation algorithm can reduce the data volume flowing into a big data platform; which aggregation algorithm the electronic device performs on the extracted specified data may also be configured in advance.

In actual implementation, according to a preset timing duration, in the timing period, according to a preset field, extracting specified data from initial data of a target data source, wherein the specified data can be stored in a memory or cached, waiting for triggering of a timer clock, and after the preset timing duration is reached, performing aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result; or, in a timing period, designated data may be extracted from the initial data of the target data source item by item according to a preset field, and each extracted designated data is aggregated by a preset aggregation algorithm to obtain an aggregation result. The aggregated result has a smaller amount of data relative to the initial data.

It should be noted that, the electronic device extracts the specified data mainly by streaming real-time data collection, and the non-real-time (offline) data collection is naturally supported.

And step S304, outputting the aggregation result to a data analysis system when the time length is reached.

The data analysis system may include a big data platform, where various big data components are deployed in the big data platform to receive the aggregation result, for example, the aggregation result is output to Kafak, and data analysis is performed on the received aggregation result; the storage address of the data analysis system to which the electronic device outputs the aggregation result can be configured in advance. Specifically, when the preset time duration is reached, the aggregation result calculated in the time duration is output to a preset address; or, when the preset time duration is reached, performing aggregation processing on the specified data received in the time duration, and outputting the calculated aggregation result to a preset address; when the aggregation result is output, the formatted aggregation result may be output to the data analysis system.

The embodiment of the invention provides a data processing method, which comprises the steps of extracting specified data from initial data of a target data source within a preset timing duration, and carrying out aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the polymerization result to a data analysis system. In the method, before the data is input into the data analysis system, the data is subjected to aggregation processing, the aggregation result is periodically output to the data analysis system in a timing mode, and the data is aggregated, so that the data volume of the data received by the data analysis system can be reduced, the load bearing pressure of the system is reduced, and the data transmission pressure of a transmission network is also reduced.

The embodiment of the invention provides another data processing method, which is realized on the basis of the method of the embodiment; the embodiment focuses on the implementation process of the step of extracting the specified data from the initial data of the target data source (implemented by steps S402-S404); as shown in fig. 4, the method includes the steps of:

step S402, reading initial data from a target data source one by one within a preset timing duration;

the electronic equipment with the data acquisition system operates, and initial data can be read from a target data source one by one through the data receiving module within a preset timing duration.

Step S404, aiming at each piece of initial data, extracting specified data from the piece of initial data, and performing aggregation processing on the specified data and data in a preset storage space through an aggregation algorithm to obtain an aggregation result;

the initial data may include data of a plurality of fields, for example, log data in the electronic device, where each piece of initial data includes field information such as access time and access address; the preset storage space can store various forms of data, such as map, json and the like; the data in the preset storage space may be specified data extracted from the initial data within a preset timing duration, or data that has been subjected to aggregation processing. For example, within a timing duration, initial data may be received item by item, and in an initial state, a first piece of specified data is extracted from a first piece of received initial data according to a preset extraction field; meanwhile, the appointed data is stored in a storage space in a preset storage form; when the next piece of initial data is read, extracting a second piece of specified data according to a preset extraction field, carrying out aggregation processing on the specified data and the first piece of specified data in the storage space through a preset aggregation algorithm to obtain a first aggregation result, storing the first aggregation result into the storage space, and deleting the first piece of specified data and the second piece of specified data; continuously reading the third piece of initial data, extracting third piece of specified data according to a preset extraction field, carrying out aggregation processing on the third piece of specified data and a first aggregation result in a storage space through a preset aggregation algorithm to obtain a second aggregation result, storing the second aggregation result into the storage space, and deleting the first aggregation result; until the timed period is reached. Multiple pieces of initial data are usually read within a preset timing duration, and a final aggregation result is obtained after the timing duration is reached. In this way, the data in the storage space is less, and the memory can be saved.

The specified data comprises at least one keyword and a keyword value corresponding to each keyword; for the above step S404, a preferred embodiment:

step A1, aiming at each keyword, carrying out aggregation processing on a keyword value corresponding to the keyword in the specified data and a keyword value corresponding to the keyword in a storage space to obtain an aggregation value corresponding to the keyword;

the keyword may be understood as a field in the initial data, and the keyword value may be understood as field information corresponding to the field in the initial data; for example, the keyword in the log data may be time, and the keyword value may be a specific time value corresponding to the time; for another example, the keyword in the text data may be an age, and the keyword value may be a specific age value corresponding to the age. Specifically, a keyword that needs to be aggregated and calculated and a keyword value corresponding to the keyword are extracted from the designated data, for example, key1, key2, and key3 may be used as the keyword, value1, value2, and value3 may be used as the keyword value, the keyword and the keyword value are stored in the storage space, and if the keyword included in the next piece of designated data is the same as the keyword, the keyword value corresponding to the keyword in the piece of designated data and the keyword value corresponding to the keyword in the storage space are aggregated, so as to obtain an aggregate value corresponding to the keyword.

Step a2, determining each keyword and the aggregation value corresponding to the keyword as the aggregation result.

Since the designated data includes at least one keyword, the aggregation value corresponding to the keyword calculated after the aggregation processing in step a1 may be a plurality of aggregation values corresponding to a plurality of keywords; therefore, the aggregation result may include a plurality of keywords and a plurality of aggregation values corresponding to the plurality of keywords.

Step S406, when the time length is reached, outputting the aggregation result to the data analysis system.

In this way, initial data is read from a target data source one by one; for each piece of initial data, extracting specified data from the piece of initial data, and performing aggregation processing on the specified data and data in a preset storage space through an aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the aggregation result to a data analysis system. Compared with the initial data, the data volume processed by the data processing method of the embodiment is greatly reduced, so that the load bearing pressure of a large data platform is reduced, and the data transmission pressure of a transmission network is reduced.

The embodiment of the invention provides another data processing method, which is realized on the basis of the method of the embodiment; this embodiment focuses on the implementation process of the step of extracting the specified data from the initial data of the target data source (implemented by step S502); and a step (realized by step S504) preceding the step of outputting the aggregation result to the data analysis system; and a step (realized by step S508) subsequent to the step of outputting the aggregation result to the data analysis system; as shown in fig. 5, the method includes the steps of:

step S502, reading initial data from a target data source one by one within a preset timing duration; aiming at each piece of initial data, extracting specified data from the piece of initial data, and storing the specified data into a specified storage space;

the method comprises the steps that electronic equipment with a data acquisition system runs, and initial data can be read from a target data source one by one through a data receiving module within a preset timing duration; for each piece of initial data read, the data may include a plurality of fields, for example, log data in the electronic device, and each piece of initial data includes field information such as access time, access address, and the like; the designated storage space may store various forms of data, such as map, json, and the like. In actual implementation, specified data can be extracted from initial data received one by one according to preset extraction fields; meanwhile, the specified data is stored in a storage space in a preset storage form; for example, the extracted designated data is stored in the designated storage space according to the preset field and the field information corresponding to the field.

Step S504, when the time length reaches the time length, carrying out aggregation processing on the specified data in the storage space through an aggregation algorithm to obtain an aggregation result;

within a timed duration, a plurality of specified data are usually extracted; when the time length is reached, a preset aggregation algorithm may be used to perform aggregation operation on the field information corresponding to each field in all the specified data in the storage space, so as to obtain an aggregation result corresponding to a plurality of fields.

The specified data comprises at least one keyword and a keyword value corresponding to each keyword; for the above step S504, a preferred embodiment:

step B1, for each keyword, performing aggregation processing on the keyword value corresponding to the keyword in the storage space to obtain an aggregation value corresponding to the keyword;

the keyword can be understood as a field in the initial data, and the keyword value can be understood as field information corresponding to the field in the initial data; for example, the keyword in the log data may be time, and the keyword value may be a specific time value corresponding to the time; for another example, the keyword in the text data may be an age, and the keyword value may be a specific age value corresponding to the age. Specifically, since the specified data includes at least one keyword, when the time length reaches the time length, the keyword values corresponding to the keywords in the specified data stored in the storage space in the time length may be aggregated for the at least one keyword, so as to obtain multiple aggregation values corresponding to multiple keywords; the above-described aggregation processing may be set in advance according to the characteristics of the specified data.

And step B2, determining each keyword and the aggregation value corresponding to the keyword as an aggregation result.

The aggregation value corresponding to the keyword calculated after the aggregation processing in the step B1 may be a plurality of aggregation values corresponding to a plurality of keywords; therefore, the aggregation result may include a plurality of keywords and a plurality of aggregation values corresponding to the plurality of keywords.

Step S506, outputting the aggregation result to a data analysis system;

step S508, emptying the data in the storage space.

The data collected in the aggregation calculation module is collected in the timing period according to the timing period and subjected to aggregation calculation, and after an aggregation result is obtained through calculation, the aggregation result is output to the data analysis system, and meanwhile, the data in the storage space is emptied; so as to store the collected specified data in the next cycle.

In this way, initial data is read from a target data source one by one; aiming at each piece of initial data, extracting specified data from the piece of initial data, and storing the specified data into a specified storage space; when the time length reaches the fixed time length, carrying out aggregation processing on the specified data in the storage space through an aggregation algorithm to obtain an aggregation result; outputting the aggregation result to a data analysis system; emptying data in the storage space; the implementation mode supports the aggregation calculation, the multi-data source acquisition, the distributed deployment and the multi-log format during the acquisition, and supports the calculation in the data acquisition process and the transmission of the result, thereby reducing the load bearing pressure of a data analysis system, the calculation and storage pressure of a large data platform, and simultaneously reducing the data transmission pressure of a transmission network.

The configuration file is a computer file, can configure parameters and initial settings for some computer programs, and can be in json, xml, yml and other language formats. In the invention, a configuration file is used for configuring a corresponding plug-in advance for a target data source; the plug-in corresponding to the target data source can be identified through the plug-in identification, and the plug-ins respectively comprise a data acquisition plug-in, a data aggregation plug-in and a data output plug-in. The data acquisition plug-in can be set according to the type of the acquired target data source to obtain different data acquisition plug-ins, for example, acquiring TCP (Transmission control protocol) data can set the data acquisition plug-in as TCP _ INPUT _ plug, acquiring text data can set the data acquisition plug-in as FILE _ INPUT _ plug, and different data acquisition plug-ins can be developed according to service requirements; the data aggregation plug-in may also set different data aggregation plug-ins according to the characteristics of the target data source and the service requirements, for example, a data aggregation plug-in AGG _ component _ plug that performs aggregation calculation according to a specified field with five minutes as a timing duration, and certainly, the data aggregation plug-in may also be customized according to different requirements; the data OUTPUT plug-in can set different data OUTPUT plug-ins, such as FILE data OUTPUT FILE _ OUTOUT _ plug, and data OUTPUT plug-in KFAKA _ OUTPUT _ plug OUTPUT to Kafak, according to different addresses of data OUTPUT and characteristics of OUTPUT data.

In addition, the plug-ins are connected in sequence; data flow turns in sequentially connected cards such as shown in FIG. 6; the association between the plug-ins is generated by an acquisition program kernel and a configuration file, and the configuration file indicates the upstream and downstream relation of the plug-ins; for example, the data collection plug-in can read initial data from a target data source in a file format (the content of the file is ip \ t \ current time, wherein "\ t" is a separator); the data output plug-in can output the aggregation result to Kafka of the data analysis system; the data aggregation plug-in can calculate the number of times of accessing each ip in each minute; in this embodiment, taking a Language format in which the configuration files are toml (Tom's objects, minor Language), the plug-in included in each configuration file may be implemented by the following codes:

[FILE_INPUT_PLUGIN]

file＝"/data/iponline.log"

split＝”\t”

[AGG_COMPUTE_PLUGIN]

interval＝”60s”

sum_value＝count(*)

sum_key＝ip

[KAFKA_OUTPUT_PLUGIN]

address＝”127.0.0.1:9200”

the codes represent the data flow transition process of data acquisition in the whole configuration file, namely input, computer, output and parameter configuration of each step, for example, the data acquisition plug-in is a file to collect initial data by a '\ t' partition, and the data aggregation plug-in calculates the number of times of the occurrence of an ip every 60 seconds; finally the data output plug-in sets the address of kafka.

The configuration file also comprises plug-in parameters of the plug-ins; wherein, the plug-in parameters of the data acquisition plug-in include: a source identification of the target data source and a separation identifier of each piece of initial data; the plug-in parameters of the data aggregation plug-in include: a time length is set, a keyword of specified data and an aggregation algorithm corresponding to a keyword value of the keyword are specified; the plug-in parameters of the data output plug-in include: address information of the data analysis system.

Plug-in parameters of the data acquisition plug-in can be set according to the type, data characteristics, data format and the like of a target data source; a source identifier of the target data source and a separation identifier of each piece of initial data are included, wherein the source identifier generally refers to a name of the target data source, for example, data in a read file format, and the source identifier may be file ═ data/iponline.log "; the separation identifier for each piece of initial data may separate each piece of initial data read, e.g., the data collection plug-in collects each piece of initial data with a "\ t" separator.

The plug-in parameters of the data aggregation plug-in can also be set according to the characteristics of the acquired initial data, the field of the initial data, the service requirements and the like, and generally comprise timing duration, keywords of the specified data and an aggregation algorithm corresponding to the keyword values of the keywords; the plugin parameter values of the data aggregation plugins may be specifically set according to actual service requirements, for example, the number of times of ip occurrence is required to be calculated every 60 seconds, at this time, the timing duration is set to 60 seconds, that is, the interval is "60 s"; the key word of the specified data is ip, and sum _ key is ip; the aggregation algorithm corresponding to the key value of the key is to calculate the number of times of ip occurrence sum _ value ═ count (×).

The plug-in parameters of the data output plug-in can be set according to which address needs to be output in the service requirement, and usually comprise address information of a data analysis system; the address information may be an address of a data processing platform deployed in the data analysis system; for example, the data is required to be input into Kafak in the data analysis system, i.e., address ═ 127.0.0.1:9200 ".

The configuration file further includes: queue identification of data transit queues between connected plug-ins. Data message flow among the plug-ins is not limited to the current direct transmission, and a buffer queue or other message transfer mechanisms can be added; therefore, the queue identification can be used for selecting the data message circulation mode between the plug-ins.

It should be noted that the implementation of the plug-in may be implemented by a JAVA reflection mechanism, that is, the data acquisition system may reflect a specific class according to the name of the plug-in, such as FILE _ INPUT _ plug, and the class may be used as long as it implements various processing methods of the data acquisition system. In addition, by means of plug-in, the downstream of a data receiving module can not be limited to be designed as an aggregation computing module, that is, many-to-many can be realized between modules, and multiple message sources and multiple outlets are realized.

In the method, data acquisition, data aggregation and data output are realized in a plug-in mode, and plug-in is supported, so that a new target data source is docked or a new aggregation calculation method is docked very quickly and conveniently in data processing; in addition, the plug-in can self-define the configuration file, so that when some new tasks are completed, new plug-ins can be directly developed according to task requirements, and the plug-ins can be dynamically installed without restarting a main program, so that the processes of data acquisition, calculation and output are more convenient and faster.

The embodiment provides a specific implementation manner, referring to fig. 7, an aggregation calculation module in an electronic device running a data acquisition system may first obtain a key required to be used as a key from plug-in parameters of a data aggregation module included in a configuration file, then start receiving initial data item by item, extract the key required to be calculated and a key value required to be subjected to aggregation calculation from each piece of initial data, and if the key includes three keys, store the key required to be calculated and the key value values in a map form using the key1, the key2, and the key3 as keys, and store the key required to be calculated, the key1, the value2, and the value3 as values into a memory (a container associating key pair objects with value objects); if the received next keyword is the same as the key, respectively adding the keyword value corresponding to the keyword into the value of the memory, carrying out aggregation processing on the keyword value corresponding to the keyword in the memory to obtain an aggregation value corresponding to the keyword, and determining each keyword and the aggregation value corresponding to the keyword as an aggregation result; judging whether the timing duration is reached, if not, namely N in the graph, continuing to receive the next piece of initial data; if the result is reached, namely Y in the graph, outputting the calculated aggregation result, and simultaneously emptying the memory; wherein the calculated aggregation result can be formatted and output.

Corresponding to the above data processing method embodiment, an embodiment of the present invention provides a data processing apparatus, as shown in fig. 8, where the apparatus is located in an electronic device running with a data acquisition system, or is located in an electronic device communicatively connected to a data output end of an electronic device running with a data acquisition system; the electronic equipment is in communication connection with the electronic equipment running with the data analysis system; the device includes:

the extracting module 81 is configured to extract specified data from initial data of the target data source within a preset timing duration, so as to perform aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result;

and the output module 82 is used for outputting the aggregation result to the data analysis system when the time duration is reached.

The embodiment of the invention provides a data processing device, which is used for extracting specified data from initial data of a target data source within a preset timing duration so as to carry out aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result; and when the time length is reached, outputting the polymerization result to a data analysis system. In the method, before the data is input into the data analysis system, the data is subjected to aggregation processing, the aggregation result is periodically output to the data analysis system in a timing mode, and the data is aggregated, so that the data volume of the data received by the data analysis system can be reduced, the load bearing pressure of the system is reduced, and the data transmission pressure of a transmission network is also reduced.

Further, the extracting module is further configured to: reading initial data from a target data source one by one; and aiming at each piece of initial data, extracting specified data from the piece of initial data, and performing aggregation processing on the specified data and data in a preset storage space through an aggregation algorithm to obtain an aggregation result.

Further, the specified data includes at least one keyword and a keyword value corresponding to each keyword; the extraction module is further configured to: for each keyword, carrying out aggregation processing on a keyword value corresponding to the keyword in the specified data and a keyword value corresponding to the keyword in a storage space to obtain an aggregate value corresponding to the keyword; and determining each key and the aggregation value corresponding to the key as an aggregation result.

Further, the extracting module is further configured to: reading initial data from a target data source one by one; aiming at each piece of initial data, extracting specified data from the piece of initial data, and storing the specified data into a specified storage space; the above apparatus is also for: and when the time length is reached, carrying out aggregation processing on the specified data in the storage space through an aggregation algorithm to obtain an aggregation result.

Further, the specified data includes at least one keyword and a keyword value corresponding to each keyword; the extraction module is further configured to: for each keyword, carrying out aggregation processing on a keyword value corresponding to the keyword in a storage space to obtain an aggregate value corresponding to the keyword; and determining each key and the aggregation value corresponding to the key as an aggregation result.

Further, the above apparatus is further configured to: when the timing length is reached, the data in the storage space is emptied.

Further, the configuration file further comprises plug-in parameters of the plug-in; wherein, the plug-in parameters of the data acquisition plug-in include: a source identification of the target data source and a separation identifier of each piece of initial data; the plug-in parameters of the data aggregation plug-in include: a time length is set, a keyword of specified data and an aggregation algorithm corresponding to a keyword value of the keyword are specified; the plug-in parameters of the data output plug-in include: address information of the data analysis system.

Further, the configuration file further includes: queue identification of data transit queues between connected plug-ins.

The data processing device provided by the embodiment of the invention has the same technical characteristics as the data processing method provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.

An embodiment of the present invention further provides an electronic device, as shown in fig. 9, the electronic device includes a processor 90 and a memory 91, the memory 91 stores machine executable instructions capable of being executed by the processor 90, and the processor 90 executes the machine executable instructions to implement the data processing method.

Further, the electronic device shown in fig. 9 further includes a bus 92 and a communication interface 93, and the processor 90, the communication interface 93, and the memory 91 are connected by the bus 92.

The Memory 91 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 93 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. Bus 92 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The processor 90 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 90. The Processor 90 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 91, and the processor 90 reads the information in the memory 91 and performs the steps of the method of the previous embodiment in combination with the hardware thereof.

The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the data processing method.

The data processing method, the data processing apparatus, the electronic device, and the computer program product of the machine-readable storage medium according to the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meaning of the above terms in the present invention can be understood in specific cases for those skilled in the art.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that the following embodiments are merely illustrative of the present invention, and not restrictive, and the scope of the present invention is not limited thereto: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data processing method is characterized in that the method is applied to electronic equipment operating with a data acquisition system or electronic equipment in communication connection with a data output end of the electronic equipment operating with the data acquisition system; the electronic equipment is in communication connection with the electronic equipment running with the data analysis system; the method comprises the following steps:

within a preset timing duration, extracting specified data from initial data of a target data source, and carrying out aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result;

and when the time length is reached, outputting the polymerization result to the data analysis system.

2. The method of claim 1, wherein the step of extracting the specified data from the initial data of the target data source comprises:

reading initial data from the target data source one by one;

and aiming at each piece of initial data, extracting specified data from the piece of initial data, and performing aggregation processing on the specified data and data in a preset storage space through the aggregation algorithm to obtain an aggregation result.

3. The method according to claim 2, wherein the specified data includes at least one keyword, and a keyword value corresponding to each of the keywords;

the step of aggregating the specified data and the data in the preset storage space through the aggregation algorithm to obtain an aggregation result includes:

for each keyword, performing aggregation processing on a keyword value corresponding to the keyword in the specified data and a keyword value corresponding to the keyword in the storage space to obtain an aggregate value corresponding to the keyword;

and determining each keyword and the aggregation value corresponding to the keyword as an aggregation result.

4. The method of claim 1, wherein the step of extracting the specified data from the initial data of the target data source comprises: reading initial data from the target data source one by one; for each piece of initial data, extracting specified data from the piece of initial data, and storing the specified data into a specified storage space;

prior to the step of outputting the aggregated results to the data analysis system, the method further comprises: and when the time duration is reached, carrying out aggregation processing on the specified data in the storage space through the aggregation algorithm to obtain an aggregation result.

5. The method according to claim 4, wherein the specified data includes at least one keyword, and a keyword value corresponding to each of the keywords;

the step of performing aggregation processing on the specified data in the storage space through the aggregation algorithm to obtain an aggregation result includes:

for each keyword, performing aggregation processing on a keyword value corresponding to the keyword in the storage space to obtain an aggregate value corresponding to the keyword;

6. The method of any of claims 2-5, wherein after the step of outputting the aggregated results to the data analysis system, the method further comprises: and emptying the data in the storage space.

7. The method of claim 1, wherein the target data source is preconfigured with a configuration file; the configuration file includes: the plug-in identification of the plug-in corresponding to the target data source; the plug-ins comprise a data acquisition plug-in, a data aggregation plug-in and a data output plug-in which are sequentially connected;

wherein the data collection plug-in is configured to: reading initial data from the target data source;

the data aggregation plugin is to: extracting specified data from the initial data, and carrying out aggregation processing on the specified data through the aggregation algorithm to obtain an aggregation result;

the data output plug-in is used for: and outputting the aggregation result to the data analysis system.

8. The method of claim 7, wherein the configuration file further comprises plug-in parameters for the plug-in;

wherein the plug-in parameters of the data acquisition plug-in include: a source identification of the target data source and a separation identifier for each of the initial data;

the plug-in parameters of the data aggregation plug-in include: the timing duration, the keywords of the specified data and an aggregation algorithm corresponding to the keyword values of the keywords;

the plug-in parameters of the data output plug-in include: address information of the data analysis system.

9. The method of claim 7 or 8, wherein the configuration file further comprises: and identifying the queue of the data transfer queue between the connected plug-ins.

10. A data processing device is characterized in that the device is arranged on electronic equipment running with a data acquisition system or on electronic equipment in communication connection with a data output end of the electronic equipment running with the data acquisition system; the electronic equipment is in communication connection with the electronic equipment running with the data analysis system; the device comprises:

the extraction module is used for extracting specified data from initial data of a target data source within a preset timing duration, and carrying out aggregation processing on the specified data through a preset aggregation algorithm to obtain an aggregation result;

and the output module is used for outputting the aggregation result to the data analysis system when the time duration is reached.

11. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the data processing method of any one of claims 1 to 9.

12. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the data processing method of any of claims 1 to 9.