CN114205654A

CN114205654A - Data processing system, method, apparatus, computer-readable storage medium, and device

Info

Publication number: CN114205654A
Application number: CN202111535736.5A
Authority: CN
Inventors: 孙梓洋; 余波; 俞翔; 张永民
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-18

Abstract

The application provides a data processing system, a method, a device, a computer readable storage medium and a device, which relate to the technical field of computers, wherein in the system, a coordinator is used for acquiring incremental plaintext data to be processed and determining the current load of a central processing unit when a trigger starting instruction is received; and determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data. The collector is used for reading the incremental plaintext data into the cache according to the target data collection mode; and when detecting that the increment plaintext data has the preset identification, transmitting all the data in the cache to the sorter, and emptying the cache. And the sorter is used for cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule and generating the structured data which accords with a preset format according to the plurality of data pages to be processed. Therefore, the method can improve the data acquisition efficiency and the data processing efficiency and reduce the equipment maintenance difficulty.

Description

Data processing system, method, apparatus, computer-readable storage medium, and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a data processing system, a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device.

Background

An interactive IP network television (IPTV) is a new technology that integrates a plurality of technologies such as internet, multimedia, and communication by using a broadband cable television network, and provides a plurality of interactive services including digital television to home users. An Electronic Program Guide (EPG) is an output initiation interface for service capability for user interaction, Program selection, and interaction with a user. The indexing and navigation of various services provided by IPTV are usually completed through an EPG system, the IPTV EPG can be actually understood as a portal system of IPTV, an interface of the EPG system is similar to a Web page, and on the EPG interface, various menus, buttons, links and other components that can be directly clicked by a user when the user selects a program are usually provided for the user, and the user can also browse various dynamic or static multimedia contents.

In general, an EPG apparatus may provide EPG business services to users, and data generated based on the EPG apparatus may be used to analyze user behavior or perform other analysis purposes. At present, most of data generated based on EPG equipment is stored in a service log of an external container, and data calling also needs to depend on a plurality of external equipment, so that the problems of high maintenance difficulty and low data acquisition efficiency are easily caused.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.

Disclosure of Invention

An object of the present application is to provide a data processing system, a data processing method, a data processing apparatus, a computer-readable storage medium, and an electronic device, which can disclose a data processing system, and rely on a cache to perform real-time data acquisition and data structuring, so that data can be processed more quickly without relying on an external device, thereby improving data acquisition efficiency and data processing efficiency, and reducing the difficulty in maintaining the device.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of the present application, there is provided a data processing system comprising at least a coordinator, a collector, and a sorter, wherein:

the coordinator is used for acquiring incremental plaintext data to be processed and determining the current load of the central processing unit when a trigger starting instruction is received; determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data;

the collector is used for reading the incremental plaintext data into the cache according to the target data collection mode; when detecting that the increment plaintext data has the preset identification, transmitting all data in the cache to a sorter, and emptying the cache;

and the sorter is used for cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule and generating the structured data which accords with a preset format according to the plurality of data pages to be processed.

In an exemplary embodiment of the present application, the system further includes a data completer, if the collector detects that the cache space occupancy rate is greater than the preset occupancy rate and does not detect the preset identifier, wherein:

the collector is also used for transmitting at least one piece of complete data to the sorter and emptying the cache when detecting that only at least one piece of complete data exists in the cache;

and the collector is also used for transmitting at least one piece of complete data to the sorter, transmitting the incomplete data to the data completer and emptying the cache when detecting that at least one piece of complete data and incomplete data exist in the cache.

In an exemplary embodiment of the present application, wherein:

the collector is also used for acquiring incomplete data from the data complementer when a new round of cache reading is carried out, and reading the incomplete data and unprocessed residual data in the incremental plaintext data into the cache;

and the collector is also used for transmitting all the data in the cache to the sorter and emptying the cache when detecting that the preset identification exists in the residual data.

In an exemplary embodiment of the present application, the system further comprises a monitor and a recorder, wherein:

the monitor is used for acquiring the data processing progress and the total data amount in the recorder;

the monitor is also used for determining data increment according to the data processing progress and the data total amount;

and the monitor is also used for determining the corresponding relation between the incremental plaintext data corresponding to the data increment and the specific mark, and if the incremental plaintext data does not have the corresponding relation with the specific mark, sending a trigger starting instruction to the coordinator.

In an exemplary embodiment of the present application, wherein,

the coordinator is also used for segmenting the incremental plaintext data according to a preset threshold value when detecting that the incremental plaintext data is larger than or equal to the preset threshold value before determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data, and determining an initial processing position and a cut-to-processing position according to the incremental plaintext data which are obtained by segmentation and correspond to the preset threshold value; generating a trigger starting instruction according to the initial processing position and the cut-to-processing position, and sending the trigger starting instruction to the collector;

and the collector reads the incremental plaintext data into the cache according to the target data collection mode, and the method comprises the following steps:

the collector reads the incremental plaintext data into the cache based on the trigger starting instruction and according to the target data collection mode;

in an exemplary embodiment of the present application, the coordinator determines a target data collection mode from at least two data collection modes according to the current load of the central processor and incremental plaintext data, including:

when the coordinator detects that the current load of the central processing unit is greater than or equal to a preset load threshold value or the incremental plaintext data is greater than or equal to a preset processing amount, determining a single-thread acquisition mode as a target data acquisition mode;

and when detecting that the incremental plaintext data is less than or equal to the preset processing amount and the current load of the central processing unit is less than the preset load threshold value, the coordinator determines the multithreading collection mode as the target data collection mode.

In an exemplary embodiment of the present application, if the target data collection mode is a multi-thread collection mode, wherein:

the coordinator is also used for determining the number of bytes to be processed corresponding to each thread in the multiple threads according to the number of bytes of the incremental plaintext data;

the coordinator is also used for dividing the incremental plaintext data into a plurality of byte blocks according to the number of bytes to be processed corresponding to each thread and determining byte offset; wherein the number of byte blocks corresponds to the number of threads;

and the collector reads the incremental plaintext data into the cache according to the target data collection mode, and the method comprises the following steps: the collector starts a plurality of threads to read the byte blocks corresponding to the threads into the cache according to the byte offset; and respectively carrying out preset identification detection on the byte blocks corresponding to the threads.

In an exemplary embodiment of the present application, the system further comprises a recorder, wherein:

and the recorder is used for storing the data processing progress and the byte number of the incremental plaintext data and updating the data processing progress after the plurality of threads finish the detection of the preset identification of the corresponding byte block.

In an exemplary embodiment of the present application, wherein:

the recorder is also used for storing the current load of the central processing unit when the existing storage space has the residual space;

and the recorder is also used for covering the load record of a specific position in the existing storage space through the current load of the central processing unit when the existing storage space has no residual space.

According to an aspect of the present application, there is provided a data processing method, characterized in that the method includes:

when a trigger starting instruction is received, obtaining incremental plaintext data to be processed, and determining the current load of a central processing unit;

determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data;

reading the incremental plaintext data into a cache according to a target data acquisition mode;

when detecting that the increment plaintext data has the preset identification, transmitting all data in the cache to a sorter, and emptying the cache;

and cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule, and generating structured data conforming to a preset format according to the plurality of data pages to be processed.

According to an aspect of the present application, there is provided a data processing apparatus including:

the data acquisition unit is used for acquiring incremental plaintext data to be processed and determining the current load of the central processing unit when a trigger starting instruction is received;

the data acquisition mode determining unit is used for determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data;

the read-in cache unit is used for reading the incremental plaintext data into a cache according to the target data acquisition mode;

the data transmission unit is used for transmitting all data in the cache to the sorter and emptying the cache when detecting that the preset identification exists in the incremental plaintext data;

and the structured data generation unit is used for cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule and generating structured data conforming to a preset format according to the plurality of data pages to be processed.

According to an aspect of the present application, there is provided an electronic device including: a processor; and a memory for storing executable instructions for the processor; wherein the processor is configured to perform the method of any of the above via execution of the executable instructions.

According to an aspect of the application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any one of the above.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

The exemplary embodiments of the present application may have some or all of the following advantages:

the data processing system provided by an example embodiment of the application at least comprises a coordinator, a collector and a sorter, wherein the coordinator is used for acquiring incremental plaintext data to be processed and determining the current load of a central processing unit when a trigger starting instruction is received; and determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data. The collector is used for reading the incremental plaintext data into the cache according to the target data collection mode; and when detecting that the increment plaintext data has the preset identification, transmitting all the data in the cache to the sorter, and emptying the cache. And the sorter is used for cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule and generating the structured data which accords with a preset format according to the plurality of data pages to be processed. According to the technical description, on one hand, the instant data acquisition and data structuring are carried out by depending on the cache, so that the data can be processed more quickly under the condition of not depending on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty is reduced. In another aspect of the present application, consumption of network resources may also be reduced based on the use of cache.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 illustrates a schematic structural diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present application;

FIG. 2 schematically shows a block diagram of a data processing system according to an embodiment of the present application;

FIG. 3 schematically illustrates a block diagram of an information query system according to another embodiment of the present application;

FIG. 4 schematically shows a sequence diagram of a data processing system according to an embodiment of the present application;

FIG. 5 schematically shows a flow diagram of a data processing method according to an embodiment of the present application;

fig. 6 schematically shows a block diagram of a data processing apparatus in an embodiment according to the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present application.

Furthermore, the drawings are merely schematic illustrations of the present application and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating a computer system suitable for implementing an electronic device according to an embodiment of the present disclosure. The electronic device may be an EPG device.

It should be noted that the computer system 100 of the electronic device shown in fig. 1 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 1, the computer system 100 includes a Central Processing Unit (CPU)101 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)102 or a program loaded from a storage section 108 into a Random Access Memory (RAM) 103. In the RAM 103, various programs and data necessary for system operation are also stored. The CPU 101, ROM 101, and RAM 103 are connected to each other via a bus 104. An input/output (I/O) interface 105 is also connected to bus 104.

The following components are connected to the I/O interface 105: an input portion 106 including a keyboard, a mouse, and the like; an output section 107 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 108 including a hard disk and the like; and a communication section 109 including a network interface card such as a LAN card, a modem, or the like. The communication section 109 performs communication processing via a network such as the internet. A drive 110 is also connected to the I/O interface 105 as needed. A removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 110 as necessary, so that a computer program read out therefrom is mounted into the storage section 108 as necessary.

In particular, according to embodiments of the present application, the processes described below with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 109, and/or installed from the removable medium 111. The computer program performs various functions defined in the method and apparatus of the present application when executed by a Central Processing Unit (CPU) 101.

Turning to FIG. 2, FIG. 2 schematically illustrates a block diagram of a data processing system according to an embodiment of the present application. As shown in FIG. 2, data processing system 200 may include at least: coordinator 210, collector 220, and sorter 230, wherein:

the coordinator 210 is configured to, when a trigger start instruction is received, obtain incremental plaintext data to be processed, and determine a current load of the central processing unit; and determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data.

The collector 220 is used for reading the incremental plaintext data into the cache according to the target data collection mode; when detecting that the preset identification exists in the incremental plaintext data, all data in the cache is transmitted to the sorter 230, and the cache is emptied.

The sorter 230 is configured to cut the incremental plaintext data into a plurality of to-be-processed data pages according to a preset cutting rule, and generate structured data conforming to a preset format according to the plurality of to-be-processed data pages.

Specifically, the coordinator 210, the collector 220, and the sorter 230 shown in fig. 2 may be disposed in an EPG device or other electronic devices, and the embodiment of the present invention is not limited thereto. Wherein, the trigger start instruction received by the coordinator 210 may be generated and issued by the monitor, and the trigger start instruction is used to trigger the coordinator 210 to start. After the coordinator 210 is started, when a trigger starting instruction is received, incremental plaintext data to be processed is acquired, and the current load of the central processing unit is determined; determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data; the incremental plaintext data may be various types of log data, the incremental plaintext data may be used to indicate a user behavior, and the incremental plaintext data may include one or more pieces of line data, which is not limited in this embodiment of the application.

In addition, the collector 220 may read the incremental plaintext data into the cache according to the target data collection mode; upon detecting the presence of a preset identification (e.g., 0x0a) in the incremental plaintext data, all of the data in the buffer is transferred to sorter 230 and the buffer is emptied. Where the incremental plaintext data may comprise multiple lines of data.

Furthermore, the sorter 230 may cut the incremental plaintext data into a plurality of to-be-processed data pages according to a preset cutting rule, and generate the structured data conforming to the preset format according to the plurality of to-be-processed data pages. The preset cutting rule may define a cutting mode for the incremental plaintext data, and the preset cutting rule may be represented by a character string, a text, or the like, which is not limited in the embodiment of the present application. In addition, the plurality of data pages to be processed may conform to a preset data page representation form, and the structured data conforming to the preset format may be represented in a form of a character string, a table, and the like, which is not limited in the embodiment of the present application, for example, the structured data may be represented in a form of [ metadata, data content ]. The preset format may include one or more types, and optionally, before the sorter 230 generates the structured data conforming to the preset format according to the plurality of data pages to be processed, the sorter may be further configured to select a corresponding preset format from the plurality of preset formats based on the data type of the incremental plaintext data.

In addition, optionally, the sorter 230 generates structured data conforming to a preset format according to a plurality of data pages to be processed, which specifically includes: the sorter 230 acquires key information from a plurality of data pages to be processed according to a first preset rule; the sorter 230 deletes redundant information in the plurality of data pages to be processed according to a second preset rule; the sorter 230 generates a character string array including an index, and acquires data corresponding to each metadata from a plurality of to-be-processed data pages obtained after processing based on a first preset rule and a second preset rule according to a metadata correspondence relationship, so as to obtain structured data conforming to a preset format.

For example, incremental plaintext data may be represented as follows:

furthermore, the sorter 230 may cut the incremental plaintext data into a plurality of to-be-processed data pages according to a preset cutting rule, where the plurality of to-be-processed data pages may be represented as follows:

further, the sorter 230 may acquire key information from the plurality of data sheets to be processed according to a first preset rule. For example, the first preset rule may include: the user IP is obtained from page 0, the user account is obtained from page 4, and the contents of all pages after page 7 are spliced. The key information may be expressed as follows:

furthermore, the sorter 230 may delete redundant information in the plurality of data pages to be processed according to a second preset rule. For example, the second preset rule may include: the deleted characters pos [0] and pos [ strlen (str) ] are written back again to page 7. Furthermore, the sorter 230 may generate a character string array including an index, and obtain data corresponding to each metadata from a plurality of to-be-processed data pages obtained after processing based on a first preset rule and a second preset rule according to a metadata correspondence relationship, so as to obtain structured data conforming to a preset format, where the structured data may be represented as follows:

further, the generated structured data may be persisted for subsequent data analysis.

Therefore, by implementing the data processing system shown in fig. 2, the cache is relied on for real-time data acquisition and data structuring, so that data can be processed more quickly without relying on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty can be reduced. In addition, consumption of network resources can be reduced based on the use of the cache.

Referring to fig. 3, fig. 3 schematically shows a block diagram of an information query system according to another embodiment of the present application. As shown in fig. 3, data processing system 300 may include at least: coordinator 320, collector 340, sorter 330, data completer 350, monitor 310, recorder 360. It should be noted that the monitor 310 included in the present application may rely on the monitoring instance (watch) to perform the corresponding steps; the coordinator 320, the collector 340, the sorter 330, the data completer 350 and the recorder 360 can rely on a data processing instance (Worker) to execute corresponding steps; wherein, the monitoring instance (watch) is resident in the memory, and the monitoring instance (watch) can start the coordinator 320 of the Worker according to the preset unit time length (for example, 30 s).

A monitor 310 for acquiring the progress of data processing and the total amount of data in the recorder 360; determining data increment according to the data processing progress and the data total amount; and determining the corresponding relationship between the incremental plaintext data corresponding to the data increment and the specific mark, and if the incremental plaintext data does not have the corresponding relationship with the specific mark, sending a trigger starting instruction to the coordinator 320.

The coordinator 320 is used for acquiring incremental plaintext data to be processed and determining the current load of the central processing unit when a trigger starting instruction is received; when the current load of the central processing unit is detected to be greater than or equal to a preset load threshold value or the incremental plaintext data is detected to be greater than or equal to a preset processing amount, determining a single-thread acquisition mode as a target data acquisition mode; and when the incremental plaintext data is detected to be less than or equal to the preset processing amount and the current load of the central processing unit is detected to be less than the preset load threshold value, determining the multithreading collection mode as the target data collection mode.

The collector 340 is used for reading the incremental plaintext data into the cache according to the target data collection mode; when detecting that the preset identification exists in the incremental plaintext data, all data in the cache is transmitted to the sorter 330, and the cache is emptied.

The sorter 330 is configured to cut the incremental plaintext data into a plurality of to-be-processed data pages according to a preset cutting rule, and generate structured data conforming to a preset format according to the plurality of to-be-processed data pages.

The recorder 360 is configured to store the data processing progress and the number of bytes of incremental plaintext data and update the data processing progress after the multiple threads complete the preset identification detection for the respective corresponding byte block; when the existing storage space has residual space, storing the current load of the central processing unit; and when the existing storage space has no residual space, covering the load record of a specific position in the existing storage space by the current load of the central processing unit.

Specifically, the coordinator 320, the collector 340, the sorter 330, the data complementer 350, the monitor 310, and the recorder 360 shown in fig. 2 may be disposed in an EPG device or other electronic devices, and the embodiment of the present invention is not limited thereto. The monitor 310 may obtain the data processing progress and the total amount of data in the recorder 360, and specifically perform as: the monitor 310 obtains the data processing progress and the total amount of data in the recorder 360 according to a preset time interval WorkerInterval; the measurement unit of the preset time interval WorkerInterval is MS, the data processing progress can be determined based on the byte position where the pointer is located, and the total amount of data can be represented as the current total number of bytes.

Based on this, the monitor 310 determines the data increment according to the data processing progress and the data total amount, including: the monitor 310 calculates the data increment according to an expression (data increment-total data amount-data processing progress). Further, the monitor 310 may also determine a correspondence between incremental plaintext data corresponding to the data increment and the specific mark; the specific mark may be used to indicate whether the data is locked, and optionally, the specific mark may also be used to indicate whether the data corresponds to a specific user or otherwise, which is not limited in the embodiments of the present application. When the incremental plaintext data does not correspond to a particular tag, the monitor 310 may send a trigger start instruction to the coordinator 320. When there is a corresponding relationship between the incremental plaintext data and the specific mark or the number of the incremental plaintext data is 0, the process is ended.

Further, the coordinator 320 may obtain incremental plaintext data to be processed based on a preset file log path after receiving a trigger start instruction, and determine a current load of the central processing unit (i.e., a current load of the CPU); when detecting that the current load of a central processing unit is greater than or equal to a preset load threshold value or increment plaintext data is greater than or equal to a preset processing amount (burstLimitSize), determining a single-thread collection mode (Lazy mode) as a target data collection mode; when the incremental plaintext data is detected to be smaller than or equal to a preset processing amount (Burst LimitSize) and the current load of the central processing unit is smaller than a preset load threshold value, determining a multithreading collection mode (Burst mode) as a target data collection mode; the preset load threshold and the preset processing amount can be preset any numerical value, and the measurement unit of the preset processing amount (burstlimit size) is MB. Optionally, the coordinator 320 may be further configured to calculate the target value according to an expression (the first target value ═ cpu current load/100 × 10 ]); acquiring target values of a preset number (such as 30) according to a time sequence, determining the mode of the target values of the preset number, and if the detected mode is smaller than a preset load threshold, judging that the current load of the central processing unit is smaller than the preset load threshold; and if the mode is detected to be larger than or equal to the preset load threshold, judging that the current load of the central processing unit is larger than or equal to the preset load threshold.

Optionally, before the coordinator 320 determines the target data collection mode from the at least two data collection modes according to the current load of the central processing unit and the incremental plaintext data, it may be further configured to, when it is detected that the incremental plaintext data is greater than or equal to a preset threshold (input multiple max), segment the incremental plaintext data according to the preset threshold, and determine an initial processing position and an interception processing position according to the obtained incremental plaintext data corresponding to the preset threshold (input multiple max); the preset threshold may be the maximum processing data amount (burst limit size) or any preset value, and the measurement unit of the preset threshold (input multiple max) is MB; further, a trigger starting instruction is generated according to the initial processing position and the cut-to-processing position, and the trigger starting instruction is sent to the collector 340, so that the problems of memory overflow caused by overlarge data and energy consumption influence caused by long-time data processing can be avoided; wherein the start processing position and the end processing position are used to indicate the data storage location.

If the target data acquisition mode is a single-thread acquisition mode (Lazy mode), the acquirer 340 reads the incremental plaintext data into the cache according to the target data acquisition mode, including: determining data to be processed according to the initial processing position and the cut-to-processing position, and reading the data to be processed into a cache according to the target data acquisition mode, so that the occupancy rate of computer resources can be reduced; the data to be processed may be a part of the incremental plaintext data, or may be the incremental plaintext data, and the embodiment of the present application is not limited. Upon detecting the presence of the predetermined identification in the incremental plaintext data, the collector 340 may transmit all data in the cache to the sorter 330 and empty the cache.

If the target data acquisition mode is a multi-thread acquisition mode (Burst mode), wherein: the coordinator 320 is further configured to determine, according to the number of bytes of the incremental plaintext data, the number of bytes to be processed corresponding to each thread in the multiple threads; the coordinator 320 is further configured to divide the incremental plaintext data into a plurality of byte blocks according to the number of bytes to be processed respectively corresponding to each thread, and determine a byte offset, so that data processing efficiency can be improved; wherein the number of byte blocks corresponds to the number of threads. Wherein, each thread corresponds to different numbers (e.g., 1, 2, 3, … …), and there is a precedence order based on the numbers between each thread, and each thread can process the corresponding byte block based on the respective corresponding numbers.

Based on this, the collector 340 further reads the incremental plaintext data into the cache according to the target data collection mode, and may perform: the collector 340 obtains byte blocks to be processed based on a preset file log path, and starts a plurality of threads to read the respective corresponding byte blocks into the cache at the same time according to the byte offset; and respectively carrying out preset identification detection on the byte blocks corresponding to the threads.

Further, the collector 340 may also transmit all data in the buffer to the sorter 330 and empty the buffer when detecting that the incremental plaintext data has a preset identifier (e.g., 0x0 a).

In the process of reading the incremental plaintext data into the cache according to the target data acquisition mode, if the acquisition unit 340 detects that the cache space occupancy rate is greater than the preset occupancy rate and does not detect the preset identifier, the acquisition unit 340 is further configured to transmit at least one piece of complete data to the sorter 330 and empty the cache when detecting that only at least one piece of complete data exists in the cache; the collector 340 is further configured to, when detecting that at least one complete data and an incomplete data exist in the buffer, transmit the at least one complete data to the sorter 330, transmit the incomplete data to the data completer 350, and empty the buffer. Further optionally, the collector 340 is further configured to, when a new round of cache reading is performed, obtain incomplete data from the data complementer 350, and read the incomplete data and unprocessed remaining data in the incremental plaintext data into the cache; the collector 340 is further configured to transmit all data in the cache to the sorter 330 and empty the cache when detecting that the preset identifier exists in the remaining data.

Furthermore, the sorter 330 may cut the incremental plaintext data into a plurality of to-be-processed data pages according to a preset cutting rule, and generate structured data conforming to a preset format (e.g., a key value format) according to the plurality of to-be-processed data pages. For example, the size of the data page to be processed may be 1 kb. Furthermore, sorter 330 may also output the structured data that conforms to the predetermined format to a value store (e.g., Redis/DB/HDFS, etc.).

In addition, after the multiple threads complete the preset identification detection for the respective corresponding byte block, the recorder 360 stores the data processing progress and the byte number of the incremental plaintext data and updates the data processing progress; when the existing storage space has residual space, storing the current load of the central processing unit; and when the existing storage space has no residual space, covering the load record of a specific position in the existing storage space by the current load of the central processing unit.

Therefore, by implementing the data processing system shown in fig. 3, the data processing system relies on the cache to perform real-time data acquisition and data structuring, so that the data can be processed more quickly without relying on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty can be reduced. In addition, consumption of network resources can be reduced based on the use of the cache.

Referring to fig. 4, fig. 4 schematically shows a sequence diagram of a data processing system according to an embodiment of the present application. As shown in fig. 4, the sequence diagram may include: step S410 to step S470.

Step S410: the monitor acquires the data processing progress and the total data amount in the recorder; determining data increment according to the data processing progress and the data total amount; and determining the corresponding relation between the incremental plaintext data corresponding to the data increment and the specific mark, and if the incremental plaintext data does not have the corresponding relation with the specific mark, sending a trigger starting instruction to the coordinator.

Step S420: when receiving a trigger starting instruction, the coordinator acquires incremental plaintext data to be processed and determines the current load of the central processing unit; when the current load of the central processing unit is detected to be greater than or equal to a preset load threshold value or the incremental plaintext data is detected to be greater than or equal to a preset processing amount, determining a single-thread acquisition mode as a target data acquisition mode; and when the incremental plaintext data is detected to be less than or equal to the preset processing amount and the current load of the central processing unit is detected to be less than the preset load threshold value, determining the multithreading collection mode as the target data collection mode.

Step S430: the collector reads the incremental plaintext data into the cache according to the target data collection mode; and when detecting that the increment plaintext data has the preset identification, transmitting all the data in the cache to the sorter, and emptying the cache.

Step S440: the sorter cuts the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule, and generates structured data conforming to a preset format according to the plurality of data pages to be processed.

Step S450: when detecting that only at least one piece of complete data exists in the cache, the collector transmits the at least one piece of complete data to the sorter and clears the cache; when detecting that at least one piece of complete data and incomplete data exist in the cache, transmitting the at least one piece of complete data to the sorter, transmitting the incomplete data to the data completer, and emptying the cache.

Step S460: when a new round of cache reading is carried out, the collector acquires incomplete data from the data complementer and reads the incomplete data and unprocessed residual data in the incremental plaintext data into the cache; and when detecting that the preset identification exists in the residual data, transmitting all the data in the cache to the sorter, and emptying the cache.

Step S470: after the recorder finishes the detection of the preset identification aiming at the corresponding byte block, the data processing progress and the byte number of the incremental plaintext data are stored and the data processing progress is updated; when the existing storage space has residual space, storing the current load of the central processing unit; and when the existing storage space has no residual space, covering the load record of a specific position in the existing storage space by the current load of the central processing unit.

It should be noted that steps S410 to S470 correspond to the steps implemented by the system shown in fig. 3, and for the specific implementation of steps S410 to S470, please refer to the steps implemented by the system shown in fig. 3 and the embodiments thereof, which are not described herein again.

Therefore, by implementing the data processing system shown in fig. 4, the cache is relied on for real-time data acquisition and data structuring, so that data can be processed more quickly without relying on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty can be reduced. In addition, consumption of network resources can be reduced based on the use of the cache.

Referring to fig. 5, fig. 5 schematically shows a flow chart of a data processing method according to an embodiment of the present application. As shown in fig. 5, the data processing method may include: step S510 to step S550.

Step S510: and when a trigger starting instruction is received, acquiring incremental plaintext data to be processed, and determining the current load of the central processing unit.

Step S520: and determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data.

Step S530: and reading the incremental plaintext data into a cache according to the target data acquisition mode.

Step S540: and when detecting that the increment plaintext data has the preset identification, transmitting all the data in the cache to the sorter, and emptying the cache.

Step S550: and cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule, and generating structured data conforming to a preset format according to the plurality of data pages to be processed.

It should be noted that steps S510 to S550 correspond to the steps implemented by the system shown in fig. 2, and for the specific implementation of steps S510 to S550, please refer to the steps implemented by the system shown in fig. 2 and the embodiment thereof, which are not described herein again.

Therefore, by implementing the data processing method shown in fig. 5, the cache is relied on for real-time data acquisition and data structuring, so that data can be processed more quickly without relying on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty can be reduced. In addition, consumption of network resources can be reduced based on the use of the cache.

Referring to fig. 6, fig. 6 schematically shows a block diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 6, the data processing apparatus 600 may include: the device comprises a data acquisition unit 610, a data acquisition mode determination unit 620, a read-in buffer unit 630, a data transmission unit 640 and a structured data generation unit 650.

The data acquisition unit 610 is used for acquiring incremental plaintext data to be processed and determining the current load of the central processing unit when a trigger starting instruction is received;

a data acquisition mode determining unit 620, configured to determine a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data;

a read-in cache unit 630, configured to read the incremental plaintext data into a cache according to the target data acquisition mode;

the data transmission unit 640 is configured to transmit all data in the cache to the sorter and empty the cache when detecting that the increment plaintext data has the preset identifier;

the structured data generating unit 650 is configured to cut the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule, and generate structured data conforming to a preset format according to the plurality of data pages to be processed.

Therefore, by implementing the data processing device shown in fig. 6, the data processing device relies on the cache to perform real-time data acquisition and data structuring, so that the data can be processed more quickly without relying on external equipment, the data acquisition efficiency and the data processing efficiency can be improved, and the equipment maintenance difficulty can be reduced. In addition, consumption of network resources can be reduced based on the use of the cache.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the data processing system described above for details which are not disclosed in the embodiments of the apparatus of the present application, since the respective functional modules of the data processing apparatus of the exemplary embodiment of the present application correspond to the steps of the exemplary embodiment of the data processing system described above.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that the computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A data processing system, characterized in that the system comprises at least a coordinator, a collector and a sorter, wherein:

the collector is used for reading the increment plaintext data into a cache according to the target data collection mode; when detecting that the increment plaintext data has a preset identifier, transmitting all data in the cache to a sorter, and emptying the cache;

and the sorter is used for cutting the incremental plaintext data into a plurality of data pages to be processed according to a preset cutting rule and generating structured data conforming to a preset format according to the plurality of data pages to be processed.

2. The system of claim 1, further comprising a data complementer, if the collector detects that the occupancy rate of the cache space is greater than a preset occupancy rate and the preset identifier is not detected, wherein:

the collector is further configured to transmit at least one piece of complete data to the sorter and empty the cache when it is detected that only at least one piece of complete data exists in the cache;

the collector is further configured to transmit the at least one piece of complete data to the sorter, transmit the incomplete data to the data completer, and empty the cache when detecting that the cache has at least one piece of complete data and incomplete data.

3. The system of claim 2, wherein:

the collector is further configured to obtain the incomplete data from the data completer when a new round of cache reading is performed, and read the incomplete data and unprocessed remaining data in the incremental plaintext data into the cache;

the collector is further configured to transmit all data in the cache to the sorter and empty the cache when detecting that the preset identifier exists in the remaining data.

4. The system of claim 1, further comprising a monitor and a recorder, wherein:

the monitor is further used for determining data increment according to the data processing progress and the data total amount;

the monitor is further configured to determine a corresponding relationship between incremental plaintext data corresponding to the data increment and a specific mark, and send the trigger start instruction to the coordinator if the incremental plaintext data does not have a corresponding relationship with the specific mark.

5. The system of claim 1, wherein,

the coordinator is further configured to, before determining a target data acquisition mode from at least two data acquisition modes according to the current load of the central processing unit and the incremental plaintext data, segment the incremental plaintext data according to a preset threshold when it is detected that the incremental plaintext data is greater than or equal to the preset threshold, and determine an initial processing position and a truncation processing position according to the incremental plaintext data obtained by the segmentation and corresponding to the preset threshold; generating a trigger starting instruction according to the starting processing position and the interception processing position, and sending the trigger starting instruction to the collector;

and the collector reads the incremental plaintext data into a cache according to the target data collection mode, and the method comprises the following steps:

the collector reads the incremental plaintext data into a cache based on the trigger starting instruction and according to the target data collection mode;

and the monitor generates a trigger starting instruction according to the starting processing position and the ending processing position and sends the trigger starting instruction to the coordinator.

6. The system of claim 1, wherein said coordinator determines a target data collection mode from at least two data collection modes based on said central processor current load and said incremental plaintext data, comprising:

when the coordinator detects that the current load of the central processing unit is greater than or equal to a preset load threshold value or the incremental plaintext data is greater than or equal to a preset processing amount, determining a single-thread acquisition mode as the target data acquisition mode;

and when detecting that the incremental plaintext data is less than or equal to the preset processing amount and the current load of the central processing unit is less than the preset load threshold value, the coordinator determines a multithreading collection mode as the target data collection mode.

7. The system of claim 6, wherein if the target data collection mode is the multi-threaded collection mode, wherein:

the coordinator is further configured to determine, according to the number of bytes of the incremental plaintext data, the number of bytes to be processed corresponding to each thread in the multiple threads;

the coordinator is further configured to divide the incremental plaintext data into a plurality of byte blocks according to the number of bytes to be processed respectively corresponding to each thread, and determine a byte offset; wherein the plurality of byte blocks corresponds to a same number as the plurality of threads;

8. The system of claim 7, further comprising a recorder, wherein:

and the recorder is used for storing the data processing progress and the byte number of the incremental plaintext data and updating the data processing progress after the plurality of threads finish the detection of the preset identification aiming at the corresponding byte block.

9. The system of claim 8, wherein:

the recorder is further configured to cover the load record of the specific location in the existing storage space through the current load of the central processing unit when the existing storage space does not have the remaining space.

10. A method of data processing, the method comprising:

reading the incremental plaintext data into a cache according to the target data acquisition mode;

when detecting that the increment plaintext data has a preset identifier, transmitting all data in the cache to a sorter, and emptying the cache;

11. A data processing apparatus, characterized in that the apparatus comprises:

the read-in cache unit is used for reading the increment plaintext data into a cache according to the target data acquisition mode;

the data transmission unit is used for transmitting all data in the cache to a sorter and emptying the cache when detecting that a preset identifier exists in the incremental plaintext data;

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the system according to any one of claims 1-9 or the method according to claim 10.

13. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the steps performed by the system of any one of claims 1-9 or the method of claim 10 via execution of the executable instructions.