CN110990438A - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110990438A
CN110990438A CN201911257201.9A CN201911257201A CN110990438A CN 110990438 A CN110990438 A CN 110990438A CN 201911257201 A CN201911257201 A CN 201911257201A CN 110990438 A CN110990438 A CN 110990438A
Authority
CN
China
Prior art keywords
data
delayed
time
time window
delay
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911257201.9A
Other languages
Chinese (zh)
Inventor
李尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201911257201.9A priority Critical patent/CN110990438A/en
Publication of CN110990438A publication Critical patent/CN110990438A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data processing method and device, electronic equipment and a storage medium, and relates to the technical field of data processing. According to the data processing method, the data processing device, the electronic equipment and the storage medium, the data are read from the data cache region, the time window is set, whether the delayed data exist or not is judged based on the time window, then the delayed data are written into the delay cache region, the non-delayed data are processed, the same logic is used in the whole processing process, the coupling degree of data processing is reduced, the delayed data cannot be lost, and the accuracy of data calculation is improved.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the rapid development of the streaming computing in the big data era, higher and higher requirements are put forward on the real-time performance, quality, reliability and availability of the streaming data, and the streaming computing engine can perform a large amount of real-time computing and read mass data. At present, in many scenarios, a lambda architecture is used for performing streaming data calculation processing, the lambda architecture is divided into three layers, namely a Batch Layer (Batch Layer), a service Layer (Serving Layer) and a Speed Layer (Speed Layer), and streaming data processing is performed through the lambda architecture, if the logic of one Layer of the lambda architecture changes, the logic of the other two layers or one Layer of the lambda architecture also changes, which results in a large coupling degree between the layers, and data may be lost, which results in inaccurate data calculation results.
Disclosure of Invention
Based on the above research, the present invention provides a data processing method, apparatus, electronic device, and storage medium to improve the above problems.
Embodiments of the invention may be implemented as follows:
in a first aspect, an embodiment of the present invention provides a data processing method, where the method includes:
reading data from a data cache region, setting a time window, and judging whether delayed data exists or not based on the time window;
and writing the delayed data into a delay buffer area, and processing the undelayed data.
In an optional embodiment, the step of determining whether there is delayed data based on the time window includes:
acquiring a time stamp of the read data, and judging whether the time stamp is in a time window of the current moment;
and if the timestamp is not in the time window of the current moment, judging that the data is not delayed, and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging that the data is delayed.
In an optional embodiment, the step of determining whether there is delayed data based on the time window includes:
acquiring a time stamp of the read data, and judging whether the time stamp is within the delay time of the current moment; the delay time is the sum of a delay time period after a time window of the current moment and the time window of the current moment;
if the time stamp is within the delay time, the data is judged not to be delayed, and if the time stamp is not within the delay time and the time stamp is within the delay time of the last moment, the data is judged to be delayed.
In an alternative embodiment, the method further comprises:
after the delayed data is written into the delay buffer area, reading the data in the data buffer area and the delay data in the delay buffer area, and processing the data in the data buffer area and the delay data in the delay buffer area based on the time windows with the same length.
In an alternative embodiment, the method further comprises:
after processing the data which is not delayed, outputting the processing result to a storage medium for storage; or, after processing the data in the data buffer area and the delay data in the delay buffer area, outputting the processing result to a storage medium for storage.
In an optional embodiment, when there is delayed data, the method further comprises:
and marking the delayed data, and writing the marked data into a delay cache region.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, where the data processing apparatus includes a reading module and a writing module;
the reading module is used for reading data from the data cache region, setting a time window and judging whether delayed data exists or not based on the time window;
the writing module is used for writing the delayed data into the delay cache region and processing the non-delayed data.
In an alternative embodiment, the reading module is configured to:
acquiring a time stamp of the read data, and judging whether the time stamp is in a time window of the current moment;
and if the timestamp is not in the time window of the current moment, judging that the data is not delayed, and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging that the data is delayed.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a non-volatile memory storing computer instructions, where the computer instructions, when executed by the processor, cause the electronic device to perform the data processing method described in any one of the foregoing embodiments.
In a fourth aspect, an embodiment of the present invention provides a storage medium, in which a computer program is stored, and the computer program, when executed, implements the data processing method described in any one of the foregoing embodiments.
According to the data processing method, the data processing device, the electronic equipment and the storage medium, the data are read from the data cache region, the time window is set, whether the delayed data exist or not is judged based on the time window, then the delayed data are written into the delay cache region, the non-delayed data are processed, the same logic is used in the whole processing process, the coupling degree of data processing is reduced, the delayed data cannot be lost, and the accuracy of data calculation is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic diagram of data processing in the prior art.
Fig. 2 is a block diagram of an electronic device according to an embodiment of the present invention.
Fig. 3 is a schematic flow chart of a data processing method according to an embodiment of the present invention.
Fig. 4 is a flow chart illustrating a sub-step of a data processing method according to an embodiment of the present invention.
Fig. 5 is a second flow chart illustrating sub-steps of the data processing method according to the embodiment of the present invention.
Fig. 6 is a schematic application diagram of a data processing method according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of another application of the data processing method according to the embodiment of the present invention.
Fig. 8 is a block diagram of a data processing apparatus according to an embodiment of the present invention.
Icon: 100-an electronic device; 10-a data processing device; 11-a reading module; 12-a write module; 20-a memory; 30-a processor; 40-a communication unit.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
At present, a lambda architecture is used for performing streaming data calculation processing in many scenarios, as shown in fig. 1, the lambda architecture is divided into three layers, which are a Batch Layer (Batch Layer), a service Layer (Serving Layer) and an acceleration Layer (Speed Layer), respectively, where the Batch Layer is used for storing a data set and pre-calculating a query function on the data set to construct a view corresponding to a query. And the Batch processing layer processes the whole volume data set, and directly obtains Batch View according to the whole offline data set. The acceleration layer is used for processing incremental Real-time data, calculating the data and generating Real-time View, the latest incremental data stream is processed, and the Real-time View is continuously updated when new data is received. And the service layer is used for responding to the query request of the user and combining the result data sets in the Batch View and the Real-time View into a final data set. The View is a core concept of the lambda architecture, is optimized for the query, and can quickly obtain the query result through the View.
When streaming data is processed through the lambda framework, frames used by the batch layer, the acceleration layer and the service layer may be different, for example, the batch layer may be calculated by frames such as Hadoop (Hadoop), Spark, and Flink, and the acceleration layer may be calculated by frames such as Storm, Spark streaming, and Flink, which results in higher development cost, and results of processing by the batch layer and the acceleration layer may be inconsistent, which results in high tracing difficulty and poor user experience.
Based on the above research, the present embodiment provides a data processing method to improve the above problems.
The data processing method provided by this embodiment is applied to the electronic device 100 shown in fig. 2, and the electronic device 100 executes the data processing method provided by this embodiment. In the embodiment, the electronic device 100 may be, but is not limited to, an electronic device 100 with a processing capability, such as a Personal Computer (PC), a notebook Computer, a Personal Digital Assistant (PDA), or a server.
The electronic device 100 comprises a data processing apparatus 10, a memory 20, a processor 30 and a communication unit 40; the various elements of the memory 20, processor 30 and communication unit 40 are electrically connected to each other, directly or indirectly, to enable the transfer or interaction of data. For example, the components may be directly electrically connected to each other via one or more communication buses or signal lines. The data processing apparatus 10 includes at least one software functional module which can be stored in the memory 20 in the form of software or Firmware (Firmware), and the processor 30 executes various functional applications and data processing by running software programs and modules stored in the memory 20.
The Memory 20 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 30 may be an integrated circuit chip having signal processing capabilities. The processor 30 may be a general-purpose processor including a Central Processing Unit (CPU), a Network Processor (NP), and the like.
The communication unit 40 is configured to establish a communication connection between the electronic device 100 and another external device through a network, and perform data transmission through the network.
It is to be understood that the configuration shown in fig. 2 is merely exemplary, and that the electronic device 100 may include more or fewer components than shown in fig. 2, or have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.
Based on the implementation architecture of the electronic device 100, please refer to fig. 3, fig. 3 is a schematic flow chart of the data processing method provided in this embodiment, and a detailed flow of the data processing method shown in fig. 3 is described in detail below.
Step S10: and reading data from the data cache region, setting a time window, and judging whether delayed data exists or not based on the time window.
Step S20: and writing the delayed data into a delay buffer area, and processing the undelayed data.
In an optional implementation manner, the present embodiment may cache data based on a kafka cluster, where kafka is a distributed message system and has the characteristics of high level expansion and high throughput. The messages in the kafka cluster are organized by Topic (subject), the message data in one Topic is organized according to a plurality of partitions, a partition is the minimum unit of the kafka message queue organization, and one partition can be regarded as a First-in First-out (FIFO) queue. Optionally, in this embodiment, the stream data is accessed to an input-cache in the kafka cluster for caching, where the input-cache is a data cache region, the delayed data is accessed to a late-cache in the kafka cluster for caching, and the late-cache is a delay cache region.
As another optional implementation, in this embodiment, the data may also be cached based on data storage tools such as hive and a relational database management system (MySQL), and the delayed data and the stream data are cached respectively, so as to construct a data cache region and a delayed cache region, cache the stream data in the data cache region, and cache the delayed data in the delayed cache region.
In the data processing method provided in this embodiment, after determining whether delayed data exists based on a time window, on one hand, the delayed data is written into a delay cache region for caching, and on the other hand, an operator is triggered to process non-delayed data based on a set time window. Optionally, in this implementation, the processing of the undelayed data may be, but is not limited to, performing processing such as mean calculation, normalization calculation, summation on the undelayed data.
In the data processing method provided by this embodiment, the same logic is used in the whole processing flow, whether delayed data exists is determined by a set time window, the delayed data is written into the delay cache region, and the non-delayed data is processed, so that the problem of high coupling in the data processing process is solved, the encoding process is simplified, and the code reuse rate is high. In addition, the data processing method provided by this embodiment writes the delayed data into the delay buffer, so as to ensure that the delayed data is not lost during the processing, and improve the accuracy of data processing.
In a specific application scenario, when data processing is performed, the occurrence time of the data, the time of accessing a computing engine, and the processing time of an operator are different, the data entering the computing engine can be transmitted through a complex network, when a problem occurs or is transmitted in a certain link in the transmission process, the network delays, and the transmitted data can arrive in a delayed manner. Therefore, in the flow calculation, the arrival time and arrival order of data cannot be specified, and all data cannot be stored. In order to determine whether the data is delayed data, in this embodiment, each data has a time stamp, and whether the data is delayed is determined according to the time stamp of the data.
In an exemplary embodiment, referring to fig. 4, the step of determining whether there is time-delayed data based on the time window includes steps S11 to S13.
Step S11: and acquiring the time stamp of the read data, and judging whether the time stamp is in the time window of the current moment.
Step S12: and if the timestamp is in the time window of the current moment, judging that the data is not delayed.
Step S13: and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging the data delay.
In this embodiment, the time window may be a scrolling time window, a sliding time window, or a conversation window. Taking the rolling time window as an example, if the length of the rolling time window is 1 hour and the current time is 2 points, the time window of the current time is 2 points to 3 points, the time window of the previous time is 1 point to two points, and the time window of the next time is 3 points to 4 points. Taking a sliding time window as an example, setting the sliding step length to be half an hour, the length to be 1 hour, and the current time to be 1 point, then the time window of the current time is 2 points to 3 points, the time window of the previous time is 1 point half to 2 point half, and the time window of the next time is 2 point half to 3 point half, wherein the windows of the sliding window are overlapped between different time intervals. Optionally, in this embodiment, the type and length of the time window may be set according to actual requirements, and are not limited in this embodiment.
And based on the setting of the time window, after the data is read from the data cache region, acquiring the time stamp of the read data, judging whether the time stamp is in the time window of the current moment, and if the time stamp is in the time window of the current moment, judging that the data is not delayed. And if the time stamp is not in the time window of the current moment and the time stamp is in the time window of the previous moment, judging the data delay.
For example, taking a rolling time window as an example, if the length of the time window is 10 minutes, and the current time is 2 points, the time window of the current time is 2 points to 2 points 10 minutes, and the time window of the previous time is 1 point 50 to 2 points, if the time stamp of the read data is 2 points 5 minutes, and the time stamp of the data is within the time window of the current time, the data is not delayed, and if the time stamp of the data is 1 point 59 minutes, the time stamp of the data is not within the time window of the current time, and the time stamp of the data is within the time window of the previous time, the data is delayed.
Because the statistics is carried out by using access time or operator execution time, the result of the statistics is deviated from the actual data generation time, and errors are introduced artificially. In order to reduce the error, in this embodiment, the occurrence time of the data is optionally used as the time stamp.
In an alternative embodiment, in order to improve the integrity and accuracy of data reading, please refer to fig. 5, the step of determining whether there is delayed data based on the time window further includes steps S14 to S16.
Step S14: and acquiring the time stamp of the read data, and judging whether the time stamp is within the delay time of the current time.
Step S15: and if the time stamp is within the delay time, judging that the data is not delayed.
Step S16: and if the time stamp is not within the delay time and the time stamp is within the delay time of the last moment, judging the data delay.
The delay time is the sum of the delay time period after the time window of the current moment and the time window of the current moment. For example, taking a rolling time window as an example, if the length of the rolling time window is 1 hour, the deferrable time period is half an hour, and the current time is 2 points, the deferrable time at the current time is 2 points to 3 points and half, the deferrable time at the previous time is zero points and half to two points, and the deferrable time at the next time is 3 points and half to 5 points.
In this embodiment, a delay time is set, and if the time stamp of the read data is within the delay time, the data is not delayed. For example, taking a rolling time window as an example, the length of the time window is set to be 10 minutes, the deferrable time period is 5 minutes, and the current time is 2 points, the time window of the current time is 2 points to 2 points 10 minutes, the deferrable time of the current time is 2 points to 2 points 15 minutes, and if the timestamps of the read data are all within 2 points to 2 points 15, the read data are all not delayed.
In the embodiment, the delay time is set, so that the delay time period of the data is increased, and in the delay time period after the time window, the read data of the timestamp in the time window is also undelayed data, thereby improving the integrity and accuracy of data reading.
In this embodiment, a timestamp is set for the data, the delayed data can be found according to the timestamp, and after the delayed data is found, the delayed data is written into the delay buffer area for buffering. And for non-delayed data, performing calculation processing based on the time window.
Because the generation of the delayed data cannot guarantee whether the read data can arrive in order, but the read data cannot wait indefinitely when the time window is calculated, a mechanism is needed to guarantee that the time window is triggered to be calculated after a specific time. Optionally, in this embodiment, a watermark mechanism (watermark) is used to process the out-of-order stream data.
In this embodiment, after data is read from the data buffer, a time stamp including a watermark time is generated based on the occurrence time of the read data. The watermark time may be generated by the data source or by a watermark generator. A watermark time with time t is understood to mean that all data less than or equal to (i.e. before) time t have been reached (with some reasonable probability), i.e. have been read. And if the watermark time in the time stamp of the read data is the ending time point of the time window of the current moment, triggering the time window to calculate, processing the read data in the time window, and outputting the processing result to a storage medium for storage.
For example, the time window of the current time is 1 point to 1 point 10 minutes, if the watermark time in the time stamp of the read data is 1 point 10 minutes, the electronic device 100 defaults that all the data between 1 point 10 minutes have arrived, that is, have been read, the electronic device 100 triggers the time window to perform calculation, that is, calculates the data read from 1 point to 1 point 10 minutes, and outputs the calculation result to the storage medium for storage. And if the time stamp of the data is in the time window from 1 point to 1 point 10 minutes, the data is the delay data.
If only processing the non-delayed data, the delayed data will be lost in the processing process, so as to cause incompleteness of the data and reduce the accuracy of the data processing result, and in order to improve the accuracy of the data processing, after writing the delayed data into the delay buffer area, the data processing method provided by this embodiment further includes the following processes:
and reading the data in the data cache region and the delay data in the delay cache region, and processing the data in the data cache region and the delay data in the delay cache region based on time windows with the same length.
In this embodiment, the data buffer may further retain the read stream data according to a set time period while caching the newly generated stream data, where the set time period may be set according to a requirement, for example, if the set time period is one month, the data buffer updates based on a time period of one month, that is, the stream data with a retention time of one month is cleaned. In this embodiment, the read stream data is retained based on the set time period, so that the stream data in the data buffer area in this embodiment can be reused, and when the data in the data buffer area needs to be recalculated, only the stream data in the data buffer area needs to be read again for calculation.
It can be understood that, while the delay buffer buffers the newly generated delayed data, the delay buffer may also retain the delayed data according to a set time period. Therefore, in order to improve the accuracy of data processing, after the delayed data is written into the delay buffer, the read data in the data buffer and the delayed data in the delay buffer can be simultaneously read again by a new task, and the data in the data buffer and the delayed data in the delay buffer are processed together.
And then processing the data in the data buffer area and the delay data in the delay buffer area based on a time window with the same length. For example, if data of a previous month is to be processed, data in the data buffer area and delayed data in the delayed buffer area in the previous month need to be read simultaneously, and if the length of the set time window is 10 minutes, the data in the data buffer area and the delayed data in the delayed buffer area which are read simultaneously in the same time period are processed by using the time window with the length of 10 minutes, and the calculation logic of the processing is the same as that of the processing of the data which is not delayed, so that the whole processing flow uses the same logic, and the code reuse rate is improved.
In this embodiment, after the delayed data is written into the delay buffer, the data in the data buffer and the delay data in the delay buffer are read again, and the data in the data buffer and the delay data in the delay buffer are processed based on the time windows with the same length, so that the delayed data is included in the processing process, the integrity of the data is ensured, and the accuracy of the data processing result is improved.
Optionally, in this embodiment, the processing procedure of reading the data in the data buffer and the delayed data in the delayed buffer again, and based on the time window with the same length, and processing the data in the data buffer and the delayed data in the delayed buffer may be performed in parallel with the processing procedure of reading the data from the data buffer, determining whether there is delayed data based on the set time window, writing the delayed data into the delayed buffer, and processing the non-delayed data, as shown in fig. 6, in which in fig. 6, job1 is the processing procedure of reading the data from the data buffer, determining whether there is delayed data based on the set time window, writing the delayed data into the delayed buffer, and processing the non-delayed data, job2 is the processing procedure of reading the data in the data buffer again and the delayed data in the delayed buffer, and based on the time window with the same length, the job1 and job2 are processed in parallel for the processing of data in the data buffer and delayed data in the delay buffer.
Based on the parallelism of the processing procedures, optionally, in this embodiment, if the processing progress of the job2 catches up with the processing progress of the job1, the processing result of the job1 may be replaced by the processing result of the job2, that is, the processing result of the job1 is replaced by the processing result of the job 2.
The data processing method provided in this embodiment may further add another task according to the task requirement, and process the multiple tasks in parallel, for example, as shown in fig. 7, this embodiment may further add a job3 according to another new task, and the job3 may recalculate the data in the data cache, where the job1, the job2, and the job3 may process in parallel.
Optionally, in this embodiment, like that after processing the data without delay, the processing result is output to the storage medium for storage, and after processing the data in the data buffer and the delay data in the delay buffer, the processing result may also be output to the storage medium for storage, and then the user may obtain the two processing results from the storage medium. Optionally, in this embodiment, the storage medium may be, but is not limited to, a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, which can store various data.
As an optional implementation manner, in this embodiment, the processing result of processing the non-delayed data and the processing result of processing the data in the data buffer and the delayed data in the delayed buffer may also be directly output to the user side used by the user.
As an optional implementation manner, in this embodiment, the delay data buffered in the delay buffer may be reused, that is, processed together with the data in the data buffer, or processed separately, or directly output, and the specific processing manner is set according to the service requirement.
In an optional embodiment, in order to improve the processing efficiency of the data, when there is delayed data, the method further includes:
and marking the delayed data, and writing the marked data into a delay cache region.
The delayed data is marked, and then the marked data is written into the delay buffer area according to the mark. When the data in the data buffer area and the delay data in the delay buffer area are read again at the same time, the delay data can be distinguished according to the marks, and the data processing efficiency is improved. Optionally, in this embodiment, the mark may be a tag, and a tag name of the tag may be self-defined.
In an optional implementation manner, if the amount of data to be processed is large, the data processing method provided in this embodiment may also be executed based on a cluster. The clusters may be, but are not limited to, yarn, k8s (kubernets), meso, etc.
In the data processing method provided by this embodiment, data is read from the data buffer, a time window is set, whether delayed data exists is determined based on the time window, then the delayed data is written into the delay buffer, and the non-delayed data is processed.
On the basis, please refer to fig. 8 in combination, the present embodiment further provides a data processing apparatus 10, where the data processing apparatus 10 includes a reading module 11 and a writing module 12.
The reading module 11 is configured to read data from a data buffer, set a time window, and determine whether delayed data exists based on the time window.
The writing module 12 is configured to write the delayed data into the delay buffer and process the non-delayed data.
The reading module 11 is configured to:
and acquiring the time stamp of the read data, and judging whether the time stamp is in the time window of the current moment.
And if the timestamp is not in the time window of the current moment, judging that the data is not delayed, and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging that the data is delayed.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the data processing apparatus 10 described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.
On the basis, an embodiment of the present invention further provides an electronic device, which includes a processor and a non-volatile memory storing computer instructions, where when the computer instructions are executed by the processor, the electronic device executes the data processing method described in any one of the foregoing embodiments.
On the basis of the foregoing, an embodiment of the present invention provides a storage medium having a computer program stored therein, where the computer program is executed to implement the data processing method according to any one of the foregoing embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the electronic device and the storage medium described above may refer to the corresponding processes in the foregoing method, and will not be described in too much detail herein.
In summary, in the data processing method, the data processing apparatus, the electronic device, and the storage medium provided in this embodiment, a time window is set by reading data from the data buffer, and whether delayed data exists is determined based on the time window, and then the delayed data is written into the delay buffer, and the non-delayed data is processed.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A method of data processing, the method comprising:
reading data from a data cache region, setting a time window, and judging whether delayed data exists or not based on the time window;
and writing the delayed data into a delay buffer area, and processing the undelayed data.
2. The data processing method of claim 1, wherein the step of determining whether delayed data exists based on the time window comprises:
acquiring a time stamp of the read data, and judging whether the time stamp is in a time window of the current moment;
and if the timestamp is not in the time window of the current moment, judging that the data is not delayed, and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging that the data is delayed.
3. The data processing method of claim 1, wherein the step of determining whether delayed data exists based on the time window comprises:
acquiring a time stamp of the read data, and judging whether the time stamp is within the delay time of the current moment; the delay time is the sum of a delay time period after a time window of the current moment and the time window of the current moment;
if the time stamp is within the delay time, the data is judged not to be delayed, and if the time stamp is not within the delay time and the time stamp is within the delay time of the last moment, the data is judged to be delayed.
4. The data processing method of claim 1, wherein the method further comprises:
after the delayed data is written into the delay buffer area, reading the data in the data buffer area and the delay data in the delay buffer area, and processing the data in the data buffer area and the delay data in the delay buffer area based on the time windows with the same length.
5. The data processing method of claim 4, wherein the method further comprises:
after processing the data which is not delayed, outputting the processing result to a storage medium for storage; or, after processing the data in the data buffer area and the delay data in the delay buffer area, outputting the processing result to a storage medium for storage.
6. The data processing method of claim 1, wherein in the presence of delayed data, the method further comprises:
and marking the delayed data, and writing the marked data into a delay cache region.
7. A data processing device is characterized by comprising a reading module and a writing module;
the reading module is used for reading data from the data cache region, setting a time window and judging whether delayed data exists or not based on the time window;
the writing module is used for writing the delayed data into the delay cache region and processing the non-delayed data.
8. The data processing apparatus of claim 7, wherein the read module is configured to:
acquiring a time stamp of the read data, and judging whether the time stamp is in a time window of the current moment;
and if the timestamp is not in the time window of the current moment, judging that the data is not delayed, and if the timestamp is not in the time window of the current moment and the timestamp is in the time window of the previous moment, judging that the data is delayed.
9. An electronic device comprising a processor and a non-volatile memory having stored thereon computer instructions, which, when executed by the processor, perform the data processing method of any one of claims 1 to 6.
10. A storage medium, characterized in that a computer program is stored in the storage medium, which computer program, when executed, implements the data processing method of any one of claims 1 to 6.
CN201911257201.9A 2019-12-09 2019-12-09 Data processing method and device, electronic equipment and storage medium Pending CN110990438A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911257201.9A CN110990438A (en) 2019-12-09 2019-12-09 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911257201.9A CN110990438A (en) 2019-12-09 2019-12-09 Data processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN110990438A true CN110990438A (en) 2020-04-10

Family

ID=70091835

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911257201.9A Pending CN110990438A (en) 2019-12-09 2019-12-09 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110990438A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680065A (en) * 2020-05-25 2020-09-18 泰康保险集团股份有限公司 Processing system, equipment and method for lag data in streaming computation
CN112000703B (en) * 2020-10-27 2021-02-05 港胜技术服务(深圳)有限公司 Data warehousing processing method and device, computer equipment and storage medium
CN112651772A (en) * 2020-12-18 2021-04-13 浙江同花顺智能科技有限公司 Event touch method, device, equipment and storage medium
CN113204387A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Method and device for processing data overtime in real-time calculation
CN113626447A (en) * 2021-10-12 2021-11-09 民航成都信息技术有限公司 Civil aviation data management platform and method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271529A1 (en) * 2008-04-25 2009-10-29 Hitachi, Ltd. Stream data processing method and computer systems
CN107704373A (en) * 2017-10-31 2018-02-16 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN109861878A (en) * 2019-01-17 2019-06-07 平安科技(深圳)有限公司 The monitoring method and relevant device of the topic data of kafka cluster
CN110334145A (en) * 2018-02-24 2019-10-15 北京京东尚科信息技术有限公司 The method and apparatus of data processing
CN110391840A (en) * 2019-09-17 2019-10-29 中国人民解放军国防科技大学 Method and system for judging abnormality of telemetry parameters of sun synchronous orbit satellite

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271529A1 (en) * 2008-04-25 2009-10-29 Hitachi, Ltd. Stream data processing method and computer systems
CN107704373A (en) * 2017-10-31 2018-02-16 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN110334145A (en) * 2018-02-24 2019-10-15 北京京东尚科信息技术有限公司 The method and apparatus of data processing
CN109861878A (en) * 2019-01-17 2019-06-07 平安科技(深圳)有限公司 The monitoring method and relevant device of the topic data of kafka cluster
CN110391840A (en) * 2019-09-17 2019-10-29 中国人民解放军国防科技大学 Method and system for judging abnormality of telemetry parameters of sun synchronous orbit satellite

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680065A (en) * 2020-05-25 2020-09-18 泰康保险集团股份有限公司 Processing system, equipment and method for lag data in streaming computation
CN111680065B (en) * 2020-05-25 2023-11-10 泰康保险集团股份有限公司 Processing system, equipment and method for hysteresis data in stream type calculation
CN112000703B (en) * 2020-10-27 2021-02-05 港胜技术服务(深圳)有限公司 Data warehousing processing method and device, computer equipment and storage medium
CN112651772A (en) * 2020-12-18 2021-04-13 浙江同花顺智能科技有限公司 Event touch method, device, equipment and storage medium
CN113204387A (en) * 2021-05-21 2021-08-03 珠海金山网络游戏科技有限公司 Method and device for processing data overtime in real-time calculation
CN113626447A (en) * 2021-10-12 2021-11-09 民航成都信息技术有限公司 Civil aviation data management platform and method

Similar Documents

Publication Publication Date Title
CN110990438A (en) Data processing method and device, electronic equipment and storage medium
US9684562B2 (en) Automatic serial starting of resource groups on failover based on the prediction of aggregate resource usage
US20110153603A1 (en) Time series storage for large-scale monitoring system
US20080010497A1 (en) Selecting a Logging Method via Metadata
JP5245711B2 (en) Distributed data processing system, distributed data processing method, and distributed data processing program
CN110825731A (en) Data storage method and device, electronic equipment and storage medium
CN111831383A (en) Window splicing method, device, equipment and storage medium
US10359936B2 (en) Selecting a primary storage device
CN115328741A (en) Exception handling method, device, equipment and storage medium
US20130125139A1 (en) Logging In A Computer System
CN113360581A (en) Data processing method, device and storage medium
CN111078418B (en) Operation synchronization method, device, electronic equipment and computer readable storage medium
US10223189B1 (en) Root cause detection and monitoring for storage systems
US20200117640A1 (en) Method, device and computer program product for managing storage system
CN114625805B (en) Return test configuration method, device, equipment and medium
CN113342744B (en) Parallel construction method, device and equipment of call chain and storage medium
CN115329143A (en) Directed acyclic graph evaluation method, device, equipment and storage medium
CN110968405A (en) Method and device for detecting planned tasks
CN117093335A (en) Task scheduling method and device for distributed storage system
US9898357B1 (en) Root cause detection and monitoring for storage systems
CN112667614A (en) Data processing method and device and computer equipment
CN101894119B (en) Mass data storage system for monitoring
CN111984202A (en) Data processing method and device, electronic equipment and storage medium
CN114116790A (en) Data processing method and device
CN109614249B (en) Method, device and computer readable storage medium for simulating multi-core communication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200410