CN112328660A - Stream data processing method and device - Google Patents

Stream data processing method and device Download PDF

Info

Publication number
CN112328660A
CN112328660A CN202011216552.8A CN202011216552A CN112328660A CN 112328660 A CN112328660 A CN 112328660A CN 202011216552 A CN202011216552 A CN 202011216552A CN 112328660 A CN112328660 A CN 112328660A
Authority
CN
China
Prior art keywords
data
stream
main
streams
tuple
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011216552.8A
Other languages
Chinese (zh)
Inventor
张顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hangyun Iot Information Technology Co ltd
Original Assignee
Beijing Hangyun Iot Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hangyun Iot Information Technology Co ltd filed Critical Beijing Hangyun Iot Information Technology Co ltd
Priority to CN202011216552.8A priority Critical patent/CN112328660A/en
Publication of CN112328660A publication Critical patent/CN112328660A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2477Temporal data queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a stream data processing device and method, wherein the device comprises: the data acquisition module is used for acquiring the stream data of at least two data streams and sending the stream data of each data stream to the data generation module; the data generating module is used for receiving the stream data sent by the data acquisition module, integrating the stream data of at least two data streams into one or more stream data tuples and sending the stream data tuples to the data output module; and the data output module is used for outputting the stream data according to the stream data tuple integrated by the data generation module. The invention collects the flow data of each data flow, integrates the flow data of all the data flows into one or more flow data tuples, and outputs the flow data in the flow data tuple mode, so that a data set with relatively aligned time can be obtained, the problem of time dislocation of the data is solved, the flow data can be directly used for calculation, and a more rigorous calculation result can be obtained.

Description

Stream data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for processing stream data.
Background
With the rise of the internet of things and stream computing technology, more and more sensor data are accessed to a stream computing platform for real-time analysis and processing. If data of a plurality of sensors at a certain time point are calculated, the data of each sensor are difficult to align according to the certain time point due to the problem that acquisition time and acquisition period are inconsistent in different sensor data acquisition.
For example, with sensors a and B, a collecting data every 10 seconds starting at 0 seconds and B collecting data every 20 seconds starting at 5 th seconds, the data is entered as in table 1 below.
TABLE 1
Time(s) 0 5 10 15 20 25 30 35 40 45
Sensor A A1 A2 A3 A4 A5
Sensor B B1 B2 B3
In order to simplify the programming model in stream calculations, the data are in many cases converted into a two-dimensional table on which the computational analysis is performed. Assuming that sensor a data is input stream a and sensor B data is input stream B, if nothing is done, the above input data is accumulated as in table 2 below.
TABLE 2
Stream A A1 A2 A3 A4 A5
Stream B B1 B2 B3
It can be seen that the data a1 at the 0 th second of sensor a and the data B1 at the 5 th second of sensor B are aligned, and the latter data are sequentially misaligned, which, if the data are accumulated directly as an input stream, may cause a problem of data time misalignment, may not be directly used for calculation and analysis calculations based on a two-dimensional table, or may cause an imprecise calculation result.
Disclosure of Invention
In order to solve at least one of the above technical problems, the present disclosure provides a stream data processing method and apparatus, and a readable storage medium and a computing device.
In a first aspect, an embodiment of the present invention provides a stream data processing apparatus, including: a data acquisition module, a data generation module and a data output module, wherein,
the data acquisition module is used for acquiring the stream data of at least two data streams and sending the stream data of each data stream to the data generation module;
the data generation module is used for receiving the stream data sent by the data acquisition module, integrating the stream data of at least two data streams into one or more stream data tuples and sending the stream data tuples to the data output module; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and the data output module is used for receiving the stream data tuple sent by the data generation module and outputting the stream data according to the stream data tuple integrated by the data generation module.
Optionally, the data generating module is specifically configured to integrate, according to the acquisition time of the data acquisition module acquiring the stream data in each data stream, the stream data acquired earliest in each data stream into a current stream data tuple, send the current stream data tuple to the data output module, and repeatedly send the stream data acquired by the data acquisition module to the data output module.
Optionally, the data generating module comprises a frequency determining unit, a data accepting and rejecting unit and a data integrating unit, wherein,
the frequency determining unit is used for determining a main data stream in all the data streams and the main frequency of data acquired by the main data stream;
the data accepting and rejecting unit is used for performing flow data compensation on the data stream of which the frequency of the acquired data is lower than the main frequency in the data streams except the main data stream, and/or performing flow data discarding on the data stream of which the frequency of the acquired data is higher than the main frequency in the data streams except the main data stream;
the data integration unit is configured to integrate the stream data of the main data stream and the stream data of the other data streams, which are obtained by the data exchange unit and are other than the main data stream, into a stream data tuple.
Optionally, the data generating module comprises a time determining unit, a data selecting unit and a data grouping unit, wherein,
the time determining unit is used for determining main data streams in all the data streams and the acquisition time of stream data in all the data streams;
the data selecting unit is used for selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and the data grouping unit is used for integrating the stream data of the main data stream and the stream data selected from each of the rest data streams into a data tuple.
In a second aspect, an embodiment of the present invention provides a stream data processing method, where the stream data processing method includes:
collecting flow data of at least two data flows;
integrating stream data of at least two of the stream data into one or more stream data tuples; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and outputting stream data of at least two data streams according to the stream data tuple.
Optionally, the integrating stream data of at least two of the stream data into one or more stream data tuples includes:
and integrating the stream data collected earliest in each data stream into a stream data tuple according to the collection time of the stream data in each data stream.
Optionally, the integrating stream data of at least two of the stream data into one or more stream data tuples includes:
s1: determining a main data stream in all the data streams and a main frequency of data collected by the main data stream;
s2: performing stream data compensation on the data stream of which the frequency of the acquired data in the data streams except the main data stream is lower than the main frequency, and/or performing stream data discarding on the data stream of which the frequency of the acquired data in the data streams except the main data stream is higher than the main frequency;
s3: the stream data of the main stream and the stream data of each of the remaining streams obtained in S2 are integrated into a stream data tuple.
Optionally, the integrating stream data of at least two of the stream data into one or more stream data tuples includes:
determining a main data stream in all the data streams and the acquisition time of stream data in all the data streams;
selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and integrating the stream data of the main data stream and the stream data selected from each remaining data stream into a stream data tuple.
In a third aspect, the present invention provides a readable storage medium having executable instructions thereon, which when executed, cause a computer to perform any of the methods included in the second aspect.
In a fourth aspect, an embodiment of the present invention provides a computing device, including: one or more processors, memory, and programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to perform any of the methods included in the second aspect.
Compared with the prior art, the invention has at least the following beneficial effects:
the invention collects the flow data of each data flow, integrates the flow data of all the data flows into one or more flow data tuples, and outputs the flow data in the flow data tuple mode, so that a data set with relatively aligned time can be obtained, the problem of time dislocation of the data is solved, the flow data can be directly used for calculation, and a more rigorous calculation result can be obtained.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a block diagram showing a configuration of a stream data processing apparatus according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a stream data processing method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a stream data processing apparatus including: a data acquisition module, a data generation module and a data output module, wherein,
the data acquisition module is used for acquiring the stream data of at least two data streams and sending the stream data of each data stream to the data generation module;
the data generation module is used for receiving the stream data sent by the data acquisition module, integrating the stream data of at least two data streams into one or more stream data tuples and sending the stream data tuples to the data output module; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and the data output module is used for receiving the stream data tuple sent by the data generation module and outputting the stream data according to the stream data tuple integrated by the data generation module.
In the embodiment, the data sets which are relatively aligned in time can be obtained by collecting the stream data of each data stream, integrating the stream data of all the data streams into one or more stream data tuples and outputting the stream data in the stream data tuple mode, so that the problem of time misalignment of the data is solved, the stream data can be directly used for calculation, and a more rigorous calculation result can be obtained.
It should be noted that the number of elements of the stream data tuple is the same as the number of data streams, and one stream data tuple only contains one stream data of each data stream. If there are data stream A, data stream B and data stream C, the tuple of the stream data is (A)n,Bn,Cn) Wherein A isnFor one stream data of stream A, BnFor one stream data of stream B, CnIs one stream data of the data stream C.
In an embodiment of the present invention, the data generating module is specifically configured to integrate, according to the acquisition time of the data acquisition module acquiring the stream data in each data stream, the stream data acquired earliest in each data stream into a current stream data tuple, send the current stream data tuple to the data output module, and repeatedly send the stream data acquired by the data acquisition module to the data output module.
In this embodiment, for the case that the acquisition time point and the acquisition frequency of each data stream are the same, the stream data of the respective data streams acquired at the same acquisition time point may be directly integrated into a stream data tuple. In the present embodiment, three data streams are exemplified, but the number of data streams is not limited in the actual data acquisition work. The earliest collected stream data is subject to a set reference time and the stream data that has been integrated into a tuple of stream data is not accounted for. If the existing data stream a, data stream B and data stream C are all acquired from 0 point 0 minutes 0 seconds of 11, month and 1 day of 2020 and the acquisition frequency is 10 seconds, the existing data streams are acquired once in 11, month and 1 day of 2020Respectively acquiring the stream data A of the data stream A, the data stream B and the data stream C at 0 point, 0 minute and 0 second1Stream data B1And stream data C1Then, the data are integrated into a stream data tuple (A)1,B1,C1) And sending the data to a data output module, and respectively acquiring the stream data A of the data stream A, the data stream B and the stream data C at 0 point, 0 minute and 10 seconds of 11, 1 and 1 days of 20202Stream data B2And stream data C2Then, the data are integrated into a stream data tuple (A)2,B2,C2) And sending the data to a data output module, and repeating the steps until the acquisition task is finished.
However, in actual work, the time points of collecting the stream data of different data streams are different, and in the case that the collection time points of each data stream are different and the collection frequency is the same, the stream data collected earliest in each data stream can be integrated into the current stream data tuple according to the collection time of the stream data. And the earliest collected flow data is subject to the set reference time, and the flow data which is integrated into the flow data tuple is not counted in. If the conventional data stream E starts to be acquired from 0 point 0 min 0 s on 11/1/2020, the data stream F starts to be acquired from 0 point 0 min 2 s on 11/1/2020, and the data stream G starts to be acquired from 0 point 0 min 3 s on 11/1/2020, but the acquisition frequency is 10 seconds, and the data stream E of the data stream E is acquired first at 0 point 0 min 0 s on 11/1/2020, with 0 point 0 min 0 s on 11/1/2020 as the reference time, the data stream E of the data stream E is acquired first at 0 point 0 min 0 s on 11/1/20201However, if the stream data of the data stream F and the stream data of the data stream G are not acquired, it is waited to acquire the stream data F of the data stream F at 0 point, 0 minutes and 2 seconds of 11 month and 1 day of 20201At this time, the stream data E of the data stream E is acquired1And stream data F of stream data F1If the flow data of the data stream G is not acquired, the method continues to wait until the flow data C of the data stream G is acquired at 0 point, 0 minutes and 3 seconds of 11 months and 1 days of 20201When the stream data of the data stream E, the data stream F and the data stream G are acquired, the stream data tuples (E) are integrated1,F1,G1) And sending to a data output module, and 0 o ' clock 0 min 10 sec, 0 o ' clock 0 min 12 sec and 0 o ' clock 0 of 11 months and 1 days in 2020Respectively acquiring the stream data E of the data stream E, the data stream F and the data stream G in 13 seconds2Stream data F2And stream data G2Albeit stream data E1Stream data F1And stream data G1Earlier than the stream data E2Stream data F2And stream data G2But the flow data E1Stream data F1And stream data G1Integrated into a stream data tuple (E)1,F1,G1) And sending the data to a data output module, wherein the data flow collected earliest in each data flow is the data flow E2Stream data F2And stream data G2Then integrated into a stream data tuple (E)2,F2,G2) And sending the data to a data output module, and repeating the steps until the acquisition task is finished.
It should be noted that, for the situation that the acquisition time points and the acquisition frequencies of the different data streams acquired in the actual work are different, including the situation that the same portions are different and different from each other, the data stream acquired earliest in each data stream may be integrated into the current data stream tuple according to the acquisition time of the data stream.
In another embodiment of the present invention, the data generation module includes a frequency determination unit, a data extraction and subtraction unit, and a data integration unit, wherein,
the frequency determining unit is used for determining a main data stream in all the data streams and the main frequency of data acquired by the main data stream;
the data accepting and rejecting unit is used for performing flow data compensation on the data stream of which the frequency of the acquired data is lower than the main frequency in the data streams except the main data stream, and/or performing flow data discarding on the data stream of which the frequency of the acquired data is higher than the main frequency in the data streams except the main data stream;
the data integration unit is configured to integrate the stream data of the main data stream and the stream data of the other data streams, which are obtained by the data exchange unit and are other than the main data stream, into a stream data tuple.
In this embodiment, for the case that the acquisition time points and the acquisition frequencies for acquiring different data streams in actual work are different, including the case that the same partial data streams are different among the partial data streams and the case that all the partial data streams are different from each other, the calculation error may be caused by the data number and the data time difference by using the method of integrating the stream data acquired earliest in each data stream into the current stream data tuple according to the acquisition time of the stream data, and for the case of different acquisition frequencies, the stream data of other data streams may be compensated or discarded according to the acquisition main frequency of the main data stream. The main data stream may be set according to the service requirements. If the data stream O is collected from 0 point, 0 minute and 0 second of 11 months and 1 day of 2020, the collection frequency is 5 seconds; the data stream P is collected from 0 point 0 min 1 sec of 11 months and 1 day in 2020, and the collection frequency is once collected for 8 seconds; the data flow Q is collected from 0 point 0 min 2 s of 11, month and 1 day of 2020, the collection frequency is once collected for 12 seconds, and the data flow P is a main data flow according to the actual service demand. Then, as shown in table 3, the stream data O of the data stream O was acquired at 0 point 0 minute 0 second, 0 point 0 minute 5 second, and 0 point 0 minute 10 second on 11 months and 1 days of 20201、O2And O3And so on; the stream data P of the data stream P is respectively acquired at 0 point 0 min 1 second, 0 point 0 min 9 second and 0 point 0 min 17 second of 11, 1 and 20201、P2And P3And so on; the stream data Q of the data stream Q is acquired at 0 point 0 min 2 s, 0 point 0 min 14 s and 0 point 0 min 26 s on 11/1/20201、Q2And Q3And so on.
TABLE 3
Time(s) 0 1 2 5 9 10 14 15 17 20 25 26
Data stream O O1 O2 O3 O4 O5
Data stream P P1 P2 P3 P4
Data stream Q Q1 Q2 Q3
In this embodiment, the data stream P is the main data stream, and the acquisition frequency of the data stream P is based on the acquisition frequency of the data stream P, the acquisition frequency of the data stream P is once every 8 seconds, and the acquisition frequency of the data stream O is once every 5 seconds and is greater than the acquisition frequency of the data stream P, and if the acquired data stream of the data stream O is too much, the data stream of the data stream O needs to be properly discarded, and if the acquisition frequency of the data stream Q is once every 12 seconds and is less than the acquisition frequency of the data stream P, the acquired data stream of the data stream Q needs to be properly compensated. The dropping or compensation of the stream data can be selected according to the actual situation setting dropping or compensation strategy, for example, O can be dropped2Compensating for Q2Then get the tuple (O) of stream data1,P1,Q1)、(O3,P2,Q2)、(O3,P3,Q2)、(O5,P4,Q3) O can also be discarded5Compensating for Q3Then get the tuple (O) of stream data1,P1,Q1)、(O2,P2,Q2)、(O3,P3,Q3)、(O4,P4,Q3) In different embodiments, different strategies can be set according to actual conditions to perform selective compensation or discarding, so as to obtain different stream data tuples, and adapt to different stream calculations.
In one embodiment of the invention, the data generation module comprises a time determination unit, a data selection unit and a data grouping unit, wherein,
the time determining unit is used for determining main data streams in all the data streams and the acquisition time of stream data in all the data streams;
the data selecting unit is used for selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and the data grouping unit is used for integrating the stream data of the main data stream and the stream data selected from each of the rest data streams into a data tuple.
In the invention, for the conditions of different acquisition frequencies, the stream data of other data streams can be compensated or discarded according to the acquisition main frequency of the main data stream, and the compensation or discarding method can be set according to the actual conditions. In this embodiment, based on the acquisition time of the main data stream, the stream data with the smallest time interval with the acquisition time of the main data stream is selected from the remaining data streams, so as to form a stream data tuple. Taking the example shown in table 3 above, the main data stream P is acquired with the time 0 point 0 minutes 1 seconds1In other words, stream data O in stream data O1The acquisition time of (1) is 0 point, 0 minute and 0 second, and the data of the stream O2The acquisition time of (1) is 0 point, 0 min and 5 sec, O1The acquisition time and P1The interval between the acquisition times of (1) is 1 second, O2The acquisition time and P1With an interval of 4 seconds between acquisition times, flow data O1And stream data P1With a collection time interval smaller than the stream data O2And stream data P1And the acquisition time interval of (c), and the flow data O1And stream data P1Is minimized, and thus the flow data O is determined1And stream data P1In the same stream data tuple, and stream data Q in data stream Q1The acquisition time of (1) is 0 point, 0 minute and 2 seconds, and the data of the stream Q2The acquisition time of (1) is 0 point, 0 min and 14 sec, then Q1The acquisition time and P1The interval between the acquisition times of (1) second, Q2The acquisition time and P1With an interval of 12 seconds between acquisition times, flow data Q1And stream data P1With a collection time interval smaller than the stream data Q2And stream data P1The acquisition time interval of (2), and the number of streamsAccording to Q1And stream data P1Is minimized, and thus the flow data Q is determined1And stream data P1In the same stream data tuple, thus, a stream data tuple (O) is obtained1,P1,O1) And sent to the data output module. Similarly, the main data flow P is acquired with the flow data P with 0 point, 0 minute and 9 seconds2In other words, stream data O2And stream data P2The collection time interval of (4) seconds is larger than the flow data O3And stream data P2With an acquisition time interval of 1 second, and data of the stream O3And stream data P2Is minimized, and thus the flow data O is determined3And stream data P2In the same tuple of stream data, and stream data Q1And stream data P2The collection time interval of 7 seconds is larger than the flow data Q2And stream data P2Is 5 seconds apart, and the flow data O2And stream data P2Is minimized, and thus the flow data O is determined2And stream data P2In the same stream data tuple, thus, a stream data tuple (O) is obtained3,P2,O2) And sent to the data output module. Similarly, a stream data tuple (O) can be obtained4,P3,Q2)、(O5,P4,Q3) And so on until the collection task is completed.
It should be noted that, if the collection time interval between two stream data in a certain data stream and a certain stream data in the main data stream is equal and is the minimum time interval, the stream data collected first may be selected, and the stream data collected later may also be selected. As shown in table 3, the stream data O of the data stream O is acquired at 0 point, 0 minutes and 30 seconds of 11/1/2020/11/day6Stream data O5And stream data P4With an acquisition time interval of 5 seconds equal to the flow data O6And stream data P4Is 5 seconds apart, and the flow data O5And O6And stream data P4Is equal and minimal, so that the first acquired flow data O can be acquired5And stream data P4Are integrated in the same tuple of the stream data,post-collected flow data O can also be used6And stream data P4Integrated in the same stream data tuple. And the specific selection of the first collected flow data or the later collected flow data can be set according to different service requirements.
In addition, in practical application, for different service requirements and acquisition time points and acquisition frequencies of the stream data, the data generation module can obtain corresponding stream data tuples according to different stream data integration methods. The data flow forms a two-dimensional data table in the form of flow data tuples, and the structured query language SQL can perform operations such as adding, deleting, modifying, checking and the like on the two-dimensional table, so that the SQL can be introduced to reduce the development complexity of flow calculation, and the calculation is converted into SQL sentences to reduce the development complexity.
As shown in fig. 2, an embodiment of the present invention provides a stream data processing method, where the stream data processing method includes:
collecting flow data of at least two data flows;
integrating stream data of at least two of the stream data into one or more stream data tuples; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and outputting stream data of at least two data streams according to the stream data tuple.
In an embodiment of the present invention, said integrating stream data of at least two of the stream data into one or more stream data tuples comprises:
and integrating the stream data collected earliest in each data stream into a stream data tuple according to the collection time of the stream data in each data stream.
In an embodiment of the present invention, said integrating stream data of at least two of the stream data into one or more stream data tuples comprises:
s1: determining a main data stream in all the data streams and a main frequency of data collected by the main data stream;
s2: performing stream data compensation on the data stream of which the frequency of the acquired data in the data streams except the main data stream is lower than the main frequency, and/or performing stream data discarding on the data stream of which the frequency of the acquired data in the data streams except the main data stream is higher than the main frequency;
s3: the stream data of the main stream and the stream data of each of the remaining streams obtained in S2 are integrated into a stream data tuple.
In this embodiment, in S2, corresponding operations are performed according to different actual situations, and if the frequencies of the collected data of the remaining data streams are all lower than the main frequency of the collected data of the main data stream, stream data compensation is performed on the remaining data streams; if the frequency of the collected data of the remaining data stream is higher than the main frequency of the collected data of the main data stream, discarding the stream data of the remaining data stream; and if the frequency of the collected data of the remaining data stream is higher than the main frequency of the collected data of the main data stream and lower than the main frequency of the collected data of the main data stream, distinguishing the remaining data stream, performing stream data compensation on the data stream lower than the main frequency, and discarding the stream data of the data stream higher than the main frequency.
In an embodiment of the present invention, said integrating stream data of at least two of the stream data into one or more stream data tuples comprises:
determining a main data stream in all the data streams and the acquisition time of stream data in all the data streams;
selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and integrating the stream data of the main data stream and the stream data selected from each remaining data stream into a stream data tuple.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a" does not exclude the presence of other similar elements in a process, method, article, or apparatus that comprises the element.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A stream data processing apparatus, characterized by comprising: a data acquisition module, a data generation module and a data output module, wherein,
the data acquisition module is used for acquiring the stream data of at least two data streams and sending the stream data of each data stream to the data generation module;
the data generation module is used for receiving the stream data sent by the data acquisition module, integrating the stream data of at least two data streams into one or more stream data tuples and sending the stream data tuples to the data output module; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and the data output module is used for receiving the stream data tuple sent by the data generation module and outputting stream data according to the stream data tuple integrated by the data generation module.
2. The flow data processing apparatus according to claim 1, wherein the data generating module is specifically configured to integrate the flow data collected earliest in each of the data streams into a current flow data tuple according to a collection time of the data collecting module collecting the flow data in each of the data streams, send the current flow data tuple to the data output module, and repeatedly send the flow data collected by the data collecting module to the data output module.
3. The streaming data processing apparatus according to claim 1, wherein the data generation module includes a frequency determination unit, a data rounding unit, and a data integration unit, wherein,
the frequency determining unit is used for determining a main data stream in all the data streams and the main frequency of data acquired by the main data stream;
the data accepting and rejecting unit is used for performing flow data compensation on the data stream of which the frequency of the acquired data is lower than the main frequency in the data streams except the main data stream, and/or performing flow data discarding on the data stream of which the frequency of the acquired data is higher than the main frequency in the data streams except the main data stream;
the data integration unit is configured to integrate the stream data of the main data stream and the stream data of the other data streams, which are obtained by the data exchange unit and are other than the main data stream, into a stream data tuple.
4. The stream data processing apparatus according to claim 1, wherein the data generation module includes a time determination unit, a data extraction unit, and a data grouping unit, wherein,
the time determining unit is used for determining main data streams in all the data streams and the acquisition time of stream data in all the data streams;
the data selecting unit is used for selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and the data grouping unit is used for integrating the stream data of the main data stream and the stream data selected from each of the rest data streams into a data tuple.
5. A stream data processing method, characterized by comprising:
collecting flow data of at least two data flows;
integrating stream data of at least two of the stream data into one or more stream data tuples; wherein the number of elements of the stream data tuple is consistent with the number of the data streams;
and outputting stream data of at least two data streams according to the stream data tuple.
6. The method for processing stream data according to claim 5, wherein said integrating the stream data of at least two of the stream data into one or more stream data tuples comprises:
and integrating the stream data collected earliest in each data stream into a stream data tuple according to the collection time of the stream data in each data stream.
7. The method for processing stream data according to claim 5, wherein said integrating the stream data of at least two of the stream data into one or more stream data tuples comprises:
s1: determining a main data stream in all the data streams and a main frequency of data collected by the main data stream;
s2: performing stream data compensation on data streams of which the frequency of the acquired data is lower than the main frequency in the data streams except the main data stream, and/or performing stream data discarding on data streams of which the frequency of the acquired data is higher than the main frequency in the data streams except the main data stream;
s3: the stream data of the main stream and the stream data of the stream obtained by S2 are integrated into a stream data tuple.
8. The method for processing stream data according to claim 5, wherein said integrating the stream data of at least two of the stream data into one or more stream data tuples comprises:
determining a main data stream in all the data streams and the acquisition time of stream data in all the data streams;
selecting stream data with the minimum time interval with the acquisition time of the main data stream from each remaining data stream by taking the acquisition time of the main data stream as a reference;
and integrating the stream data of the main data stream and the stream data selected from each remaining data stream into a stream data tuple.
9. A readable storage medium having executable instructions thereon, which when executed, cause a computer to perform the method as included in any one of claims 5-8.
10. A computing device, comprising: one or more processors, memory, and programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors to perform the method as recited in any of claims 5-8.
CN202011216552.8A 2020-11-04 2020-11-04 Stream data processing method and device Pending CN112328660A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011216552.8A CN112328660A (en) 2020-11-04 2020-11-04 Stream data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011216552.8A CN112328660A (en) 2020-11-04 2020-11-04 Stream data processing method and device

Publications (1)

Publication Number Publication Date
CN112328660A true CN112328660A (en) 2021-02-05

Family

ID=74324717

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011216552.8A Pending CN112328660A (en) 2020-11-04 2020-11-04 Stream data processing method and device

Country Status (1)

Country Link
CN (1) CN112328660A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756322A (en) * 2022-05-09 2022-07-15 北京航云物联信息技术有限公司 Picture processing method and device, computer equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718412A (en) * 2016-01-14 2016-06-29 深圳市同创国芯电子有限公司 Channel frequency difference compensation method, and channel control method, device and system
CN112328597A (en) * 2020-11-06 2021-02-05 北京航云物联信息技术有限公司 Flow calculation method and device based on table

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718412A (en) * 2016-01-14 2016-06-29 深圳市同创国芯电子有限公司 Channel frequency difference compensation method, and channel control method, device and system
CN112328597A (en) * 2020-11-06 2021-02-05 北京航云物联信息技术有限公司 Flow calculation method and device based on table

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
用户1621951: "干货 | 时间序列数据的对齐和数据库的分批查询", pages 4 - 5, Retrieved from the Internet <URL:https://cloud.tencent.com/developer/article/1442989> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114756322A (en) * 2022-05-09 2022-07-15 北京航云物联信息技术有限公司 Picture processing method and device, computer equipment and storage medium
CN114756322B (en) * 2022-05-09 2024-02-20 北京航云物联信息技术有限公司 Picture processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3602341B1 (en) Data replication system
JP4925143B2 (en) Stream data processing system, stream data processing method, and stream data processing program
US20090187534A1 (en) Transaction prediction modeling method
CN107515874B (en) Method and equipment for synchronizing incremental data in distributed non-relational database
JP6032680B2 (en) System, method, and program for performing aggregation processing for each received data
CN108647357B (en) Data query method and device
WO2019101119A1 (en) Cost-based optimizer, and cost estimation method and device thereof
CN109656963A (en) Metadata acquisition methods, device, equipment and computer readable storage medium
CN112256523B (en) Service data processing method and device
CN109753502A (en) A kind of collecting method based on NiFi
CN112328660A (en) Stream data processing method and device
CN111367951A (en) Method and device for processing stream data
CN109298929A (en) Timing task carrying-out time recommended method, device, equipment and storage medium
CN108073641B (en) Method and device for querying data table
CN111639068A (en) Multi-system-based public data pool generation method, device, equipment and readable storage medium
US10824629B2 (en) Query implementation using synthetic time series
CN116414891A (en) Data blood-source tracing method and system
CN108415990B (en) Data quality monitoring method and device, computer equipment and storage medium
CN110580307B (en) Processing method and device for fast statistics
US11016951B1 (en) Microbatch loading
CN110909072B (en) Data table establishment method, device and equipment
CN110489460B (en) Optimization method and system for rapid statistics
CN107203579B (en) User taxi taking data-based holiday classification method and device
CN112185575B (en) Method and device for determining medical data to be compared
CN111309758A (en) Charging data verification and comparison method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination