CN113703874A - Data stream processing method, device, equipment and readable storage medium - Google Patents

Data stream processing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN113703874A
CN113703874A CN202111040434.0A CN202111040434A CN113703874A CN 113703874 A CN113703874 A CN 113703874A CN 202111040434 A CN202111040434 A CN 202111040434A CN 113703874 A CN113703874 A CN 113703874A
Authority
CN
China
Prior art keywords
stream
target
data
data stream
configuration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111040434.0A
Other languages
Chinese (zh)
Other versions
CN113703874B (en
Inventor
杨春龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202111040434.0A priority Critical patent/CN113703874B/en
Publication of CN113703874A publication Critical patent/CN113703874A/en
Application granted granted Critical
Publication of CN113703874B publication Critical patent/CN113703874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a data stream processing method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: asynchronously loading the data stream and the configuration stream; judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely; if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded; and if so, configuring the target data stream by using the target configuration stream. In the application, the problem that data out-of-order of a processing system is achieved by using a cache technology to solve the problem that data streams are loaded before configuration streams under double-stream asynchronous loading of flink, and the correct order effect that the configuration streams are loaded when the data streams start to be processed can be achieved.

Description

Data stream processing method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of computer application technologies, and in particular, to a data stream processing method, apparatus, device, and readable storage medium.
Background
When a Flink (an open source stream processing framework) loads a data stream and a broadcasted configuration stream, the two are asynchronous loading mechanisms, so that the data loading sequence of the data stream and the configuration stream in an internal data processing system of the Flink has a time out-of-order problem. Because the data stream needs to be read to the configuration data of the configuration stream immediately after being loaded to complete the enriched conversion of the data stream data, the normal data loading order under the constraint of the requirement is that the configuration stream is loaded before the data stream.
However, this order cannot be fully guaranteed under the dual stream asynchronous loading mechanism of flink, and it is very probable that a flink app (an application based on an open source stream processing framework) will arrive at the processing system before the configuration stream at the very start-up causing data out-of-order. This problem will cause the data stream to be not configuration enriched for the short time the app has just started, causing the data to be in an abnormal state.
In summary, how to effectively solve the problem of time disorder of data stream and configuration stream is a technical problem that needs to be solved urgently by those skilled in the art at present.
Disclosure of Invention
The application aims to provide a data stream processing method, a data stream processing device, data stream processing equipment and a readable storage medium, and the data stream processing method, the data stream processing device, the data stream processing equipment and the readable storage medium solve the problem that the data stream is out of order before the configuration stream under the double-stream asynchronous loading of the flink, and can achieve the correct order effect that the configuration stream is loaded completely when the data stream starts to be processed.
In order to solve the technical problem, the application provides the following technical scheme:
a method of data stream processing, comprising:
asynchronously loading the data stream and the configuration stream;
judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely;
if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded;
and if so, configuring the target data stream by using the target configuration stream.
Preferably, the determining whether the target configuration stream having an association relationship with the loaded target data stream is loaded completely includes:
acquiring whether the read-only broadcast state object is empty from the read-only context;
if yes, judging whether the configuration set of the read-only broadcast state object loads data of the required configuration dimension;
and if so, determining that the target configuration stream is completely loaded.
Preferably, storing the data object of the target data stream into a lifecycle cache mapping table, includes:
judging whether the life cycle cache mapping table reaches a cache threshold value;
if not, the data object is placed into the life cycle cache mapping table, and the life cycle is set for the data object.
Preferably, the method further comprises the following steps:
traversing the life cycle cache mapping table, and finding out a target data object exceeding the life cycle;
and cleaning the target data object.
Preferably, traversing the lifecycle cache mapping table to find the target data object exceeding the lifecycle comprises:
traversing the life cycle cache mapping table to obtain the cache time of each data object;
and determining the data object with the difference value between the cache time and the current system time larger than the corresponding life cycle as the target data object.
Preferably, the method further comprises the following steps:
and releasing the resources occupied by the life cycle cache mapping table in the application exit process.
Preferably, the method further comprises the following steps:
judging whether the data stream to be processed exists or not by utilizing the life cycle cache mapping table;
if not, determining that the data streams are all configured in an associated manner.
A data stream processing apparatus comprising:
the data loading module is used for asynchronously loading the data stream and the configuration stream;
the judging module is used for judging whether the target configuration stream which has the incidence relation with the loaded target data stream is loaded completely;
a cache configuration module, configured to store the data object of the target data stream into a lifecycle cache mapping table if the target configuration stream is not completely loaded, and configure the target data stream using the target configuration stream after it is determined that the target configuration stream is completely loaded;
and the data configuration module is used for configuring the target data stream by using the target configuration stream if the target configuration stream is loaded completely.
An electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the data stream processing method when executing the computer program.
A readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned data stream processing method.
By applying the method provided by the embodiment of the application, the data stream and the configuration stream are asynchronously loaded; judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely; if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded; and if so, configuring the target data stream by using the target configuration stream.
In the application, data streams and configuration streams are asynchronously loaded, and under the condition that a target data stream is loaded, whether the target configuration stream associated with the target data stream is loaded is judged, and if the target configuration stream is loaded, the target data stream can be directly configured based on the target configuration stream; if the loading of the non-target configuration stream is not finished, the configuration processing of the target data stream is suspended, and the data object of the target data stream is stored into the life cycle cache mapping table, namely the target data stream is cached, and after the loading of the target configuration stream is determined to be finished, the target data stream is configured by using the target configuration stream. That is to say, the problem that the data out-of-order of the processing system is achieved by the data stream before the configuration stream under the double-stream asynchronous loading of the flink is solved by means of the cache technology, and the correct order effect that the configuration stream is loaded when the data stream starts to be processed can be achieved.
Accordingly, embodiments of the present application further provide a data stream processing apparatus, a device, and a readable storage medium corresponding to the data stream processing method, which have the above technical effects and are not described herein again.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of an implementation of a data stream processing method in an embodiment of the present application;
fig. 2 is a schematic diagram illustrating an embodiment of a data stream processing method according to the present application;
fig. 3 is a schematic structural diagram of a data stream processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present application.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a data stream processing method in an embodiment of the present application, where the method may be specifically used in a Flink-based data processing system, and the method includes the following steps:
s101, asynchronously loading the data stream and the configuration stream.
Specifically, an asynchronous loading mechanism can be directly adopted to load the data stream and the configuration stream. That is, the data stream may be loaded prior to the configuration stream or may be loaded later than the configuration stream. For a specific implementation manner of the asynchronous loading mechanism, specific definitions and implementations of the asynchronous loading mechanism may be specifically referred to, and details are not repeated here.
S102, whether the target configuration stream having the incidence relation with the loaded target data stream is loaded is judged.
For the association method of the data stream and the configuration stream, a connect method may be used for association, and then a process operator is applied to the association stream, and the process operator introduces an object whose extension realizes a BroadcastProcessFunction.
In this embodiment, any loaded data stream may be the target data stream.
The target configuration stream corresponding to the target data stream may be clarified based on the association relationship between the configuration stream and the data stream, and it may be determined whether the target configuration stream is loaded completely.
Specifically, the step S102 of determining whether the target configuration stream having the association relationship with the loaded target data stream is loaded completely may include:
step one, acquiring whether a read-only broadcast state object is empty from a read-only context;
step two, if yes, judging whether the configuration set of the read-only broadcast state object loads the data of the required configuration dimension;
and step three, if the target configuration stream is loaded, determining that the target configuration stream is loaded completely.
For convenience of description, the above three steps will be described in combination.
Whether the read-only broadcast state object is empty or not can be acquired from the read-only context, if so, whether the configuration set of the read-only broadcast state object completes data loading of the required configuration dimension or not is further judged, and if so, the target configuration stream can be determined to be completely loaded. Of course, if the read-only broadcast state object is not empty, it is determined that the target configuration stream is not completely loaded; alternatively, the data loading of the required configuration dimension is not completed in the configuration set of the read-only broadcast object, and it may also be determined that the target configuration stream is not completely loaded.
After determining whether the target configuration stream having the association relation with the target data stream is loaded, the corresponding subsequent operation can be executed according to the specific loading condition. Specifically, if the target configuration stream is loaded, the operation of step S104 is executed; if the target configuration stream is not loaded, the operation of step S103 is performed.
S103, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is completely loaded.
When the target configuration stream of the explicit target data stream is not completely loaded, the data stream is completely loaded before the configuration stream, and data confusion may occur if the configuration processing is performed on the data stream. Therefore, in this embodiment, when it is clear that the target configuration stream of the target data stream is not completely loaded, the data object of the target data stream is stored in the lifecycle cache mapping table without performing configuration processing on the target data stream. Thus, after the loading of the target configuration stream is completed, the target data stream can be configured by using the target configuration stream.
The storing of the data object of the target data stream into the life cycle cache mapping table specifically includes:
step one, judging whether a life cycle cache mapping table reaches a cache threshold value;
and step two, if not, the data object is placed into a life cycle cache mapping table, and the life cycle is set for the data object.
For convenience of description, the above two steps will be described in combination.
In this embodiment, in order to avoid that the lifecycle cache mapping table caches too many data streams and exceeds the time of a useless data stream, a cache threshold may be set for the lifecycle cache mapping table. The cache threshold, that is, the maximum number of pieces of stored data, for example, one million pieces, may be set and adjusted according to actual conditions. Therefore, the data objects in the life cycle cache mapping table can be maintained within the range of the cache threshold value. Specifically, before a data object of a target data stream needs to be stored in a life cycle cache mapping table, whether the life cycle cache mapping table reaches a cache threshold value is judged, if yes, the life cycle cache mapping table can be cleaned, and then the data object is placed in the life cycle cache mapping table; of course, if the life cycle cache mapping table still reaches the cache threshold, the data object may be directly placed in the life cycle cache mapping table.
It should be noted that, in order To manage the lifetime cache mapping table, after the data object is placed in the lifetime cache mapping table, a lifetime, or a time To live, may also be set for the data object, the lifetime of the initialized object is the most persistent time that the object is cleared after the TTL time exists at most during the self-initialization.
Wherein, the life cycles of the data objects can be completely the same, such as 5 minutes or 6 minutes; different lifecycles may also be set for different data objects, e.g. a lifecycle of 4 minutes for data object a and a lifecycle of 6 minutes for data object B.
Correspondingly, the specific cleaning process of the data object in the life cycle cache mapping table comprises the following steps:
step one, traversing a life cycle cache mapping table, and finding out a target data object exceeding the life cycle;
and step two, cleaning the target data object.
For convenience of description, the above two steps will be described in combination.
In this embodiment, each data object in the lifetime cache mapping table may be traversed, so as to determine a correspondence between the storage time and the lifetime of each data object, that is, if the storage time is longer than the current time by the lifetime, the data object corresponding to the table is already longer than the lifetime, and may be cleaned. In this embodiment, for the sake of easy distinction, the data object that needs to be cleaned beyond the life cycle is referred to as the target data object.
The first step of traversing the life cycle cache mapping table to find out the target data object exceeding the life cycle may specifically include:
step 1, traversing a life cycle cache mapping table to obtain the cache time of each data object;
and 2, determining the data object with the difference value between the cache time and the current system time larger than the corresponding life cycle as a target data object.
That is, the cache time of each data object is first determined by traversal, and then the difference between the cache time and the current system time is calculated. And if the difference value is larger than the corresponding growth period, determining the corresponding data object as the target data object.
After the target data object is determined, the target data object may be cleaned. Namely, the target data object is removed from the birth life cycle cache mapping table, and the corresponding data stream is deleted.
For example, the following steps are carried out: if the data object A belongs to the data object stored in the life cycle cache mapping table before 5 minutes, the data object A needs to be cleaned; if the data object belongs to the data object stored in the life cycle cache mapping table within 5 minutes, the data object does not need to be cleaned and is continuously temporarily stored in the cache to wait for the associated configuration.
And S104, configuring the target data stream by using the target configuration stream.
And under the condition that the loading of the target configuration stream of the definite target data stream is finished, directly configuring the target data stream by using the target configuration stream. The configuration process may specifically refer to a conventional implementation method for configuring and processing a data stream by using a configuration stream having an association relationship with a sample, which is not described in detail herein.
In this embodiment, the lifecycle cache mapping table may be utilized to determine whether there is a data stream to be processed; if not, determining that the data streams are all configured in an associated manner. The data stream which is loaded before the configuration stream enters the growth cycle cache mapping table for caching, the data stream which is loaded after the configuration stream can be directly configured, and the data object in the life cycle cache mapping table can be processed after the corresponding configuration stream is loaded, so that whether the data stream is completely configured in an associated manner can be determined by checking whether the data stream to be processed exists in the life cycle cache mapping table.
Preferably, in consideration of that the life cycle cache mapping table also occupies resources, the resources occupied by the life cycle cache mapping table are released in the application exit process. Therefore, the phenomenon that the growth cycle cache mapping table occupies resources can be avoided.
By applying the method provided by the embodiment of the application, the data stream and the configuration stream are asynchronously loaded; judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely; if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded; and if so, configuring the target data stream by using the target configuration stream.
In the application, data streams and configuration streams are asynchronously loaded, and under the condition that a target data stream is loaded, whether the target configuration stream associated with the target data stream is loaded is judged, and if the target configuration stream is loaded, the target data stream can be directly configured based on the target configuration stream; if the loading of the non-target configuration stream is not finished, the configuration processing of the target data stream is suspended, and the data object of the target data stream is stored into the life cycle cache mapping table, namely the target data stream is cached, and after the loading of the target configuration stream is determined to be finished, the target data stream is configured by using the target configuration stream. That is to say, the problem that the data out-of-order of the processing system is achieved by the data stream before the configuration stream under the double-stream asynchronous loading of the flink is solved by means of the cache technology, and the correct order effect that the configuration stream is loaded when the data stream starts to be processed can be achieved.
In order to facilitate better understanding of the data stream processing method provided in the embodiments of the present application for those skilled in the art, a message description is performed on a specific implementation of the data stream processing method in combination with a specific application scenario as an example.
In the related scheme, a state with a TTL function may be used to temporarily store the loaded data stream, and after the configuration stream is judged to be loaded, data processing is performed on the data stream stored in the state to which the TTL is applied and associated with the configuration stream data. Since Flink provides a state with a TTL function since version 1.6 later, Flink supports a state with a TTL function only after version 1.6, which is not supported by previous versions, and thus there is a problem of narrow version coverage. In addition, these schemes can only apply TTL function to a key state, but cannot apply it to a non-key state, and the application range is small.
The data stream processing method provided in the embodiment of the application is intended to solve the data disorder problem that the data stream reaches the processing system before the configuration stream when the flink asynchronously loads the data stream and the broadcasted configuration stream, so as to ensure that the processing system can correctly associate the loaded configuration stream configuration data when processing the data stream, thereby achieving the effect of correcting the order of the configuration stream before the data stream is loaded. Specifically, please refer to fig. 2, fig. 2 is a schematic diagram illustrating an embodiment of a data stream processing method according to an embodiment of the present application.
Firstly, setting a flink data stream association configuration stream: the Flink's data stream associates the configuration stream using the connect method, and then applies a process operator to the associated stream, which imports an object whose extension implements BroadcastProcessFunction.
When the expansion realizes the BroadcastProcessfunction, the four methods of open, close, processElement and processBroadcastElement need to be overwritten. Wherein, the open method is used for initializing the used object; the close method is used to destroy used objects, the processElement method is used to process data flow, and the processBroadcastElement method is used to process broadcast configuration flow.
Particularly, an object needs to be defined in the BroadcastProcessFunction in an extended manner, specifically, a TTLCacheMap (lifecycle cache mapping table) needs to be defined in an extended manner, so as to cache a data object corresponding to a data stream in the following.
The TTLCacheMap is packaged by using Threadlocal to ensure that the TTLCacheMap object can be accessed safely in a multi-thread mode, and particularly, the sequence and the accuracy are mainly guaranteed. The threshold size of the data size passed into the TTLCacheMap constructor is 1000000 to buffer now is 1 million.
After the configuration is completed, the data processing system may now asynchronously load both the configuration stream and the data stream. Firstly, judging whether the configuration flow is loaded completely in a processElement callback method of the data flow. Specifically, whether a ReadOnlyBroadcastState object (read-only broadcast state object) acquired from ReadOnlyContext is not null and whether data of a required configuration dimension is loaded in a configuration set of the ReadOnlyBroadcastState object is not null and when the ReadOnlyBroadcastState object (read-only broadcast state object) is not null and data of the required configuration dimension is loaded in the configuration set of the ReadOnlyBroadcastState object is determined to be loaded completely. In the process, in a processBroadcastElement callback method for asynchronously loading configuration streams, assembly of ReadOnlyBroadcastState (read-only broadcast state object) is performed on loaded configuration data according to a defined mapsitedescriptor structure (mapping state descriptor).
And under the condition that the corresponding configuration stream is determined not to be loaded completely, directly placing the data object TTLCacheobject into the TTLCacheomap, and setting the data object TTLCacheobject into the data object TTLCacheobject according to the TTL time (such as 5 minutes) configured by the user during the placing. After the logic is judged to be executed, the TTLCacheMap is traversed to check whether an element (namely a data object corresponding to the cached data stream) exceeding the TTL exists or not, and if so, cleaning is carried out. Wherein the incoming cleaning work of 1 million size threshold (size) can be guaranteed by the Map self mechanism.
And under the condition that the loading of the corresponding configuration stream is determined to be finished, judging whether the data to be processed exists in the TTLCacheMap or not, performing data processing on the data to be processed by associating the data to be processed with the configuration stream data in ReadOnlyBroadcastState, and deleting the processed data.
That is, when the data stream is loaded, if the configuration stream is also loaded, the configuration processing is directly performed on the data stream based on the configuration stream; when the data stream is loaded, if the configuration stream is not loaded completely, the data stream is cached and the life cycle is set, so that the configuration can be carried out after the configuration stream is loaded well, or the unconfigured data stream is cleaned up after the life cycle is reached.
And when judging that no data to be processed remains in the TTLCacheMap, directly associating the data and configuring the data to participate in a subsequent data processing link.
And when the APP exits, releasing the resource occupied by the TTLCacheMap based on the destruction logic of the TTLCacheMap added in the close method. By this, the entire flow processing is ended.
Therefore, the data stream processing method provided by the embodiment of the application can solve the universality problem that the data stream reaches the processing system out of order before the configuration stream under the double-stream asynchronous loading mechanism of the data stream and the broadcasted configuration stream of the flink, can cache the data which cannot be correctly processed by the associated configuration of the data stream data within the short time of the just started app, and can continuously participate in the data processing after the configuration is reached, thereby ensuring the accuracy of data cleaning. And the general caching technical scheme with TTL function is independent of the flink version and the state type.
Corresponding to the above method embodiments, the present application further provides a data stream processing apparatus, and the data stream processing apparatus described below and the data stream processing method described above may be referred to in correspondence with each other.
Referring to fig. 3, the apparatus includes the following modules:
the data loading module 101 is configured to asynchronously load a data stream and a configuration stream;
a judging module 102, configured to judge whether a target configuration stream having an association relationship with a loaded target data stream is loaded completely;
the cache configuration module 103 is configured to store the data object of the target data stream into the lifecycle cache mapping table if the target configuration stream is not loaded, and configure the target data stream by using the target configuration stream after it is determined that the target configuration stream is loaded;
and the data configuration module 104 is configured to configure the target data stream by using the target configuration stream if the target configuration stream is loaded.
The device provided by the embodiment of the application is applied to asynchronously load the data stream and the configuration stream; judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely; if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded; and if so, configuring the target data stream by using the target configuration stream.
In the application, data streams and configuration streams are asynchronously loaded, and under the condition that a target data stream is loaded, whether the target configuration stream associated with the target data stream is loaded is judged, and if the target configuration stream is loaded, the target data stream can be directly configured based on the target configuration stream; if the loading of the non-target configuration stream is not finished, the configuration processing of the target data stream is suspended, and the data object of the target data stream is stored into the life cycle cache mapping table, namely the target data stream is cached, and after the loading of the target configuration stream is determined to be finished, the target data stream is configured by using the target configuration stream. That is to say, the problem that the data out-of-order of the processing system is achieved by the data stream before the configuration stream under the double-stream asynchronous loading of the flink is solved by means of the cache technology, and the correct order effect that the configuration stream is loaded when the data stream starts to be processed can be achieved.
In a specific embodiment of the present application, the determining module 102 is specifically configured to obtain whether the read-only broadcast state object is empty from the read-only context; if yes, judging whether the configuration set of the read-only broadcast state object loads data of the required configuration dimension; and if so, determining that the target configuration stream is completely loaded.
In a specific embodiment of the present application, the cache configuration module 103 is specifically configured to determine whether a life cycle cache mapping table has reached a cache threshold; if not, the data object is placed into the life cycle cache mapping table, and the life cycle is set for the data object.
In one embodiment of the present application, the method further includes:
the cleaning module is used for traversing the life cycle cache mapping table and finding out the target data object exceeding the life cycle; the target data object is cleaned.
In a specific embodiment of the present application, the cleaning module is specifically configured to traverse a life cycle cache mapping table to obtain cache time of each data object; and determining the data object with the difference value between the cache time and the current system time larger than the corresponding life cycle as the target data object.
In one embodiment of the present application, the method further includes:
and the resource release module is used for releasing the resources occupied by the life cycle cache mapping table in the application exit process.
In one embodiment of the present application, the method further includes:
the configuration detection module is used for judging whether the data stream to be processed exists or not by utilizing the life cycle cache mapping table; if not, determining that the data streams are all configured in an associated manner.
Corresponding to the above method embodiment, the present application further provides an electronic device, and the electronic device described below and the data stream processing method described above may be referred to in correspondence.
Referring to fig. 4, the electronic device includes:
a memory 332 for storing a computer program;
a processor 322 for implementing the steps of the data stream processing method of the above-mentioned method embodiments when executing the computer program.
Specifically, referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device provided in this embodiment, which may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 322 (e.g., one or more processors) and a memory 332, where the memory 332 stores one or more computer applications 342 or data 344. Memory 332 may be, among other things, transient or persistent storage. The program stored in memory 332 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a data processing device. Still further, the central processor 322 may be configured to communicate with the memory 332 to execute a series of instruction operations in the memory 332 on the electronic device 301.
The electronic device 301 may also include one or more power sources 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341.
The steps in the data stream processing method described above may be implemented by the structure of an electronic device.
Corresponding to the above method embodiment, the present application further provides a readable storage medium, and a readable storage medium described below and a data stream processing method described above may be referred to correspondingly.
A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data stream processing method of the above-mentioned method embodiment.
The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Claims (10)

1. A method for processing a data stream, comprising:
asynchronously loading the data stream and the configuration stream;
judging whether the target configuration stream having the incidence relation with the loaded target data stream is loaded completely;
if not, storing the data object of the target data stream into a life cycle cache mapping table, and configuring the target data stream by using the target configuration stream after the target configuration stream is determined to be loaded;
and if so, configuring the target data stream by using the target configuration stream.
2. The method according to claim 1, wherein the determining whether the target configuration stream associated with the loaded target data stream is loaded includes:
acquiring whether the read-only broadcast state object is empty from the read-only context;
if yes, judging whether the configuration set of the read-only broadcast state object loads data of the required configuration dimension;
and if so, determining that the target configuration stream is completely loaded.
3. The data stream processing method of claim 1, wherein storing the data object of the target data stream into a lifecycle cache mapping table comprises:
judging whether the life cycle cache mapping table reaches a cache threshold value;
if not, the data object is placed into the life cycle cache mapping table, and the life cycle is set for the data object.
4. The data stream processing method of claim 3, further comprising:
traversing the life cycle cache mapping table, and finding out a target data object exceeding the life cycle;
and cleaning the target data object.
5. The data stream processing method of claim 4, wherein traversing the lifecycle cache mapping table to find target data objects that exceed the lifecycle comprises:
traversing the life cycle cache mapping table to obtain the cache time of each data object;
and determining the data object with the difference value between the cache time and the current system time larger than the corresponding life cycle as the target data object.
6. The data stream processing method of claim 1, further comprising:
and releasing the resources occupied by the life cycle cache mapping table in the application exit process.
7. The data stream processing method of claim 1, further comprising:
judging whether the data stream to be processed exists or not by utilizing the life cycle cache mapping table;
if not, determining that the data streams are all configured in an associated manner.
8. A data stream processing apparatus, comprising:
the data loading module is used for asynchronously loading the data stream and the configuration stream;
the judging module is used for judging whether the target configuration stream which has the incidence relation with the loaded target data stream is loaded completely;
a cache configuration module, configured to store the data object of the target data stream into a lifecycle cache mapping table if the target configuration stream is not completely loaded, and configure the target data stream using the target configuration stream after it is determined that the target configuration stream is completely loaded;
and the data configuration module is used for configuring the target data stream by using the target configuration stream if the target configuration stream is loaded completely.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data stream processing method according to any one of claims 1 to 7 when executing the computer program.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the data stream processing method according to any one of claims 1 to 7.
CN202111040434.0A 2021-09-06 2021-09-06 Data stream processing method, device, equipment and readable storage medium Active CN113703874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111040434.0A CN113703874B (en) 2021-09-06 2021-09-06 Data stream processing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111040434.0A CN113703874B (en) 2021-09-06 2021-09-06 Data stream processing method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113703874A true CN113703874A (en) 2021-11-26
CN113703874B CN113703874B (en) 2023-09-05

Family

ID=78660686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111040434.0A Active CN113703874B (en) 2021-09-06 2021-09-06 Data stream processing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113703874B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643854A (en) * 2002-04-30 2005-07-20 国际商业机器公司 Method and arrangement for local synchronization in master-slave distributed communication systems
CN101446890A (en) * 2007-11-29 2009-06-03 株式会社瑞萨科技 Stream processing apparatus, method for stream processing and data processing system
CN109240613A (en) * 2018-08-29 2019-01-18 平安科技(深圳)有限公司 Data cache method, device, computer equipment and storage medium
US10324845B1 (en) * 2017-07-28 2019-06-18 EMC IP Holding Company LLC Automatic placement of cache operations for complex in-memory dataflows
CN112422669A (en) * 2020-11-10 2021-02-26 济中节能技术(苏州)有限公司 Multi-association equipment data real-time extraction method and related device
CN113010373A (en) * 2021-01-25 2021-06-22 腾讯科技(深圳)有限公司 Data monitoring method and device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1643854A (en) * 2002-04-30 2005-07-20 国际商业机器公司 Method and arrangement for local synchronization in master-slave distributed communication systems
CN101446890A (en) * 2007-11-29 2009-06-03 株式会社瑞萨科技 Stream processing apparatus, method for stream processing and data processing system
US20090144527A1 (en) * 2007-11-29 2009-06-04 Hiroaki Nakata Stream processing apparatus, method for stream processing and data processing system
US10324845B1 (en) * 2017-07-28 2019-06-18 EMC IP Holding Company LLC Automatic placement of cache operations for complex in-memory dataflows
CN109240613A (en) * 2018-08-29 2019-01-18 平安科技(深圳)有限公司 Data cache method, device, computer equipment and storage medium
CN112422669A (en) * 2020-11-10 2021-02-26 济中节能技术(苏州)有限公司 Multi-association equipment data real-time extraction method and related device
CN113010373A (en) * 2021-01-25 2021-06-22 腾讯科技(深圳)有限公司 Data monitoring method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
FABIAN HUESKE: "State TTL for Apache Flink:How to Limit the Lifetime of state", pages 1 - 5, Retrieved from the Internet <URL:《www.ververica.com/blog/state-ttl-for-apache-flink-how-to-limit-the-lifetime-of-state》> *
千里风雪: "flink怎么保证广播流比数据流先到", pages 1 - 2, Retrieved from the Internet <URL:《www.blog.csdn.net/u012447842/article/details/113173716》> *
晨菲娱乐社: "Spark Streaming & Flink 广播实现作业配置动态更新", pages 1 - 5, Retrieved from the Internet <URL:《www.163.com/dy/article/GDR7SQGM0537378K.html》> *
王延升: "粗粒度动态可重构处理器中的高能效关键配置技术研究", 《中国优秀硕士论文 信息科技》 *
王玉真: "基于Flink的实时计算平台的设计与实现", 《中国优秀硕士论文 信息科技》 *

Also Published As

Publication number Publication date
CN113703874B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
US9430254B2 (en) Register mapping techniques
CN108399132B (en) Scheduling test method, device and storage medium
US9471386B2 (en) Allocating resources to tasks in a build process
US7908521B2 (en) Process reflection
CN109086193B (en) Monitoring method, device and system
US9098350B2 (en) Adaptive auto-pipelining for stream processing applications
US8806446B2 (en) Methods and apparatus for debugging programs in shared memory
CN109325016B (en) Data migration method, device, medium and electronic equipment
CN106708608B (en) Distributed lock service method, acquisition method and corresponding device
US9141551B2 (en) Specific prefetch algorithm for a chip having a parent core and a scout core
US20130036426A1 (en) Information processing device and task switching method
US20130080399A1 (en) Dynamically redirecting a file descriptor
CN115599448A (en) Loading method and device based on linux kernel ko module
CN111309548B (en) Timeout monitoring method and device and computer readable storage medium
US9032199B1 (en) Systems, devices, and methods for capturing information, creating loadable images, and providing for restarts in a computer system
CN113703874A (en) Data stream processing method, device, equipment and readable storage medium
US11036624B2 (en) Self healing software utilizing regression test fingerprints
CN115033337A (en) Virtual machine memory migration method, device, equipment and storage medium
JP2018538632A (en) Method and device for processing data after node restart
US8359456B2 (en) Generating random addresses for verification of distributed computerized devices
JP2004185345A (en) Debug method and system
US20160196140A1 (en) Data processing device, method of reporting predicate values, and data carrier
CN112799933B (en) Automatic test method and device
US8966496B2 (en) Lock free use of non-preemptive system resource
CN113110955B (en) System disk repair method, storage medium and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant