CN113760988A - Method, device, equipment and storage medium for associating and processing unbounded stream data - Google Patents

Method, device, equipment and storage medium for associating and processing unbounded stream data Download PDF

Info

Publication number
CN113760988A
CN113760988A CN202110158388.8A CN202110158388A CN113760988A CN 113760988 A CN113760988 A CN 113760988A CN 202110158388 A CN202110158388 A CN 202110158388A CN 113760988 A CN113760988 A CN 113760988A
Authority
CN
China
Prior art keywords
data
main
data stream
main data
stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110158388.8A
Other languages
Chinese (zh)
Inventor
安金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202110158388.8A priority Critical patent/CN113760988A/en
Publication of CN113760988A publication Critical patent/CN113760988A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for associating and processing unbounded data, wherein the method comprises the following steps: acquiring a main data stream and an associated data stream associated with the main data stream; determining a main data acquisition path of the main data flow and an associated data acquisition path of the associated data flow according to the connection type of the main data flow and the associated data flow; acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path; and performing association processing on the target main data and the target associated data to obtain an association processing result. The method provided by the embodiment of the invention ensures that different connection types can reasonably acquire complete associated data, avoids data loss during data flow delay or network jitter, and improves the accuracy of data processing.

Description

Method, device, equipment and storage medium for associating and processing unbounded stream data
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device, equipment and a storage medium for associating and processing unbounded stream data.
Background
In the stream processing application, there is an association relationship between data streams, and in order to implement association analysis of the data streams associated with each other, a common processing manner is to collect and receive data of the data streams associated with the association relationship by using a stream processing window, and perform data analysis between the association streams according to the data received by the stream processing window.
In the process of implementing the invention, the inventor finds that at least the following technical problems exist in the prior art: when any data stream is delayed or network jitters, complete associated data cannot be acquired, data loss is caused, and the data processing effect is poor.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for associated processing of unbounded data flow, so as to ensure the integrity of the associated data flow and improve the data processing effect.
In a first aspect, an embodiment of the present invention provides a method for associating and processing unbounded data, including:
acquiring a main data stream and an associated data stream associated with the main data stream;
determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection type of the main data stream and the associated data stream;
acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path;
and performing association processing on the target main data and the target associated data to obtain an association processing result.
In a second aspect, an embodiment of the present invention further provides an apparatus for associating and processing unbounded stream data, including:
a data stream obtaining module, configured to obtain a main data stream and an associated data stream associated with the main data stream;
the acquisition path determining module is used for determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection type of the main data stream and the associated data stream;
the data acquisition module is used for acquiring target main data based on the main data acquisition path and acquiring target associated data based on the associated data acquisition path;
and the data processing module is used for performing association processing on the target main data and the target associated data to obtain an association processing result.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for associating processing of unbounded data stream as provided by any embodiment of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for associating and processing unbounded stream data according to any embodiment of the present invention.
The embodiment of the invention obtains the main data flow and the associated data flow associated with the main data flow; determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection types of the main data stream and the associated data stream, thereby ensuring that different connection types can reasonably acquire complete associated data; acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path; and performing correlation processing on the target main data and the target correlation data to obtain a correlation processing result, so that data loss caused by data flow delay or network jitter is avoided, and the accuracy of data processing is improved.
Drawings
Fig. 1 is a flowchart of an unbounded data association processing method according to an embodiment of the present invention;
fig. 2a is a schematic diagram of an unbounded data association processing method according to a second embodiment of the present invention;
FIG. 2b is a flow chart illustrating a persistent storage of an associated stream according to a second embodiment of the present invention;
fig. 2c is a schematic flow chart of a data flow association calculation according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for associating and processing unbounded stream data according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of an unbounded data association processing method according to an embodiment of the present invention. The present embodiment is applicable to the case where association processing is performed on unlimited stream data. The method may be performed by an unbounded data associated processing apparatus, which may be implemented in software and/or hardware, for example, which may be configured in a computer device. As shown in fig. 1, the method includes:
s110, acquiring a main data stream and an associated data stream associated with the main data stream.
In the present embodiment, the main data stream and the associated data stream are data streams for performing association analysis. Both the main data stream and the associated data stream may be determined according to data processing requirements. Taking the e-commerce platform as an example, if the data processing requirement is the characteristic analysis of successful order transaction, and considering that the browsing, collecting and delivery speeds of the articles in the order may all have an association relationship with the successful order transaction, the order flow can be used as a main data flow, and the browsing flow, the collecting flow, the logistics information flow and the like can be used as an associated data flow. It will be appreciated that the association key between the main data stream and the different associated data streams may be different or the same. Taking the main data flow as the order flow, and the associated data flow as the browsing flow, the collecting flow and the logistics information flow as examples, the association key between the order flow, the browsing flow and the collecting flow can be the item identifier, and the association key between the order flow and the logistics information flow can be the logistics identifier.
In one embodiment, obtaining a main data stream and an associated data stream associated with the main data stream includes: in response to the detected data processing request, determining a main data stream and an associated condition according to a data processing type of the data processing request; and determining an associated data stream associated with the main data stream and an associated key between the main data stream and the associated data stream according to the association condition. Optionally, when the data processing request is detected, the main data stream and the associated data stream are determined according to the data processing request. For example, a user may manually set the main data stream and the associated conditions, and initiate a data processing request with the information. And after receiving the data processing request, the unbounded data association processing device analyzes the data processing request to obtain the main data stream and the association condition. The main data stream and the associated conditions corresponding to the data processing type can also be preset, and after a user initiates a data processing request, the unbounded data associated processing device determines the main data stream and the associated conditions of the data processing request based on the preset corresponding relation according to the data processing type of the data processing request. The association condition may be one or more. For example, the association condition may be an item identification association, or the association condition may be a user identification or an item identification association. After obtaining the association condition, at least one associated data stream can be determined according to the information carried in each data stream, and finally, an association key between the main data stream and each associated data stream is determined according to the association relationship between the main data stream and the associated data stream.
On the basis of the scheme, the method further comprises the following steps: determining the connection type of the main data flow and the associated data flow; selecting a data stream from the main data stream and the associated data stream as a storage data stream according to the connection type; and taking the association key as a storage main key for storing the data stream, and storing the storage data stream based on the storage main key. . When performing association analysis on data in the main data stream and the associated data stream, processing needs to be performed according to a combination formed by aggregating and connecting the data in the main data stream and the data in the associated data stream. Different connection types correspond to different connection modes. In order to ensure that any connection mode can acquire complete associated data, in this embodiment, it is determined to perform persistent storage on the main data stream or perform persistent storage on the associated data stream according to the connection type. When persistent storage of a stored data stream is performed, information such as a table name and a primary key may be written in a file in a key-value format.
Generally, the connection mode of the main data in the main data stream and the associated data in the associated data stream includes four types, i.e., an inner connection, a left outer connection, a right outer connection, and a full outer connection. When the connection type is internal connection, only the matched data in the main data stream and the associated data stream are aggregated into a combination; when the connection type is left external connection, aggregating all data in the main data stream into a combination; when the connection type is the outer right connection, aggregating all data in the associated data streams into a combination; when the connection type is an all-out connection, all data in the main data stream and the associated data stream are aggregated into a combination. In order to ensure that different connection types can acquire complete data, a data stream is selected from the main data stream and the associated data stream as a storage data stream in a connection mode corresponding to the connection type, and persistent storage is performed. Specifically, when the connection type is internal connection and right external connection, the associated data stream is used as a storage data stream, and the real-time data of the associated data stream is stored persistently. And when the connection types are left external connection and full external connection, the main data stream is used as a storage data stream, and the real-time data of the main data stream is subjected to persistent storage.
S120, determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection type of the main data stream and the associated data stream.
In this embodiment, different data stream processing methods are set for different connection types. Therefore, when data acquisition is performed, different connection types and different data flows have different data acquisition paths.
According to the above embodiments, when the connection types are left external connection and full external connection, the main data stream is used as the storage data stream, and the real-time data of the main data stream is persistently stored. Therefore, when the connection types are left external connection and full external connection, determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection types of the main data stream and the associated data stream includes: and determining the associated data acquisition path as a real-time data stream, and determining the main data acquisition path as the real-time data stream and a storage space. Specifically, the data of the associated data stream is obtained from the real-time data stream, and the data of the main data stream is obtained from the real-time data stream and the storage space.
Likewise, when the connection types are inner and right outer, the associated data stream is taken as the stored data stream. Therefore, when the connection type is an inner connection and a right outer connection, determining a main data acquisition path of the main data flow and an associated data acquisition path of the associated data flow according to the connection type of the main data flow and the associated data flow includes: and determining the main data acquisition path as a real-time data stream, and determining the associated data acquisition path as the real-time data stream and a storage space. Specifically, the data of the main data stream is obtained from the real-time data stream, and the data of the associated data stream is obtained from the real-time data stream and the storage space.
S130, acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path.
After the data acquisition path of each data stream is determined, the data of each data stream is acquired based on the determined data acquisition path.
When the acquiring path of the main data stream is the real-time data stream and the acquiring path of the associated data stream is the real-time data stream and the storage space, acquiring target associated data based on the associated data acquiring path, including: acquiring an association key between a main data stream and an associated data stream; acquiring real-time associated data from the real-time data stream by taking the associated key as a main key, and determining unmatched main data which is not matched in the target main data according to the target main data and the real-time associated data; taking the associated key of the unmatched main data as a main key to obtain storage associated data corresponding to the unmatched main data from the storage space; and merging and de-duplicating the real-time associated data and the stored associated data to obtain target associated data. Specifically, the target main data is obtained from the real-time data stream. Acquiring real-time associated data from the real-time data stream, determining unmatched main data which are not matched with the real-time associated data in the target main data, then taking an associated key of the unmatched main data as a main key to acquire storage associated data corresponding to the unmatched main data from the storage space, and combining and de-duplicating the real-time associated data and the storage associated data to obtain the target associated data. Optionally, the storage space may include a local cache and a persistent storage unit, where the local cache has a small space but a fast data processing speed, and the persistent storage unit has a large space but a relatively slow data processing speed. Correspondingly, taking the associated key of the unmatched main data as the main key to obtain the storage associated data corresponding to the unmatched main data from the storage space may specifically be: traversing whether the associated data corresponding to the associated key exists in the local cache or not, and if so, taking the associated data as storage associated data; and if not, traversing whether the associated data corresponding to the associated key exists in the persistent storage unit or not, if so, taking the associated data as storage associated data, and correspondingly storing the storage associated data and the associated key into a local cache.
When the acquisition path of the associated data stream is a real-time data stream and the acquisition path of the main data stream is a real-time data stream and a storage space, acquiring target main data based on the main data acquisition path, including: acquiring an association key between a main data stream and an associated data stream; acquiring real-time main data from the real-time data stream by taking the association key as a main key, and determining unmatched association data which are not matched in the target association data according to the target association data and the real-time main data; taking the associated key of the unmatched associated data as a main key to obtain main storage data corresponding to the unmatched associated data from the storage space; and merging and de-duplicating the real-time main data and the stored main data to obtain target main data. Specifically, target associated data is obtained from the real-time data stream. The method comprises the steps of obtaining real-time main data from a real-time data stream, determining unmatched associated data which are not matched with the real-time main data in target associated data, then taking an associated key of the unmatched associated data as a main key to obtain stored main data corresponding to the unmatched associated data from a storage space, and combining and de-duplicating the real-time main data and the stored main data to obtain the target associated data. Optionally, taking the associated key of the unmatched associated data as the primary key to obtain the storage primary data corresponding to the unmatched associated data from the storage space may specifically be: traversing whether data corresponding to the association key exists in the local cache or not, and if so, taking the data as main storage data; and if the data does not exist, traversing whether the data corresponding to the association key exists in the persistent storage unit, if so, taking the data as main storage data, and correspondingly storing the main storage data and the association key into a local cache.
And S140, performing association processing on the target main data and the target associated data to obtain an association processing result.
And after the target main data and the target associated data are obtained, storing the target main data and the target associated data into a memory database for associated calculation, and issuing a calculation result to a result stream. The specific association calculation method may be determined based on the data processing requirement, and is not described herein again.
The embodiment of the invention obtains the main data flow and the associated data flow associated with the main data flow; determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection types of the main data stream and the associated data stream, thereby ensuring that different connection types can reasonably acquire complete associated data; acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path; and performing correlation processing on the target main data and the target correlation data to obtain a correlation processing result, effectively avoiding data loss during data stream delay or network jitter, and improving the accuracy of data processing.
Example two
The present embodiment provides a preferred embodiment based on the above-described scheme. In this embodiment, a method for associating and processing unbounded stream data is described by taking an associated data stream as a storage data stream as an example.
Fig. 2a is a schematic diagram of an unbounded data association processing method according to a second embodiment of the present invention. As shown in fig. 2a, when the main data stream1 and the associated data stream2 are associated in real time, data preprocessing is performed first. A persistent storage unit and a passive cache layer are added in the processing flow of the associated data stream 2. The persistent storage unit stores data in persistent storage, such as hbase. The passive cache layer may be a local memory cache (e.g., guava cache) and put data that is not sensitive to change, e.g., only data that is not updated is inserted, and dimension table data, into the cache. When two streams are associated for calculation, the data which is not associated with the two streams is removed from the passive cache and the persistent storage through the data loader. Therefore, even if the data of the data stream is delayed, the data can be still taken from the persistent storage for calculation after the data is issued, all the existing data in the two streams are ensured to be calculated in an associated manner, so that the data cannot be lost, and the associated calculation is accelerated and the reading of the persistent storage is reduced by adding the passive cache.
Fig. 2b is a schematic flowchart of an associated stream persistent storage according to a second embodiment of the present invention. As shown in FIG. 2b, associating the stream persistence store includes determining the primary key, updating the configuration file, data deduplication within the window, and updating the persistence store. Determining a primary key: firstly, according to the association condition of two stream association calculation, determining the association key when each flow is used as the primary key of the persistent storage. And (3) updating the configuration file: in order to facilitate later task management, a configuration file is added, and information such as a table name and a primary key of persistent storage corresponding to each real-time stream is recorded. After the primary key is determined, the real-time stream name, the name of the persistent storage table, the primary key, the used field information and the like need to be updated into the configuration file. Data deduplication within the window: in order to reduce the interaction with the persistent storage, a stream processing window (for example, a window of 2 seconds) can be set, the deduplication operation is performed in the stream processing window, and only one piece of data in the latest state is reserved for the data of the same main key, so that the write operation to the persistent storage can be reduced. Determining tables that can be cached passively: and writing the deduplicated data into a persistent storage system.
Fig. 2c is a schematic flow chart of data stream association calculation according to a second embodiment of the present invention. As shown in fig. 2c, the data flow association calculation process mainly includes:
1. and loading the data of the main data stream1 and extracting the associated key. Taking the main data stream1 as the main stream, the data in the current window is organized into a data set form, such as Map1< key, data > form. The key in Map1 is the associated key, which is also the primary key in the persistent store.
2. Loading the associated data stream2 from persistent storage requires the associated data.
(1) And querying a local cache through a main key, if the cache is hit, returning the hit storage associated data DataLoadStream2< key, data >, and entering the step 3.
If the cache does not hit, the missed data is marked as unassociated data UnJoinStream1 and step (2) is entered.
(2) And traversing the associated key values in UnJoinStream1, inquiring the persistent storage, and obtaining hit data persistent associated data DataLoadStream2< key, data >.
(3) And writing the persistent associated data DataLoadStream2< key, data > into the local cache so that the data can be hit in the local cache subsequently, and the processing speed is increased.
3. Data merging
The Map1< key, data > of the main data stream1 and the storage associated data DataLoadStream2< key, data > loaded from the cache or the persistent storage are merged to obtain a data set to be calculated, namely, the ToBeCalculatedMap1< key, data >.
4. And loading the data set obtained after merging into the memory database.
5. And performing correlation calculation in an in-memory database.
6. And sending the calculation result to a result stream.
In the embodiment of the invention, in the real-time computation process of mass data, when two or more streams are subjected to correlation computation through the stream processing window, when a certain stream is delayed or jittered, persistent storage and automatic loading of unassociated data are introduced, and a passive cache (local cache) is added on the persistent cache, so that the computation performance is improved, the access pressure of the persistent storage is reduced, and the problem that the data cannot be associated and not transmitted and the result stream contains a small amount of data is solved skillfully. The robustness, stability and fault tolerance of mass data multi-stream real-time correlation calculation are effectively enhanced, and automation and intelligentization convenience are provided for operation and maintenance of mass real-time calculation.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an apparatus for associating and processing unbounded stream data according to a third embodiment of the present invention. The unbounded data association processing apparatus may be implemented in software and/or hardware, for example, the unbounded data association processing apparatus may be configured in a computer device. As shown in fig. 3, the apparatus includes a data stream acquiring module 310, an acquiring path determining module 320, a data acquiring module 330, and a data processing module 340, wherein:
a data stream obtaining module 310, configured to obtain a main data stream and an associated data stream associated with the main data stream;
an obtaining path determining module 320, configured to determine a main data obtaining path of the main data stream and an associated data obtaining path of the associated data stream according to connection types of the main data stream and the associated data stream;
a data obtaining module 330, configured to obtain target main data based on the main data obtaining path, and obtain target associated data based on the associated data obtaining path;
and the data processing module 340 is configured to perform association processing on the target main data and the target associated data to obtain an association processing result.
The embodiment of the invention obtains the main data flow and the associated data flow associated with the main data flow; determining a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to the connection types of the main data stream and the associated data stream, thereby ensuring that different connection types can reasonably acquire complete data; acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path; and performing correlation processing on the target main data and the target correlation data to obtain a correlation processing result, so that data loss caused by data flow delay or network jitter is avoided, and the accuracy of data processing is improved.
Optionally, on the basis of the foregoing scheme, the connection type includes an inner connection and a right outer connection, and the acquisition path determining module 320 is specifically configured to:
and determining the main data acquisition path as a real-time data stream, and determining the associated data acquisition path as the real-time data stream and a storage space.
Optionally, on the basis of the foregoing scheme, the data obtaining module 330 is specifically configured to:
acquiring an association key between a main data stream and an associated data stream;
acquiring real-time associated data from the real-time data stream by taking the associated key as a main key, and determining unmatched main data which is not matched in the target main data according to the target main data and the real-time associated data;
taking the associated key of the unmatched main data as a main key to obtain storage associated data corresponding to the unmatched main data from the storage space;
and merging and de-duplicating the real-time associated data and the stored associated data to obtain target associated data.
Optionally, on the basis of the foregoing scheme, the connection type includes a left external connection and a full external connection, and the acquisition path determining module 320 is specifically configured to:
and determining the main data acquisition path as a real-time data stream and a storage space, and determining the associated data acquisition path as a real-time data stream.
Optionally, on the basis of the foregoing scheme, the data obtaining module 330 is specifically configured to:
acquiring an association key between a main data stream and an associated data stream;
acquiring real-time main data from the real-time data stream by taking the association key as a main key, and determining unmatched association data which are not matched in the target association data according to the target association data and the real-time main data;
taking the associated key of the unmatched associated data as a main key to obtain main storage data corresponding to the unmatched associated data from the storage space;
and merging and de-duplicating the real-time main data and the stored main data to obtain target main data.
Optionally, on the basis of the foregoing scheme, the data stream obtaining module 310 is specifically configured to:
in response to the detected data processing request, determining a main data stream and an associated condition according to a data processing type of the data processing request;
and determining an associated data stream associated with the main data stream according to the association condition, and determining an association key between the main data stream and the associated data stream based on the association condition.
Optionally, on the basis of the foregoing scheme, the apparatus further includes a stored data stream determining module, configured to:
determining the connection type of the main data flow and the associated data flow;
selecting a data stream from the main data stream and the associated data stream as a storage data stream according to the connection type;
and taking the association key as a storage main key for storing the data stream, and storing the storage data stream based on the storage main key.
The device for associating and processing unbounded stream data provided by the embodiment of the invention can execute the method for associating and processing unbounded stream data provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 4 is a schematic structural diagram of a computer device according to a sixth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 412 suitable for use in implementing embodiments of the present invention. The computer device 412 shown in FIG. 4 is only one example and should not impose any limitations on the functionality or scope of use of embodiments of the present invention.
As shown in FIG. 4, computer device 412 is in the form of a general purpose computing device. Components of computer device 412 may include, but are not limited to: one or more processors 414, a system memory 428, and a bus 418 that couples the various system components (including the system memory 428 and the processors 414).
Bus 418 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 414 or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 412 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 428 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)430 and/or cache memory 432. The computer device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Memory 428 can include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in memory 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.
The computer device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing device, display 424, etc.), with one or more devices that enable a user to interact with the computer device 412, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 412 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 422. Also, computer device 412 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) through network adapter 420. As shown, network adapter 420 communicates with the other modules of computer device 412 over bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 412, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processor 414 executes programs stored in the system memory 428 to execute various functional applications and data processing, for example, implement the method for associating and processing unbounded stream data provided by the embodiment of the present invention, the method includes:
acquiring a main data stream and an associated data stream associated with the main data stream;
determining a main data acquisition path of the main data flow and an associated data acquisition path of the associated data flow according to the connection type of the main data flow and the associated data flow;
acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path;
and performing association processing on the target main data and the target associated data to obtain an association processing result.
Of course, those skilled in the art can understand that the processor may also implement the technical solution of the method for associating and processing unbounded data provided in any embodiment of the present invention.
EXAMPLE five
The fifth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for associating and processing unbounded data provided in the fifth embodiment of the present invention, where the method includes:
acquiring a main data stream and an associated data stream associated with the main data stream;
determining a main data acquisition path of the main data flow and an associated data acquisition path of the associated data flow according to the connection type of the main data flow and the associated data flow;
acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path;
and performing association processing on the target main data and the target associated data to obtain an association processing result.
Of course, the computer program stored on the computer-readable storage medium provided by the embodiment of the present invention is not limited to the above method operations, and may also perform related operations of the method for associating and processing unbounded stream data provided by any embodiment of the present invention.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for processing unbounded data association is characterized by comprising the following steps:
acquiring a main data stream and an associated data stream associated with the main data stream;
determining a main data acquisition path of the main data flow and an associated data acquisition path of the associated data flow according to the connection type of the main data flow and the associated data flow;
acquiring target main data based on the main data acquisition path, and acquiring target associated data based on the associated data acquisition path;
and performing association processing on the target main data and the target associated data to obtain an association processing result.
2. The method of claim 1, wherein the connection types comprise an inner connection and a right outer connection, and wherein determining the main data acquisition path of the main data flow and the associated data acquisition path of the associated data flow according to the connection types of the main data flow and the associated data flow comprises:
and determining that the main data acquisition path is a real-time data stream, and the associated data acquisition path is a real-time data stream and a storage space.
3. The method of claim 2, wherein the obtaining target associated data based on the associated data obtaining path comprises:
acquiring an association key between the main data stream and the associated data stream;
acquiring real-time associated data from a real-time data stream by taking the associated key as a main key, and determining unmatched main data which is not matched in the target main data according to the target main data and the real-time associated data;
taking the associated key of the unmatched main data as a main key to obtain storage associated data corresponding to the unmatched main data from the storage space;
and merging and de-duplicating the real-time associated data and the stored associated data to obtain the target associated data.
4. The method of claim 1, wherein the connection types comprise a left outer connection and a full outer connection, and wherein determining the main data acquisition path of the main data flow and the associated data acquisition path of the associated data flow according to the connection types of the main data flow and the associated data flow comprises:
and determining the main data acquisition path as a real-time data stream and a storage space, and determining the associated data acquisition path as a real-time data stream.
5. The method of claim 4, wherein the obtaining target master data based on the master data obtaining path comprises:
acquiring an association key between the main data stream and the associated data stream;
acquiring real-time main data from a real-time data stream by taking the association key as a main key, and determining unmatched association data which are not matched in the target association data according to the target association data and the real-time main data;
taking the associated key of the unmatched associated data as a main key to acquire main storage data corresponding to the unmatched associated data from the storage space;
and merging and de-duplicating the real-time main data and the stored main data to obtain the target main data.
6. The method of claim 1, wherein obtaining a main data stream and an associated data stream associated with the main data stream comprises:
in response to a detected data processing request, determining a main data stream and an associated condition according to a data processing type of the data processing request;
and determining an associated data stream associated with the main data stream and an associated key between the main data stream and the associated data stream according to the association condition.
7. The method of claim 5, further comprising:
determining a connection type of the main data stream and the associated data stream;
selecting a data stream from the main data stream and the associated data stream as a storage data stream according to the connection type;
and taking the association key as a storage primary key of the storage data stream, and storing the storage data stream based on the storage primary key.
8. An apparatus for associating and processing unbounded data, comprising:
a data stream obtaining module, configured to obtain a main data stream and an associated data stream associated with the main data stream;
an acquisition path determining module, configured to determine a main data acquisition path of the main data stream and an associated data acquisition path of the associated data stream according to a connection type of the main data stream and the associated data stream;
the data acquisition module is used for acquiring target main data based on the main data acquisition path and acquiring target associated data based on the associated data acquisition path;
and the data processing module is used for performing association processing on the target main data and the target associated data to obtain an association processing result.
9. A computer device, the device comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of unbounded data association processing as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the unbounded data-related processing method according to any one of claims 1 to 7.
CN202110158388.8A 2021-02-04 2021-02-04 Method, device, equipment and storage medium for associating and processing unbounded stream data Pending CN113760988A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110158388.8A CN113760988A (en) 2021-02-04 2021-02-04 Method, device, equipment and storage medium for associating and processing unbounded stream data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110158388.8A CN113760988A (en) 2021-02-04 2021-02-04 Method, device, equipment and storage medium for associating and processing unbounded stream data

Publications (1)

Publication Number Publication Date
CN113760988A true CN113760988A (en) 2021-12-07

Family

ID=78786544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110158388.8A Pending CN113760988A (en) 2021-02-04 2021-02-04 Method, device, equipment and storage medium for associating and processing unbounded stream data

Country Status (1)

Country Link
CN (1) CN113760988A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116126872A (en) * 2023-04-18 2023-05-16 紫金诚征信有限公司 Correlation method, device and computer readable medium for real-time dimension table

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116126872A (en) * 2023-04-18 2023-05-16 紫金诚征信有限公司 Correlation method, device and computer readable medium for real-time dimension table
CN116126872B (en) * 2023-04-18 2023-06-23 紫金诚征信有限公司 Correlation method, device and computer readable medium for real-time dimension table

Similar Documents

Publication Publication Date Title
CN109992454B (en) Method, device and storage medium for fault location
CN109471851B (en) Data processing method, device, server and storage medium
CN111352902A (en) Log processing method and device, terminal equipment and storage medium
US8977587B2 (en) Sampling transactions from multi-level log file records
CN110134869B (en) Information pushing method, device, equipment and storage medium
CN109561212B (en) Merging method, device, equipment and storage medium for published information
CN111061740A (en) Data synchronization method, equipment and storage medium
CN110990346A (en) File data processing method, device, equipment and storage medium based on block chain
CN109033456B (en) Condition query method and device, electronic equipment and storage medium
CN113238815B (en) Interface access control method, device, equipment and storage medium
CN113760988A (en) Method, device, equipment and storage medium for associating and processing unbounded stream data
CN111367813B (en) Automatic testing method and device for decision engine, server and storage medium
CN112948396A (en) Data storage method and device, electronic equipment and storage medium
CN112100092B (en) Information caching method, device, equipment and medium
CN112487025A (en) Data query method and device, electronic equipment and storage medium
CN112039975A (en) Method, device, equipment and storage medium for processing message field
CN111753141B (en) Data management method and related equipment
CN111930385A (en) Data acquisition method, device, equipment and storage medium
CN111913861A (en) Performance test method, device, equipment and medium of Internet of things system
CN107894942B (en) Method and device for monitoring data table access amount
CN113760903A (en) Method, device, equipment and storage medium for associating and processing unbounded stream data
CN112163127B (en) Relationship graph construction method and device, electronic equipment and storage medium
CN112948410A (en) Data processing method, device, equipment and medium
CN109635228B (en) Method, device, equipment and storage medium for determining difference degree between ordered arrays
CN112364268A (en) Resource acquisition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination