CN113515547B

CN113515547B - Out-of-order processing method, device, medium and equipment for multi-association real-time data stream

Info

Publication number: CN113515547B
Application number: CN202110839677.4A
Authority: CN
Inventors: 马明辉; 王彬; 宋建锋; 张翼
Original assignee: Beijing Euronet Alliance Technology Co ltd
Current assignee: Beijing Euronet Alliance Technology Co ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2023-11-24
Anticipated expiration: 2041-07-23
Also published as: CN113515547A

Abstract

The invention provides a method, a device, a medium and equipment for processing out-of-order of multi-association real-time data streams, wherein the method comprises the following steps: monitoring log changes of a main table and/or an associated table in a database in real time; when the log changes, the main table change record and the associated table change record are written into corresponding topics of message queuing in parallel in a multi-partition mode; acquiring multi-partition data streams of a main table and an associated table; loading a business main key of the main table into a cache region; acquiring a service external key, and judging whether a corresponding service main key can be acquired from a cache area; if the service primary key can be acquired, merging the multi-partition data streams of the primary table and the association table into independent partition data streams respectively, carrying out grouping processing on the independent partition data streams according to the appointed field, carrying out sequencing processing on the data stream of each group, consuming the data stream of each group, and executing service logic processing on the consumed data stream. The method can solve the occurrence of disorder in a high-concurrency multi-association data stream real-time scene.

Description

Out-of-order processing method, device, medium and equipment for multi-association real-time data stream

Technical Field

The present invention relates to the field of processing, and in particular, to a method, an apparatus, a medium, and a device for out-of-order processing of multiple associated real-time data streams.

Background

With the increasing data volume in the big data age, the variety of data and the increasing application of real-time big data are increased, and the Hadoop as an offline high-throughput and low-response framework cannot meet the requirements, so that the real-time computing framework is used for carrying out micro batch or streaming processing to reduce the data delay so as to ensure that the data can be efficiently presented to users.

When the real-time computing framework is used for processing data, the external data is stored in the message queue, the data under the corresponding theme is consumed in real time through the real-time framework, and business logic processing is carried out, so that the data delay is degraded.

The inventors have found that in the process of implementing the present invention, at least the following disadvantages exist in the prior art:

when a real-time computing framework is applied to process a data stream, the problems of data delay, loss and repetition are faced, so that delayed data and consumed offsets are required to be stored in a buffer to ensure the accuracy of the data, but in the case of multi-partition for improving the parallelism of tasks, the problem of sequence is difficult to ensure, and therefore, in some business scenes needing to ensure the sequence, deviation is caused.

Disclosure of Invention

In view of the above, the present invention provides a method, apparatus, medium and device for processing multiple associated real-time data streams in order to solve the problem of disorder occurring due to different data streams in a high concurrent multiple associated real-time data stream scenario.

To achieve the above object, in a first aspect, there is provided a method for out-of-order processing of multiple associated real-time data streams, including:

monitoring log changes of a main table and an associated table in a database in real time;

when the log of a main table and/or an associated table in the database changes, obtaining a main table change record and/or an associated table change record, writing the main table change record into a main table theme of a message queue in parallel in a multi-partition mode, and writing the associated table change record into an associated table theme of the message queue in parallel in a multi-partition mode;

starting a main table real-time process to acquire multi-partition data streams of the main table from the main table theme, and starting an associated table real-time process to acquire multi-partition data streams of the associated table from the associated table theme;

loading the business primary key of the primary table into a cache region of the database;

acquiring a service external key after acquiring the data stream of the association table, and judging whether a corresponding service main key can be acquired from the cache region;

If the service primary key can be acquired, respectively merging the multi-partition data stream of the primary table and the multi-partition data stream of the association table into a single partition data stream, carrying out packet processing on each single partition data stream according to a designated field to obtain a plurality of packets, carrying out sequencing processing on the data stream in each packet, consuming the data stream in each packet according to the sequence determined by the sequencing processing, and executing service logic processing on the consumed data stream.

In some possible embodiments, the performing business logic processing on the post-consumer data stream includes: and synchronizing the data stream of the consumed main table and the data stream of the consumed association table to the target library after translation.

In some possible embodiments, the method further comprises: and if the corresponding service primary key cannot be acquired from the cache region, writing the data stream of the association table into the cache region to wait for reloading.

In some possible embodiments, the database is a relational database; the table names of each table in the main table and the associated table are the subjects of the message queue; the method comprises the steps of carrying out packet processing on each individual partition data stream according to a designated field to obtain a plurality of packets, and specifically comprises the following steps:

Grouping the independent partition data streams of the main table according to the service main key and the corresponding transaction identification field thereof, and grouping the independent partition data streams of the association table according to the service external key and the corresponding transaction identification field thereof to obtain a plurality of groups;

the method specifically comprises the steps of performing sorting processing on the data flow in each packet, and consuming the data flow in each packet according to the sequence determined by the sorting processing:

for the data flow in each packet, carrying out ascending sort processing according to the event time;

and sequentially consuming the data streams in each packet according to the sequence determined after the ascending sort processing according to the event time.

In a second aspect, an out-of-order processing apparatus for multiple associated real-time data streams is provided, comprising:

the real-time monitoring acquisition module is used for monitoring log changes of the main table and the associated table in the database in real time;

the real-time synchronization module is used for obtaining a main table change record and/or an associated table change record when the log of the main table and/or the associated table in the database changes, writing the main table change record into a main table theme of the message queue in parallel in a multi-partition mode, and writing the associated table change record into an associated table theme of the message queue in parallel in a multi-partition mode;

The process starting module is used for starting a main table real-time process to acquire multi-partition data streams of the main table from the main table theme, and starting an associated table real-time process to acquire multi-partition data streams of the associated table from the associated table theme;

the loading module is used for loading the business primary key of the primary table to the cache region of the database;

the judging module is used for acquiring the service external key after acquiring the data stream of the association table and judging whether the corresponding service main key can be acquired from the cache region or not;

and the processing module is used for respectively merging the multi-partition data stream of the main table and the multi-partition data stream of the association table into a single partition data stream if the service main key can be acquired, carrying out packet processing on each single partition data stream according to the designated field to obtain a plurality of packets, carrying out sequencing processing on the data stream in each packet, consuming the data stream in each packet according to the sequence determined by the sequencing processing, and executing service logic processing on the consumed data stream.

In some possible embodiments, the processing module is specifically configured to post-translate and synchronize the data stream of the post-consumer primary table and the data stream of the post-consumer association table to the target library.

In some possible implementations, the processing module is further configured to write, if the corresponding primary service key cannot be obtained from the buffer, the data stream of the association table into the buffer to wait for reloading.

In some possible embodiments, the database is a relational database; the table names of each table in the main table and the associated table are the subjects of the message queue; the processing module is specifically configured to perform packet processing on the individual partition data stream of the main table according to the service main key and the corresponding transaction identification field thereof, and perform packet processing on the individual partition data stream of the association table according to the service external key and the corresponding transaction identification field thereof, so as to obtain a plurality of packets; for the data flow in each packet, carrying out ascending sort processing according to the event time; and sequentially consuming the data streams in each packet according to the sequence determined after the ascending sort processing according to the event time.

In a third aspect, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements any of the methods of out-of-order processing of multiple associated real-time data streams as described above.

In a fourth aspect, there is provided a computer device comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement any of the methods of out-of-order processing of multiple associated real-time data streams described above.

The technical scheme has the advantages that:

the embodiment of the invention not only ensures the delay and repetition of the data flow of the multiple association tables in a high concurrency real-time scene, but also effectively solves the problem of disorder of the data flow of the multiple association tables. From the experimental results, by adopting the method of the embodiment of the invention, even if the input end has frequent repeated operation, the application end can be ensured to have accurate data, and the problem of disorder in various business scenes is basically satisfied. The technical scheme of the embodiment of the invention is simple to realize and meets the application requirement.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for out-of-order processing of multiple associated real-time data streams in accordance with an embodiment of the present invention;

FIG. 2 is another specific flow diagram of an out-of-order processor of a multi-associated real-time data stream in accordance with an embodiment of the present invention;

FIG. 3 is a functional block diagram of an out-of-order processing apparatus for multiple associated real-time data streams according to an embodiment of the present invention;

FIG. 4 is a functional block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be noted that, without conflict, the following embodiments and features in the embodiments may be combined with each other; and, based on the embodiments in this disclosure, all other embodiments that may be made by one of ordinary skill in the art without inventive effort are within the scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present disclosure, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

Example 1

Fig. 1 is a flowchart of a method for out-of-order processing of multiple associated real-time data streams according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

s110, monitoring log changes of a main table and an associated table in a database in real time;

s120, when the log of the main table and/or the association table in the database changes, obtaining a main table change record and/or an association table change record, writing the main table change record into a main table theme of the message queue in parallel in a multi-partition mode, and writing the association table change record into an association table theme of the message queue in parallel in a multi-partition mode;

s130, starting a main table real-time process to acquire multi-partition data streams of a main table from a main table theme, and starting an associated table real-time process to acquire multi-partition data streams of an associated table from an associated table theme;

s140, loading the business primary key of the primary table into a cache region of a database;

specifically, the buffer area may be a buffer area of a memory database, and the memory database rediss is used as a buffer. The buffer area stores the association relation between some main keys and service keys, or writes the delayed data into the buffer, and loads the data of the buffer area to process service logic together when the next batch of data is acquired.

S150, acquiring a service external key after acquiring the data stream of the association table, and judging whether a corresponding service main key can be acquired from the cache region;

s160, if the service primary key can be acquired, respectively merging the multi-partition data stream of the primary table and the multi-partition data stream of the association table into a single partition data stream, carrying out packet processing on each single partition data stream according to the designated field to acquire a plurality of packets, carrying out sequencing processing on the data stream in each packet, consuming the data stream in each packet according to the sequence determined by the sequencing processing, and executing service logic processing on the consumed data stream.

Specifically, if the service primary key can be obtained in this step, the data streams of the association table in the buffer area are loaded, and the multi-partition data streams of the primary table and the association table are respectively combined into one partition data stream, and after being respectively grouped according to the service primary key, the service external key and the corresponding transaction identification fields thereof, the service external key and the corresponding transaction identification fields are ordered according to the ascending sequence of event time, so that the data streams of the primary table and the data streams of the association table are respectively consumed according to the sequence of event occurrence, and then service logic processing is performed on the data streams of the primary table and the data streams of the association table. Ordering among the packets by transaction ids ensures the order among the packets.

In some embodiments, performing business logic processing on the post-consumer data stream includes: and synchronizing the data stream of the consumed main table and the data stream of the consumed association table to the target library after translation. The target library may be run on any server, and may be output to different positions according to the service scenario.

In some embodiments, the method further comprises step S170: if the corresponding service primary key cannot be acquired from the cache region, the data stream of the association table is written into the cache region to wait for reloading.

In some embodiments, the database is a relational database; the table name of each of the master table and the association table is the subject of the message queue;

the above-mentioned packet processing for each individual partition data flow according to the specified field to obtain a plurality of packets may specifically include: the method comprises the steps of carrying out grouping processing on independent partition data streams of a main table according to a service main key and a corresponding transaction identification field thereof, and carrying out grouping processing on independent partition data streams of an associated table according to a service external key and a corresponding transaction identification field thereof to obtain a plurality of groups;

The foregoing sorting process for the data stream in each packet, and consuming the data stream in each packet according to the order determined by the sorting process may specifically include: for the data flow in each packet, carrying out ascending sort processing according to the event time; and sequentially consuming the data streams in each packet according to the sequence determined after the ascending sort processing according to the event time.

Multiple atomic operations are possible under the same transaction, and inconsistencies may occur if consumption is not performed according to event time. For example: the modification of the variable a in a transaction, the value of a being modified to b, c, and d, the end result being b, c, d if consumed out of order, one of which thus results in data inconsistency. The embodiment of the invention respectively groups the business main key, the business external key and the corresponding transaction identification fields, and sorts the business main key, the business external key and the corresponding transaction identification fields according to the ascending sequence of event time, so that the data stream of the main table and the data stream of the association table are respectively consumed according to the sequence of event occurrence.

Fig. 2 is another specific flow diagram of an out-of-order processor for multiple associated real-time data streams in accordance with an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a multi-association real-time data stream disorder processing method, which includes the following steps:

S1, preparing a relational database to be acquired in real time. In the embodiment of the invention, the database comprises a main table and a plurality of associated tables, wherein the main table field comprises a service main key (a unique identification field of the main table, which is the same as the description field below) and a description field thereof, and the associated table comprises a service external key (a mapping field of the main table service main key, which is the same as the description field below) and a description field thereof.

S2, carrying out real-time synchronization, wherein the real-time synchronization comprises the following steps: and monitoring log changes of the main table and the associated table through a real-time monitoring acquisition module.

S3, when the logs of the main table and the associated table in the database are changed, writing the change records of the change table into corresponding topics of the message queue in parallel by multiple partitions (the table name of each table is the topic of the message queue), so as to record the corresponding changes of the database. Generally, in order to improve the parallelism of tasks, the real-time monitoring and collecting module writes the tasks into a message queue in a multi-partition mode, and the partitions can be flexibly configured in the message queue.

S4, the step S4 comprises the step S41 of starting the main table real-time computing process 1 to acquire the data stream of the main table, and the step S42 of starting the association table real-time computing process 2 to acquire the data stream of the association table. Step S41 and step S42 may be performed simultaneously or sequentially. Step S41 is performed, and then step S9 is performed, and step S42 is performed, and then step S5 is performed.

Specifically, the purpose of starting the two processes (process 1 and process 2) described above is to acquire data of two data streams. The main table real-time computing process 1 is started, and the association table real-time computing process 2 is started, so that the advantage is that each task can independently process business logic, and the parallelism is improved.

S8, loading (writing) the business primary key of the primary table into the cache region.

Specifically, in step S9, it may be first determined whether the service primary key exists in the buffer, if the determination result is no, the service primary key is written into the buffer, and if the determination result is yes, step S6 is executed.

S5, after the data stream of the association table is obtained, the service external key is obtained from the data stream of the association table, and whether the corresponding service main key exists or not is obtained from the cache region.

In specific implementation, the service foreign key corresponds to a field of the association table data stream acquired above, namely the foreign key of the association table. In one example, the business primary key and business foreign key may be, but are not limited to, a primary key and foreign key of mysql, the primary key of the primary table being equal to the foreign key of the association table.

S6, if the service main key can be obtained in the step S5, the data flow of the main table is not delayed, the data flow of the associated table of the buffer area is loaded, the multi-partition data flows of the main table and the associated table are respectively combined into one partition data flow, the service main key (the main table is the service main key, the associated table is the service external key) and the event time are sequenced in ascending order and consumed sequentially, and then the service logic processing of the data flow of the main table and the associated table data flow is carried out.

In this step, the data stream of the main table and the data stream of the associated table are subjected to respective service logic processing.

The service logic processing of the data flow of the main table comprises the following steps: and calling a translation module to translate the data stream of the main table and then synchronize the data stream to the target library.

The service logic processing of the data flow of the association table comprises the following steps: and calling a translation module to translate the data stream of the association table and then synchronize the data stream to the target library.

In this step, the sorting according to the service foreign key and the ascending sequence of the event time may specifically include: the business foreign key field news_flash_id and event time in the data stream of the association table are ordered according to the ascending order of the time ts, and the examples are described in detail later.

In step S5, if the service primary key can be obtained, it indicates that the data stream of the primary table has no delay, the data stream of the association table in the buffer area is loaded, and the multi-partition data streams under the subject of the association table are combined into a partition data stream, so that duplication elimination is performed for the scene of duplication data in the compatible data stream, and the duplication elimination has the advantages of avoiding duplication translation, sorting and sequentially consuming according to the service external key and the ascending order of event time after duplication elimination, and executing service logic processing of the primary data stream and the association table data stream after respectively calling the translation module to translate the data streams of the primary table and the association table. The primary data streams are grouped by primary key id and transaction id, and the association table data streams are grouped by foreign key id and transaction id.

And S7, if the service primary key cannot be acquired in the step S5, the data flow delay of the primary table is indicated, and the data flow of the associated table is written into the buffer area to wait for reloading.

From the ending step S7, the process loops back to step S42, and the above processing is looped. Because the task is a real-time task, the task execution is not terminated once but continues to run and load, and the execution from step S42 continues after the end of step S7.

The following is illustrative:

1. real-time synchronization of real-time monitoring acquisition module

Synchronizing the changed main table data stream and the changed associated table data stream to the corresponding theme of the message queue by utilizing the real-time monitoring module:

the main table data stream writes the main table topic:

writing into the partition: 0:32911- - >32912;

writing into the partition: 1:32881- >32882;

writing into the partition: 2:32892- >32893;

writing into the partition: 3:32914- - >32915.

The association table data stream writes the association table topic:

writing into the partition: 0:2913- >2914;

writing into the partition: 1:2825- >2826;

writing into the partition: 2:2882- >2883;

writing into the partition: 3:2924- >2925.

Main table data flow:

(1) content of the quick message

{ "database": "gmall0105", "xid":176 "," data ": {" create_time ": 1456911033", "link": "," mid ": 966", tile ": A company completes three million A round of financing to create PaaS ecosphere", "content": 3 months 2 days, A company announces that A company completes the B company's collar, C company, D company and three million RMB A round of financing. The company n A belongs to Beijing certain technology limited company, is established in 2014 and 9 months, the creator and CEO are an original E company architect, the company A maintains a Mesos cluster of a near thousand server scale for a long time, and the cluster is the largest distributed operating system cluster outside BAT. Compared with the traditional PaaS, the DCOS system of the company n A has four advantages of enterprise-level hybrid container cluster management, tens of thousands of support node sizes, big data and machine learning support capability and hybrid cloud deployment capability.

"cover": "," update_time ":1610420400," grade ":" C "," last_mid ":0," id ":9," source_id ": 9", "content_html": 3 months and 2 days, company A announces that the financing by company B is completed, company C and company D follow the three tens of millions of RMB A rounds. The company n A belongs to Beijing certain technology limited company, is established in 2014 and 9 months, the creator and CEO are an original E company architect, the company A maintains a Mesos cluster of a near thousand server scale for a long time, and the cluster is the largest distributed operating system cluster outside BAT. Compared with the traditional PaaS, the DCOS system of the company n A has four advantages of enterprise-level hybrid container cluster management, tens of thousands of support node sizes, big data and machine learning support capability and hybrid cloud deployment capability. "pubdate":1456909500, "status":1}, "old": { "content": "3 months and 2 days, company A announces that the financing by company B is completed, company C and company D follow the three tens of millions of RMB A rounds. The company n A belongs to Beijing certain technology limited company, is established in 2014 and 9 months, the creator and CEO are an original E company architect, the company A maintains a Mesos cluster of a near thousand server scale for a long time, and the cluster is the largest distributed operating system cluster outside BAT. Compared with the traditional PaaS, the DCOS system of the company n A has four advantages of enterprise-level hybrid container cluster management, tens of thousands of support node sizes, big data and machine learning support capability and hybrid cloud deployment capability.

"},"commit":true,"type":"update","table":"crm_news_flash","ts":1621938250}。

And (3) injection: wherein id is the business primary key.

Association table data flow:

(1) Content of business associated with quick communication

{ "database": "gmall0105", "xid":656, "data": { "create_time":1574425429, "name": F company "," mid ":690," events_id ":4," id ":9," news_flash_id ":9}," old ": {" create_time ":1574425428}," com ": true": type ": update", "table": "crm_news_flash_events", "ts":1621938552}

Note that: wherein the news_flash_id is a business foreign key.

(2) Content of quick communication related industry

{"database":"gmall0105","xid":591,"data":{"update_time":"2020-09-02

12:34:56"," create_time ":" 1970-01-01:08:33:41 "," tag_name ":" healthcare industry "," tag_rank ": 0", "is_del":0 "," tag_id ":10 c93a3544f43153e30e4c3b53714ac2", "is_related":0 "," id ": 9", "news_flash_id":9 "," order ": 0", "old": { "update_time": 2020-09-02:12:34:55 "},

"commit":true,"type":"update","table":"crm_news_flash_industry_tag","ts":1621938514}

note that: wherein the news_flash_id is a business foreign key.

(3) Quick association tag content

{ "database": "gmall0105", "xid":61299 "," data ": {" create_time ": 157620677", "name": medical "," tag_id ": 2015", "mid":690 "," id ": 1066", "news_flash_id":9}, "old": { "create_time":1576206735}, "create": true ": type": update "," table ": crm_news_flash_tag", "ts":1621993388}

Note that: wherein the news_flash_id is a business foreign key.

2. Starting real-time process subscription of the main table, consuming the subject data of the main table, calling the processing module to merge and partition, sorting the groups, then calling the translation module to translate and write the translated data into the target library:

As shown in fig. 2, after step S1 and step S2 are completed, the changed data in the main table are written into the respective topics, and the subscription refers to the acquisition of the data in the topics, that is, the acquisition of the changed data.

And the translation module is used for calling the translation interface, realizing the mutual translation of the two languages and writing the translated text into the target library. A text may be ultimately written to the target library by real-time translation.

Theme setting: the "table name" of each table is the subject of the message queue.

Pre-translation content field: and 3 months and 2 days, the company A announces that the B company is required to conduct the financing with the C company and the D company, which are required to conduct the three tens of millions of RMB A.

The A company belongs to Beijing certain technology limited company, is established in 2014 and 9 months, and is a source E company architect, and the A company maintains a Mesos cluster with a server scale of nearly thousands for a long time, and is the largest distributed operating system cluster outside BAT.

Compared with the traditional PaaS, the DCOS system of the A company has four advantages of enterprise-level hybrid container cluster management, tens of thousands of support node sizes, big data support, machine learning capacity and hybrid cloud deployment capacity.

Post-translation content field: on March 2,Company A announced the completion of a round of financing of RMB 30million,led by Company B venture capital,and followed by Company C and Company D.

Company A is affiliated to a beijing technology Co.,Ltd.,which was founded in September2014.Its founder and CEO is a former architect of Company E.Company A has maintained a mesos cluster of nearly 1000servers for a long time,and is the largest distributed operating system cluster outside bat.

Compared with the traditional PAAS,the DCOS system of Company A has four advantages:enterprise level hybrid container cluster management,supporting more than 10000nodes,supporting big data and machine learning ability,and hybrid cloud deployment ability.

Sql is as follows:

insert into crm_news_flash

(id,pubdate,title,content,content_html,source,link,cover,grade,mid,last_mid,status,create_time,update_time,data_source,source_id)

values(63,1456909500,'Company A completed 30million round a financing to build PAAS ecosystem','On March 2,Company A announced the completion of a round of financing of RMB 30million,led by Company B venture capital,and followed by Company C and Company D.

Company Ais affiliated to a beijing technology Co.,Ltd.,which was founded in September2014.Its founder and CEO is a former architect of Company E.Company Ahas maintained a mesos cluster of nearly 1000servers for a long time,and is the largest distributed operating system cluster outside bat.

compared with the traditional PAAS, the DCOS system of Company A has four advantages: enterprise level hybrid container cluster management, supporting more than 10000nodes,supporting big data and machine learning ability,and hybrid cloud deployment ability ','3 months, 2 days, A announced that the investment by B company, C company, D company was completed with three tens of millions of RMB A rounds of financing.

Compared with the traditional PaaS, the DCOS system of the A company has four advantages of enterprise-level hybrid container cluster management, tens of thousands of support node sizes, big data support, machine learning capacity and hybrid cloud deployment capacity. 'null', 'C',966,0,1,1456911033, UNIX_TIMESTAMP (), 'crm news flash', '9'

on duplicate key update

pubdate＝1456909500,source＝'null',link＝”,cover＝”,grade＝'C',mid＝966,last_mid＝0,status＝1,create_time＝1456911033,update_time＝UNIX_TIMESTAMP()；

And mapping and writing the service primary key of the primary table and the service primary key of the target table into a cache:

*****insert news_flash new_id:63*****source_id:9*****

source_id: a service primary key of a source table (primary table);

new_id: business main key of target list;

and finally, writing the translated data into a target table of a target library, thereby completing the real-time translation process of the main table data stream.

3. And starting real-time process subscription of the association table, consuming the theme data of the association table, calling the processing module and the translation module to be combined into a single partition, sorting in groups, translating and writing into a target library.

As shown in fig. 1, after steps S1 and S2 are completed, the changed data in the association table is written into the respective association table theme, and the subscription refers to obtaining the data in the theme where the association table is located, that is, obtaining the changed data.

And the translation module is used for calling the translation interface to realize the mutual translation of the two languages, and writing the translated text into the target library after processing the business logic.

For example, the real-time monitoring module monitors three changes of the content nf_id=9 of the quick-communication related enterprise as follows:

{ "database": "gmall0105", "xid":88190 "," data ": {" create_time ": 1574425430", "name": F company "," mid ": 690", "events_id": 4 "," id ": 10", "nf_id":9}, "old": { "create_time":1574425429}, "xoffset":0 "," type ": update", "table": crm_news_flash_events "," ts ":1622012232}

{ "database": "gmall0105", "xid":88190 "," data ": {" create_time ": 1574425431": "name": F company "," mid ": 690", "events_id": 4 ": id":10 ": nf_id":9}, "old": { "create_time":1574425430}, "xoffset":1 ": type": update "," table ": crm_news_flash_events", "ts":1622012236}

{ "database": "gmall0105", "xid":88190 "," data ": {" create_time ": 1574425435", "name": F company "," mid ": 690", "events_id": 4 "," id ": 10", "nf_id":9}, "old": { "create_time":1574425432}, "com": true "," type ":" update "," table ":" crm_news_flash_events "," ts ":1622012242}

Wherein xid: transaction ID, ts: event time, news_flash_id: the service external key and other fields are not described in detail.

Transactions (transactions) generally refer to what is to be done or what is done. In computer terminology, a program execution unit (unit) refers to a program that accesses and possibly updates various data items in a database. In computer terminology, a transaction is generally referred to as a database transaction. To ensure atomicity, multiple queries may be placed in a transaction, either all executing successfully or all failing, with each atomicity operation having the same transaction ID.

The xid is 88190, which indicates that the above operations are in a transaction, and in this embodiment, the grouping is performed according to two fields of the transaction identifier xid and nf_id (i.e. news_flash_id) in ascending order according to the event time ts, so that sequential consumption can be guaranteed under disorder (a developer can flexibly adjust grouping and ordering rules according to own requirements). The inventors found that the reason for the disorder is that the multi-partition writing is performed, so that the sequence is not controllable, and therefore, in the process flow of this embodiment, the multi-partition data is combined into a single partition, and the single partition is subjected to grouping and sorting according to certain preset or designated fields, and then translated and sequentially written into the target table.

The grouping order is to perform the operation of reducing bykey according to the designated field of each record, for example, nf_id and xid are used as keys to perform grouping, the data of the same key are divided into a group by reducing, the data are first locally aggregated and added according to the value, then globally aggregated and added, then the ascending order according to the ts (event time) field is performed to ensure that the data of each partition can be processed according to the sequence of the event time, that is, the sequence inside the grouping can be ensured, and the ascending order of the data flow with ordered sequence among the groups can be ensured according to the transaction id. Wherein the transaction id is monotonically increasing.

Two-stage polymerization (local polymerization and global polymerization): the idea of this scheme is to perform two-stage polymerization. For the first time, local polymerization is performed, a random number, such as a random number within 10, is added to each key, and the original key becomes different, such as (hello, 1) (hello, 1) (hello, 1) (hello, 1) and becomes (1_hello, 1) (2_hello, 1).

Then, the data with random numbers is subjected to aggregation operation such as reduce bykey and local aggregation is carried out, and then the local aggregation result becomes (1_hello, 2) (2_hello, 2).

The prefix of each key is then removed, and becomes (hello, 2) (hello, 2), and the global aggregation operation is performed again, so that the final result, such as (hello, 4), can be obtained. The scheme realizes the principle: the original same key is changed into a plurality of different keys by adding random prefixes, so that the data which is originally processed by one task can be dispersed to a plurality of tasks to be locally aggregated, and the problem of excessive data volume processed by a single task is solved. Then removing the random prefix, and carrying out global aggregation again to obtain the final result. The scheme has the advantages that: the effect is very good for data tilting caused by the shuffle operation of the aggregation class. Data skew can be usually solved, or at least data skew is greatly relieved, and performance is improved by more than a plurality of times.

(1) If the main table delay is found to be not written into the business main key cache, the associated data is written into the data cache to wait for the next batch to be reloaded:

****crm_news_flash_relation new_id:null****source_id:9****update_status:null****

(2) If the main table is found to be written into the business main key cache, translating and splicing sql, and writing into a target library:

****crm_news_flash_relation new_id:63****source_id:9****update_status:1****

sql:

insert into crm_news_flash_enterprises

(nf_id,enterprises_id,name,mid,create_time)

values(63,4,'Huawei',690,1574425427)

on duplicate key update name＝'Huawei',mid＝690。

the method for processing the multi-association data stream disorder problem can ensure that real-time translation synchronization of (any two languages) quick messages is realized under second-level delay and data accuracy.

When multi-partition data under the topic of a multi-association table of a message queue is consumed, firstly, multi-partition data streams are combined into a data stream of one partition, then grouping is carried out according to a service main key, the data stream is returned after being ordered according to an event time identification field, and if the data stream of the main table has data delay, the service external key and the data of the association table are stored in a buffer area to wait for reloading, and the rest of data is processed by service logic of a core.

As mentioned above, the transaction is an atomic operation, and the grouping purpose is to process a plurality of operations corresponding to the same service primary key in sequence, so as to ensure the processing accuracy. The processing of the packet may include: the same service key and transaction identification fields are placed in an iterator and then returned to the data stream after ordering according to event time.

Compared with the prior art, the embodiment of the invention not only ensures the delay and repetition of the data flow of the multiple association tables in a high concurrency real-time scene, but also effectively solves the problem of disorder of the data flow of the multiple association tables. From the experimental results, by adopting the method of the embodiment of the invention, even if the input end has frequent repeated operation, the application end can be ensured to have accurate data, and the problem of disorder in various business scenes is basically satisfied. The technical scheme of the embodiment of the invention is simple to realize and meets the application requirement.

Example two

Fig. 3 is a functional block diagram of an out-of-order processing apparatus for multiple associated real-time data streams according to an embodiment of the present invention. As shown in fig. 3, the apparatus 200 includes:

the real-time monitoring and collecting module 210 is used for monitoring log changes of the main table and the associated table in the database in real time;

the real-time synchronization module 220 is configured to obtain a main table change record and/or an associated table change record when a log of a main table and/or an associated table in the database changes, write the main table change record in parallel in a main table topic of the message queue in a multi-partition manner, and write the associated table change record in parallel in an associated table topic of the message queue in a multi-partition manner;

the process starting module 230 is configured to start the main table real-time process to obtain a multi-partition data stream of the main table from the main table theme, and start the association table real-time process to obtain a multi-partition data stream of the association table from the association table theme;

the loading module 240 is configured to load the service primary key of the primary table into a cache region of the database;

a judging module 250, configured to acquire a service foreign key after acquiring the data stream of the association table, and judge whether a corresponding service primary key can be acquired from the cache region;

and the processing module 260 is configured to, if the service primary key can be obtained, combine the multi-partition data stream of the primary table and the multi-partition data stream of the association table into one single-partition data stream, perform packet processing on each single-partition data stream according to the designated field to obtain a plurality of packets, perform ordering processing on the data stream in each packet, consume the data stream in each packet according to the order determined by the ordering processing, and perform service logic processing on the consumed data stream.

In some embodiments, the processing module 260 is specifically configured to post-translate the data stream of the post-consumer master table and the data stream of the post-consumer association table to the target library.

In some embodiments, the processing module 260 may be further configured to write the data stream of the association table to the cache to await reloading if the corresponding primary key is not available from the cache.

In some embodiments, the database is a relational database; the table name of each of the master table and the association table is the subject of the message queue; the processing module 260 may specifically be configured to: the method comprises the steps of carrying out grouping processing on independent partition data streams of a main table according to a service main key and a corresponding transaction identification field thereof, and carrying out grouping processing on independent partition data streams of an associated table according to a service external key and a corresponding transaction identification field thereof to obtain a plurality of groups;

for the data flow in each packet, carrying out ascending sort processing according to the event time; and sequentially consuming the data streams in each packet according to the sequence determined after the ascending sort processing according to the event time.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Example III

The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program is realized when being executed by a processor:

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. Of course, there are other ways of readable storage medium, such as quantum memory, graphene memory, etc. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

Example IV

The embodiment of the present invention further provides a computer device 200, as shown in fig. 4, which includes one or more processors 301, a communication interface 302, a memory 303, and a communication bus 304, where the processors 301, the communication interface 302, and the memory 303 perform communication with each other through the communication bus 304.

A memory 303 for storing a computer program;

processor 301, when executing a program stored in memory 303, implements:

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface is used for communication between the electronic device and other devices.

Bus 304 includes hardware, software, or both for coupling the above components to one another. For example, the bus may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. The bus may include one or more buses, where appropriate. Although embodiments of the invention have been described and illustrated with respect to a particular bus, the invention contemplates any suitable bus or interconnect.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

Memory 303 may include mass storage for data or instructions. By way of example, and not limitation, memory 303 may include a Hard Disk Drive (HDD), floppy Disk Drive, flash memory, optical Disk, magneto-optical Disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the above. The memory 303 may include removable or non-removable (or fixed) media, where appropriate. In a particular embodiment, the memory 303 is a non-volatile solid state memory. In particular embodiments, memory 303 includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a car-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Although the application provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented in an actual device or end product, the instructions may be executed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment) as illustrated by the embodiments or by the figures.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, and readable storage medium embodiments, since they are substantially similar to method embodiments, the description is relatively simple, and references to parts of the description of method embodiments are only required.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention. The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A method for out-of-order processing of multiple associated real-time data streams, comprising:

2. The method of claim 1, wherein performing business logic processing on the post-consumer data stream comprises: and synchronizing the data stream of the consumed main table and the data stream of the consumed association table to the target library after translation.

3. The method according to claim 1 or 2, further comprising: and if the corresponding service primary key cannot be acquired from the cache region, writing the data stream of the association table into the cache region to wait for reloading.

4. The method according to claim 1 or 2, wherein the database is a relational database; the table names of each table in the main table and the associated table are the subjects of the message queue; the method comprises the steps of carrying out packet processing on each individual partition data stream according to a designated field to obtain a plurality of packets, and specifically comprises the following steps:

5. An out-of-order processing apparatus for multiple associated real-time data streams, comprising:

6. The apparatus of claim 5, wherein the processing module is configured to post-translate the data stream of the post-consumer master table and the data stream of the post-consumer association table to the target library.

7. The apparatus according to claim 5 or 6, wherein the processing module is further configured to write the data stream of the association table into the buffer to wait for reloading if the corresponding primary key is not available from the buffer.

8. The apparatus of claim 5 or 6, wherein the database is a relational database; the table names of each table in the main table and the associated table are the subjects of the message queue; the processing module is specifically configured to perform packet processing on the individual partition data stream of the main table according to the service main key and the corresponding transaction identification field thereof, and perform packet processing on the individual partition data stream of the association table according to the service external key and the corresponding transaction identification field thereof, so as to obtain a plurality of packets; for the data flow in each packet, carrying out ascending sort processing according to the event time; and sequentially consuming the data streams in each packet according to the sequence determined after the ascending sort processing according to the event time.

9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a method of out-of-order processing of a multi-associated real-time data stream as claimed in any of claims 1-4.

10. A computer device, comprising:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the out-of-order processing method of multi-associated real-time data streams as recited in any of claims 1-4.