WO2020029405A1 - 一种数据发射方法及装置 - Google Patents

一种数据发射方法及装置 Download PDF

Info

Publication number
WO2020029405A1
WO2020029405A1 PCT/CN2018/108238 CN2018108238W WO2020029405A1 WO 2020029405 A1 WO2020029405 A1 WO 2020029405A1 CN 2018108238 W CN2018108238 W CN 2018108238W WO 2020029405 A1 WO2020029405 A1 WO 2020029405A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
target
heap
read
grouping
Prior art date
Application number
PCT/CN2018/108238
Other languages
English (en)
French (fr)
Inventor
林斌树
刘华明
Original Assignee
网宿科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网宿科技股份有限公司 filed Critical 网宿科技股份有限公司
Priority to US16/755,852 priority Critical patent/US20200293543A1/en
Priority to EP18929307.9A priority patent/EP3835975A4/en
Publication of WO2020029405A1 publication Critical patent/WO2020029405A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/543User-generated data transfer, e.g. clipboards, dynamic data exchange [DDE], object linking and embedding [OLE]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/278Data partitioning, e.g. horizontal or vertical partitioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues

Definitions

  • the present invention relates to the field of Internet technology, and in particular, to a data transmission method and device.
  • Kafka is a distributed messaging system that can be used to persist logs to disk.
  • Kafka saves data, it can usually be classified according to the topic of the data.
  • each data topic can contain multiple grouped data (partitions). These grouped data can partition the data in a type of data topic. .
  • Storm can integrate Kafka's system architecture.
  • Business data or business logs can be written to Kafka in real time, and Storm can read data from Kafka for calculation.
  • business scenarios are complex and changeable.
  • An analysis of a business may require reading multiple types of business logs. Therefore, a Storm system usually needs to read multiple data topics in Kafka.
  • the business usually has a peak period, and the amount of logs generated during the peak period is relatively large, and sometimes it is even several times as large as usual.
  • the memory of the processing program may overflow due to the skyrocketing data or the amount of data read in each batch, or the batch The speed of secondary data processing becomes slower, resulting in lower data processing efficiency.
  • the purpose of this application is to provide a data transmission method and device, which can improve the efficiency of data processing.
  • one aspect of the present application is to provide a data transmission method, the method comprising: acquiring a data transmission instruction, the data transmission instruction pointing to a plurality of packet data associated with at least one transmission data; and from the plurality of packets The data determines the target packet data that has been stored in the local data pool, wherein each of the target packet data includes at least one data heap; and at least one of the target packet data is read in the current batch, where if Taking at least two sets of target grouping data, the sum of the number of heaps of each data heap in the at least two sets of target grouping data being less than or equal to a specified heap number threshold; packaging the target grouping data read in the current batch as a data tuple, And sending the data tuple to an initiator of the data transmission instruction.
  • another aspect of the present application further provides a data transmitting device, the device includes: an instruction acquiring unit, configured to acquire a data transmitting instruction, the data transmitting instruction pointing to a plurality of packets associated with at least one transmitted data Data; a target packet data determining unit, configured to determine, from the plurality of packet data, target packet data that has been stored in a local data pool, wherein each of the target packet data includes at least one data heap; data read A unit configured to read at least one of the target grouping data in the current batch, wherein if at least two sets of target grouping data are read, the sum of the number of heaps of each data heap in the at least two sets of target grouping data is less than or equal to A threshold for the number of piles is specified; a data transmitting unit is configured to encapsulate the target packet data read in the current batch into a data tuple, and send the data tuple to an initiator of the data transmission instruction.
  • another aspect of the present application further provides a data transmitting device, where the device includes a memory and a processor, where the memory is used to store a computer program, and the computer program is executed by the processor to implement the foregoing.
  • the technical solution provided in this application can specify the packet data associated with each transmission data in advance, and after receiving the data transmission instruction, multiple packet data to be currently transmitted can be determined. Then, you can first query the stored target packet data in the local data pool. The target packet data that has been stored in the local data pool can be read and transmitted in batches. In order to avoid the problem of memory overflow or data processing jam caused by too much data to be read, when reading grouped data in a batch, the upper limit of the number of heaps of the data heap in the grouped data can be limited. The upper limit value can be used as the specified heap number threshold.
  • the current batch when the current batch reads at least two sets of target packet data, it can ensure that the sum of the number of heaps of each data heap in the multiple target packet data read is less than or equal to the specified heap number threshold, thereby ensuring the amount of data read Not too much.
  • the target grouped data may be read and sent as a single batch. In this way, on the one hand, you can avoid reading too much data, and on the other hand, you can ensure that the data heaps in the same data packet can be read and transmitted in the same batch, avoiding data errors caused by data segmentation. The problem.
  • the technical solution provided by this application can avoid the problem of memory overflow or data processing blockage caused by too much data read, thereby improving Data processing efficiency.
  • FIG. 1 is a schematic diagram of a system architecture in an embodiment of the present invention
  • FIG. 2 is a step diagram of a data transmission method in an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of traversing grouped data in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of functional modules of a data transmitting device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a data transmitting device in an embodiment of the present invention.
  • the system architecture may include a preset packet database, a data transmitting device, and multiple data processing nodes.
  • the data transmitting device can read the packet data from the preset packet database and store the packet data in the local data pool. Then, the data transmitting device can retrieve the corresponding data from the local data pool according to the received data transmitting instruction. It is read out from the server and processed by the data processing node and then transmitted to the initiator of the data transmission instruction.
  • the preset grouping database can be a kafka database
  • the data transmitting device can be kafkaspout
  • each data processing node can be a bolt.
  • the data transmission method provided in this application may include the following steps.
  • S1 Acquire a data transmission instruction, where the data transmission instruction points to a plurality of packet data associated with at least one transmission data.
  • each transmitting data may be responsible for reading and transmitting the data.
  • the corresponding packet data Partition
  • each transmission data can perform the process of reading and transmitting data according to the assigned packet data.
  • the grouped data is divided according to a data topic.
  • Kafkaspout When Kafkaspout is initialized, it can read the grouping information of each data topic.
  • the data topic may include multiple grouped data.
  • the grouped information It may include an identifier of the data subject and an identifier of each grouped data in the data subject.
  • the identification of the 10 grouped data in the first data topic can be Is a value from 0 to 9
  • the identification of the 10 packet data in the second data subject may be a value from 10 to 19.
  • the grouping information of the first data subject can be a combination of values such as 0: 0, 0: 1, and 0: 5, where the first value represents the identifier of the data subject and the second value represents the identifier of the grouped data.
  • the grouped data in each of the data subjects may be sorted according to the identifier of the data subject and the identifier of the grouped data. In the sorting result, priority is given to the identifier of the data subject. In the same data subject, the identifier can be arranged according to the size of the grouped data identifier.
  • Each current transmission data may also have its own index identifier, which may be an incremental value starting from 0. For example, the index identifier of the first transmitted data is 0, and the index identifier of the second transmitted data is 1. Then, according to the index identifier, designated packet data may be selected from the sorted packet data, and the designated packet data may be allocated to the current transmission data.
  • each transmission data should be assigned 2 packet data, then for the first transmission data with the index identification 0, the packet data with the identification 0 and 10 can be assigned to the first
  • the packet data with the identifiers 1 and 11 may be allocated to the second transmit data with the index identifier 1, and so on, so as to complete the packet data allocation process. In this way, different transmission data can be associated with different packet data.
  • kafkaspout when transmitting data, receives a data transmission instruction, and the data transmission instruction may carry an identifier of each packet data to be transmitted and an identifier of a data subject to which each packet data belongs. In this way, kafkaspout can analyze each packet data pointed to by analyzing the data transmission instruction.
  • S3 Determine target packet data that has been stored in a local data pool from the plurality of packet data, wherein each of the target packet data includes at least one data heap.
  • a local data pool may be provided in kafkaspout, and the local data pool may store data read from a kafka database.
  • the local data pool may store data read from a kafka database.
  • For a plurality of packet data to be transmitted first, it is possible to query in the local data pool whether all or part of the packet data is available, so that the target packet data that has been stored in the local data pool can be determined from the plurality of packet data.
  • the target grouped data data can be divided in a data batch manner, and the same grouped data cannot be divided into different data heaps. Therefore, in each of the target packet data, at least one data heap may be included.
  • S5 Read at least one of the target packet data in the current batch, wherein if at least two sets of target packet data are read, the sum of the number of heaps of each data heap in the at least two sets of target packet data is less than or equal to the designated heap Number of thresholds.
  • the target packet data can be read from the local data pool in batches.
  • the number of data heaps that can be read by one batch can be limited in advance.
  • a specified heap number threshold may be set in advance, and the specified heap number threshold may be used as an upper limit of the number of heaps of the data heap read in the same batch. Refer to Figure 3, taking the current batch as an example, you can try to read each target packet data in turn.
  • kafkaspout can identify the total number of heaps of the data heap contained in the first target grouping data, and then update the preset heap number statistics based on the identified total number of heaps Cumulative value of the parameter.
  • the preset number of heap statistics parameter can be used to count the total number of data heaps that have been read in the current batch.
  • the initial value of this parameter is 0, and the parameter is only for the current batch. At the end of this time, the parameter will also be reset to 0.
  • the corresponding sum of the number of heaps can be directly written into this parameter. Then, the magnitude relationship between the updated cumulative value and the specified heap number threshold can be judged. If the updated cumulative value is less than the specified heap number threshold, it means that the number of data heaps read by the current batch has not reached the upper limit, so , The current batch can continue to try to read more data heaps.
  • each data heap in the first target packet data can be read, and other target packet data in the local data pool can be traversed.
  • the updated cumulative value is greater than or equal to the specified heap number threshold, it means that the current batch can no longer read more data heaps.
  • the first target grouping data may be used as the current The only target group data read by the batch, and the data reading process of the current batch is ended.
  • the next target packet data can be traversed, and the next target packet data can be used as the current target packet data.
  • the sum of the number of heaps of the data heap and the cumulative value of the preset heap number statistical parameter can be calculated.
  • the first goal is recorded in the preset heap number statistical parameter.
  • the sum of the two values is calculated, and the sum of the number of heaps of the data heap included in the current target grouping data is not updated to the preset heap number statistical parameter.
  • the calculated result is greater than the specified heap number threshold, it means that if all the data heaps in the current target packet data are read, the amount of data will be read too much.
  • the data heap can be read in the same batch. At this time, you can skip the current target packet data and continue to traverse the next target packet data.
  • the target grouping data will continue to be skipped until the calculation The result is less than or equal to the specified heap number threshold.
  • each data heap in the current target packet data may continue to be read, and updated according to the total number of heaps of the data heap included in the current target packet data. An accumulated value of the preset heap number statistical parameter. Then, you can continue to traverse the next target packet data according to the above process until the traversal is complete.
  • S7 Encapsulate the target packet data read in the current batch into a data tuple, and send the data tuple to the initiator of the data transmission instruction.
  • each data pile read in the current batch may be encapsulated into a data tuple, and the data transmission instruction is initiated.
  • Party sends the data tuple.
  • the data reading and transmitting process of the next batch can be performed according to the same scheme.
  • grouping information corresponding to each target packet data sent in the current batch and related to the grouping information may be recorded in the memory. Offset of the associated data.
  • the grouping information corresponding to the target grouping data may include an identifier of a data subject to which the target grouping data belongs and an identifier of the target grouping data.
  • the above-mentioned data offset can point to the end position of the data that has been read. In this way, when the next batch reads the packet data in the local resource pool, it can be directly characterized by the data offset. Start reading data at the location so that data is not missed and is not read repeatedly.
  • the storm system may include a central management system ZooKeeper.
  • the above-mentioned grouping information corresponding to each target grouping data and the data offset associated with the grouping information can be written into ZooKeeper and the data in ZooKeeper. It can be accessed by the data reading process of the next batch, so as to know the corresponding data offset. It should be noted that in actual applications, the corresponding information will not be written into ZooKeeper after each target packet data is sent, but only after it is determined that the transmission data of the current batch has been processed successfully.
  • the packet information corresponding to each target packet data and the data offset associated with the packet information can be written into ZooKeeper.
  • part of the packet data to be transmitted may not be stored in the local data pool of Kafkaspout beforehand.
  • Kafkaspout needs to read the corresponding packet data from the preset packet database-kafka database to the local first. Data pool, and then send it from the local data pool.
  • kafkaspout may first obtain a data offset and a data size of the unstored packet data.
  • the data offset may indicate a starting position of the data
  • the data size may indicate an amount of data to be read from the starting position.
  • the data offset of the grouped data will be maintained in the preset group database.
  • the preset configuration parameters Read the data offset corresponding to the unstored packet data from the preset packet database.
  • the preset configuration parameter may represent a path of recording data offsets in the Kafka database.
  • kafkaspout is not reading data from the preset grouping database for the first time, it means that part of the grouping data has been read from the kafka database to the local data pool before, and the target group in the local data pool is grouped.
  • the data is provided to the originator of the data acquisition request.
  • the memory ZooKeeper
  • the data offset after the last data processing has been recorded. Therefore, in this case, kafkaspout can directly read the data offset corresponding to the unstored packet data from the specified path in the memory. After reading the data offset and obtaining the data size, kafkaspout can read the data of the data size starting from the position characterized by the data offset in the preset grouping database.
  • the read data can be divided into multiple data heaps according to a certain data granularity.
  • the divided data heap can be stored in the local resource pool of kafkaspout according to the grouped data.
  • kafkaspout can also record the data offset of the divided data heap according to the belonging grouping data, so that the subsequent can be Based on the recorded data offset, each data heap in the packet data is read from the local resource pool.
  • the preset grouping in order to ensure that there is no redundancy in the grouped data in the local data pool, before all the data read from the preset grouping database is sent out, the preset grouping is no longer required. Read data from the database.
  • the present application further provides a data transmitting device, where the device includes:
  • An instruction obtaining unit configured to obtain a data transmission instruction, the data transmission instruction pointing to a plurality of packet data associated with at least one transmission data
  • a target packet data determining unit configured to determine, from the plurality of packet data, target packet data that has been stored in a local data pool, wherein each of the target packet data includes at least one data heap;
  • a data reading unit configured to read at least one of the target packet data in the current batch, wherein if at least two sets of target packet data are read, the sum of the number of heaps of each data heap in the at least two sets of target packet data Less than or equal to the specified heap number threshold;
  • a data transmitting unit is configured to encapsulate the target packet data read in the current batch into a data tuple, and send the data tuple to an initiator of the data transmission instruction.
  • the target packet data determination unit includes:
  • the first reading module is configured to update a cumulative value of a preset heap number statistical parameter according to a first target grouping data of a current batch according to a total number of heaps of a data heap included in the first target grouping data;
  • a judgment reading module configured to read each data heap in the first target packet data if the updated cumulative value is less than the specified heap number threshold, and traverse other target packet data in the local data pool ;
  • the traversal module calculates the sum of the total number of data heaps and the cumulative value of the preset heap statistics parameter for the current target grouping data in the traversal process. If the calculated result is greater than The specified heap number threshold continues to traverse the next target packet data; if the calculated result is less than or equal to the specified heap number threshold, each data heap in the current target group data is read, and according to the current target The total number of heaps of the data heap included in the grouped data, updating the cumulative value of the preset heap number statistical parameter, and continuing to traverse the next target grouping data;
  • the data transmitting unit is configured to, after the target packet data in the local data pool is traversed, encapsulate each target packet data read in the current batch into a data tuple, and transmit an instruction to the data.
  • the initiator of the sends the data tuple.
  • the target packet data determination unit further includes:
  • a separate reading module configured to: if the updated cumulative value is greater than or equal to the specified heap number threshold, use the first target grouping data as the target grouping data that is only read by the current batch, and end the current batch Data read process.
  • the apparatus further includes:
  • An unstored data information obtaining unit configured to obtain a data offset and a data size of the unstored grouped data for the unstored grouped data in the local resource pool;
  • a packet data reading unit configured to read data of the data size from a position characterized by the data offset in a preset packet database, and divide the read data into multiple data heaps , Storing the divided data heap in the local resource pool according to the grouping data to which it belongs.
  • the present application further provides a data transmitting device.
  • the device includes a memory and a processor.
  • the memory is used to store a computer program.
  • the computer program is executed by the processor, the foregoing data can be implemented. Launch method.
  • the technical solution provided in this application can specify the packet data associated with each transmission data in advance, and after receiving the data transmission instruction, multiple packet data to be currently transmitted can be determined. Then, you can first query the stored target packet data in the local data pool. The target packet data that has been stored in the local data pool can be read and transmitted in batches. In order to avoid the problem of memory overflow or data processing jam caused by too much data to be read, when reading grouped data in a batch, the upper limit of the number of heaps of the data heap in the grouped data can be limited. The upper limit value can be used as the specified heap number threshold.
  • the current batch when the current batch reads at least two sets of target packet data, it can ensure that the sum of the number of heaps of each data heap in the multiple target packet data read is less than or equal to the specified heap number threshold, thereby ensuring the amount of data read Not too much.
  • the target grouped data may be read and sent as a single batch. In this way, on the one hand, you can avoid reading too much data, and on the other hand, you can ensure that the data heaps in the same data packet can be read and transmitted in the same batch, avoiding data errors caused by data segmentation.
  • the problem As can be seen from the above, by limiting the amount of data read in each batch, the technical solution provided by this application can avoid the problem of memory overflow or data processing blockage caused by too much data read, thereby improving The stability of the entire system in the process of processing data.
  • each embodiment can be implemented by means of software plus a necessary universal hardware platform, and of course, can also be implemented by hardware.
  • the above-mentioned technical solution essentially or part that contributes to the existing technology can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic A disc, an optical disc, and the like include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种数据发射方法及装置,其中,所述方法包括:获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。本申请提供的技术方案,能够提高数据处理的效率。

Description

一种数据发射方法及装置 技术领域
本发明涉及互联网技术领域,特别涉及一种数据发射方法及装置。
背景技术
随着互联网中数据量的不断增长,目前通常基于分布式系统来处理庞大的网络数据量。其中,Storm作为一个分布式实时大数据处理系统,其具有巨大的处理数据的能力,且具备较高的可扩展性,在负载增加的情况下可以通过线性增加资源来保持性能。
另外,Kafka是一个分布式的消息系统,它可用于将日志持久化到磁盘中。Kafka在保存数据时,通常可以根据数据的主题(topic)进行归类,同时每个数据主题中可以包含多个分组数据(partition),这些分组数据可以对一类数据主题中的数据进行分区存储。
目前,在构建一个实时大数据处理平台时候,可以采用Storm集成Kafka的系统架构。业务数据或者业务日志可以实时地写入Kafka中,Storm则可以从Kafka中读取数据进行计算。在实际应用中,业务场景是复杂多变的,一个业务的分析可能需要读取多种类型的业务日志,因此一个Storm系统通常需要读取多个Kafka中的数据主题。
业务通常都有高峰期,高峰期产生的日志量相对平时都会比较大,有时候甚至会是平时的好几倍。在这种情况下,Storm系统在读取多个数据主题中的分组数据时,可能会因数据的暴涨或者每个批次读取的数据量太大,而导致处理程序的内存溢出或者该批次数据处理的速度变慢,从而导致数据处理的效率变低。
发明内容
本申请的目的在于提供一种数据发射方法及装置,能够提高数据处理的效率。
为实现上述目的,本申请一方面提供一种数据发射方法,所述方法包括:获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
为实现上述目的,本申请另一方面还提供一种数据发射装置,所述装置包括:指令获取单元,用于获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;目标分组数据确定单元,用于从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;数据读取单元,用于在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;数据发射单元,用于将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
为实现上述目的,本申请另一方面还提供一种数据发射装置,所述装置包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现上述的方法。
由上可见,本申请提供的技术方案,可以预先指定各个发射数据关联的分组数据,当接收到数据发射指令后,可以确定出当前待发射的多个分组数据。然后,首先可以在本地数据池中查询已经存储的目标分组数据。针对这些已经在本地数据池中存储的目标分组数据,可以分批次进行读取并发射。为了避免由于读取的数据量过多,而导致内存溢出或者数据处理堵塞的问题,在一个批次中读取分组数据时,可以限定分组数据中数据堆的堆数上限值,该堆数上限值可以作为指定堆数阈值。这样,当前批次在读取至少两组目标分组数据时,可以保证读取的多个目标分组数据中各个数据堆的堆数总和小于或者等于该指定堆数阈值,从而保证读取的数据量不会过多。此外,如果一个目标分组数据中数据堆的堆数总和大于或者等于所述指定堆数阈值,那么可以将该目标分组 数据作为单独的一个批次进行读取和发送。这样,一方面可以避免读取的数据量过多,另一方面还可以保证同一个数据分组中的数据堆能够在同一批次中被全部读取并发射,避免了数据分割而导致的数据出错的问题。由上可见,本申请提供的技术方案,通过限定每个批次中读取的数据量,从而可以避免由于读取的数据量过多,而导致内存溢出或者数据处理堵塞的问题,进而提高了数据处理的效率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施方式中系统架构的示意图;
图2是本发明实施例中数据发射方法的步骤图;
图3是本发明实施例中分组数据的遍历示意图;
图4是本发明实施例中数据发射装置的功能模块示意图;
图5是本发明实施例中数据发射装置的结构示意图。
具体实施方式
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
实施例一
本申请提供一种数据发射方法,该方法可以应用于如图1所示的系统架构中。该系统架构中可以包括预设分组数据库、数据发射装置以及多个数据处理节点。其中,数据发射装置可以从预设分组数据库中读取分组数据,并将分组数据存储于本地数据池中,然后,数据发射装置可以根据接收到的数据发射指令,将对应的数据从本地数据池中读取出来并通过数据处理节点进行处理后发射至数据发射指令的发起方。在实际应用中,该预设分组数据库可以是kafka数据库,该数据发射装置可以是kafkaspout,各个数据处理节点可以是bolt。
具体地,请参阅图2,本申请提供的数据发射方法可以包括以下步骤。
S1:获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据。
在本实施方式中,Storm系统中可以由各个发射数据来负责数据的读取和发射。kafkaspout在初始化时,可以预先给各个发射数据分配对应的分组数据(Partition),后续,各个发射数据便可以按照分配到的分组数据,执行数据读取和发射的过程。具体地,分组数据都是按照数据主题(topic)进行划分的,kafkaspout在初始化时,可以读取各个数据主题的分组信息,其中,所述数据主题中可以包括多个分组数据,所述分组信息可以包括所述数据主题的标识以及所述数据主题中各个分组数据的标识。例如,当前具备2个数据主题,这两个数据主题的标识分别为0和1,而每个数据主题中各具备10个分组数据,那么第一个数据主题中的10个分组数据的标识可以是从0至9的数值,而第二个数据主题中的10个分组数据的标识可以是从10至19的数值。这样,第一个数据主题的分组信息可以是0:0、0:1、0:5这样的数值组合,其中,第一个数值表示数据主题的标识,第二个数值表示分组数据的标识。
在本实施方式中,在读取了各个数据主题的分组信息后,可以将各个所述数据主题中的分组数据按照数据主题的标识以及分组数据的标识进行排序。排序的结果中,优先按照数据主题的标识进行排列,在同一个数据主题中,可以按照分组数据的标识的大小进行排列。当前的各个发射数据也可以具备各自的索引标识,该索引标识可以是从0开始的递增的数值。例如,第一个发射数据的索引标识为0,第二个发射数据的索引标识为1。然后,可以根据所述索引标识,从排序后的分组数据中选取指定分组数据,并将所述指定分组数据分配给所述当前发射数据。例如,当前共有两个标识为0和1的数据主题,这两个数据主题各包含10组分组数据,第一个数据主题的分组数据的标识是从0至9,第二个数据主题的分组数据的标识是从10至19。假设目前有10个发射数据,那么每个发射数据应当被分配2个分组数据,那么针对索引标识为0的第一个发射数据而言,可以将标识为0和10的分组数据分配给第一个发射数据;相应地,可以将标识为1和11的分组数据分配给索引标识为1的第二个发射数据,以此类推,从而完成分组数据的分配过程。这样,便可以将不同的发射数据与不同的分组数据进行关联。
在本实施方式中,发射数据在执行时,kafkaspout会接收到数据发射指令, 在该数据发射指令中,可以携带待发射的各个分组数据的标识以及各个分组数据所属的数据主题的标识。这样,kafkaspout通过分析该数据发射指令,便可以获知其指向的各个分组数据。
S3:从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆。
在本实施方式中,kafkaspout中可以具备本地数据池,该本地数据池中可以存储从kafka数据库中读取的数据。针对待发射的多个分组数据,首先可以在本地数据池中查询是否具备全部或者其中的部分分组数据,从而可以从所述多个分组数据中确定已在本地数据池中存储的目标分组数据。在所述目标分组数据中,数据可以按照数据堆(batch)的方式进行划分,并且同一个分组数据不能划分到不同的数据堆中。因此,在每个所述目标分组数据中,可以包括至少一个数据堆。
S5:在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值。
在本实施方式中,可以按批次从本地数据池中读取目标分组数据。为了保证一个批次读取的数据量不会过大,可以预先对一个批次能够读取的数据堆的数量进行限定。具体地,可以预先设置一个指定堆数阈值,该指定堆数阈值可以作为在同一批次中读取的数据堆的堆数上限。请参阅图3,以当前批次为例,可以依次尝试读取各个目标分组数据。针对当前批次的第一个目标分组数据而言,kafkaspout可以识别该第一个目标分组数据中包含的数据堆的堆数总和,然后可以根据识别出的堆数总和,更新预设堆数统计参数的累计值。其中,所述预设堆数统计参数可以用于统计当前批次中已经完成读取的数据堆的总数,该参数的初始值为0,并且该参数仅针对当前批次而言,如果当前批次结束,那么该参数也会重置为0。针对第一个目标分组数据,可以直接将其对应的堆数总和写入该参数中。然后可以判断更新后的累计值与指定堆数阈值的大小关系,若更新后的累计值小于所述指定堆数阈值,表示当前批次读取的数据堆的数量还没有达到上限值,因此,当前批次还可以继续尝试读取更多的数据堆。此时,可以读取所述第一个目标分组数据中的各个数据堆,并开始遍历所述本地数据池中的其它目标分组数据。然而,如果更新后的累计值大于或者等于所述指定 堆数阈值,则表示当前批次已经无法读取更多的数据堆,此时,可以将所述第一个目标分组数据作为所述当前批次唯一读取的目标分组数据,并结束当前批次的数据读取过程。
如图3所示,在更新后的累计值小于该指定堆数阈值的情况下,可以遍历下一个目标分组数据,该下一个目标分组数据可以作为当前目标分组数据。针对该当前目标分组数据,可以计算其中包含的数据堆的堆数总和与所述预设堆数统计参数的累计值之和,此时,预设堆数统计参数中记录的是第一个目标分组数据中数据堆的堆数总和,这样,计算的结果就是第一个目标分组数据和当前分组数据中各个数据堆的堆数总和。需要说明的是,目前只是计算两个值的和,并不是将当前目标分组数据中包含的数据堆的堆数总和更新至预设堆数统计参数中。此时,若计算出的结果大于所述指定堆数阈值,则表示如果读取当前目标分组数据中的全部数据堆,则会导致数据量读取过多,而为了保证同一个目标分组数据中的数据堆能够在同一批次中全部读取,此时可以跳过当前目标分组数据,继续遍历下一个目标分组数据。同样地,如果下一个目标分组数据中包含的数据堆的堆数总和与第一个目标分组数据的堆数总和之和仍旧大于该指定堆数阈值,则继续跳过该目标分组数据,直至计算的结果小于或者等于该指定堆数阈值为止。
在本实施方式中,针对当前目标分组数据而言,若计算出的结果小于所述指定堆数阈值,则表示就算读取了当前目标分组数据中的全部数据堆,也不会超过限定的数据量。此时,在读取了第一个目标分组数据之后,可以继续读取所述当前目标分组数据中的各个数据堆,并根据所述当前目标分组数据中包含的数据堆的堆数总和,更新所述预设堆数统计参数的累计值。然后,可以按照上述的过程,继续遍历下一个目标分组数据,直至遍历完成。
S7:将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
在本实施方式中,当所述本地数据池中的目标分组数据遍历完成之后,可以将当前批次读取的各个数据堆封装为数据元组(tuple),并向所述数据发射指令的发起方发送所述数据元组。在完成当前批次的数据读取和发射过程后,可以按照相同的方案,进行下一个批次的数据读取和发射过程。
在一个实施方式中,在向所述数据发射指令的发起方发送所述数据元组之 后,还可以在内存中记录当前批次发送的各个目标分组数据对应的分组信息以及与所述分组信息相关联的数据偏移量(offset)。其中,目标分组数据对应的分组信息可以包括所述目标分组数据所属的数据主题的标识以及所述目标分组数据的标识。此外,上述的数据偏移量可以指向已经完成读取的数据的结束位置,这样,当下一批次在所述本地资源池中读取分组数据时,可以直接从所述数据偏移量表征的位置开始读取数据,从而不会遗漏数据,也不会重复读取数据。在实际应用中,storm系统中可以包括一个中心管理系统ZooKeeper,上述的各个目标分组数据对应的分组信息以及与所述分组信息相关联的数据偏移量便可以写入ZooKeeper中,ZooKeeper中的数据可以被下一个批次的数据读取过程访问,从而获知对应的数据偏移量。需要说明的是,在实际应用中,并不会在发送每个目标分组数据之后就会将对应的信息写入ZooKeeper中,而是只有在确定当前批次的发射数据已经全部处理成功之后,才会将各个目标分组数据对应的分组信息以及与所述分组信息相关联的数据偏移量便可以写入ZooKeeper中。
在一个实施方式中,待发射的部分分组数据可能在kafkaspout的本地数据池中并没有预先存储,此时,kafkaspout则需要从预设分组数据库-kafka数据库中先将对应的分组数据读取至本地数据池,然后再从本地数据池发送出去。具体地,针对所述本地资源池中未存储的分组数据,kafkaspout首先可以获取所述未存储的分组数据的数据偏移量以及数据尺寸。其中,所述数据偏移量可以表明数据的起始位置,数据尺寸则可以表示从该起始位置开始需要读取的数据量。在实际应用中,如果kafkaspout是第一次从所述预设分组数据库中读取数据,那么分组数据的数据偏移量会维护在预设分组数据库中,此时,可以按照预设配置参数,从所述预设分组数据库中读取所述未存储的分组数据对应的数据偏移量。其中,该预设配置参数可以表征kafka数据库中记录数据偏移量的路径。
此外,若kafkaspout不是第一次从所述预设分组数据库中读取数据,则表示之前已经将部分的分组数据从kafka数据库中读取至本地数据池中,并且将本地数据池中的目标分组数据提供给了数据获取请求的发起方。此时,在内存(ZooKeeper)中,已经记录了上一次数据处理之后的数据偏移量。因此,在这种情况下,kafkaspout可以直接从内存的指定路径中读取所述未存储的分组数 据对应的数据偏移量。在读取到数据偏移量以及获取了数据尺寸之后,kafkaspout便可以在预设分组数据库中从所述数据偏移量表征的位置开始,读取所述数据尺寸的数据。读取后的数据可以按照一定的数据粒度,被划分为多个数据堆,最终,划分得到的数据堆可以按照所属的分组数据被存放于kafkaspout的本地资源池中。在将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中之后,kafkaspout还可以将所述划分得到的数据堆的数据偏移量按照所属的分组数据进行记录,以使得后续可以基于记录的所述数据偏移量,从所述本地资源池中读取分组数据中的各个数据堆。
在本实施方式中,为了保证本地数据池中的分组数据不出现冗余,可以在将本次从所述预设分组数据库中读取的数据全部发送出去之前,不再从所述预设分组数据库中读取数据。
实施例二
请参阅图4,本申请还提供一种数据发射装置,所述装置包括:
指令获取单元,用于获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;
目标分组数据确定单元,用于从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;
数据读取单元,用于在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;
数据发射单元,用于将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
在一个实施方式中,所述目标分组数据确定单元包括:
首次读取模块,用于针对当前批次的第一个目标分组数据,根据所述第一个目标分组数据中包含的数据堆的堆数总和,更新预设堆数统计参数的累计值;
判定读取模块,用于若更新后的累计值小于所述指定堆数阈值,读取所述第一个目标分组数据中的各个数据堆,并遍历所述本地数据池中的其它目标分组数据;
遍历模块,针对遍历过程中的当前目标分组数据,计算所述当前目标分组 数据中包含的数据堆的堆数总和与所述预设堆数统计参数的累计值之和,若计算出的结果大于所述指定堆数阈值,继续遍历下一个目标分组数据;若计算出的结果小于或者等于所述指定堆数阈值,读取所述当前目标分组数据中的各个数据堆,并根据所述当前目标分组数据中包含的数据堆的堆数总和,更新所述预设堆数统计参数的累计值,并继续遍历下一个目标分组数据;
相应地,所述数据发射单元,用于当所述本地数据池中的目标分组数据遍历完成之后,将当前批次读取的各个目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
在一个实施方式中,所述目标分组数据确定单元还包括:
单独读取模块,用于若更新后的累计值大于或者等于所述指定堆数阈值,将所述第一个目标分组数据作为所述当前批次唯一读取的目标分组数据,并结束当前批次的数据读取过程。
在一个实施方式中,所述装置还包括:
未存储数据信息获取单元,用于针对所述本地资源池中未存储的分组数据,获取所述未存储的分组数据的数据偏移量以及数据尺寸;
分组数据读取单元,用于在预设分组数据库中从所述数据偏移量表征的位置开始,读取所述数据尺寸的数据,并将读取的所述数据划分为多个数据堆之后,将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中。
请参阅图5,本申请还提供一种数据发射装置,所述装置包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,可以实现上述的数据发射方法。
由上可见,本申请提供的技术方案,可以预先指定各个发射数据关联的分组数据,当接收到数据发射指令后,可以确定出当前待发射的多个分组数据。然后,首先可以在本地数据池中查询已经存储的目标分组数据。针对这些已经在本地数据池中存储的目标分组数据,可以分批次进行读取并发射。为了避免由于读取的数据量过多,而导致内存溢出或者数据处理堵塞的问题,在一个批次中读取分组数据时,可以限定分组数据中数据堆的堆数上限值,该堆数上限值可以作为指定堆数阈值。这样,当前批次在读取至少两组目标分组数据时,可以保证读取的多个目标分组数据中各个数据堆的堆数总和小于或者等于该指定堆数阈值,从而保证读取的数据量不会过多。此外,如果一个目标分组数据 中数据堆的堆数总和大于或者等于所述指定堆数阈值,那么可以将该目标分组数据作为单独的一个批次进行读取和发送。这样,一方面可以避免读取的数据量过多,另一方面还可以保证同一个数据分组中的数据堆能够在同一批次中被全部读取并发射,避免了数据分割而导致的数据出错的问题。由上可见,本申请提供的技术方案,通过限定每个批次中读取的数据量,从而可以避免由于读取的数据量过多,而导致内存溢出或者数据处理堵塞的问题,进而提高了整个系统在处理数据的过程中的稳定性。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (14)

  1. 一种数据发射方法,其特征在于,所述方法包括:
    获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;
    从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;
    在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;
    将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
  2. 根据权利要求1所述的方法,其特征在于,所述分组数据与发射数据按照以下方式建立关联关系:
    在初始化时,读取各个数据主题的分组信息,其中,所述数据主题中包括多个分组数据,所述分组信息包括所述数据主题的标识以及所述数据主题中各个分组数据的标识;
    将各个所述数据主题中的分组数据按照数据主题的标识以及分组数据的标识进行排序;
    获取当前发射数据的索引标识,并根据所述索引标识,从排序后的分组数据中选取指定分组数据,并将所述指定分组数据分配给所述当前发射数据。
  3. 根据权利要求1所述的方法,其特征在于,在当前批次中读取至少一个所述目标分组数据包括:
    针对当前批次的第一个目标分组数据,根据所述第一个目标分组数据中包含的数据堆的堆数总和,更新预设堆数统计参数的累计值;
    若更新后的累计值小于所述指定堆数阈值,读取所述第一个目标分组数据中的各个数据堆,并遍历所述本地数据池中的其它目标分组数据;
    针对遍历过程中的当前目标分组数据,计算所述当前目标分组数据中包含 的数据堆的堆数总和与所述预设堆数统计参数的累计值之和,若计算出的结果大于所述指定堆数阈值,继续遍历下一个目标分组数据;若计算出的结果小于或者等于所述指定堆数阈值,读取所述当前目标分组数据中的各个数据堆,并根据所述当前目标分组数据中包含的数据堆的堆数总和,更新所述预设堆数统计参数的累计值,并继续遍历下一个目标分组数据;
    相应地,当所述本地数据池中的目标分组数据遍历完成之后,将当前批次读取的各个目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    若更新后的累计值大于或者等于所述指定堆数阈值,将所述第一个目标分组数据作为所述当前批次唯一读取的目标分组数据,并结束当前批次的数据读取过程。
  5. 根据权利要求1所述的方法,其特征在于,在向所述数据发射指令的发起方发送所述数据元组之后,所述方法还包括:
    在内存中记录当前批次发送的各个目标分组数据对应的分组信息以及与所述分组信息相关联的数据偏移量;其中,目标分组数据对应的分组信息包括所述目标分组数据所属的数据主题的标识以及所述目标分组数据的标识,并且下一批次在所述本地资源池中读取分组数据时,从所述数据偏移量表征的位置开始读取数据。
  6. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    针对所述本地资源池中未存储的分组数据,获取所述未存储的分组数据的数据偏移量以及数据尺寸;
    在预设分组数据库中从所述数据偏移量表征的位置开始,读取所述数据尺寸的数据,并将读取的所述数据划分为多个数据堆之后,将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中。
  7. 根据权利要求6所述的方法,其特征在于,获取所述未存储的分组数据 的数据偏移量包括:
    若是第一次从所述预设分组数据库中读取数据,按照预设配置参数,从所述预设分组数据库中读取所述未存储的分组数据对应的数据偏移量;
    若不是第一次从所述预设分组数据库中读取数据,从内存的指定路径中读取所述未存储的分组数据对应的数据偏移量。
  8. 根据权利要求6所述的方法,其特征在于,在将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中之后,所述方法还包括:
    将所述划分得到的数据堆的数据偏移量按照所属的分组数据进行记录,以使得基于记录的所述数据偏移量,从所述本地资源池中读取分组数据中的各个数据堆。
  9. 根据权利要求6所述的方法,其特征在于,在将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中之后,所述方法还包括:
    在将本次从所述预设分组数据库中读取的数据全部发送出去之前,不再从所述预设分组数据库中读取数据。
  10. 一种数据发射装置,其特征在于,所述装置包括:
    指令获取单元,用于获取数据发射指令,所述数据发射指令指向与至少一个发射数据关联的多个分组数据;
    目标分组数据确定单元,用于从所述多个分组数据中确定已在本地数据池中存储的目标分组数据,其中,每个所述目标分组数据中包括至少一个数据堆;
    数据读取单元,用于在当前批次中读取至少一个所述目标分组数据,其中,若读取至少两组目标分组数据,所述至少两组目标分组数据中各个数据堆的堆数总和小于或者等于指定堆数阈值;
    数据发射单元,用于将当前批次读取的所述目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
  11. 根据权利要求10所述的装置,其特征在于,所述目标分组数据确定单元包括:
    首次读取模块,用于针对当前批次的第一个目标分组数据,根据所述第一个目标分组数据中包含的数据堆的堆数总和,更新预设堆数统计参数的累计值;
    判定读取模块,用于若更新后的累计值小于所述指定堆数阈值,读取所述第一个目标分组数据中的各个数据堆,并遍历所述本地数据池中的其它目标分组数据;
    遍历模块,针对遍历过程中的当前目标分组数据,计算所述当前目标分组数据中包含的数据堆的堆数总和与所述预设堆数统计参数的累计值之和,若计算出的结果大于所述指定堆数阈值,继续遍历下一个目标分组数据;若计算出的结果小于或者等于所述指定堆数阈值,读取所述当前目标分组数据中的各个数据堆,并根据所述当前目标分组数据中包含的数据堆的堆数总和,更新所述预设堆数统计参数的累计值,并继续遍历下一个目标分组数据;
    相应地,所述数据发射单元,用于当所述本地数据池中的目标分组数据遍历完成之后,将当前批次读取的各个目标分组数据封装为数据元组,并向所述数据发射指令的发起方发送所述数据元组。
  12. 根据权利要求11所述的装置,其特征在于,所述目标分组数据确定单元还包括:
    单独读取模块,用于若更新后的累计值大于或者等于所述指定堆数阈值,将所述第一个目标分组数据作为所述当前批次唯一读取的目标分组数据,并结束当前批次的数据读取过程。
  13. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    未存储数据信息获取单元,用于针对所述本地资源池中未存储的分组数据,获取所述未存储的分组数据的数据偏移量以及数据尺寸;
    分组数据读取单元,用于在预设分组数据库中从所述数据偏移量表征的位置开始,读取所述数据尺寸的数据,并将读取的所述数据划分为多个数据堆之后,将划分得到的数据堆按照所属的分组数据存放于所述本地资源池中。
  14. 一种数据发射装置,其特征在于,所述装置包括存储器和处理器,所述存储器用于存储计算机程序,所述计算机程序被所述处理器执行时,实现如 权利要求1至9中任一权利要求所述的方法。
PCT/CN2018/108238 2018-08-10 2018-09-28 一种数据发射方法及装置 WO2020029405A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/755,852 US20200293543A1 (en) 2018-08-10 2018-09-28 Method and apparatus for transmitting data
EP18929307.9A EP3835975A4 (en) 2018-08-10 2018-09-28 DATA TRANSMISSION PROCESS AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810913710.1A CN110825533B (zh) 2018-08-10 2018-08-10 一种数据发射方法及装置
CN201810913710.1 2018-08-10

Publications (1)

Publication Number Publication Date
WO2020029405A1 true WO2020029405A1 (zh) 2020-02-13

Family

ID=69415371

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/108238 WO2020029405A1 (zh) 2018-08-10 2018-09-28 一种数据发射方法及装置

Country Status (4)

Country Link
US (1) US20200293543A1 (zh)
EP (1) EP3835975A4 (zh)
CN (1) CN110825533B (zh)
WO (1) WO2020029405A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314229A (zh) * 2021-05-26 2021-08-27 北京京东拓先科技有限公司 一种数据处理方法、装置、电子设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220198028A1 (en) * 2020-12-17 2022-06-23 The Toronto-Dominion Bank Secure resolution of email-based queries involving confidential third-party data
CN113452667A (zh) * 2021-03-05 2021-09-28 浙江华云信息科技有限公司 一种适用于多种协议类型的边缘物联终端接入方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927305A (zh) * 2013-01-11 2014-07-16 中国移动通信集团山东有限公司 一种对内存溢出进行控制的方法和设备
CN104932941A (zh) * 2012-11-05 2015-09-23 北京奇虎科技有限公司 一种分布式消息处理系统及其中的设备和方法
US20180039513A1 (en) * 2016-08-02 2018-02-08 Salesforce.Com, Inc. Techniques and architectures for non-blocking parallel batching
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106210032A (zh) * 2016-07-06 2016-12-07 乐视控股(北京)有限公司 基于终端数据批量上报的方法及装置
CN106302385B (zh) * 2016-07-26 2019-11-15 努比亚技术有限公司 一种消息分发装置及方法
CN107943802A (zh) * 2016-10-12 2018-04-20 北京京东尚科信息技术有限公司 一种日志分析方法和系统
CN108255628A (zh) * 2016-12-29 2018-07-06 北京国双科技有限公司 一种数据处理方法及装置
CN106648904B (zh) * 2017-01-09 2020-06-12 大连理工大学 一种流式数据处理自适应速率控制方法
CN107509119B (zh) * 2017-07-11 2020-02-21 北京潘达互娱科技有限公司 一种监控报警方法与装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104932941A (zh) * 2012-11-05 2015-09-23 北京奇虎科技有限公司 一种分布式消息处理系统及其中的设备和方法
CN103927305A (zh) * 2013-01-11 2014-07-16 中国移动通信集团山东有限公司 一种对内存溢出进行控制的方法和设备
US20180039513A1 (en) * 2016-08-02 2018-02-08 Salesforce.Com, Inc. Techniques and architectures for non-blocking parallel batching
CN108365971A (zh) * 2018-01-10 2018-08-03 深圳市金立通信设备有限公司 日志解析方法、设备及计算机可读介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314229A (zh) * 2021-05-26 2021-08-27 北京京东拓先科技有限公司 一种数据处理方法、装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN110825533A (zh) 2020-02-21
EP3835975A4 (en) 2021-09-22
US20200293543A1 (en) 2020-09-17
EP3835975A1 (en) 2021-06-16
CN110825533B (zh) 2022-12-20

Similar Documents

Publication Publication Date Title
KR102337092B1 (ko) 트래픽 측정 방법, 장치, 및 시스템
WO2020029405A1 (zh) 一种数据发射方法及装置
CN109117275B (zh) 基于数据分片的对账方法、装置、计算机设备及存储介质
CN110413650B (zh) 一种业务数据的处理方法、装置、设备和存储介质
US20200351207A1 (en) Method and system of limiting traffic
CN110740054B (zh) 一种基于强化学习的数据中心虚拟化网络故障诊断方法
CN106027595A (zh) 用于cdn节点的访问日志处理方法及系统
CN106921665B (zh) 一种报文处理方法及网络设备
CN112016030B (zh) 消息推送的方法、装置、服务器和计算机存储介质
CN111181800B (zh) 测试数据处理方法、装置、电子设备及存储介质
CN111966289A (zh) 基于Kafka集群的分区优化方法和系统
US11677769B2 (en) Counting SYN packets
CN111159002A (zh) 一种基于分组的数据边缘采集方法、边缘采集设备及系统
CN114244752A (zh) 流量统计方法、装置和设备
CN114567674B (zh) 一种数据处理方法、装置、计算机设备以及可读存储介质
CN107819697B (zh) 数据传输方法、交换机及数据中心
CN114064312A (zh) 一种数据处理系统及模型训练方法
CN109802868A (zh) 一种基于云计算的移动应用实时识别方法
EP4174675A1 (en) On-board data storage method and system
CN112180757A (zh) 一种智能家居系统及其策略管理方法
CN108829735B (zh) 并行执行计划的同步方法、装置、服务器及存储介质
EP4354297A1 (en) Data integrity processing method and apparatus, and electronic device
CN114070755B (zh) 虚拟机网络流量确定方法、装置、电子设备和存储介质
CN116132698A (zh) 内容录制方法、播放方法、cdn系统、存储介质
CN114579303A (zh) 一种工业互联网的业务数据处理方法、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18929307

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018929307

Country of ref document: EP

Effective date: 20210308