CN113438274A - Data transmission method and device, computer equipment and readable storage medium - Google Patents

Data transmission method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN113438274A
CN113438274A CN202110577396.6A CN202110577396A CN113438274A CN 113438274 A CN113438274 A CN 113438274A CN 202110577396 A CN202110577396 A CN 202110577396A CN 113438274 A CN113438274 A CN 113438274A
Authority
CN
China
Prior art keywords
data
search engine
distributed search
engine server
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110577396.6A
Other languages
Chinese (zh)
Inventor
韩大炜
刘立
李开科
孙浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Network Technology Co ltd
Original Assignee
Dawning Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Network Technology Co ltd filed Critical Dawning Network Technology Co ltd
Priority to CN202110577396.6A priority Critical patent/CN113438274A/en
Publication of CN113438274A publication Critical patent/CN113438274A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/22Parsing or analysis of headers

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data transmission method, a data transmission device, computer equipment and a readable storage medium. The method comprises the following steps: receiving N data packets through N data transmission connections, and carrying out convergence processing on original data in the N data packets to obtain target data; the original data is data to be written into the distributed search engine server; performing data segmentation processing on target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; m is less than N; the M data units are sent over a data transfer connection with the distributed search engine server. The problem that the data collection efficiency is reduced due to TCP concurrency or TCP burst phenomenon when the ES server collects data can be solved.

Description

Data transmission method and device, computer equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data transmission method and apparatus, a computer device, and a readable storage medium.
Background
At present, a centralized server is generally adopted in a common search engine such as Google, and a distributed search engine is gradually derived in order to realize decentralized engine service. The Elastic Search (ES) is a distributed search and analysis engine developed by Elastic corporation, and the Elastic search engine can provide data search and analysis functions based on distributed data.
In the prior art, the ES server may collect data from different data sources through a Transmission Control Protocol (TCP) connection, and provide an engine service based on the collected data. However, the data of the data source is uncertain, and two phenomena of high concurrency (high concurrency) and burst (burst) of the TCP may occur. For example, after one data source receives data from another data source, a TCP connection request may be initiated to the ES server for the data to send the data to the ES server over a TCP connection. As the data sources and the data quantity increase, the TCP connection requests are greatly concurrent and exceed the number of the TCP connections supported by the ES server, so that the TCP connection requests are partially not responded, and partial data transmission fails. Or, in the case of TCP burst, the number of data transmitted on a certain TCP connection between the data source and the ES server exceeds the maximum processing capacity of the ES server, which may result in that some data transmitted on the TCP connection cannot be processed and written by the ES server, thereby affecting the data collection efficiency of the ES server.
It can be seen that the data collection efficiency of the ES server may be reduced due to the TCP high concurrency, TCP burst phenomenon.
Disclosure of Invention
Embodiments of the present application provide a data transmission method, an apparatus, a computer device, and a readable storage medium, which can solve the problem that when an ES server collects data, data collection efficiency is reduced due to a TCP concurrency or TCP burst phenomenon.
In a first aspect, a data transmission method is provided, including:
receiving N data packets through N data transmission connections, and carrying out convergence processing on original data in the N data packets to obtain target data; the original data is data to be written into the distributed search engine server;
performing data segmentation processing on target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; m is less than N;
the M data units are sent over a data transfer connection with the distributed search engine server.
According to the data transmission method and device, when data are received through data transmission connection, the received data packets are gathered and divided, N data transmission connection requests cannot be initiated aiming at the N data packets, and the concurrency of the data transmission connection requests is reduced, so that the processing amount of the distributed search engine server on the data transmission connection requests is reduced, and the problem that the data transmission rate is reduced due to high concurrency of TCP is avoided. In addition, the data size of a single data unit does not exceed the maximum data processing capacity of the distributed search engine server, and the problems that data cannot be written and the data transmission rate is reduced due to TCP burst can also be avoided.
With reference to the first aspect, in a first possible implementation manner of the first aspect, the aggregating the original data in the N data packets, and obtaining the target data includes:
analyzing the N data packets to obtain original data in the N data packets;
and sequentially writing the original data in the N data packets into a preset cache space to obtain target data.
According to the method, the special cache space is set to store the target data obtained by converging the plurality of data packets, so that convergence processing and data segmentation processing are separated in time, the data source equipment is prevented from frequently carrying out convergence processing and data segmentation processing, and the processing load of the data source equipment is reduced.
With reference to the first aspect, in a second possible implementation manner of the first aspect,
and after all the original data in the N data packets are written into the preset cache space, adding a write-in completion identifier for the preset cache space.
According to the method, after the data are written in the special cache space, the write-in completion identification can be added to the cache space, and the complete storage of the data to be sent can be guaranteed. Data aggregation and data division are carried out at intervals, and meanwhile data omission is avoided.
With reference to the first aspect, in a third possible implementation manner of the first aspect, the method further includes:
and periodically polling the preset cache space, determining that the preset cache space is in a write-in completion state according to the write-in completion identifier, and reading the target data in the preset cache space.
According to the method, when the data source equipment performs data segmentation processing at intervals after caching data, the cache space can be determined to store all data to be sent according to the write-in completion identifier, and data omission is avoided during data segmentation processing.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the performing data segmentation processing on the target data according to the maximum data processing amount of the distributed search engine server to obtain M data units includes:
judging whether the size of the target data exceeds the maximum data processing capacity or not;
if the size of the target data exceeds the maximum data processing capacity, performing data segmentation processing on the target data by using a preset division granularity to obtain M data units; the data size corresponding to the preset partition granularity does not exceed the data size corresponding to the maximum data processing amount.
According to the method, when the cached target data exceeds the maximum data processing capacity of the distributed search engine server, the target data needs to be divided according to the maximum data processing capacity of the distributed search engine server, so that the data units obtained by division do not exceed the maximum data processing capacity of the distributed search engine server, the data sent at one time through a single data transmission connection is not too large, TCP (transmission control protocol) emergency can be avoided, the problem that the data cannot be processed and written by the distributed search engine server is avoided, and the data writing efficiency of the distributed search engine server is optimized to a certain extent.
With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the method further includes:
and if the size of the target data does not exceed the maximum data processing capacity, forbidding data segmentation processing on the target data, and sending the target data through a data transmission connection with the distributed search engine server.
According to the method, when the cached target data does not exceed the maximum data processing capacity of the distributed search engine server, the target data does not need to be divided, the transmission capacity of a single data transmission connection does not exceed the maximum data processing capacity of the distributed search engine server, and TCP (transmission control protocol) emergencies can be avoided.
With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the sending M data units through a data transmission connection with a distributed search engine server includes:
and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
In the method provided by the application, the plurality of data units can be sequentially sent according to a certain sequence, and the distributed search engine server is ensured to correctly recombine the data according to the receiving sequence. For example, the specific implementation of sending M data units via a data transmission connection with a distributed search engine server referred to above includes: and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
In a second aspect, a data transmission apparatus is provided, including:
the receiving unit is used for receiving the N data packets through the N data transmission connections and carrying out convergence processing on original data in the N data packets to obtain target data; the original data is data to be written into the distributed search engine server;
the data processing unit is used for carrying out data segmentation processing on the target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; the maximum data processing capacity is used for limiting the single data transmission capacity of the data transmission connection of the distributed search server;
and the sending unit is used for sending the M data units through data transmission connection with the distributed search engine server.
In a third aspect, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the method according to the first aspect and any implementation manner of the first aspect when executing the computer program.
In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to the first aspect and any one of the implementations of the first aspect.
The embodiment of the application provides a data transmission method, a data transmission device, a computer device and a readable storage medium, data in data packets from a plurality of data transmission connections are gathered, the gathered data are divided according to the maximum data processing capacity of a distributed search engine server, and the number of the divided data units is not more than that of the originally received data packets, so that target data can be transmitted to the distributed search engine server through fewer data transmission connections, N data transmission connection requests cannot be initiated aiming at N data packets, the concurrence of the data transmission connection requests is reduced, the processing capacity of the distributed search engine server on the data transmission connection requests is reduced, and the problem that the data transmission rate is reduced due to high concurrence of TCP is avoided. In addition, the data size of a single data unit does not exceed the maximum data processing capacity of the distributed search engine server, and the problems that data cannot be written and the data transmission rate is reduced due to TCP burst can also be avoided.
Drawings
FIG. 1 is a schematic diagram of a data collection system provided by an embodiment of the present application;
fig. 2 is a schematic flowchart of a data transmission method according to an embodiment of the present application;
fig. 3 is another schematic flow chart of a data transmission method according to an embodiment of the present application;
fig. 4 is another schematic flow chart of a data transmission method according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a data source device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The method provided by the embodiment of the application is suitable for the data collection system shown in FIG. 1. Referring to fig. 1, the system includes a distributed search engine server 10 and a data source device 20. The data source device 20 includes a primary data source device which directly communicates with the distributed search engine server 10, and a secondary data source device, a tertiary data source device, etc. which cannot directly communicate with the distributed search engine server 10.
The distributed search engine server provides a search engine to the client 30, and a user can perform data search using the distributed search engine through the client 30. The distributed search engine server 10 may collect data from various data source devices 20 to support data search, engine services for clients.
For example, any data source device 20 may collect data from a next data source device through a TCP connection, initiate a TCP connection request to a previous device for the data received by the TCP connection, and forward the received data to the previous device through a TCP connection with the previous device. For the primary device in the system shown in fig. 1, after receiving the data sent by the secondary data source device through the TCP connection, the primary device initiates a TCP connection request to the distributed search engine server 10, and sends the data through the TCP connection request to the distributed search engine server 10.
When high concurrency of TCP occurs, for example, a primary data source device initiates a TCP connection request which is massively concurrent and exceeds the number of TCP connections supported by the distributed search engine server 10, so that a part of the TCP connection request is not responded, and a part of data transmission fails. When a TCP burst occurs, for example, the number of data transmitted on a certain TCP connection between the primary data source device and the ES server exceeds the maximum processing capacity of the ES server, which may result in that part of data transmitted on the TCP connection cannot be processed and written by the ES server, thereby affecting the data collection efficiency of the ES server.
Based on this, the embodiment of the application provides a data transmission method, which can solve the problem that the data collection efficiency is reduced due to a TCP concurrent phenomenon or a TCP burst phenomenon when a distributed search engine server collects data. Referring to fig. 2, the method includes the steps of:
step 201, the data source device receives N data packets through N data transmission connections, and performs aggregation processing on original data in the N data packets to obtain target data.
The data source device is used for providing data for the distributed search engine server so as to support the engine and analysis functions of the distributed search engine server. The data transfer connection may be a TCP connection, and a TCP connection request is first initiated to establish a TCP connection between the devices before transferring data between the data source devices.
In a specific implementation, the data source device may encapsulate original data to be transmitted according to a protocol standard to obtain a data packet, and then transmit the data packet through a TCP connection between the data source devices. Accordingly, the data source device of the receiving side may receive the data packet through the TCP connection. It will be appreciated that the raw data is the data to be written to the distributed search engine server.
In a possible implementation manner, after receiving a data packet, a data source device parses the data packet according to a protocol standard to obtain original data therein. And aggregating the original data in the data packets received through the plurality of data transmission connections. Illustratively, N data packets are received through N data transmission connections, and original data in the N data packets are aggregated to obtain target data. It should be noted that the protocol standard described in the embodiment of the present application is a protocol for communication between devices, for example, a protocol for communication between data source devices.
In this embodiment of the application, the data source device may be a primary data source device in the system shown in fig. 1, and after receiving the data packet through the data transmission connection, the data source device avoids directly initiating a data transmission connection request to the distributed search engine for the data packet, but aggregates data in a plurality of data packets for subsequent transmission after being divided, so that the amount of concurrence of the data transmission connection request can be reduced.
Step 202, the data source device performs data segmentation processing on the target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units.
Wherein M is an integer less than N. Just because M is less than N, and M data units can be sent to the distributed search engine server through M data transmission connections, the embodiment of the present application can transmit target data to the distributed search engine server through fewer data transmission connections, and does not initiate N data transmission connection requests for N data packets, thereby reducing the concurrence of data transmission connection requests, and reducing the processing amount of the distributed search engine server on the data transmission connection requests.
In a possible implementation manner, the maximum data processing amount according to the embodiment of the present application may be a maximum data amount processed by the distributed search engine server at a single time. For example, the maximum data processing amount of the distributed search engine server for a single TCP connection may be used, that is, the maximum data processing amount of the distributed search engine server after receiving data through a single data transmission connection can be used.
For example, the target data is divided according to the maximum data processing amount, that is, the target data is divided into a plurality of data units, and the data amount of each data unit does not exceed the maximum data processing amount. Further, the data source device may further encapsulate the segmented data according to a communication protocol standard between the data source device and the distributed search engine server to obtain a plurality of data units.
Step 203, the data source device sends M data units through the data transmission connection with the distributed search engine server.
The data source device can simultaneously initiate M data transmission connection requests to the distributed search engine server, establish M data transmission connections with the distributed search engine server, and respectively send the M data units through the M data transmission connections.
Or, the data source device may initiate P (less than M) data transmission connection requests to the distributed search engine server at the same time, establish P data transmission connections with the distributed search engine server, and send the M data units through the P data transmission connections, respectively. Specifically, a plurality of data units are sent on one of the P data transmission connections.
In the embodiment of the application, data in data packets from a plurality of data transmission connections are aggregated, the aggregated data are divided according to the maximum data processing capacity of the distributed search engine server, and the number of the divided data units is not more than the number of the originally received data packets, so that target data can be transmitted to the distributed search engine server through fewer data transmission connections, N data transmission connection requests cannot be initiated for N data packets, the concurrency of the data transmission connection requests is reduced, the processing capacity of the distributed search engine server for the data transmission connection requests is reduced, and the problem of data transmission rate reduction caused by high concurrency of TCP is avoided. In addition, the data size of a single data unit does not exceed the maximum data processing capacity of the distributed search engine server, and the problems that data cannot be written and the data transmission rate is reduced due to TCP burst can also be avoided.
In this embodiment, the data source device may aggregate data received through a plurality of data transmission connections in the cache. For example, the foregoing implementation of aggregating the original data in the N data packets to obtain the target data includes the steps shown in fig. 3:
step 301, analyzing the N data packets to obtain original data in the N data packets.
In a specific implementation, the data source device parses the received data packet according to a communication protocol with the data packet sender, and obtains original data, that is, data to be written into the distributed search engine server, from payload of the data packet.
And step 302, sequentially writing the original data in the N data packets into a preset cache space to obtain target data.
In a specific implementation, the original data in the N data packets may be sequentially written into a preset buffer space according to the receiving sequence of the N data packets. It should be noted that the data source device may also sequentially write the original data in the N data packets according to other orders, which is not limited in this embodiment of the present application.
In a possible implementation manner, the preset buffer space may be a buffer space dedicated to buffer aggregated data (i.e. the aforementioned target data) of a plurality of data transmission connections.
It should be noted that, in the method provided by the present application, a dedicated cache space is set to store target data obtained by aggregating a plurality of data packets, so that aggregation processing and data segmentation processing are separated in time, frequent aggregation processing and data segmentation processing performed by a data source device are avoided, and a processing load of the data source device is reduced.
In this embodiment of the present application, after the data received through the data transmission connection is written in the dedicated cache space, a write completion flag may be further added to the memory space. Illustratively, the method shown in fig. 2 further includes: and after all the original data in the N data packets are written into the preset cache space, adding a write-in completion identifier for the preset cache space.
It should be noted that, after data is written in the dedicated cache space, a write completion identifier may be added to the cache space, so that complete storage of data to be sent may be ensured. Data aggregation and data division are carried out at intervals, and meanwhile data omission is avoided.
In the embodiment of the application, the data source device can also complete the complete reading of the data according to the write-in completion identifier of the cache space, and data omission is avoided. For example, the foregoing specific implementation of reading the target data from the preset cache space includes:
and periodically polling the preset cache space, determining that the preset cache space is in a write-in completion state according to the write-in completion identifier, and reading the target data in the preset cache space.
It should be noted that, when the data source device performs data segmentation processing at intervals after caching data, it may be determined that the cache space stores all data to be sent according to the write completion identifier, so as to avoid data omission during data segmentation processing.
In the embodiment of the application, the data source device can divide the cached data according to a certain data size, so that the divided data unit is prevented from exceeding the single data processing capacity of the distributed search engine server. For example, the foregoing related specific implementation of performing data segmentation processing on target data according to the maximum data throughput of the distributed search engine server to obtain M data units includes:
judging whether the size of the target data exceeds the maximum data processing capacity; if the size of the target data exceeds the maximum data processing capacity, performing data segmentation processing on the target data by using a preset division granularity to obtain M data units; the data size corresponding to the preset partition granularity does not exceed the data size corresponding to the maximum data processing amount.
That is to say, when the cached target data exceeds the maximum data processing amount of the distributed search engine server, the target data needs to be divided according to the maximum data processing amount of the distributed search engine server, so that the data unit obtained by division does not exceed the maximum data processing amount of the distributed search engine server, the data transmitted at one time by a single data transmission connection is not too large, and a TCP emergency can be avoided, thereby avoiding the problem that the data cannot be processed and written by the distributed search engine server, and optimizing the data writing efficiency of the distributed search engine server to a certain extent.
In one possible implementation, if the size of the target data does not exceed the maximum data processing amount, the data segmentation processing on the target data is prohibited, and the target data is sent through a data transmission connection with the distributed search engine server.
That is, when the cached target data does not exceed the maximum data processing amount of the distributed search engine server, the target data does not need to be divided, the transmission amount of a single data transmission connection does not exceed the maximum data processing amount of the distributed search engine server, and the TCP emergency can be avoided.
In the embodiment of the application, the plurality of data units can be sequentially sent according to a certain sequence, so that the distributed search engine server can be ensured to correctly recombine the data according to the receiving sequence. For example, the specific implementation of sending M data units via a data transmission connection with a distributed search engine server referred to above includes: and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
In a specific implementation, the data source device may send M data units to the distributed search engine server through M data transmission connections, and the sending times of the M data units are staggered in sequence. Because the transmission time delay of the data transmission connection between the data source equipment and the distributed search engine server is approximately the same, the distributed search engine server can be ensured to receive M data units in sequence, thereby realizing the correct recombination of the data and avoiding the phenomenon of out-of-order recombination.
The embodiment of the application also provides a data transmission method, and the data source device can execute a relevant process to converge and divide the received data so as to avoid the problems of high concurrency and TCP burst of TCP. Illustratively, with reference to fig. 4, the method includes the steps of:
step 401, after starting the data collection task, establishing two operation threads, namely a data receiving thread and a cache polling thread.
The data receiving thread and the data polling thread are configured to perform the aggregation processing and the data splitting processing described above.
Step 402, starting a data receiving thread to monitor a TCP port, and starting a data convergence processing flow when a TCP connection is established.
And 403, executing a data receiving thread, receiving data through the TCP connection, finishing data preprocessing according to a data analysis strategy, and writing the data into a preset cache space.
The preset cache space is a cache applied in advance in the shared memory space and is specially used for caching data aggregated by the data.
Step 404, executing a data receiving thread, and adding a write completion flag to the cache space after the write is completed.
Therefore, under a multi-concurrency scene, writing of multiple copies of data can be completed simultaneously, and complete storage of data to be sent can be guaranteed.
And 405, polling the cache space by the cache polling thread, and when the cache space is judged to have a write-in completion mark, listing the cache index of the cache space into a task queue.
And 406, the buffer polling thread executes the task queue to converge the data in the buffer space into target data.
In the specific implementation, the data aggregation logic determines a cache space corresponding to the cache index of the task queue, and aggregates the data in the cache space into a data block to complete the data aggregation of the concurrent TCP.
Step 407, the cache polling thread calculates the size of the target data (databuffer size) and compares it with the single transmission maximum value (MaxSendSize).
Where MaxSendSize may be the maximum data throughput as described above, may be the maximum amount of data that the distributed search engine server can handle in response to a single TCP connection, or may be the maximum amount of data that the distributed search engine server can receive over a single TCP connection.
If DataBufferSize < ═ MaxSendSize, then go to step 408; if DataBufferSize > MaxSendSize, then step 409 is performed.
Step 408, directly submitting the target data to the output logic, and sending the target data through the TCP connection with the distributed search engine server.
And 409, performing data segmentation processing on the target data to obtain N data units, and sending the N data units one by one through TCP connection with the distributed search engine server.
The segmentation algorithm for segmenting the target data comprises the following steps: and cutting the target data by using MaxSendSize as a cutting granularity, wherein the length of a data sequence consisting of a plurality of cut data units is int (DataBufferSize/MaxSendSize) + DataBufferSize% MaxSendSize.
Wherein, the length of the first N-1 data units is MaxSendSize, and the length of the last data unit is DataBufferSize% MaxSendSize.
According to the method provided by the embodiment of the application, through the data aggregation and cutting method, the problem that the writing of the ElasticSearch data fails because the data cannot be received due to high concurrency and burst fluctuation in a streaming data transmission scene can be solved.
An embodiment of the present application further provides a data source device, as shown in fig. 5, where the data source device includes:
a receiving unit 501, configured to receive N data packets through N data transmission connections, and perform aggregation processing on original data in the N data packets to obtain target data; the original data is data to be written into the distributed search engine server;
a data processing unit 502, configured to perform data segmentation processing on target data according to a maximum data throughput of the distributed search engine server, to obtain M data units; the maximum data processing capacity is used for limiting the single data transmission capacity of the data transmission connection of the distributed search server;
a sending unit 503, configured to send M data units through a data transmission connection with the distributed search engine server.
According to the data transmission method and device, when data are received through data transmission connection, the received data packets are gathered and divided, N data transmission connection requests cannot be initiated aiming at the N data packets, and the concurrency of the data transmission connection requests is reduced, so that the processing amount of the distributed search engine server on the data transmission connection requests is reduced, and the problem that the data transmission rate is reduced due to high concurrency of TCP is avoided. In addition, the data size of a single data unit does not exceed the maximum data processing capacity of the distributed search engine server, and the problems that data cannot be written and the data transmission rate is reduced due to TCP burst can also be avoided.
In an embodiment, the data processing unit 502 is configured to parse the N data packets to obtain original data in the N data packets;
and sequentially writing the original data in the N data packets into a preset cache space to obtain target data.
The device provided by the application can be provided with a special cache space to store the target data obtained by converging the plurality of data packets, so that convergence processing and data segmentation processing are separated in time, the frequent convergence processing and data segmentation processing of the data source equipment are avoided, and the processing load of the data source equipment is reduced.
In an embodiment, the data processing unit 502 is configured to add a write completion identifier to a preset cache space after all original data in the N data packets are written into the preset cache space.
According to the device, after the data are written in the special cache space, the write-in completion identification can be added to the cache space, and the complete storage of the data to be sent can be guaranteed. Data aggregation and data division are carried out at intervals, and meanwhile data omission is avoided.
In an embodiment, the data processing unit 502 is configured to periodically poll a preset cache space, determine that the preset cache space is in a write-complete state according to the write-complete identifier, and read the target data in the preset cache space.
When the data source equipment provided by the application carries out data segmentation processing at intervals after caching data, the cache space can be determined to store all data to be sent according to the write-in completion identifier, and data omission is avoided when carrying out data segmentation processing.
In one embodiment, the data processing unit 502 is configured to determine whether the size of the target data exceeds a maximum data throughput;
if the size of the target data exceeds the maximum data processing capacity, performing data segmentation processing on the target data by using a preset division granularity to obtain M data units; the data size corresponding to the preset partition granularity does not exceed the data size corresponding to the maximum data processing amount.
According to the device, when the cached target data exceeds the maximum data processing capacity of the distributed search engine server, the target data needs to be divided according to the maximum data processing capacity of the distributed search engine server, so that the data units obtained by division do not exceed the maximum data processing capacity of the distributed search engine server, the data sent at one time through single data transmission connection cannot be overlarge, TCP (transmission control protocol) emergency can be avoided, the problem that the data cannot be processed and written by the distributed search engine server is avoided, and the data writing efficiency of the distributed search engine server is optimized to a certain extent.
In one embodiment, the data processing unit 502 is configured to prohibit the data splitting process on the target data if the size of the target data does not exceed the maximum data processing amount, and send the target data through the data transmission connection with the distributed search engine server.
In one embodiment, the data processing unit 502 is configured to sequentially send the M data units through a data transmission connection with the distributed search engine server according to a division order of the M data units.
The device provided by the application can also sequentially send a plurality of data units according to a certain sequence, and the distributed search engine server is ensured to correctly recombine data according to the receiving sequence. For example, the specific implementation of sending M data units via a data transmission connection with a distributed search engine server referred to above includes: and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
An embodiment of the present application further provides a computer device, where the computer device may be the data source device described above. The internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device may store configuration information, rights information, and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the steps performed by the server in the method shown in fig. 6 in the embodiment of the present application.
By way of example, the computer program when executed by a processor implements: receiving N data packets through N data transmission connections, and carrying out convergence processing on original data in the N data packets to obtain target data; the original data is data to be written into the distributed search engine server;
performing data segmentation processing on target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; m is less than N;
the M data units are sent over a data transfer connection with the distributed search engine server.
In one embodiment, the computer program when executed by a processor implements: analyzing the N data packets to obtain original data in the N data packets;
and sequentially writing the original data in the N data packets into a preset cache space to obtain target data.
In one embodiment, the computer program when executed by a processor implements: and after all the original data in the N data packets are written into the preset cache space, adding a write-in completion identifier for the preset cache space.
In one embodiment, the computer program when executed by a processor implements: and periodically polling the preset cache space, determining that the preset cache space is in a write-in completion state according to the write-in completion identifier, and reading the target data in the preset cache space.
In one embodiment, the computer program when executed by a processor implements: judging whether the size of the target data exceeds the maximum data processing capacity or not;
if the size of the target data exceeds the maximum data processing capacity, performing data segmentation processing on the target data by using a preset division granularity to obtain M data units; the data size corresponding to the preset partition granularity does not exceed the data size corresponding to the maximum data processing amount.
In one embodiment, the computer program when executed by a processor implements: and if the size of the target data does not exceed the maximum data processing capacity, forbidding data segmentation processing on the target data, and sending the target data through a data transmission connection with the distributed search engine server.
In one embodiment, the computer program when executed by a processor implements: and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of data transmission, comprising:
receiving N data packets through N data transmission connections, and carrying out aggregation processing on original data in the N data packets to obtain target data; the original data is data to be written into a distributed search engine server;
performing data segmentation processing on the target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; said M is less than said N;
and sending the M data units through a data transmission connection between the distributed search engine server and the distributed search engine server.
2. The method according to claim 1, wherein the aggregating the original data in the N data packets to obtain the target data comprises:
analyzing the N data packets to obtain original data in the N data packets;
and sequentially writing the original data in the N data packets into a preset cache space to obtain the target data.
3. The method of claim 2, further comprising:
and after all the original data in the N data packets are written into the preset cache space, adding a write-in completion identifier for the preset cache space.
4. The method of claim 3, further comprising:
and periodically polling the preset cache space, determining that the preset cache space is in a write-in completion state according to the write-in completion identifier, and reading the target data in the preset cache space.
5. The method of claim 1, wherein the performing data segmentation processing on the target data according to the maximum data throughput of the distributed search engine server to obtain M data units comprises:
judging whether the size of the target data exceeds the maximum data processing capacity;
if the size of the target data exceeds the maximum data processing capacity, performing data segmentation processing on the target data by using a preset partition granularity to obtain the M data units; and the data size corresponding to the preset partition granularity does not exceed the data size corresponding to the maximum data processing amount.
6. The method of claim 5, further comprising:
and if the size of the target data does not exceed the maximum data processing capacity, forbidding data segmentation processing on the target data, and sending the target data through a data transmission connection between the target data and the distributed search engine server.
7. The method of claim 1, wherein said sending said M data units over a data transfer connection with said distributed search engine server comprises:
and sequentially sending the M data units through the data transmission connection between the distributed search engine server according to the segmentation sequence of the M data units.
8. A data transmission apparatus, comprising:
the receiving unit is used for receiving N data packets through N data transmission connections, and carrying out convergence processing on original data in the N data packets to obtain target data; the original data is data to be written into a distributed search engine server;
the data processing unit is used for carrying out data segmentation processing on the target data according to the maximum data processing capacity of the distributed search engine server to obtain M data units; the maximum data processing capacity is used for limiting the single data transmission capacity of the data transmission connection of the distributed search server;
and the sending unit is used for sending the M data units through data transmission connection with the distributed search engine server.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of claims 1 to 7.
CN202110577396.6A 2021-05-26 2021-05-26 Data transmission method and device, computer equipment and readable storage medium Pending CN113438274A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110577396.6A CN113438274A (en) 2021-05-26 2021-05-26 Data transmission method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110577396.6A CN113438274A (en) 2021-05-26 2021-05-26 Data transmission method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113438274A true CN113438274A (en) 2021-09-24

Family

ID=77802913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110577396.6A Pending CN113438274A (en) 2021-05-26 2021-05-26 Data transmission method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113438274A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040136A (en) * 2021-11-05 2022-02-11 北京京东乾石科技有限公司 Track inspection device, image processing method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377650A (en) * 2010-08-12 2012-03-14 华为技术有限公司 Data transmission processing method, device and system
CN103701714A (en) * 2013-12-25 2014-04-02 北京奇虎科技有限公司 Page extraction method, server and network system
CN109120687A (en) * 2018-08-09 2019-01-01 深圳市腾讯网络信息技术有限公司 Data packet sending method, device, system, equipment and storage medium
CN109800260A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 High concurrent date storage method, device, computer equipment and storage medium
CN109951255A (en) * 2019-03-27 2019-06-28 深圳市网心科技有限公司 A kind of data transmission method based on TCP, system, source device and target device
US10375192B1 (en) * 2013-03-15 2019-08-06 Viasat, Inc. Faster web browsing using HTTP over an aggregated TCP transport
CN111147573A (en) * 2019-12-24 2020-05-12 网宿科技股份有限公司 Data transmission method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102377650A (en) * 2010-08-12 2012-03-14 华为技术有限公司 Data transmission processing method, device and system
US10375192B1 (en) * 2013-03-15 2019-08-06 Viasat, Inc. Faster web browsing using HTTP over an aggregated TCP transport
CN103701714A (en) * 2013-12-25 2014-04-02 北京奇虎科技有限公司 Page extraction method, server and network system
CN109120687A (en) * 2018-08-09 2019-01-01 深圳市腾讯网络信息技术有限公司 Data packet sending method, device, system, equipment and storage medium
CN109800260A (en) * 2018-12-14 2019-05-24 深圳壹账通智能科技有限公司 High concurrent date storage method, device, computer equipment and storage medium
CN109951255A (en) * 2019-03-27 2019-06-28 深圳市网心科技有限公司 A kind of data transmission method based on TCP, system, source device and target device
CN111147573A (en) * 2019-12-24 2020-05-12 网宿科技股份有限公司 Data transmission method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114040136A (en) * 2021-11-05 2022-02-11 北京京东乾石科技有限公司 Track inspection device, image processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN110659151B (en) Data verification method and device and storage medium
CN106959820B (en) Data extraction method and system
EP3780438B1 (en) Data transmission method and related device
WO2016095149A1 (en) Data compression and storage method and device, and distributed file system
US20240039995A1 (en) Data access system and method, device, and network adapter
CN113742135B (en) Data backup method, device and computer readable storage medium
US9298765B2 (en) Apparatus and method for handling partially inconsistent states among members of a cluster in an erratic storage network
CN113438274A (en) Data transmission method and device, computer equipment and readable storage medium
US7388834B2 (en) System and method for controlling network traffic flow in a multi-processor network
CN111404842B (en) Data transmission method, device and computer storage medium
CN113138969A (en) Data transmission method and device, electronic equipment and computer readable storage medium
CN110908939B (en) Message processing method and device and network chip
CN112559116B (en) Memory migration method and device and computing equipment
CN112380001A (en) Log output method, load balancing device and computer readable storage medium
CN116208615A (en) Network data processing method, processing module, array server and medium
CN116264592A (en) Virtual desktop performance detection method, device, equipment and storage medium
CN110866066B (en) Service processing method and device
WO2022021357A1 (en) File block download method and apparatus
CN107615259A (en) A kind of data processing method and system
CN112395296A (en) Big data archiving method, device, equipment and storage medium
CN110912969A (en) High-speed file transmission source node, destination node device and system
CN105608212B (en) Method and system for ensuring that MapReduce data input fragment contains complete record
CN112671905B (en) Service scheduling method, device and system
CN113468195B (en) Server data cache updating method, system and main database server
US5862332A (en) Method of data passing in parallel computer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210924

RJ01 Rejection of invention patent application after publication