CN110677402B - Data integration method and device based on intelligent network card - Google Patents

Data integration method and device based on intelligent network card Download PDF

Info

Publication number
CN110677402B
CN110677402B CN201910904415.4A CN201910904415A CN110677402B CN 110677402 B CN110677402 B CN 110677402B CN 201910904415 A CN201910904415 A CN 201910904415A CN 110677402 B CN110677402 B CN 110677402B
Authority
CN
China
Prior art keywords
data
network card
intelligent network
node
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910904415.4A
Other languages
Chinese (zh)
Other versions
CN110677402A (en
Inventor
郑琳琳
刘畅
郑文琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201910904415.4A priority Critical patent/CN110677402B/en
Publication of CN110677402A publication Critical patent/CN110677402A/en
Application granted granted Critical
Publication of CN110677402B publication Critical patent/CN110677402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/565Conversion or adaptation of application format or content
    • H04L67/5651Reducing the amount or size of exchanged application data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC

Abstract

The application provides a data integration method and a device based on an intelligent network card, wherein the method comprises the following steps: the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node; the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; the second intelligent network card sends the target data to a second node so that the second node integrates the target data; wherein the first node and the second node are two nodes in a big data compute engine platform. The method can reduce the burden of the CPU, save network resources, reduce communication overhead, and simultaneously, a user can obtain the optimal data compression ratio according to the requirement.

Description

Data integration method and device based on intelligent network card
Technical Field
The application relates to the field of big data, in particular to a data integration method and device based on an intelligent network card.
Background
Spark (calculation engine) is a distributed big data parallel processing platform based on memory calculation, which integrates batch processing, real-time stream processing, interactive query and graph calculation, and avoids resource waste caused by the need of deploying different clusters in various operation scenes.
MapReduce (programmed computing model) is a computing model, framework and platform oriented to large data parallel processing, and serves as a cluster-based high-performance parallel computing platform. The Spark and the MapReduce are integrated to form a large data processing platform with stronger functions. Spark integrates the cluster management and storage platform from the framework, using its cluster management and underlying storage. MapReduce can be effectively used for data types such as log files and static batch processing work, and other processing tasks can be assigned to Spark. The integrated Spark platform MapReduce process involves transmission of a large amount of data, the Spark platform comprises a plurality of working nodes, each working node can process tasks distributed by the Spark platform in parallel, and data transmission in the MapReduce process occupies network resources and causes long time consumption for data synchronization among Spark working nodes. In some technical schemes, data transmitted between nodes is compressed and then transmitted, and then decompressed after reaching a destination node. The method can save network resources and reduce communication overhead. However, the data compression and decompression operation itself needs to be completed by the CPU, which occupies a large amount of CPU time and is very high in cost, so that it is a problem that is urgently needed to solve at present to liberate CPU resources and seek other methods to process data transmitted in the Spark platform.
Disclosure of Invention
The invention provides a data integration method and device based on an intelligent network card, which are applied to a big data computing engine platform and used for compressing data transmitted in the MapReduce process, thereby lightening the burden of a CPU, saving network resources and reducing communication overhead.
In a first aspect, an embodiment of the present application provides a data integration method based on an intelligent network card, where the method includes:
the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node;
the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to a second node so that the second node integrates and processes the target data, wherein the first node and the second node are two nodes in a big data computing engine platform.
Optionally, the compressing, by the first intelligent network card, the data to be transmitted according to the target compression ratio to obtain compressed data, including:
determining the preorder data and the postorder data in the data to be transmitted according to the target compression ratio;
determining a first function rule according to the preamble data; taking the coefficient value of the first function rule as the compressed data, wherein the first function rule is used for reflecting the association relationship between the preorder data and the postorder data;
the first intelligent network card sends the compressed data to the second intelligent network card, and the method comprises the following steps:
sending the preorder data, the first function rule and the compressed data to the second intelligent network card; and enabling the second intelligent network card to determine a predicted value of the subsequent data according to the preamble data, the first function rule and the compressed data, wherein the predicted value of the subsequent data and the preamble data form the target data.
Optionally, the first function rule satisfies the following formula:
Figure BDA0002212853330000021
wherein x is the number of ith data in the preamble data, n is the number of the preamble data, and p i Is a coefficient of the ith data in the preamble data, and P (x) is the ith data in the preamble data; set P = { P = { (P) i The compressed data.
Optionally, determining preamble data in the data to be transmitted according to the target compression ratio includes:
determining the number n of the preamble data according to the following formula:
n=(1-m)*K;
wherein m is the target compression ratio, and K is the number of data in the data to be transmitted.
Optionally, before sending the compressed data to the second intelligent network card, the method further includes:
calculating the compression loss rate of the data to be transmitted;
outputting prompt information based on the compression loss rate, wherein the prompt information is used for prompting a user whether the compression loss rate is met;
sending the compressed data to the second intelligent network card, including:
and if a determination instruction is received, sending the compressed data to the second intelligent network card.
Optionally, the outputting prompt information based on the compression loss rate further includes:
and if a negative instruction is received, prompting a user to reset a target compression ratio, and compressing the data to be transmitted by the first intelligent network card based on the reset target compression ratio.
Optionally, calculating the compression loss rate of the data to be transmitted includes:
when the x is the serial number of the jth subsequent data in the subsequent data, according to the formula
Figure BDA0002212853330000031
Determining a predicted value of the jth subsequent data; calculating the difference value between the predicted value of the jth subsequent data and the true value of the jth subsequent data, and determining the sum of the difference values;
according to the formula
Figure BDA0002212853330000032
Calculating the compression loss rate;
wherein E is the compression loss rate, E (j) is the sum of the differences, and D j Is the jth subsequent data.
In a second aspect, an embodiment of the present application provides a data integration apparatus based on an intelligent network card, where the apparatus includes:
the device comprises an acquisition module, a compression module and a compression module, wherein the acquisition module is used for acquiring data to be transmitted and a target compression ratio;
the processing module is used for compressing the data to be transmitted according to the target compression ratio to obtain compressed data;
the communication module is used for sending the compressed data to a second intelligent network card;
the processing module is further used for decompressing the compressed data to obtain target data;
the communication module is further configured to send the target data to the second node;
the processing module is also used for integrating the target data;
wherein the first node and the second node are two nodes in a big data compute engine platform.
In a third aspect, an embodiment of the present application provides an intelligent network card, including:
a memory for storing program instructions;
a processor for calling the program instructions stored in the memory and executing one or more steps of any of the above methods according to the obtained program instructions.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a computer, cause the computer to perform one or more steps of the intelligent network card-based data integration method provided in the first aspect.
In a fifth aspect, an embodiment of the present application provides a program product, where the program product includes program instructions, and when the program instructions are executed by a computer, the computer executes one or more steps of the data integration method based on the intelligent network card as provided in the first aspect.
The beneficial effect of this application is as follows:
in the technical solution of the embodiment of the present application, a data integration method based on an intelligent network card is provided, the method includes: the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node; the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data; the first intelligent network card sends the compressed data to a second intelligent network card so that the second intelligent network card decompresses the compressed data to obtain target data; the second intelligent network card sends the target data to a second node so that the second node integrates the target data; wherein the first node and the second node are two nodes in a big data compute engine platform. The first node and the second node can be represented by a CPU, and the intelligent network card replaces the CPU to compress data, so that the burden of the CPU is reduced, network resources are saved, and communication overhead is reduced.
Drawings
Fig. 1 is a schematic diagram of a data integration system architecture based on an intelligent network card according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the MapReduce process;
fig. 3 is a schematic flowchart of a data integration method based on an intelligent network card according to an embodiment of the present application;
fig. 4 is a graph of n-order polynomial fitting performed on data to be transmitted with numbers "D0 to D10" according to the embodiment of the present application;
FIG. 5 is a schematic flowchart illustrating a process of obtaining an optimal compression ratio by a user according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data integration device based on an intelligent network card according to an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. The shapes and sizes of the various elements in the drawings are not to be considered as true proportions, but rather are merely intended to illustrate the context of the application.
The data integration system based on the intelligent network card provided by the embodiment of the application is introduced below. The system can be suitable for a big data computing engine platform and is used for compressing and decompressing the transmitted data in the big data computing engine platform. Referring to fig. 1, fig. 1 is a schematic diagram of a data integration system architecture based on an intelligent network card according to an embodiment of the present disclosure, in fig. 1, a first node and a second node of a big data computing engine platform are taken as an example, and one node may be represented by one CPU. The first intelligent network card and the second intelligent network card include: the FPGA chip, the network interface module and the PCle interface; the FPGA chip is used for customizing a compression module and a decompression module through algorithm internal instantiation; the network port module is used for communicating the first intelligent network card and the second intelligent network card through a network link; and the PCle interface is used for the communication between the first intelligent network card and the CPU. It should be understood that the first intelligent network card and the second intelligent network card provided by the present application have the same architecture and the same function, and both can perform the same operation on data in the big data computing engine platform, as shown in fig. 1, the embodiment of the present application only elaborates in detail on the data flow direction of the data to be transmitted from the first network card to the second network card.
When the big data computing engine platform is a Spark platform, taking a MapReduce process of the Spark platform as an example, the MapReduce process is divided into three stages: map, shuffle, reduce. Before the shuffle, that is, in the map phase, mapReduce performs a fragmentation (split) operation on data to be processed, and allocates a MapTask to each fragment. Then, the map () function processes each row of data in each slice to obtain a key value pair (key, value), where key is an offset and value is the content of a row, and the obtained key value pair is also called an "intermediate result"; then, the method enters a shuffle stage, so that it can be seen that the shuffle stage is used for processing an "intermediate result", which means that the "intermediate result" outputted irregularly at the map end is "arranged" into data with a certain rule according to a specified rule so as to be received and processed at the reduce end, and the reduce () function performs partition combination on the data after the reduce end receives the data with the certain rule.
For example, as shown in fig. 2, fig. 2 is a schematic diagram of a MapReduce process; the method comprises the steps that a Spark platform reads an HDFS file, a map terminal slices data in the HDFS file into three regions, each region has three types of data, the data of each region are dispersed in a shuffle process, the data are classified according to data types and are transmitted to a reduce terminal after the classification is finished, the reduce terminal combines the classified data to obtain three regions, and one region is provided with one type of data.
The Spark platform comprises a plurality of working nodes, which are referred to as nodes for short hereinafter; taking the first node and the second node as an example, it is a possible case that the first node is connected with a first intelligent network card and the second node is connected with a second intelligent network card. After the Spark platform acquires the HDFS file, the first node sends data to be transmitted in the HDFS file to a first intelligent network card, and the first network card acquires a target compression rate set by a user and compresses the data to be transmitted according to the target compression rate to obtain compressed data; the first intelligent network card sends the compressed data to the second intelligent network card, and the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to the second node so that the second node integrates and processes the target data, wherein the processing process of the target data by the second node can be shuffle processing and reduce processing after map.
Another possible scenario is that, taking a first node and a second node in the Spark platform as an example, the first node is connected to a first intelligent network card, and the second node is connected to a second intelligent network card. After the Spark platform acquires the HDFS file, the first node processes data to be transmitted in the HDFS file, the data to be transmitted is subjected to map processing and shuffle processing in the MapReduce process, then the first node sends the data to be transmitted to the first intelligent network card, and the first intelligent network card compresses the data to be transmitted according to the target compression rate ratio to obtain compressed data; the first intelligent network card sends the compressed data to the second intelligent network card, and the second intelligent network card decompresses the compressed data to obtain target data; and the second intelligent network card sends the target data to the second node so that the second node integrates and processes the data to be transmitted. The second node specifically integrates the target data, and the integration processing of the second node may be reduce processing in a MapReduce process, that is, the target data is classified and merged.
Referring to fig. 3, fig. 3 is a schematic flow chart of a data integration method based on an intelligent network card according to an embodiment of the present disclosure, where the method may apply a Spark platform, may process data transmitted in a MapReduce process in the Spark platform, and may also process data transmitted between nodes in the Spark platform, and the embodiment of the present disclosure is not limited in particular. The Spark platform comprises a first node and a second node; the first node is connected with a first intelligent network card, and the second node is connected with a second intelligent network card. The method comprises the following steps:
s301: the first intelligent network card acquires data to be transmitted and a target compression ratio;
as a new technology, the intelligent network card is originally designed to support various virtualization functions at a much lower cost than a common CPU, assist the CPU in processing network loads, and has a programmable network interface, usually comprising a plurality of ports and an internal switch, forwarding data at a higher speed than the speed, and intelligently mapping the data to related applications based on network data packets, application sockets and the like; network traffic is detected and managed. In addition, the network card is used as a first pass gateway for the data stream to enter and exit, and can also realize monitoring and sniffing so as to avoid network attack and realize the effect of safety isolation.
At present, several mainstream intelligent network card architectures are different, and can be roughly divided into three types, namely an Application Specific Integrated Circuit (ASIC) network card, a Programmable Gate Array (FPGA) network card, and a System On Chip (SOC) network card. An intelligent network card (such as a malllanox connectix-5 series) with an ASIC framework is low in cost and excellent in performance, the intelligent network card of the type generally has a programmable interface, but because processing logic is solidified on the ASIC, the flexible space for control is small; compared with the FPGA-based intelligent network card (such as the napath NT100E3-1-PTP series), the flexibility is higher, but the cost is slightly higher and the programming difficulty is higher; the SOC architecture contains a dedicated CPU (e.g., the mellonox blue field family) that provides a balance of performance and controllability, and self-developed network cards from various vendors are typically used with this architecture.
Therefore, the embodiment of the application adopts an intelligent network card technology based on the FPGA, and the compression and decompression module customized by instantiation in the FPGA chip replaces the data processing of the CPU for the MapReduce transmission process in the Spark platform.
Example 1, after a spark platform acquires an HDFS (distributed file) file, a first intelligent network card directly acquires data to be transmitted in a first node and a target compression ratio set by a user through a PCle interface.
Example 2, after a spark platform acquires an HDFS (distributed file) file, the first node performs map processing and shuffle processing on data to be transmitted in the HDFS file, and the first intelligent network card acquires the data to be transmitted processed by the first node and a target compression ratio set by a user through a PCle interface.
S302: and the first intelligent network card compresses the data to be transmitted according to the target compression ratio to obtain compressed data. Optionally, the compressing, by the first intelligent network card, the data to be transmitted according to the target compression ratio to obtain compressed data, including: the first intelligent network card determines the preorder data and the postorder data in the data to be transmitted according to the target compression ratio; the first intelligent network card determines a first function rule according to the preorder data; taking the coefficient value of the first function rule as the compressed data, wherein the first function rule is used for reflecting the incidence relation between the preorder data and the postorder data;
the first intelligent network card sends the preorder data, the first function rule and the compressed data to the second intelligent network card; and enabling the second intelligent network card to determine a predicted value of the subsequent data according to the preamble data, the first function rule and the compressed data, wherein the predicted value of the subsequent data and the preamble data form the target data.
For example, assuming that there are 10 pieces of data to be transmitted, the target compression ratio input by the user is 40%, the first network card divides the data to be transmitted into preceding data and subsequent data, where there are 6 preceding data and 4 subsequent data, and curve fitting is performed on the 6 data to obtain a function law: y = a + bx, and sets polynomial coefficients { a, b } as compressed data; and sending the 6 pieces of preorder data, the compressed data and the function rule y = a + bx to a second intelligent network card, predicting 4 pieces of subsequent data by using a formula y = a + bx to respectively obtain a predicted value of each piece of subsequent data, and forming target data, namely decompressed data, by the predicted values of the 4 pieces of subsequent data and the 6 pieces of preorder data.
Optionally, the first function rule satisfies the following formula:
Figure BDA0002212853330000091
wherein x is the number of ith data in the preamble data, n is the number of the preamble data, and p i Is a coefficient of the ith data in the preamble data, and P (x) is the ith data in the preamble data; set P = { P = { (P) i And is the compressed data.
Optionally, the determining, by the first intelligent network card, the preamble data in the data to be transmitted according to the target compression ratio includes: determining the number n of preamble data according to the following formula:
n=(1-m)*K;
wherein m is the target compression ratio, and K is the number of data in the data to be transmitted.
For example, assuming that there are 10 pieces of data to be transmitted, the target compression ratio m is 60%, that is, 60% of the data to be transmitted is ignored during the compression process, and the ignored data is subsequent data, then the number n of the preamble data is 4.
It should be understood that, when the target compression ratio set by the user is lower, the larger the number n of the preamble data is, the smaller the compression error is; the higher the target compression ratio set by the user is, the smaller the number n of the preamble data is, the larger the compression error is.
Optionally, before sending the compressed data to the second intelligent network card, the method further includes: the first intelligent network card calculates the compression loss rate of the data to be transmitted; the first node outputs prompt information based on the compression loss rate, wherein the prompt information is used for prompting a user whether the compression loss rate is met;
the first intelligent network card sends the compressed data to the second intelligent network card, and the method comprises the following steps: and if the first node receives the determination instruction, the first node sends the compressed data to the second intelligent network card.
Optionally, the first node outputs prompt information based on the compression loss rate, and further includes:
and if the first node receives a negative instruction, prompting a user to reset a target compression ratio, and compressing the data to be transmitted by the first intelligent network card based on the reset target compression ratio.
Optionally, the calculating, by the first intelligent network card, a compression loss rate of the data to be transmitted includes:
when x is the jth subsequent data in the subsequent data, the first intelligent network card is according to the formula
Figure BDA0002212853330000101
Determining a predicted value of the jth subsequent data; the first intelligent network card calculates the difference value between the predicted value of the jth subsequent data and the true value of the jth subsequent data, and determines the sum of the difference values;
the first intelligent network card is according to formula
Figure BDA0002212853330000102
Calculating the compression loss rate;
wherein E is the compression loss ratio, E (j) is the sum of the difference values, and D j Is the jth subsequent data.
For example, with the first node and the second node in the Spark platform, assuming that there are 10 data in the data to be transmitted, the target compression ratio input by the user for the first time is 60%, and each data in the 10 data is numbered "D0-D9" in sequence; dividing the 10 data into 4 pieces of preorder data and subsequent data according to a target compression ratio input by a user for the first time, wherein the preorder data are numbered as D0-D3, the subsequent data are numbered as 6, and the subsequent data are numbered as D4-D9; carrying out n-order polynomial fitting on the preorder data with the serial numbers of D0-D3;
referring to fig. 4, fig. 4 is a graph of n-order polynomial fitting according to data numbered as "D0-D9" provided in this embodiment of the present application, and a first function rule is obtained by fitting a first half of a curve according to data numbered as "D0-D3", where the first function rule satisfies the following formula
Figure BDA0002212853330000103
Wherein x is the number of the ith data in the preamble data, n is the number of the preamble data, and p i The coefficient of the ith data in the preamble data is also the polynomial coefficient of the first function rule; polynomial coefficient set P = { P) for saving first function rule i As compressed data.
And before the first intelligent network card sends the compressed data to the second intelligent network card, the first intelligent network card predicts the subsequent data according to the first function rule to obtain a predicted value of the subsequent data. That is, when x is the number of the jth data in the subsequent data, the formula is used
Figure BDA0002212853330000111
Predicting the subsequent data with the numbers of D4-D9 to obtain the predicted value of the subsequent data with the numbers of D4-D9, fitting the latter half part of the curve, wherein as shown in FIG. 4, the original point on the latter half part of the curve is the predicted value of the subsequent data, the small square frame in FIG. 4 is the true value of the subsequent data, the difference value between the true value of the subsequent data and the predicted value of the subsequent data can be calculated according to the difference value, and the sum of the difference values is further calculated; the first intelligent network card calculates the compression loss rate of the 10 data, wherein the compression loss rate is mainly calculated by the loss of subsequent data, and the method specifically comprises the following steps: the first intelligent network card calculates the predicted value and the predicted value of the subsequent data with the serial number of D4-D9The difference values of the true values of the subsequent data numbered as D4-D9 are added to obtain the sum of errors; the first intelligent network card is based on the formula
Figure BDA0002212853330000112
Calculating the compression loss rate of the 10 data; wherein E is the compression loss rate, E (j) is the sum of the difference values, and D j Is the data number of the jth subsequent data in the subsequent data.
After obtaining the compression loss rates of the 10 pieces of data, the first node may further generate a prompt message according to the compression loss rate, where the prompt message may be a dialog box "whether to accept the compression loss rate, and if the next step is accepted, if the next step is not accepted, the target compression rate is reset; and giving a determining instruction or a negating instruction by the user according to the prompt message. If the first intelligent network card receives a user determination instruction transmitted from the first node, the compressed data is sent to the second intelligent network card; if the first intelligent network card receives a user negative instruction transmitted from the first node, the first intelligent network card outputs information through the first node to prompt a user to reset a target compression ratio, the first intelligent network card compresses the data to be transmitted based on the target compression ratio reset by the user, the steps are repeated until an error loss rate accepted by the user is obtained, and the user takes the target compression ratio set by the final user as an optimal compression ratio. Detailed flow is shown in fig. 5, and fig. 5 is a flow chart of obtaining an optimal compression ratio by a user according to an embodiment of the present application, and the user may set a target compression ratio multiple times according to the acceptance degree of the compression loss ratio until the compression loss ratio can be accepted.
It should be understood that different types of data that different users need to process are different, and tolerance degrees of different users to data compression loss rates are also different, so in the data integration method based on the intelligent network card provided in the embodiment of the present application, a user may select a target compression ratio according to actual needs, and further obtain a compression loss rate that the user can accept, and after determining a target compression ratio that the user finally determines, that is, after receiving a user determination instruction, the first intelligent network card compresses data to be transmitted according to the target compression ratio in an actual operating environment.
S303: the first intelligent network card sends the compressed data to the second intelligent network card;
as shown in fig. 1, the network port module of the first intelligent network card in fig. 1 communicates through the network port module of the second network card in the network link; the first intelligent network card sends the compressed data to the second intelligent network card through the network port module, and simultaneously, the first intelligent network card also sends the preorder data in the data to be transmitted to the second intelligent network card through a network link.
S304: the second intelligent network card decompresses the compressed data to obtain target data;
illustratively, after the second intelligent network card receives the compressed data and the pre-order data in the data to be transmitted, a decompression module in an FPGA chip in the second intelligent network card decompresses the compressed data according to a built-in algorithm and the pre-order data to obtain the target data.
S305: the second intelligent network card sends the target data to the second node;
for example, as shown in fig. 1, the second intelligent network card sends the data to be transmitted to the second node through the PCle interface.
S306: and the second node integrates the target data.
Illustratively, the second node may be a CPU, and the CPU performs a series of processing on the data to be transmitted obtained after decompression according to a built-in algorithm of a Spark platform, where the processing may be map processing in a MapReduce process, shuffle processing in an intermediate process, or reduce processing, and the embodiment of the present application is not limited specifically.
The full embodiments are described below.
Taking two nodes in Spark as an example, after a Spark platform acquires an HDFS (distributed file) file, a first intelligent network card directly acquires a to-be-transmitted file in a first node through a PCle interfaceAssuming that 2000 data exist in the data to be transmitted and 2000 data exist in the data to be transmitted, numbering each data in the 2000 data as 'D0-D1999' in sequence; the target compression ratio input by the user for the first time is 40%, the first network card divides the data to be transmitted into preorder data and subsequent data, wherein the preorder data number is 1200, the preorder data number is 'D0-D799', the subsequent data number is 800, the subsequent data number is 'D800-D1999', and curve fitting is performed on the 800 data to obtain a function rule
Figure BDA0002212853330000131
Polynomial coefficient p 0 ,…,p 799 Set as compressed data;
and before the first intelligent network card sends the compressed data to the second intelligent network card, the first intelligent network card predicts the subsequent data according to the first function rule to obtain a predicted value of the subsequent data. That is, when x is the number of the jth data in the subsequent data, the formula is used
Figure BDA0002212853330000132
Predicting the subsequent data with the serial number of 'D800-D1999', obtaining the predicted value of the subsequent data with the serial number of 'D800-D1999', calculating the difference value between the real value of the subsequent data and the predicted value of the subsequent data according to the predicted value, and further calculating the sum of the difference values; the first intelligent network card calculates the compression loss rate of the 2000 data, and the calculation mainly uses the loss of the subsequent data, and specifically includes: the first intelligent network card calculates the difference between the predicted value of the subsequent data with the serial number of D800-D1999 and the true value of the subsequent data with the serial number of D800-D1999, and 1200 differences are added to obtain the sum of errors; the first intelligent network card is based on the formula
Figure BDA0002212853330000133
Calculating a compression loss rate of the 2000 data; wherein E is the compression loss rate, E (j) is the sum of the difference values, and D j Is the said subsequent dataThe data number of the jth subsequent data in (b).
After the compression loss rates of the 2000 pieces of data are obtained, the first node may further generate a prompt message according to the compression loss rate, where the prompt message may be a dialog box "whether to accept the compression loss rate, and if the next step is accepted, if the previous step is not accepted, the target compression rate is reset"; and the user gives a determination instruction or a negative instruction according to the prompt information.
If the first intelligent network card receives a user determination instruction transmitted from the first node, the compressed data is sent to the second intelligent network card; if the first intelligent network card receives a user negative instruction transmitted from the first node, the first intelligent network card outputs information through the first node to prompt a user to reset a target compression ratio, the first intelligent network card compresses the data to be transmitted based on the target compression ratio reset by the user, the steps are repeated until an error loss rate accepted by the user is obtained, and the user takes the target compression ratio set by the final user as an optimal compression ratio.
The first intelligent network card converts the compressed data, the preorder data with the serial number of D0-D799 and a formula
Figure BDA0002212853330000141
The compressed data is sent to the second intelligent network card, and the second intelligent network card decompresses the compressed data, and the method specifically comprises the following steps:
mixing 800 preamble data, polynomial coefficient p 0 ,…,p 799 Set and
Figure BDA0002212853330000142
sending the information to a second intelligent network card; second intelligent network card utilizing formula
Figure BDA0002212853330000143
And predicting 1200 subsequent data to respectively obtain a predicted value of each subsequent data, wherein the predicted values of 800 subsequent data and 1200 preceding data form target data, namely decompressed data.
The second intelligent network card sends the target data to the second node through the PCle interface, and the second node performs reduce processing or map processing on the target data.
It should be noted that, the time spent for compressing and decompressing the transmission data in the MapReduce process of the Spark platform by the intelligent network card is less than the time saved by compressing the data, so that compressing the data not only frees up CPU resources, but also does not occupy the time for processing the data by the CPU.
Based on the same inventive concept, the embodiment of the invention provides a data integration device based on an intelligent network card, which is applied to a Spark platform, and takes a first node and a second node of the Spark platform as an example, wherein the first node is connected with a first intelligent network card, and the second node is connected with a second intelligent network card. Referring to fig. 6, fig. 6 is a schematic structural diagram of data integration based on an intelligent network card according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes an obtaining module 601, a processing module 602, and a communication module 603.
An obtaining module 601, configured to obtain data to be transmitted and a target compression ratio;
the processing module 602 is configured to compress the data to be transmitted according to the target compression ratio to obtain compressed data;
the communication module 603 is configured to send the compressed data to a second intelligent network card;
the processing module 602 is further configured to decompress the compressed data to obtain target data;
the communication module 603 is further configured to send the target data to the second node;
the processing module 602 is further configured to perform integration processing on the target data;
wherein the first node and the second node are two nodes in a Spark platform.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (9)

1. A data integration method based on an intelligent network card is characterized by comprising the following steps:
the first intelligent network card acquires data to be transmitted and a target compression ratio from a first node;
the first intelligent network card determines the preorder data and the postorder data in the data to be transmitted according to the target compression ratio; determining a first function rule according to the preamble data; taking the coefficient value of the first function rule as compressed data, wherein the first function rule is used for reflecting the incidence relation between the preorder data and the postorder data;
the first intelligent network card sends the preorder data, the first function rule and the compressed data to a second intelligent network card, so that the second intelligent network card determines a predicted value of the subsequent data according to the preorder data, the first function rule and the compressed data, and the predicted value of the subsequent data and the preorder data form target data; and the second intelligent network card sends the target data to a second node so that the second node integrates the target data, wherein the first node and the second node are two nodes in a big data computing engine platform.
2. The method of claim 1, wherein the first function law satisfies the following equation:
Figure FDA0003900849320000011
wherein x is the number of the ith data in the preamble data, n is the number of the preamble data, and p i Is a coefficient of the ith data in the preamble data, and P (x) is the ith data in the preamble data; collectionAnd P = { P = { (P) i And is the compressed data.
3. The method of claim 1, wherein determining preamble data in the data to be transmitted according to a target compression ratio comprises:
determining the number n of preamble data according to the following formula:
n=(1-m)*K;
wherein m is the target compression ratio, and K is the number of data in the data to be transmitted.
4. The method of claim 2, wherein prior to sending the compressed data to the second smart network card, the method further comprises:
calculating the compression loss rate of the data to be transmitted;
outputting prompt information based on the compression loss rate, wherein the prompt information is used for prompting a user whether the compression loss rate is met;
sending the compressed data to the second intelligent network card, including:
and if a determination instruction is received, sending the compressed data to the second intelligent network card.
5. The method of claim 4, wherein outputting a hint information based on the compression loss rate further comprises:
and if a negative instruction is received, prompting a user to reset a target compression ratio, and compressing the data to be transmitted by the first intelligent network card based on the reset target compression ratio.
6. The method of claim 4, wherein calculating the compression loss rate of the data to be transmitted comprises:
when the x is the serial number of the jth subsequent data in the subsequent data, according to the formula
Figure FDA0003900849320000021
Determining a predicted value of the jth subsequent data; calculating the difference value between the predicted value of the jth subsequent data and the true value of the jth subsequent data, and determining the sum of the difference values;
according to the formula
Figure FDA0003900849320000022
Calculating the compression loss rate;
wherein E is the compression loss ratio, E (j) is the sum of the difference values, and D j Is the jth subsequent data.
7. The utility model provides a data integration device based on intelligent network card which characterized in that is applied to first intelligent network card, the device includes:
an obtaining module, configured to obtain data to be transmitted and a target compression ratio from a first node;
the processing module is used for determining the preorder data and the postorder data in the data to be transmitted according to the target compression ratio; determining a first function rule according to the preamble data; taking the coefficient value of the first function rule as compressed data, wherein the first function rule is used for reflecting the incidence relation between the preorder data and the postorder data;
the communication module is used for sending the preorder data, the first function rule and the compressed data to a second intelligent network card so as to enable the second intelligent network card to determine a predicted value of the subsequent data according to the preorder data, the first function rule and the compressed data, the predicted value of the subsequent data and the preorder data form target data, and the target data are sent to a second node so as to enable the second node to integrate the target data;
wherein the first node and the second node are two nodes in a big data compute engine platform.
8. An intelligent network card, comprising: memory, processor and program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the intelligent network card based data integration method according to any of claims 1 to 6.
9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the method according to any one of claims 1-6.
CN201910904415.4A 2019-09-24 2019-09-24 Data integration method and device based on intelligent network card Active CN110677402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910904415.4A CN110677402B (en) 2019-09-24 2019-09-24 Data integration method and device based on intelligent network card

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910904415.4A CN110677402B (en) 2019-09-24 2019-09-24 Data integration method and device based on intelligent network card

Publications (2)

Publication Number Publication Date
CN110677402A CN110677402A (en) 2020-01-10
CN110677402B true CN110677402B (en) 2022-12-20

Family

ID=69078617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910904415.4A Active CN110677402B (en) 2019-09-24 2019-09-24 Data integration method and device based on intelligent network card

Country Status (1)

Country Link
CN (1) CN110677402B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113778320A (en) * 2020-06-09 2021-12-10 华为技术有限公司 Network card and method for processing data by network card
CN113438219B (en) 2020-07-08 2023-06-02 支付宝(杭州)信息技术有限公司 Playback transaction identification method and device based on blockchain all-in-one machine
CN113726875A (en) 2020-07-08 2021-11-30 支付宝(杭州)信息技术有限公司 Transaction processing method and device based on block chain all-in-one machine
CN111541783B (en) 2020-07-08 2020-10-20 支付宝(杭州)信息技术有限公司 Transaction forwarding method and device based on block chain all-in-one machine
CN111541789A (en) * 2020-07-08 2020-08-14 支付宝(杭州)信息技术有限公司 Data synchronization method and device based on block chain all-in-one machine
CN111539829B (en) 2020-07-08 2020-12-29 支付宝(杭州)信息技术有限公司 To-be-filtered transaction identification method and device based on block chain all-in-one machine
CN112596669A (en) * 2020-11-25 2021-04-02 新华三云计算技术有限公司 Data processing method and device based on distributed storage

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020205A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Compression and decompression method based on hardware accelerator card on distributive-type file system
CN106713394A (en) * 2015-11-16 2017-05-24 华为技术有限公司 Data transmission method and device
CN110177083A (en) * 2019-04-26 2019-08-27 阿里巴巴集团控股有限公司 A kind of network interface card, data transmission/method of reseptance and equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020205A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Compression and decompression method based on hardware accelerator card on distributive-type file system
CN106713394A (en) * 2015-11-16 2017-05-24 华为技术有限公司 Data transmission method and device
CN110177083A (en) * 2019-04-26 2019-08-27 阿里巴巴集团控股有限公司 A kind of network interface card, data transmission/method of reseptance and equipment

Also Published As

Publication number Publication date
CN110677402A (en) 2020-01-10

Similar Documents

Publication Publication Date Title
CN110677402B (en) Data integration method and device based on intelligent network card
US11429852B2 (en) Convolution acceleration and computing processing method and apparatus, electronic device, and storage medium
CN106325967B (en) A kind of hardware-accelerated method, compiler and equipment
US20030023830A1 (en) Method and system for encoding instructions for a VLIW that reduces instruction memory requirements
CN107391317A (en) A kind of method, apparatus of data recovery, equipment and computer-readable recording medium
US9710876B2 (en) Graph-based application programming interface architectures with equivalency classes for enhanced image processing parallelism
US9542461B2 (en) Enhancing performance of extract, transform, and load (ETL) jobs
CN108012156A (en) A kind of method for processing video frequency and control platform
CN114238187B (en) FPGA-based full-stack network card task processing system
CN111931917A (en) Forward computing implementation method and device, storage medium and electronic device
CN110855638A (en) Remote sensing satellite data decompression processing system and method based on cloud computing
EP2884398A1 (en) Data forwarding device, data forwarding method, and program
CN111164583B (en) Runtime optimization of configurable hardware
CN111680016A (en) Distributed server cluster log data processing method, device and system
CN103997648A (en) System and method for achieving decompression of JPEG2000 standard images rapidly based on DSPs
CN112771546A (en) Operation accelerator and compression method
CN110502337B (en) Optimization system for shuffling stage in Hadoop MapReduce
US20230029796A1 (en) Stateful service processing method and apparatus
CN113126958B (en) Decision scheduling customization method and system based on information flow
CN112486895B (en) FPGA chip and interconnection control method thereof
CN113934767A (en) Data processing method and device, computer equipment and storage medium
Sierra et al. High-Performance Decoding of Variable-Length Memory Data Packets for FPGA Stream Processing
CN109240978B (en) FPGA system and equipment for building acceleration platform and acceleration platform
CN113535637B (en) Operation acceleration unit and operation method thereof
CN111030844B (en) Method and device for establishing flow processing framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant