CN113626405A - HDFS network data transmission optimization method, system, terminal and storage medium - Google Patents

HDFS network data transmission optimization method, system, terminal and storage medium Download PDF

Info

Publication number
CN113626405A
CN113626405A CN202110779799.9A CN202110779799A CN113626405A CN 113626405 A CN113626405 A CN 113626405A CN 202110779799 A CN202110779799 A CN 202110779799A CN 113626405 A CN113626405 A CN 113626405A
Authority
CN
China
Prior art keywords
data
sent
checksum
file
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110779799.9A
Other languages
Chinese (zh)
Inventor
贾涛
王帅阳
李文鹏
李朝阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinan Inspur Data Technology Co Ltd
Original Assignee
Jinan Inspur Data Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinan Inspur Data Technology Co Ltd filed Critical Jinan Inspur Data Technology Co Ltd
Priority to CN202110779799.9A priority Critical patent/CN113626405A/en
Publication of CN113626405A publication Critical patent/CN113626405A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used

Abstract

The invention provides a method, a system, a terminal and a storage medium for optimizing HDFS network data transmission, which comprises the following steps: a sending end reads data to be sent and calculates the checksum of the data to be sent; packaging the data to be sent and the check sum into a file, and recording the position information of the data to be sent and the check sum into a file header; and sending the file to a receiving end, wherein after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is stored in a check-up and off-disk manner. According to the invention, the data checksum is dynamically generated without dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the magnetic disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.

Description

HDFS network data transmission optimization method, system, terminal and storage medium
Technical Field
The invention relates to the technical field of servers, in particular to a method, a system, a terminal and a storage medium for optimizing HDFS (Hadoop distributed File System) network data transmission.
Background
Hadoop Distributed File System (HDFS) refers to a Distributed File System (Distributed File System) designed to fit on general purpose hardware (comfort hardware). It has many similarities with existing distributed file systems. But at the same time, its distinction from other distributed file systems is also clear. HDFS is a highly fault tolerant system suitable for deployment on inexpensive machines. HDFS provides high throughput data access and is well suited for application on large-scale data sets. HDFS relaxes a portion of the POSIX constraints to achieve the goal of streaming file system data. HDFS was originally developed as an infrastructure for the Apache Nutch search engine project. HDFS is part of the Apache Hadoop Core project. HDFS is characterized by high fault-tolerance (fault-tolerant) and is designed to be deployed on inexpensive (low-cost) hardware. And it provides high throughput (high throughput) to access data of applications, suitable for applications with very large data sets. HDFS relaxes the requirements (requirements) of (relax) POSIX so that streaming access to data in the file system can be achieved.
The data transmission of the HDFS is mainly divided into three categories, namely, data transmission from a client to a server and data transmission from the data node to the client and data transmission between the data nodes. In order to ensure the integrity and accuracy of data in the transmission process, the data transmission of the HDFS mostly adopts a checksum mode to check the transmitted data. In the current HDFS, a checksum (checksum) is calculated when a client uploads a server DataNode and is uploaded together with data, and the DataNode performs data checksum and checksum off-the-shelf processing after receiving the checksum data; when the client reads data from the DataNode, the data and the checksum are read from the local disk, checked and then sent to the client. This presents several problems: firstly, the checksum data occupies a disk space, in a commonly used checksum algorithm, the checksum data occupies about 1% of the total length of the data, and the occupied storage space is quite large when large data is stored; secondly, the checksum is stored in the disk, if the checksum data is abnormally damaged, the real data cannot pass the verification, and the client cannot normally read the actual data; and thirdly, for the three data transmission scenes, the transmission and processing schemes are different.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, the present invention provides a method, a system, a terminal and a storage medium for optimizing data transmission in an HDFS network, so as to solve the above-mentioned technical problems.
In a first aspect, the present invention provides a method for optimizing data transmission in an HDFS network, including:
a sending end reads data to be sent and calculates the checksum of the data to be sent;
packaging the data to be sent and the check sum into a file, and recording the position information of the data to be sent and the check sum into a file header;
and sending the file to a receiving end, wherein after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is stored in a check-up and off-disk manner.
Further, the step of reading the data to be sent and calculating the checksum of the data to be sent by the sending end includes:
and the sending end reads the data to be sent from the disk, and calculates the checksum of the data to be sent in the memory according to the checksum type and the length of the data to be sent.
The customization degree of the storage system is improved by presetting the check sum type, so that the check result is more accurate. The checksum calculation efficiency is greatly improved by calculating the checksum of the data to be transmitted in the memory. In-memory computing is to embed a computing unit into a memory, such that the memory is a memory or a computer, and it is not necessary to read data from the memory, and the data is directly entered into and exited from the CPU. Not only is not limited by the performance of the memory, but also the efficiency ratio (the ratio of the efficiency of energy conversion) is improved.
Further, encapsulating the data to be transmitted and the checksum to a file, and recording the position information of the data to be transmitted and the checksum to a file header, including:
calculating the length of a header, the initial position of a check sum, the length of the check sum data, the initial position of the data to be sent and the length of the data to be sent according to the length of the data to be sent and the type of the check sum;
storing the data to be transmitted and the checksum to corresponding positions of the file according to the initial position of the data to be transmitted and the initial position of the checksum respectively;
and saving the checksum starting position, the checksum data length, the starting position of the data to be sent and the length of the data to be sent to a file header.
By storing the checksum starting position, the checksum data length, the data starting position to be sent and the data length to be sent into the header of the file, after the data is sent to the receiving end, the receiving end can quickly extract the data and the checksum according to the information, the data processing logic of the receiving end is simplified, and the data processing efficiency is improved.
Further, the method for analyzing the data to be sent and checking the data to be sent for off-disk storage by the receiving end includes:
the receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file;
Verifying the data to be transmitted according to a checksum;
and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
The receiving end quickly analyzes the file, only the received data is landed, and the check sum data is directly abandoned, so that a large amount of storage resources are saved in a big data storage system.
In a second aspect, the present invention provides an HDFS network data transmission optimization system, including:
the data calculation unit is used for reading the data to be sent by the sending end and calculating the checksum of the data to be sent;
the information storage unit is used for packaging the data to be sent and the check sum into a file and recording the position information of the data to be sent and the check sum into a file header;
and the data sending unit is used for sending the file to a receiving end, and after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is checked and stored in a falling disk mode.
Further, the data calculation unit includes:
and the check calculation module is used for reading the data to be sent from the disk by the sending end and calculating the check sum of the data to be sent in the memory according to the type of the check sum and the length of the data to be sent.
Further, the information storage unit includes:
The information calculation module is used for calculating the length of the header, the initial position of the checksum, the length of the checksum data, the initial position of the data to be sent and the length of the data to be sent according to the length and the type of the checksum of the data to be sent;
the data storage module is used for storing the data to be sent and the checksum to corresponding positions of the file according to the initial position of the data to be sent and the initial position of the checksum respectively;
and the information storage module is used for storing the checksum initial position, the checksum data length, the initial position of the data to be sent and the length of the data to be sent to a file header.
Further, the method for analyzing the data to be sent and checking the data to be sent for off-disk storage by the receiving end includes:
the receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file;
verifying the data to be transmitted according to a checksum;
and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
the memory is used for storing a computer program which,
The processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.
In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
The beneficial effect of the invention is that,
the HDFS network data transmission optimization method calculates the checksum for the data to be transmitted before transmitting the data, transmits the data to be transmitted and the checksum to the receiving end, analyzes the received file by the receiving end, verifies the data, and then drops the data, wherein the checksum does not drop any more. According to the invention, the data checksum is dynamically generated without dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the magnetic disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
According to the HDFS network data transmission optimization system provided by the invention, the data calculation unit calculates the checksum for the data to be transmitted before transmitting the data, the data transmission unit transmits the data to be transmitted and the checksum to the receiving end after information packaging is carried out through the information storage unit, the receiving end analyzes the received file and then checks the data, and then the data is landed, and the checksum is not landed any more. According to the invention, the data checksum is dynamically generated without dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the magnetic disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
The terminal provided by the invention comprises the processor, and the operation of the processor can realize the HDFS network data transmission optimization method, and the data checksum is dynamically generated by not dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
The storage medium provided by the invention stores a program for executing the HDFS network data transmission optimization method, and the data checksum is dynamically generated by not dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from a disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following explains key terms appearing in the present invention.
Checksum (checksum), in the field of data processing and data communication, is used to verify the sum of a set of data items at a destination. It is usually expressed in hexadecimal form as a numerical system. If the checksum value exceeds hexadecimal FF, i.e., 255, its complement is required as the checksum. Are commonly used to ensure data integrity and accuracy in communications, particularly over long distances.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be an HDFS network data transmission optimization system.
As shown in fig. 1, the method includes:
step 110, a sending end reads data to be sent and calculates a checksum of the data to be sent;
step 120, packaging the data to be sent and the checksum to a file, and recording the position information of the data to be sent and the checksum to a file header;
step 130, sending the file to a receiving end, wherein after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is checked and stored in a disk.
In order to facilitate understanding of the present invention, the HDFS network data transmission optimization method provided by the present invention is further described below with reference to the principle of the HDFS network data transmission optimization method of the present invention and the process of optimizing HDFS network data transmission in the embodiments.
Specifically, the method for optimizing data transmission of the HDFS network includes:
and S1, the sending end reads the data to be sent and calculates the checksum of the data to be sent.
And the sending end reads the data to be sent from the disk, and calculates the checksum of the data to be sent in the memory according to the checksum type and the length of the data to be sent.
The customization degree of the storage system is improved by presetting the check sum type, so that the check result is more accurate. The checksum calculation efficiency is greatly improved by calculating the checksum of the data to be transmitted in the memory. In-memory computing is to embed a computing unit into a memory, such that the memory is a memory or a computer, and it is not necessary to read data from the memory, and the data is directly entered into and exited from the CPU. Not only is not limited by the performance of the memory, but also the efficiency ratio (the ratio of the efficiency of energy conversion) is improved.
And S2, packaging the data to be transmitted and the checksum to a file, and recording the position information of the data to be transmitted and the checksum to a file header.
Calculating the length of a header, the initial position of a check sum, the length of the check sum data, the initial position of the data to be sent and the length of the data to be sent according to the length of the data to be sent and the type of the check sum; storing the data to be transmitted and the checksum to corresponding positions of the file according to the initial position of the data to be transmitted and the initial position of the checksum respectively; and saving the checksum starting position, the checksum data length, the starting position of the data to be sent and the length of the data to be sent to a file header.
The data sending end reads data to be sent from a disk, calculates the Header length, the checksum starting position, the checksum data length, the data starting position and the data length according to the data length and the checksum type, and records the information into the Header. And storing the checksum to the corresponding position of the file according to the starting position of the checksum, and storing the data to the corresponding position of the file according to the starting position of the data.
The checksum initial position, the checksum data length, the data initial position and the data length are stored in the Header of the file, and after the data are sent to the receiving end, the receiving end can quickly extract the data and the checksum according to the information, so that the data processing logic of the receiving end is simplified, and the data processing efficiency is improved.
And S3, sending the file to a receiving terminal, wherein after the file is analyzed by the receiving terminal to obtain the data to be sent, the data to be sent is checked and stored in a disk.
The receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file; verifying the data to be transmitted according to a checksum; and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
The file of step S2 is sent to the receiving end. After receiving the file, the receiving end reads the file header to obtain the checksum initial position, the checksum data length, the data to be sent initial position and the data to be sent length. And analyzing the checksum from the corresponding position of the file according to the starting position of the checksum and the data length of the checksum. And analyzing the data to be sent from the corresponding position of the file according to the initial position of the data to be sent and the length of the data to be sent.
And carrying out integrity check on the data to be sent by utilizing the analyzed check sum, storing the data to be sent in a falling disk mode after the check is passed, and clearing the check sum. And if the data fails to pass the verification, returning a data error prompt to the sending end, and sending the data again by the sending end.
The receiving end quickly analyzes the file, only the received data is landed, and the check sum data is directly abandoned, so that a large amount of storage resources are saved in a big data storage system.
In the method for optimizing data transmission of the HDFS network according to this embodiment, a checksum is calculated for data to be transmitted before the data is transmitted, the data to be transmitted and the checksum are transmitted to a receiving end, the receiving end analyzes a received file and then verifies the data, and then the data is landed and the checksum is not landed any more. In the embodiment, the checksum is not dropped during transmission but is dynamically generated, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
The specific implementation steps of this embodiment are as follows:
(1) reading data from disk: the data sending end reads data to be sent from a magnetic disk, calculates the Header length, the checksum initial position, the checksum data length, the data initial position and the data length according to the data length and the checksum type, and records the information into the Header;
(2) and calculating checksum according to the read data: after reading data to be sent, a sending end calculates a checksum in a memory according to a checksum type and a certain data length, and puts the checksum in a position specified by a Header;
(3) and (3) sending data: the sending end packs the Header, the checksum and the data into a Packet in sequence and sends the Packet to a network IO stream;
(4) receiving data: the receiving end receives the Header firstly, acquires the checksum data and the actual data respectively according to the position and length information recorded in the Header, verifies the actual data according to the received checksum data, and performs the disk dropping processing on the actual data after the verification is passed, wherein the checksum data does not drop the disk any more.
As shown in fig. 2, the system 200 includes:
the data calculation unit is used for reading the data to be sent by the sending end and calculating the checksum of the data to be sent;
The information storage unit is used for packaging the data to be sent and the check sum into a file and recording the position information of the data to be sent and the check sum into a file header;
and the data sending unit is used for sending the file to a receiving end, and after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is checked and stored in a falling disk mode.
Optionally, as an embodiment of the present invention, the data calculation unit includes:
and the check calculation module is used for reading the data to be sent from the disk by the sending end and calculating the check sum of the data to be sent in the memory according to the type of the check sum and the length of the data to be sent. The method is specifically used for calculating the checksum in the memory according to a certain data length according to the checksum type after the sending end reads the data to be sent.
The customization degree of the storage system is improved by presetting the check sum type, so that the check result is more accurate. The checksum calculation efficiency is greatly improved by calculating the checksum of the data to be transmitted in the memory. In-memory computing is to embed a computing unit into a memory, such that the memory is a memory or a computer, and it is not necessary to read data from the memory, and the data is directly entered into and exited from the CPU. Not only is not limited by the performance of the memory, but also the efficiency ratio (the ratio of the efficiency of energy conversion) is improved.
Optionally, as an embodiment of the present invention, the information storage unit includes:
the information calculation module is used for calculating the length of the header, the initial position of the checksum, the length of the checksum data, the initial position of the data to be sent and the length of the data to be sent according to the length and the type of the checksum of the data to be sent;
the data storage module is used for storing the data to be sent and the checksum to corresponding positions of the file according to the initial position of the data to be sent and the initial position of the checksum respectively;
and the information storage module is used for storing the checksum initial position, the checksum data length, the initial position of the data to be sent and the length of the data to be sent to a file header.
The information storage unit is specifically configured to read data from the disk: the data sending end reads data to be sent from a disk, calculates the Header length, the checksum starting position, the checksum data length, the data starting position and the data length according to the data length and the checksum type, and records the information into the Header.
By storing the checksum starting position, the checksum data length, the data starting position to be sent and the data length to be sent into the header of the file, after the data is sent to the receiving end, the receiving end can quickly extract the data and the checksum according to the information, the data processing logic of the receiving end is simplified, and the data processing efficiency is improved.
Optionally, as an embodiment of the present invention, a method for a receiving end to analyze data to be sent and check and store the data to be sent off-disk includes:
the receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file;
verifying the data to be transmitted according to a checksum;
and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
The receiving end receives the Header firstly, acquires the checksum data and the actual data respectively according to the position and length information recorded in the Header, verifies the actual data according to the received checksum data, and performs the disk dropping processing on the actual data after the verification is passed, wherein the checksum data does not drop the disk any more. The receiving end quickly analyzes the file, only the received data is landed, and the check sum data is directly abandoned, so that a large amount of storage resources are saved in a big data storage system.
In the HDFS network data transmission optimization system provided in this embodiment, the data calculation unit calculates a checksum for data to be transmitted before transmitting the data, performs information encapsulation by the information storage unit, transmits the data to be transmitted and the checksum to the receiving end by the data transmission unit, analyzes a received file by the receiving end, then verifies the data, and then drops the data, and the checksum is not dropped again. In the embodiment, the checksum is not dropped during transmission but is dynamically generated, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the disk; the checksum not-dropping can simplify the processing logic of the receiving end when receiving data and improve the efficiency of receiving and processing data.
Fig. 3 is a schematic structural diagram of a terminal 300 according to an embodiment of the present invention, where the terminal 300 may be used to execute the HDFS network data transmission optimization method according to the embodiment of the present invention.
Among them, the terminal 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Therefore, the invention calculates the checksum for the data to be sent before sending the data, sends the data to be sent and the checksum to the receiving end, analyzes the received file by the receiving end and then verifies the data, and then drops the data and does not drop the checksum any more. According to the invention, the data checksum is dynamically generated without dropping the checksum in the transmission process, so that a large amount of storage resources are saved, and the efficiency of calculating the checksum in the memory is higher than that of directly reading the checksum file from the magnetic disk; the checksum not falling can simplify the processing logic of the receiving end when receiving data, and improve the efficiency of receiving and processing data.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.
The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.
In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A data transmission optimization method for an HDFS (Hadoop distributed File System) network is characterized by comprising the following steps:
a sending end reads data to be sent and calculates the checksum of the data to be sent;
packaging the data to be sent and the check sum into a file, and recording the position information of the data to be sent and the check sum into a file header;
And sending the file to a receiving end, wherein after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is stored in a check-up and off-disk manner.
2. The method of claim 1, wherein the transmitting end reads data to be transmitted and calculates a checksum of the data to be transmitted, comprising:
and the sending end reads the data to be sent from the disk, and calculates the checksum of the data to be sent in the memory according to the checksum type and the length of the data to be sent.
3. The method of claim 2, wherein encapsulating the data and checksum to be transmitted into a file and recording location information of the data and checksum to be transmitted into a file header comprises:
calculating the length of a header, the initial position of a check sum, the length of the check sum data, the initial position of the data to be sent and the length of the data to be sent according to the length of the data to be sent and the type of the check sum;
storing the data to be transmitted and the checksum to corresponding positions of the file according to the initial position of the data to be transmitted and the initial position of the checksum respectively;
and saving the checksum starting position, the checksum data length, the starting position of the data to be sent and the length of the data to be sent to a file header.
4. The method of claim 3, wherein the method for the receiving end to parse the data to be sent and store the data to be sent as a check result in a disk, comprises:
the receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file;
verifying the data to be transmitted according to a checksum;
and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
5. An HDFS network data transmission optimization system, comprising:
the data calculation unit is used for reading the data to be sent by the sending end and calculating the checksum of the data to be sent;
the information storage unit is used for packaging the data to be sent and the check sum into a file and recording the position information of the data to be sent and the check sum into a file header;
and the data sending unit is used for sending the file to a receiving end, and after the file is analyzed by the receiving end to obtain the data to be sent, the data to be sent is checked and stored in a falling disk mode.
6. The system of claim 5, wherein the data computation unit comprises:
And the check calculation module is used for reading the data to be sent from the disk by the sending end and calculating the check sum of the data to be sent in the memory according to the type of the check sum and the length of the data to be sent.
7. The system of claim 6, wherein the information storage unit comprises:
the information calculation module is used for calculating the length of the header, the initial position of the checksum, the length of the checksum data, the initial position of the data to be sent and the length of the data to be sent according to the length and the type of the checksum of the data to be sent;
the data storage module is used for storing the data to be sent and the checksum to corresponding positions of the file according to the initial position of the data to be sent and the initial position of the checksum respectively;
and the information storage module is used for storing the checksum initial position, the checksum data length, the initial position of the data to be sent and the length of the data to be sent to a file header.
8. The system of claim 7, wherein the method for the receiving end to parse the data to be sent and store the data to be sent as a check result in a disk, comprises:
the receiving end extracts the data to be sent and the check sum from the file according to the check sum initial position, the check sum data length, the initial position of the data to be sent and the length of the data to be sent of the header of the file;
Verifying the data to be transmitted according to a checksum;
and (4) the data to be sent which passes the verification is stored in a falling disk mode, and the checksum is cleared.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN202110779799.9A 2021-07-09 2021-07-09 HDFS network data transmission optimization method, system, terminal and storage medium Withdrawn CN113626405A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110779799.9A CN113626405A (en) 2021-07-09 2021-07-09 HDFS network data transmission optimization method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110779799.9A CN113626405A (en) 2021-07-09 2021-07-09 HDFS network data transmission optimization method, system, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN113626405A true CN113626405A (en) 2021-11-09

Family

ID=78379372

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110779799.9A Withdrawn CN113626405A (en) 2021-07-09 2021-07-09 HDFS network data transmission optimization method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN113626405A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301575A (en) * 2021-12-21 2022-04-08 阿里巴巴(中国)有限公司 Data processing method, system, device and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114301575A (en) * 2021-12-21 2022-04-08 阿里巴巴(中国)有限公司 Data processing method, system, device and medium
CN114301575B (en) * 2021-12-21 2024-03-29 阿里巴巴(中国)有限公司 Data processing method, system, equipment and medium

Similar Documents

Publication Publication Date Title
US10769228B2 (en) Systems and methods for web analytics testing and web development
US9832280B2 (en) User profile configuring method and device
US20170083579A1 (en) Distributed data processing method and system
EP3809269B1 (en) Monitoring a distributed application server environment
US20230370285A1 (en) Block-chain-based data processing method, computer device, computer-readable storage medium
CN109739527A (en) A kind of method, apparatus, server and the storage medium of the publication of client gray scale
CN109447384A (en) Verification method, device, equipment and the storage medium of air control system
CN112381649A (en) Transaction consensus method, device and equipment based on block chain
CN111209339B (en) Block synchronization method, device, computer and storage medium
CN113626405A (en) HDFS network data transmission optimization method, system, terminal and storage medium
CN111355696A (en) Message identification method and device, DPI (deep packet inspection) equipment and storage medium
CN116743619B (en) Network service testing method, device, equipment and storage medium
CN109918221B (en) Hard disk error reporting analysis method, system, terminal and storage medium
CN113872826B (en) Network card port stability testing method, system, terminal and storage medium
CN110989333B (en) Redundancy control method based on multiple computing cores, computing cores and redundancy control system
CN111507840A (en) Block chain consensus method, device, computer and readable storage medium
CN111752911A (en) Data transmission method, system, terminal and storage medium based on Flume
CN109920466B (en) Hard disk test data analysis method, device, terminal and storage medium
CN112491589B (en) Object storage network early warning method, system, terminal and storage medium
CN113938450B (en) Avionics system communication fault processing method, avionics system communication fault processing device, computer equipment and medium
CN113760372B (en) Binary data packet analysis method and system
CN115185465A (en) shuffle sorting method, system and storage medium
CN117290436A (en) Chain up-link and down-link collaboration method, system, terminal and storage medium for block chain data
CN115665118A (en) Application-level call chain generation method based on HTTP (hyper text transport protocol) header extension
CN114676007A (en) Method, system, terminal and storage medium for detecting external plug-in card

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20211109

WW01 Invention patent application withdrawn after publication