CN114371810A - Data storage method and device of HDFS - Google Patents

Data storage method and device of HDFS Download PDF

Info

Publication number
CN114371810A
CN114371810A CN202011101718.1A CN202011101718A CN114371810A CN 114371810 A CN114371810 A CN 114371810A CN 202011101718 A CN202011101718 A CN 202011101718A CN 114371810 A CN114371810 A CN 114371810A
Authority
CN
China
Prior art keywords
data
stored
hdfs
record number
upper limit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011101718.1A
Other languages
Chinese (zh)
Other versions
CN114371810B (en
Inventor
高宗宝
陈燕雷
李晓
周波
李光锴
吴兴耀
耿禄博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Group Design Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Group Design Institute Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202011101718.1A priority Critical patent/CN114371810B/en
Publication of CN114371810A publication Critical patent/CN114371810A/en
Application granted granted Critical
Publication of CN114371810B publication Critical patent/CN114371810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to the technical field of data storage, in particular to a data storage method and device of an HDFS. The method comprises the following steps: acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer; if the data record number is not less than a preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer; performing HDFS writing on the data cached in the current data cache; wherein the preset upper limit value is the product of the upper limit of the data record number of the data block and a preset coefficient. The data storage method and the data storage device of the HDFS, provided by the embodiment of the invention, can combine small-scale data to the greatest extent under the condition of keeping the original characteristics of the data to be stored, so that the storage of the data in the HDFS can approach the block size, and the number of small data blocks in the HDFS is reduced.

Description

Data storage method and device of HDFS
Technical Field
The invention relates to the technical field of data storage, in particular to a data storage method and device of an HDFS.
Background
For the storage of data in HDFS (Hadoop Distributed File System), the prior art mainly adopts the following scheme:
scheme 1: data is directly written into the HDFS in the client, for example, the data is uploaded by using a copyfromlLocalFile method of an HDFS shell or a Filesystem, and the data is sorted under an actual scene without considering the scale of the uploaded data block.
The scheme is a basic method for uploading the HDFS file, the scale of uploaded data is not considered, and the arrangement of the data is considered in an operation scene, so that time and labor are wasted.
Scheme 2: the file content is added through an API provided by Hadoop, an added file stream is obtained in a client through a FileSystems type appended method, and other data are written into the stream to complete the file addition.
The scheme mainly adds data to the existing HDFS file, and the size of a file block cannot be well controlled in a client, so that the size distribution is not uniform.
Therefore, how to provide a data storage method of the HDFS can fully consider the scale of data, so that the storage of the data in the HDFS can approach to the block size, which is of great significance.
Disclosure of Invention
Aiming at the defects in the prior art, the embodiment of the invention provides a data storage method of an HDFS, which comprises the following steps:
acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer;
if the data record number is not less than a preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer;
performing HDFS writing on the data cached in the current data cache;
wherein the preset upper limit value is the product of the upper limit of the data record number of the data block and a preset coefficient.
In one embodiment, the method further comprises:
and if the data record number is greater than the data record number upper limit of the data block, counting, and acquiring the data record number of a next data buffer after the data to be stored is stored in the next data buffer.
In one embodiment, the method further comprises:
if the data record number is smaller than the preset upper limit value, continuing to store the current data buffer;
the continuing the storing operation comprises:
and storing the data to be stored into the current data buffer, and acquiring the data record number of the current data buffer after the next data to be stored is stored into the current data buffer.
In one embodiment, if the count value is greater than a preset threshold, HDFS writing is performed on the data to be stored.
In one embodiment, after the obtaining of the data to be stored in the next data buffer and before the data record number of the next data buffer, the method further includes:
and storing the data to be stored into a waiting queue buffer.
In an embodiment, if the time consumed for the continuous storage operation reaches a preset time, HDFS writing is performed on the data cached in the current data cache.
In one embodiment, the predetermined coefficient has a value in a range of 0.8 to 1.
On the other hand, an embodiment of the present invention further provides a data storage device for an HDFS, including:
the acquisition module is used for acquiring the data record number of the current data buffer after the data to be stored is stored in the current data buffer;
the judging module is used for storing the data to be stored into the current data buffer when the data record number is not less than a preset upper limit value and not more than the data record number upper limit of a data block;
the writing module is used for writing the data cached in the current data cache into the HDFS;
wherein the preset upper limit value is the product of the upper limit of the data record number of the data block and a preset coefficient.
On the other hand, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of any one of the above-mentioned data storage methods for the HDFS when executing the program.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data storage method of the HDFS described above.
According to the data storage method and device of the HDFS, provided by the embodiment of the invention, the data stored in the data cache is written in the HDFS only when the data record number of the data cache is close to the upper limit of the data record number of the data block, so that small-scale data can be combined to the maximum extent under the condition of keeping the original characteristics of the data to be stored, the storage of the data in the HDFS can approach to the block size, and the number of the small data blocks in the HDFS is reduced.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart illustrating a data storage method of an HDFS according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a data storage device of an HDFS according to an embodiment of the present invention;
fig. 3 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of an embodiment of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Fig. 1 is a schematic flow chart of a data storage method of an HDFS according to an embodiment of the present invention, and referring to fig. 1, an embodiment of the present invention provides a data storage method of an HDFS, including:
s110, acquiring the data record number of the current data buffer after the data to be stored is stored in the current data buffer;
s120, if the data record number is not less than the preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into a current data buffer;
s130, performing HDFS writing on the data cached in the current data cache;
wherein the predetermined upper limit is the product of the upper limit of the data record number of the data block and a predetermined coefficient.
An execution main body of the data storage method of the HDFS provided by the embodiment of the present invention may be a computer, such as a smart phone, a portable computer, a tablet computer, a personal computer, a wearable device, and the like.
Note that the upper limit of the number of data records is determined by the default size of the data block. For example, a memory block having a default size of 128M may have an upper limit of about 30 ten thousand data records.
The data buffer is a buffer device for structured data records. The name of the data buffer may be determined according to a specific service scenario. For example, the common data may be a file name, the mobile MRO data may be "provide-city-enhanced", and the like. And a data line recorder is simultaneously arranged with the data buffer and used for recording the data recording number in the current data buffer.
Specifically, when there is data to be stored, the data record number cu of the current data buffer after the data to be stored is stored in the current data buffer may be first obtained, and it may be determined, according to the data record number cu, what manner the data to be stored is stored in.
When the data record number cu of the current data buffer after the data to be stored is stored in the current data buffer is obtained, the data record number cu and a preset upper limit value l are judged1And an upper limit of the number of data records l of the data block0If the data record number cu is not less than the preset upper limit value l1And is not greater than the upper limit of data recording number l of data blocks0(i.e. |)1≤cu≤l0Wherein l is1=p×l0And p is a preset coefficient), the data to be stored is stored in the current data buffer.
Wherein, the value range of the preset coefficient p can be 08 to 1, the upper limit value l is preset1Is in the range of 0.8l0To 1l0. The specific value range of the preset coefficient p can be adjusted according to actual needs, which is not limited in the embodiment of the present invention.
When the data to be stored is stored in the current data buffer, the metadata of the data to be stored can be recorded, including the client number, the file name (depending on the data input source), the start line number, the end line number, and the file location (i.e. the file name in the HDFS, which is required to ensure the cluster uniqueness, for addressing in subsequent processing). The recording metadata can effectively check the stored structured data or extract lines and the like.
After the data to be stored is stored in the current data buffer, HDFS writing and metadata writing can be carried out on the data cached in the current data buffer.
It can be understood that, when the data stored in the data buffer does not reach the upper limit of the data record number of the data block and exceeds the data to be stored after the data to be stored is imported, if the HDFS is written, a small data block appears after the HDFS is cut, and the data to be stored is also cut.
In the data storage method of the HDFS provided in the embodiment of the present invention, since the HDFS is only written into the data stored in the data buffer when the data record number of the data buffer is close to the upper limit of the data record number of the data block, small-scale data can be combined to the maximum extent under the condition of keeping the original characteristics of the data to be stored, so that the data stored in the HDFS can approach the block size, and thus the number of small data blocks in the HDFS is reduced.
When the number of small data blocks in the HDFS is reduced, the processing efficiency of a task executed based on the data blocks in the HDFS can be obviously improved.
Further, in an embodiment, the data storage method of the HDFS provided by the embodiment of the present invention may further include:
if the data record number cu is larger than the data record number upper limit l of the data block0Counting, and caching the next data after the data to be stored is stored in the next data bufferThe data record number of the device.
When cu is more than l0When the data to be stored is stored in the current data buffer and the data stored in the current data buffer exceeds the storage upper limit of the data block, the data is counted, the count value is updated, and the data record number cu' of the next data buffer after the data to be stored is stored in the next data buffer is obtained.
It can be understood that, after the data record number cu 'of the next data buffer is obtained, the data record number cu' can be compared with the preset upper limit value l1And an upper limit of the number of data records l of the data block0Comparing if cu' is less than or equal to l0Then the data to be stored is stored in the next data buffer.
If cu' > l0And continuing to count, updating the count value, and acquiring the data record number cu "of the next data buffer after the data to be stored is stored in the next data buffer, and so on until a subsequent data buffer can store the data to be stored, or the count value reaches a preset threshold value.
The data to be stored can be stored in the data buffer by multiple attempts, so that the probability of storing the data to be stored in the proper data buffer can be improved, the probability of splitting and generating small data blocks after the data to be stored is written in the HDFS is reduced, and the storage of the data in the HDFS can be further ensured to approach to the block size.
When the count value reaches a preset threshold, for example, 9, it indicates that 9 times of storage of the data to be stored have been attempted, but no suitable data buffer can store the data to be stored. Then when the 10 th attempt is made (i.e., when the count value is greater than the preset threshold), the HDFS write may be made directly to the data to be stored. The specific value of the preset threshold may be adjusted according to actual needs, which is not limited in the embodiments of the present invention.
By directly writing the HDFS into the data to be stored when the count value is larger than the preset threshold value, excessive resource waste can be avoided, and the data storage efficiency of the HDFS is improved.
In an embodiment, after the data to be stored is stored in the next data buffer and before the data record number of the next data buffer is obtained, the data storage method of the HDFS provided in the embodiment of the present invention further includes:
and storing the data to be stored into the waiting queue buffer.
The structure of the wait queue register is identical to the structure of the data register.
When the further storage judgment is carried out, the data to be stored is stored into the waiting queue buffer, so that the delay of storing the next data to be stored into the current data buffer can be avoided, and the operation efficiency of the data storage method of the HDFS provided by the embodiment of the invention is improved.
In an embodiment, the data storage method of the HDFS provided in the embodiment of the present invention may further include:
if the data record number cu is less than the preset upper limit value l1If so, continuing to store the current data buffer;
the continuing the storage operation includes:
storing the data to be stored into the current data buffer, and acquiring the data record number cu of the current data buffer after the next data to be stored is stored into the current data buffer1
It will be understood that when cu < l1When the data block size is smaller than the data block size, the current data buffer can store the data to be stored, and the current data buffer can continue to store the data to be stored.
Further, in an embodiment, the data storage method of the HDFS provided by the embodiment of the present invention may further include:
and if the consumed time for continuing the storage operation reaches the preset time length, performing HDFS writing on the data cached in the current data cache.
The preset duration may be, for example, 10ms, and the specific size may be adjusted according to actual needs, which is not limited in the embodiment of the present invention.
It can be understood that, by writing the HDFS into the data cached in the current data buffer when the time consumed for continuing the storage operation reaches the preset time, it is possible to avoid that the current data buffer delays much time due to waiting for the storage of the subsequent data to be stored, thereby ensuring the efficient operation of the data storage method of the HDFS provided by the embodiment of the present invention.
In summary, the data storage method of the HDFS provided in the embodiments of the present invention can combine small-scale data to the greatest extent while preserving the original features of the data, so as to reduce the number of small data blocks in the HDFS, thereby making the distribution of the data blocks in the HDFS more balanced.
Fig. 2 is a schematic structural diagram of a data storage device of an HDFS according to an embodiment of the present invention, and referring to fig. 2, an embodiment of the present invention further provides a data storage device of an HDFS, including:
an obtaining module 210, configured to obtain a data record number of a current data buffer after data to be stored is stored in the current data buffer;
the judging module 220 is configured to store the data to be stored in the current data buffer when the data record number is not less than the preset upper limit and is not greater than the data record number upper limit of the data block;
a write-in module 230, configured to perform HDFS write-in on data cached in the current data cache;
wherein the predetermined upper limit is the product of the upper limit of the data record number of the data block and a predetermined coefficient.
According to the data storage device of the HDFS provided by the embodiment of the invention, the data stored in the data buffer is written in the HDFS only when the data record number of the data buffer is close to the upper limit of the data record number of the data block, so that small-scale data can be combined to the maximum extent under the condition of keeping the original characteristics of the data to be stored, the storage of the data in the HDFS can be close to the block size, and the number of the small data blocks in the HDFS is reduced.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a bus (bus)340, wherein the processor 310, the communication interface 320 and the memory 330 are communicated with each other via the bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method:
acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer;
if the data record number is not less than the preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer;
and performing HDFS writing on the data cached in the current data cache.
In addition, the logic instructions in the memory may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
Further, an embodiment of the present invention discloses a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the method provided by the above-mentioned method embodiments, for example, including:
acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer;
if the data record number is not less than the preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer;
and performing HDFS writing on the data cached in the current data cache.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including:
acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer;
if the data record number is not less than the preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer;
and performing HDFS writing on the data cached in the current data cache.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A data storage method of an HDFS (Hadoop distributed File System), which is characterized by comprising the following steps:
acquiring the data record number of a current data buffer after data to be stored is stored in the current data buffer;
if the data record number is not less than a preset upper limit value and not greater than the data record number upper limit of the data block, storing the data to be stored into the current data buffer;
performing HDFS writing on the data cached in the current data cache;
wherein the preset upper limit value is the product of the upper limit of the data record number of the data block and a preset coefficient.
2. The HDFS data storage method according to claim 1, further comprising:
and if the data record number is greater than the data record number upper limit of the data block, counting, and acquiring the data record number of a next data buffer after the data to be stored is stored in the next data buffer.
3. The HDFS data storage method according to claim 1, further comprising:
if the data record number is smaller than the preset upper limit value, continuing to store the current data buffer;
the continuing the storing operation comprises:
and storing the data to be stored into the current data buffer, and acquiring the data record number of the current data buffer after the next data to be stored is stored into the current data buffer.
4. The HDFS data storage method according to claim 2, wherein if the count value is greater than a preset threshold, the HDFS writing is performed on the data to be stored.
5. The HDFS data storage method according to claim 2, wherein after the obtaining of the data to be stored in the next data buffer and before the data record number of the next data buffer, the method further comprises:
and storing the data to be stored into a waiting queue buffer.
6. The HDFS data storage method according to claim 3, wherein if the time consumed for the continuous storage operation reaches a preset time, the HDFS is written into the data cached in the current data cache.
7. The HDFS data storage method according to any one of claims 1 to 6, wherein the predetermined coefficient has a value in a range of 0.8 to 1.
8. A data storage device of an HDFS, comprising:
the acquisition module is used for acquiring the data record number of the current data buffer after the data to be stored is stored in the current data buffer;
the judging module is used for storing the data to be stored into the current data buffer when the data record number is not less than a preset upper limit value and not more than the data record number upper limit of a data block;
the writing module is used for writing the data cached in the current data cache into the HDFS;
wherein the preset upper limit value is the product of the upper limit of the data record number of the data block and a preset coefficient.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the data storage method of the HDFS according to any one of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data storage method of the HDFS according to any one of claims 1 to 7.
CN202011101718.1A 2020-10-15 2020-10-15 Data storage method and device of HDFS Active CN114371810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011101718.1A CN114371810B (en) 2020-10-15 2020-10-15 Data storage method and device of HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011101718.1A CN114371810B (en) 2020-10-15 2020-10-15 Data storage method and device of HDFS

Publications (2)

Publication Number Publication Date
CN114371810A true CN114371810A (en) 2022-04-19
CN114371810B CN114371810B (en) 2023-10-27

Family

ID=81138069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011101718.1A Active CN114371810B (en) 2020-10-15 2020-10-15 Data storage method and device of HDFS

Country Status (1)

Country Link
CN (1) CN114371810B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071842A1 (en) * 2006-09-20 2008-03-20 Hitachi, Ltd. Database management system to reduce capacity of storage volume
CN104503703A (en) * 2014-12-16 2015-04-08 华为技术有限公司 Cache processing method and device
CN105446893A (en) * 2014-07-14 2016-03-30 阿里巴巴集团控股有限公司 Data storage method and device
CN105511802A (en) * 2015-11-24 2016-04-20 北京达沃时代科技有限公司 Buffer memory writing method and apparatus and synchronizing method and apparatus for disk cache region
CN108572930A (en) * 2017-03-14 2018-09-25 航天信息股份有限公司 Buffer control method and device
US10114754B1 (en) * 2015-09-30 2018-10-30 Veritas Technologies Llc Techniques for space reservation in a storage environment
CN109426438A (en) * 2017-08-31 2019-03-05 中国移动通信集团广东有限公司 Real-time big data mirrored storage method and device
WO2019154221A1 (en) * 2018-02-07 2019-08-15 华为技术有限公司 Method for sending streaming data and data sending device
WO2019218468A1 (en) * 2018-05-14 2019-11-21 平安科技(深圳)有限公司 Data storage method and device
WO2020041928A1 (en) * 2018-08-27 2020-03-05 深圳市锐明技术股份有限公司 Data storage method and system and terminal device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080071842A1 (en) * 2006-09-20 2008-03-20 Hitachi, Ltd. Database management system to reduce capacity of storage volume
CN105446893A (en) * 2014-07-14 2016-03-30 阿里巴巴集团控股有限公司 Data storage method and device
CN104503703A (en) * 2014-12-16 2015-04-08 华为技术有限公司 Cache processing method and device
US10114754B1 (en) * 2015-09-30 2018-10-30 Veritas Technologies Llc Techniques for space reservation in a storage environment
CN105511802A (en) * 2015-11-24 2016-04-20 北京达沃时代科技有限公司 Buffer memory writing method and apparatus and synchronizing method and apparatus for disk cache region
CN108572930A (en) * 2017-03-14 2018-09-25 航天信息股份有限公司 Buffer control method and device
CN109426438A (en) * 2017-08-31 2019-03-05 中国移动通信集团广东有限公司 Real-time big data mirrored storage method and device
WO2019154221A1 (en) * 2018-02-07 2019-08-15 华为技术有限公司 Method for sending streaming data and data sending device
WO2019218468A1 (en) * 2018-05-14 2019-11-21 平安科技(深圳)有限公司 Data storage method and device
WO2020041928A1 (en) * 2018-08-27 2020-03-05 深圳市锐明技术股份有限公司 Data storage method and system and terminal device

Also Published As

Publication number Publication date
CN114371810B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
CN105117351B (en) To the method and device of buffering write data
CN110764708A (en) Data reading method, device, equipment and storage medium
CN107197359B (en) Video file caching method and device
CN109981702B (en) File storage method and system
CN109471843B (en) Metadata caching method, system and related device
CN110737388A (en) Data pre-reading method, client, server and file system
CN113419824A (en) Data processing method, device, system and computer storage medium
CN103607312A (en) Data request processing method and system for server system
CN112954244A (en) Method, device and equipment for realizing storage of monitoring video and storage medium
CN112148736B (en) Method, device and storage medium for caching data
CN110543495A (en) cursor traversal storage method and device
CN107133183B (en) Cache data access method and system based on TCMU virtual block device
CN111930305A (en) Data storage method and device, storage medium and electronic device
CN111803917A (en) Resource processing method and device
CN106899558A (en) The treating method and apparatus of access request
CN109977074B (en) HDFS-based LOB data processing method and device
CN114371810A (en) Data storage method and device of HDFS
CN110658999B (en) Information updating method, device, equipment and computer readable storage medium
CN114089912A (en) Data processing method and device based on message middleware and storage medium
CN110825652B (en) Method, device and equipment for eliminating cache data on disk block
CN111125715A (en) TCG data processing acceleration method and device based on solid state disk, computer equipment and storage medium
CN112667847A (en) Data caching method, data caching device and electronic equipment
CN111090633A (en) Small file aggregation method, device and equipment of distributed file system
CN113806249B (en) Object storage sequence lifting method, device, terminal and storage medium
CN115509763B (en) Fingerprint calculation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant