CN103257970B - Method of testing and device for HDFS host node - Google Patents

Method of testing and device for HDFS host node Download PDF

Info

Publication number
CN103257970B
CN103257970B CN201210037776.1A CN201210037776A CN103257970B CN 103257970 B CN103257970 B CN 103257970B CN 201210037776 A CN201210037776 A CN 201210037776A CN 103257970 B CN103257970 B CN 103257970B
Authority
CN
China
Prior art keywords
node
mapping table
file
host node
meta data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210037776.1A
Other languages
Chinese (zh)
Other versions
CN103257970A (en
Inventor
潘瑾瑜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210037776.1A priority Critical patent/CN103257970B/en
Publication of CN103257970A publication Critical patent/CN103257970A/en
Application granted granted Critical
Publication of CN103257970B publication Critical patent/CN103257970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention proposes a kind of method of testing for HDFS host node and device. The method comprising the steps of: constructs multiple meta data files according to predetermined configurations form; Generating the metadata image file in host node according to multiple meta data files, metadata image file includes the ID of multiple data blocks at each meta data file place, number and size; According to the ID of multiple data blocks, number and size, use simulator simulation HDFS multiple from node and the mapping table setting up multiple relation between multiple data blocks at node and each meta data file place; Set up the thread between mapping table and host node; Multiple relations between multiple data blocks at node and each meta data file place are sent to metadata image file host node by thread; And use thread that the write operation for the first file is modified mapping table realization. The advantage that method according to embodiments of the present invention has saving machine resources, test rate is high.

Description

Method of testing and device for HDFS host node
Technical field
The present invention relates to Internet technical field, particularly relate to a kind of method of testing for HDFS host node and device.
Background technology
HDFS (HadoopDistributedFileSystem) is a kind of distributed file system, is mainly used in Hadoop project, the advantage with high fault tolerance, high data transmission rate, is suitable for those application programs having super large data set.
The framework of HDFS file system is primarily present two kinds of nodes: host node (NameNode) and from node (DataNode). Wherein, host node is responsible for the management of file system metadata, including the mapping relations of the name space of file system, file and data block, data block and the mapping relations from node; Be responsible for the storage of concrete file content from node, file is cut into multiple data block, is stored in different multiple from node dispersedly. In this framework, there is a host node and multiple from node. Host node, as the guardian of whole file system metadata, is the key that whole file system can be properly functioning. Along with the continuous expansion of whole group system scale, before reaching the standard grade except needs are normal to host node, except abnormal function point is verified, in addition it is also necessary to host node is carried out stress test and performance test.
There are two problems when carrying out stress test and performance test: 1. machine resources expense is bigger: along with the expansion of cluster scale, machine Numerous in the HDFS of actual motion, and the machine quantity for test does not generally all reach this scale. 2. content metadata is write: in pressure and performance test, generally all can create substantial amounts of metadata on the primary node, allow this node operationally internal memory reach a degree of pressure. Use the method ceaselessly creating file can create substantial amounts of metadata, but efficiency comparison is low. Assume one hour to write 100,000 files, then the number of files cluster in million rank scales will be constructed, at least obtain 10 hours.
Prior art generally adopts community to provide scheme (DataNodeCluster), and its solution comprises following two aspect. First, dispose multiple from node in every Radix codonopsis pilosulae computer with test. After having carried out this deployment, comparing a situation from node of only disposing in every computer during practical application, number of computers required in test process can obviously reduce. Second, it is provided that the configuration item of a permission " not reading and writing real data ", reduce, with this, the system resource that magnetic disc i/o consumes.
But, single operationally usually contain multiple thread from node, and need to use multiple network port, especially during concurrent reading and writing, this phenomenon is particularly evident. Due to the ubiquity of this phenomenon, each from the stable operation of node for ensureing in test process, use this technical scheme participate in every computer of test actual can concurrently deployed example sum very limited. Practical experience shows, it is ensured that every Radix codonopsis pilosulae that test process stably carries out is about 500-800 with the maximum from node number of deployment in the computer of test. Additionally, the program cannot be avoided constructs a large amount of metadata in the master node, therefore can not solve this process consuming time in a large number, the problem reducing testing efficiency.
Summary of the invention
It is contemplated that at least solve one of above-mentioned technical problem.
For this, it is an object of the present invention to propose the faulty hard disk that a kind of combination reports an error daily record and kernel is quoted by smart information and do true Contingency screening to reduce the method for testing for HDFS host node of hard disk failure rate.
Another object of the present invention is to propose a kind of test device for HDFS host node.
To achieve these goals, the method for testing for HDFS host node of embodiment comprises the following steps according to the first aspect of the invention: A. constructs multiple meta data files according to predetermined configurations form; B. generating the metadata image file in described host node according to the plurality of meta data file, described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size; C. according to the ID of multiple data blocks at each meta data file place, number and size, use that simulator simulation HDFS's is multiple from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place; D. the thread between described mapping table and described host node is set up; E. by described thread the plurality of relation between multiple data blocks at node and each meta data file place sent to the described metadata image file described host node; And F. uses described thread that the write operation for the first file is modified the realization of described mapping table.
The method of testing for HDFS host node according to embodiments of the present invention, it is possible to use a small amount of machine resources, constructs desired cluster scale and quantity of documents rapidly, and in order to the pressure of HDFS and performance to be tested, testing efficiency is higher.
To achieve these goals, the test device for HDFS host node of embodiment includes according to the second aspect of the invention: configuration module, and described configuration module is for constructing multiple meta data files according to predetermined configurations form; Generation module, described generation module is for generating according to the plurality of meta data file in the metadata image file in described host node, and described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size; Analog module, described analog module is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place; Setting up module, described module of setting up is for setting up the thread between described mapping table and described host node; Delivery module, described delivery module is used for the described metadata image file sending to described host node by the plurality of relation between multiple data blocks at node and each meta data file place by described thread; And modified module, described modified module realizes for using described thread that the write operation for the first file is modified described mapping table.
The test device for HDFS host node according to embodiments of the present invention, the machine resources of use is less, it is possible to construct desired cluster scale and quantity of documents rapidly, and the testing efficiency of HDFS pressure and performance is higher.
Aspect and advantage that the present invention adds will part provide in the following description, and part will become apparent from the description below, or is recognized by the practice of the present invention.
Accompanying drawing explanation
The present invention above-mentioned and/or that add aspect and advantage will be apparent from easy to understand from the following description of the accompanying drawings of embodiments, wherein,
Fig. 1 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention;
Fig. 2 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention;
Fig. 3 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention;
Fig. 4 be according to an embodiment of the invention for HDFS host node test device structured flowchart;
Fig. 5 be according to an embodiment of the invention for HDFS host node test device structured flowchart; And
Fig. 6 be according to an embodiment of the invention for HDFS host node test device structured flowchart.
Detailed description of the invention
Being described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of same or like function from start to finish. The embodiment described below with reference to accompanying drawing is illustrative of, and is only used for explaining the present invention, and is not considered as limiting the invention. On the contrary, all changes within the scope of embodiments of the invention include falling into attached claims spirit and intension, amendment and equivalent.
In describing the invention, it is to be understood that term " first ", " second " etc. only for descriptive purposes, and it is not intended that instruction or hint relative importance. In describing the invention, it is necessary to explanation, unless otherwise clearly defined and limited, term " being connected ", " connection " should be interpreted broadly, for instance, it is possible to it is fixing connection, it is also possible to be removably connect, or connect integratedly; Can be mechanically connected, it is also possible to be electrical connection; Can be joined directly together, it is also possible to be indirectly connected to by intermediary. For the ordinary skill in the art, it is possible to concrete condition understands above-mentioned term concrete meaning in the present invention. Additionally, in describing the invention, except as otherwise noted, " multiple " are meant that two or more.
Describe in flow chart or in this any process described otherwise above or method and be construed as, represent and include the module of code of executable instruction of one or more step for realizing specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not press order that is shown or that discuss, including according to involved function by basic mode simultaneously or in the opposite order, performing function, this should be understood by embodiments of the invention person of ordinary skill in the field.
Below with reference to Figure of description, the method for testing for HDFS host node according to embodiments of the present invention is described.
A kind of method of testing for HDFS host node, comprises the following steps: A. constructs multiple meta data files according to predetermined configurations form; B. generating the metadata image file in host node according to multiple meta data files, metadata image file includes the ID of multiple data blocks at each meta data file place, number and size; C. according to the ID of multiple data blocks at each meta data file place, number and size, use that simulator simulation HDFS's is multiple from node and the mapping table setting up multiple relation between multiple data blocks at node and each meta data file place; D. the thread between mapping table and host node is set up; E. multiple relations between multiple data blocks at node and each meta data file place are sent to metadata image file host node by thread; And F. uses thread that the write operation for the first file is modified mapping table realization.
Fig. 1 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention.
As it is shown in figure 1, the method for testing for HDFS host node according to embodiments of the present invention comprises the steps.
Step S101, constructs multiple meta data files according to predetermined configurations form.
In one embodiment of the invention, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Step S102, the metadata image file in host node is generated according to multiple meta data files, metadata image file includes the ID of multiple data blocks at each meta data file place, number and size, and the NameSpace of file, according to the method, the metadata image file of million grades can be gone out at extremely short time construction.
Step S103, according to the ID of multiple data blocks at each meta data file place, number and size, using that simulator simulation HDFS's is multiple from node and the mapping table setting up multiple relation between multiple data blocks at node and each meta data file place, these mapping relations are dynamically added in metadata and image file thereof.
In one embodiment of the invention, the number of the multiple data blocks according to each meta data file place, use more than first of the first simulator simulation HDFS from node the first mapping table setting up more than first relation from node and more than first meta data file between multiple data blocks at each meta data file place, use more than second of the second simulator simulation HDFS from node the second mapping table setting up more than second relation from node and more than second meta data file between multiple data blocks at each meta data file place. Wherein, simulator carry out unified management to multiple from node, and be responsible for communicating with host node by simulator. The size of multiple data blocks is predetermined. Multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Step S104, sets up the thread between mapping table and host node.
In one embodiment of the invention, the number of the thread between mapping table and host node is one or two. Set up the first thread between the first mapping table and host node the second thread setting up between the second mapping table and host node.
Step S105, sends multiple relations between multiple data blocks at node and each meta data file place to metadata image file host node by thread.
In one embodiment of the invention, send more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, send more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Step S106, uses thread that the write operation for the first file is modified mapping table and realizes.
In one embodiment of the invention, have employed the mode that a kind of streamline is write, in which, only client is visible from node by first, and all the other are all transparent from node, thus simplifying thread, and control overhead.
In one embodiment of the invention, it is also possible to farther include step: amendment result is fed back to the metadata image file in host node by mapping table.
The method of testing for HDFS host node according to embodiments of the present invention, it is possible to use a small amount of machine resources, constructs desired cluster scale and quantity of documents rapidly, and in order to the pressure of HDFS and performance to be tested, testing efficiency is higher.
Fig. 2 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention.
As in figure 2 it is shown, the method for testing for HDFS host node according to embodiments of the present invention comprises the steps.
Step S201, constructs multiple meta data files according to predetermined configurations form.
Specifically, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Step S202, the metadata image file in host node is generated according to multiple meta data files, metadata image file includes the ID of multiple data blocks at each meta data file place, number and size, and the NameSpace of file, according to the method, the metadata image file of million grades can be gone out at extremely short time construction.
Step S203, the number of the multiple data blocks according to each meta data file place, use more than first of the first simulator simulation HDFS from node the first mapping table setting up more than first relation from node and more than first meta data file between multiple data blocks at each meta data file place, use more than second of the second simulator simulation HDFS from node the second mapping table setting up more than second relation from node and more than second meta data file between multiple data blocks at each meta data file place. Wherein, simulator carry out unified management to more than one from node, and be responsible for communicating with host node by simulator. The size of multiple data blocks is predetermined. Multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Step S204, sets up the first thread between the first mapping table and host node the second thread setting up between the second mapping table and host node.
Step S205, send more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, send more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Step S206, uses thread that the write operation for the first file is modified mapping table and realizes.
In one embodiment of the invention, have employed the mode that a kind of streamline is write, in which, only client is visible from node by first, and all the other are all transparent from node, thus simplifying thread, and control overhead.
Step S207, it is judged that whether the write operation for the first file relates to the first mapping table and the second mapping table.
Step S208, if it is, the first-phase determined whether in the first mapping table involved by the write operation for the first file should second-phase from node and the second mapping table should from node. Wherein, more than second that on the first simulator rejecting streamline, the first simulator is simulated are from node, and record these described nodes.
Step S209, revising in the first mapping table first-phase in order should should from the relation between the data block of node from node and first-phase, and by the first mapping table the write operation for the first file sent to the second mapping table with revise in the second mapping table second-phase should should from the relation between the data block of node from node and second-phase, and check on described streamline whether also have that the second simulator simulates other from node, and from node and described relation between the data block of node described in sequential update.
The method of testing for HDFS host node according to embodiments of the present invention, it is possible to use a small amount of machine resources, constructs desired cluster scale and quantity of documents rapidly, and in order to the pressure of HDFS and performance to be tested, testing efficiency is higher.
Fig. 3 is the flow chart of method of testing for HDFS host node according to an embodiment of the invention.
As it is shown on figure 3, the method for testing for HDFS host node according to embodiments of the present invention comprises the steps.
Step S301, constructs multiple meta data files according to predetermined configurations form.
Specifically, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Step S302, the metadata image file in host node is generated according to multiple meta data files, metadata image file includes the ID of multiple data blocks at each meta data file place, number and size, and the NameSpace of file, according to the method, the metadata image file of million grades can be gone out at extremely short time construction.
Step S303, the number of the multiple data blocks according to each meta data file place, use more than first of the first simulator simulation HDFS from node the first mapping table setting up more than first relation from node and more than first meta data file between multiple data blocks at each meta data file place, use more than second of the second simulator simulation HDFS from node the second mapping table setting up more than second relation from node and more than second meta data file between multiple data blocks at each meta data file place. Wherein, simulator carry out unified management to more than one from node, and be responsible for communicating with host node by simulator. The size of multiple data blocks is predetermined. Multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Step S304, sets up the first thread between the first mapping table and host node the second thread setting up between the second mapping table and host node.
Step S305, send more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, send more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Step S306, uses thread that the write operation for the first file is modified mapping table and realizes.
Specifically, have employed the mode that a kind of streamline is write, in which, only client is visible from node by first, and all the other are all transparent from node, thus simplifying thread, and control overhead.
Step S307, it is judged that whether the write operation for the first file relates to the first mapping table and the second mapping table.
Step S308, if it is, the first-phase determined whether in the first mapping table involved by the write operation for the first file should second-phase from node and the second mapping table should from node. Wherein, more than second that on the first simulator rejecting streamline, the first simulator is simulated are from node, and record these described nodes.
Step S309, revising in the first mapping table first-phase in order should should from the relation between the data block of node from node and first-phase, and by the first mapping table the write operation for the first file sent to the second mapping table with revise in the second mapping table second-phase should should from the relation between the data block of node from node and second-phase, and check on described streamline whether also have that the second simulator simulates other from node, and from node and described relation between the data block of node described in sequential update.
Step S310, amendment result about the write operation of the first file in the second mapping table is fed back to the first mapping table by the second mapping table.
Step S311, the metadata image file that the first mapping table will feed back to about the amendment result of the write operation of the first file in the second mapping table and the first mapping table in host node.
By step S310 and step S311, amendment result is fed back to the metadata image file of host node by mapping table.
The method of testing for HDFS host node according to embodiments of the present invention, it is possible to use a small amount of machine resources, constructs desired cluster scale and quantity of documents rapidly, and in order to the pressure of HDFS and performance to be tested, testing efficiency is higher.
Below with reference to Figure of description, the test device for HDFS host node according to embodiments of the present invention is described.
A kind of test device for HDFS host node, including: configuration module, configuration module is for constructing multiple meta data files according to predetermined configurations form; Generation module, generation module is for generating according to multiple meta data files in the metadata image file in host node, and metadata image file includes the ID of multiple data blocks at each meta data file place, number and size; Analog module, analog module is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up multiple relation between multiple data blocks at node and each meta data file place; Set up module, set up the module thread for setting up between mapping table and host node; Delivery module, delivery module is for sending multiple relations between multiple data blocks at node and each meta data file place to metadata image file host node by thread; And modified module, modified module realizes for using thread that the write operation for the first file is modified mapping table.
Fig. 4 be according to an embodiment of the invention for HDFS host node test device structured flowchart.
As shown in Figure 4, the test device for HDFS host node according to embodiments of the present invention, including configuration module 100, generation module 200, analog module 300, set up module 400, delivery module 500 and modified module 600.
Wherein, configuration module 100 is for constructing multiple meta data files according to predetermined configurations form.
In one embodiment of the invention, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Wherein, generation module 200 is for generating according to the plurality of meta data file in the metadata image file in described host node, and described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size.
Wherein, analog module 300 is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, analog module 300 includes the first analog module 301 and the second analog module 302, wherein, first analog module 301 is for the number of the multiple data blocks according to each meta data file place, more than first of simulation HDFS are from node and set up the first mapping table of individual relation from node and more than first meta data file between multiple data blocks at each meta data file place more than first, second analog module 302 is for the number of the multiple data blocks according to each meta data file place, more than second of simulation HDFS are from node and set up the second mapping table of individual relation from node and more than second meta data file between multiple data blocks at each meta data file place more than second. wherein, simulator carry out unified management to more than one from node, and be responsible for communicating with host node by simulator. the size of multiple data blocks is predetermined. multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Wherein, the module 400 thread for setting up between mapping table and host node is set up.
In one embodiment of the invention, the number of the thread between mapping table and host node is one or two. Set up module 400 and for the first thread set up between described first mapping table and described host node and set up the second thread between described second mapping table and described host node.
Wherein, delivery module 500 for sending the described metadata image file described host node to by described thread by the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, delivery module 500 includes the first delivery module 501 and the second delivery module 502, wherein, first delivery module 501 is for sending more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, second delivery module 502 is for sending more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Wherein, modified module 600 realizes for using described thread that the write operation for the first file is modified described mapping table.
The test device for HDFS host node according to embodiments of the present invention, the machine resources of use is less, it is possible to construct desired cluster scale and quantity of documents rapidly, and the testing efficiency of HDFS pressure and performance is higher.
Fig. 5 be according to an embodiment of the invention for HDFS host node test device structured flowchart.
As it is shown in figure 5, the test device for HDFS host node according to embodiments of the present invention, including configuration module 100, generation module 200, analog module 300, set up module 400, delivery module 500, judge module 700 and modified module 600.
Wherein, configuration module 100 is for constructing multiple meta data files according to predetermined configurations form.
In one embodiment of the invention, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Wherein, generation module 200 is for generating according to the plurality of meta data file in the metadata image file in described host node, and described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size.
Wherein, analog module 300 is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, analog module 300 includes the first analog module 301 and the second analog module 302, wherein, first analog module 301 is for the number of the multiple data blocks according to each meta data file place, more than first of simulation HDFS are from node and set up the first mapping table of individual relation from node and more than first meta data file between multiple data blocks at each meta data file place more than first, second analog module 302 is for the number of the multiple data blocks according to each meta data file place, more than second of simulation HDFS are from node and set up the second mapping table of individual relation from node and more than second meta data file between multiple data blocks at each meta data file place more than second. wherein, simulator carry out unified management to more than one from node, and be responsible for communicating with host node by simulator. the size of multiple data blocks is predetermined. multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Wherein, the module 400 thread for setting up between mapping table and host node is set up.
In one embodiment of the invention, the number of the thread between mapping table and host node is one or two. Set up module 400 and for the first thread set up between described first mapping table and described host node and set up the second thread between described second mapping table and described host node.
Wherein, delivery module 500 for sending the described metadata image file described host node to by described thread by the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, delivery module 500 includes the first delivery module 501 and the second delivery module 502, wherein, first delivery module 501 is for sending more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, second delivery module 502 is for sending more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Wherein, it is judged that module 700 is for judging whether the write operation for the first file relates to the first mapping table and the second mapping table; If it is, judge module 700 determine whether the first-phase in the first mapping table involved by the write operation for the first file should second-phase from node and the second mapping table should from node.
Wherein, modified module 600 realizes for using thread that the write operation for the first file is modified mapping table. Specifically, modified module 600 for revise in order in the first mapping table first-phase should from node and first-phase should from the relation between the data block of node, and by the first mapping table the write operation for the first file sent to the second mapping table with revise in the second mapping table second-phase should should from the relation between the data block of node from node and second-phase.
The test device for HDFS host node according to embodiments of the present invention, the machine resources of use is less, it is possible to construct desired cluster scale and quantity of documents rapidly, and the testing efficiency of HDFS pressure and performance is higher.
Fig. 6 be according to an embodiment of the invention for HDFS host node test device structured flowchart.
As shown in Figure 6, the test device for HDFS host node according to embodiments of the present invention, including configuration module 100, generation module 200, analog module 300, sets up module 400, delivery module 500, judge module 700, modified module 600 and feedback module 800.
Wherein, configuration module 100 is for constructing multiple meta data files according to predetermined configurations form.
In one embodiment of the invention, predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of catalogue is dir [n], the configuration format of file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, and blocksize represents the size of each data block. Wherein, each file place is identical with the number of copies of each file from the number of node.
Wherein, generation module 200 is for generating according to the plurality of meta data file in the metadata image file in described host node, and described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size.
Wherein, analog module 300 is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, analog module 300 includes the first analog module 301 and the second analog module 302, wherein, first analog module 301 is for the number of the multiple data blocks according to each meta data file place, more than first of simulation HDFS are from node and set up the first mapping table of individual relation from node and more than first meta data file between multiple data blocks at each meta data file place more than first, second analog module 302 is for the number of the multiple data blocks according to each meta data file place, more than second of simulation HDFS are from node and set up the second mapping table of individual relation from node and more than second meta data file between multiple data blocks at each meta data file place more than second. wherein, simulator carry out unified management to more than one from node, and be responsible for communicating with host node by simulator. the size of multiple data blocks is predetermined. multiple include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
Wherein, the module 400 thread for setting up between mapping table and host node is set up.
In one embodiment of the invention, the number of the thread between mapping table and host node is one or two. Set up module 400 and for the first thread set up between described first mapping table and described host node and set up the second thread between described second mapping table and described host node.
Wherein, delivery module 500 for sending the described metadata image file described host node to by described thread by the plurality of relation between multiple data blocks at node and each meta data file place.
In one embodiment of the invention, delivery module 500 includes the first delivery module 501 and the second delivery module 502, wherein, first delivery module 501 is for sending more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to metadata image file in host node by first thread, second delivery module 502 is for sending more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to metadata image file in host node by the second thread.
Wherein, it is judged that module 700 is for judging whether the write operation for the first file relates to the first mapping table and the second mapping table; If it is, judge module 700 determine whether the first-phase in the first mapping table involved by the write operation for the first file should second-phase from node and the second mapping table should from node.
Wherein, modified module 600 realizes for using thread that the write operation for the first file is modified mapping table. Specifically, modified module 600 for revise in order in the first mapping table first-phase should from node and first-phase should from the relation between the data block of node, and by the first mapping table the write operation for the first file sent to the second mapping table with revise in the second mapping table second-phase should should from the relation between the data block of node from node and second-phase.
Wherein, feedback module 800 is for feeding back to the metadata image file in host node by amendment result.
The test device for HDFS host node according to embodiments of the present invention, the machine resources of use is less, it is possible to construct desired cluster scale and quantity of documents rapidly, and the testing efficiency of HDFS pressure and performance is higher.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination. In the above-described embodiment, multiple steps or method can realize with the storage software or firmware in memory and by suitable instruction execution system execution. Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: there is the discrete logic of logic gates for data signal realizes logic function, there is the special IC of suitable combination logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
In the description of this specification, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example describe are contained at least one embodiment or the example of the present invention. In this manual, the schematic representation of above-mentioned term is not necessarily referring to identical embodiment or example. And, the specific features of description, structure, material or feature can combine in an appropriate manner in any one or more embodiments or example.
Although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, being appreciated that and these embodiments can be carried out multiple change, amendment, replacement and modification without departing from the principles and spirit of the present invention, the scope of the invention and equivalency.

Claims (20)

1. the method for testing for HDFS host node, it is characterised in that comprise the following steps:
A. multiple meta data files are constructed according to predetermined configurations form;
B. generating the metadata image file in described host node according to the plurality of meta data file, described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size;
C. according to the ID of multiple data blocks at each meta data file place, number and size, use that simulator simulation HDFS's is multiple from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place;
D. the thread between described mapping table and described host node is set up;
E. by described thread the plurality of relation between multiple data blocks at node and each meta data file place sent to the described metadata image file described host node; And
F. use described thread that the write operation for the first data file is modified described mapping table to realize.
2. the method for testing for HDFS host node according to claim 1, it is characterized in that, described predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of described catalogue is dir [n], the configuration format of described file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, blocksize represents the size of each data block.
3. the method for testing for HDFS host node according to claim 1, it is characterised in that each file place identical with the number of copies of each file from the number of node.
4. the method for testing for HDFS host node according to claim 1, it is characterised in that the number of the thread between described mapping table and described host node is one or two.
5. the method for testing for HDFS host node according to claim 1, it is characterised in that
Described step C includes: the number according to multiple data blocks at each meta data file place, use more than first of the first simulator simulation HDFS from node the first mapping table setting up described more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place, use more than second of the second simulator simulation HDFS from node the second mapping table setting up described more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place
Described step D includes: sets up the first thread between described first mapping table and described host node and sets up the second thread between described second mapping table and described host node,
Described step E includes: send described more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to described metadata image file in described host node by described first thread, sends described more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to described metadata image file in described host node by described second thread.
6. the method for testing for HDFS host node according to claim 5, it is characterised in that farther include step:
Judge whether the write operation of described first data file is related to described first mapping table and described second mapping table;
If it is, the first-phase determined whether in the first mapping table involved by the write operation for described first data file should second-phase from node and the second mapping table should from node; And
Revise in order in described first mapping table first-phase should from node and described first-phase should from the relation between the data block of node, and by described first mapping table the write operation for described first data file sent to described second mapping table with revise in described second mapping table second-phase should should from the relation between the data block of node from node and described second-phase.
7. the method for testing for HDFS host node according to claim 6, it is characterised in that farther include step:
Described second mapping table will feed back to described first mapping table about the amendment result of the write operation of described first data file in described second mapping table; And
Amendment result about the write operation of described first data file in described second mapping table and described first mapping table is fed back to the metadata image file in described host node by described first mapping table.
8. the method for testing for HDFS host node according to claim 1, it is characterised in that farther include step:
G. amendment result is fed back to the metadata image file in described host node by described mapping table.
9. the method for testing for HDFS host node according to any one of claim 1 to 8, it is characterised in that the size of the plurality of data block is predetermined.
10. the method for testing for HDFS host node according to any one of claim 1 to 8, it is characterized in that, the plurality of include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
11. the test device for HDFS host node, it is characterised in that including:
Configuration module, described configuration module is for constructing multiple meta data files according to predetermined configurations form;
Generation module, described generation module is for generating according to the plurality of meta data file in the metadata image file in described host node, and described metadata image file includes the ID of multiple data blocks at each meta data file place, number and size;
Analog module, described analog module is for according to the ID of multiple data blocks at each meta data file place, number and size, and simulation HDFS multiple are from node and the mapping table setting up the plurality of relation between multiple data blocks at node and each meta data file place;
Setting up module, described module of setting up is for setting up the thread between described mapping table and described host node;
Delivery module, described delivery module is used for the described metadata image file sending to described host node by the plurality of relation between multiple data blocks at node and each meta data file place by described thread; And
Modified module, described modified module realizes for using described thread that the write operation for the first data file is modified described mapping table.
12. the test device for HDFS host node according to claim 11, it is characterized in that, described predetermined configurations form includes the configuration format of catalogue and the configuration format of file, the configuration format of described catalogue is dir [n], the configuration format of described file is file [m:replication:blocknumber:blocksize], wherein n represents n catalogue of generation under current directory, m represents m file of generation under current directory, replication represents the number of copies of each file, blocknumber represents the number of the data block that each file comprises, blocksize represents the size of each data block.
13. the test device for HDFS host node according to claim 11, it is characterised in that each file place identical with the number of copies of each file from the number of node.
14. the test device for HDFS host node according to claim 11, it is characterised in that the number of the thread between described mapping table and described host node is one or two.
15. the test device for HDFS host node according to claim 11, it is characterised in that
Described analog module includes the first analog module and the second analog module, wherein, described first analog module is for the number of the multiple data blocks according to each meta data file place, more than first of simulation HDFS are from node the first mapping table setting up described more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place, described second analog module is for the number of the multiple data blocks according to each meta data file place, more than second of simulation HDFS are from node the second mapping table setting up described more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place,
Described module of setting up for the first thread set up between described first mapping table and described host node and sets up the second thread between described second mapping table and described host node,
Described delivery module includes the first delivery module and the second delivery module, wherein, described first delivery module is for sending described more than first relations from node and more than first meta data file between multiple data blocks at each meta data file place to described metadata image file in described host node by described first thread, described second delivery module is for sending described more than second relations from node and more than second meta data file between multiple data blocks at each meta data file place to described metadata image file in described host node by described second thread.
16. the test device for HDFS host node according to claim 15, it is characterised in that farther include:
Judge module, described judge module is for judging whether the write operation of described first data file is related to described first mapping table and described second mapping table;
If it is, described judge module determine whether the first-phase in the first mapping table involved by the write operation for described first data file should second-phase from node and the second mapping table should from node,
Wherein said modified module for revise in order in described first mapping table first-phase should from node and described first-phase should from the relation between the data block of node, and by described first mapping table the write operation for described first data file sent to described second mapping table with revise in described second mapping table second-phase should should from the relation between the data block of node from node and described second-phase.
17. the test device for HDFS host node according to claim 16, it is characterised in that:
Described second mapping table will feed back to described first mapping table about the amendment result of the write operation of described first data file in described second mapping table; And
Amendment result about the write operation of described first data file in described second mapping table and described first mapping table is fed back to the metadata image file in described host node by described first mapping table.
18. the test device for HDFS host node according to claim 11, it is characterised in that farther include:
Feedback module, described feedback module is for feeding back to the metadata image file in described host node by amendment result.
19. the test device for HDFS host node according to any one of claim 11 to 18, it is characterised in that the size of the plurality of data block is predetermined.
20. the test device for HDFS host node according to any one of claim 11 to 18, it is characterized in that, the plurality of include and the timestamp of the ID of each data block relevant from node, size and amendment from the relation between node to multiple data blocks at each meta data file place.
CN201210037776.1A 2012-02-17 2012-02-17 Method of testing and device for HDFS host node Active CN103257970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210037776.1A CN103257970B (en) 2012-02-17 2012-02-17 Method of testing and device for HDFS host node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210037776.1A CN103257970B (en) 2012-02-17 2012-02-17 Method of testing and device for HDFS host node

Publications (2)

Publication Number Publication Date
CN103257970A CN103257970A (en) 2013-08-21
CN103257970B true CN103257970B (en) 2016-06-15

Family

ID=48961899

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210037776.1A Active CN103257970B (en) 2012-02-17 2012-02-17 Method of testing and device for HDFS host node

Country Status (1)

Country Link
CN (1) CN103257970B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106453512A (en) * 2016-09-05 2017-02-22 努比亚技术有限公司 Redis cluster information monitoring device and method
CN111240899B (en) * 2020-01-10 2023-07-25 北京百度网讯科技有限公司 State machine copying method, device, system and storage medium
CN111858098B (en) * 2020-07-24 2023-11-17 成都成信高科信息技术有限公司 Data exchange method based on mass data
CN115934670B (en) * 2023-03-09 2023-05-05 智者四海(北京)技术有限公司 Method and device for verifying copy placement strategy of HDFS (Hadoop distributed File System) multi-machine room

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035697A (en) * 2010-12-31 2011-04-27 中国电子科技集团公司第十五研究所 Concurrent connections performance testing system and method for file system
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102035697A (en) * 2010-12-31 2011-04-27 中国电子科技集团公司第十五研究所 Concurrent connections performance testing system and method for file system
CN102332027A (en) * 2011-10-15 2012-01-25 西安交通大学 Mass non-independent small file associated storage method based on Hadoop

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Yaqi Gao 等.A Metadata Access Strategy of Learning Resource baesd on HDFS.《Image Analysis and Signal Processing》.2011,620-622. *
李宽.基于HDFS的分布式Namenode节点模型的研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2011,(第12期), *
栾亚建 等.Hadoop平台的性能优化研究.《计算机工程》.2010,第36卷(第14期), *
栾亚建.分布式文件系统元数据管理研究与优化.《中国优秀硕士学位论文全文数据库 信息科技辑》.2011,(第04期), *

Also Published As

Publication number Publication date
CN103257970A (en) 2013-08-21

Similar Documents

Publication Publication Date Title
CN111723160B (en) Multi-source heterogeneous incremental data synchronization method and system
Kleppmann Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems
Sumbaly et al. The big data ecosystem at linkedin
Thusoo et al. Data warehousing and analytics infrastructure at facebook
CN100440206C (en) Synchronizing logical views independent of physical storage representations
CN104598376A (en) Data driving layered automation test system and method
Donvito et al. Testing of several distributed file-systems (HDFS, Ceph and GlusterFS) for supporting the HEP experiments analysis
CN103257970B (en) Method of testing and device for HDFS host node
Silva et al. Integrating big data into the computing curricula
Altman et al. Digital preservation through archival collaboration: The data preservation alliance for the social sciences
CN111145011A (en) Banking business system building method and device
Tang et al. Achieving convergent causal consistency and high availability for cloud storage
Branco et al. Managing very large distributed data sets on a data grid
US20220197761A1 (en) Cloud architecture for replicated data services
Liu et al. A Replication‐Based Mechanism for Fault Tolerance in MapReduce Framework
Brown et al. Big (ger) Sets: decomposed delta CRDT Sets in Riak
Noor et al. Survey on replication techniques for distributed system
Strong et al. Los alamos national laboratory interviews
CN101071428B (en) Methods, apparatuses, systems, and computer program products for generating a file structure to access multimedia files
van Kemenade The CERN digital memory platform: building a CERN scale OAIS compliant archival service
Ma et al. Integration of digital campus resources based on cloud computing
CN104765748A (en) Method and device for converting copying table into slicing table
CN110362582A (en) A kind of method and apparatus for realizing zero shutdown upgrading
Jeffries Oracle GoldenGate 11g Implementer's Guide
Mohd Noor et al. Survey on replication techniques for distributed system.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant