CN110865989A

CN110865989A - Business processing method for large-scale computing cluster

Info

Publication number: CN110865989A
Application number: CN201911159312.6A
Authority: CN
Inventors: 臧林劼; 何营
Original assignee: Inspur Electronic Information Industry Co Ltd
Current assignee: Inspur Electronic Information Industry Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-03-06

Abstract

The application discloses a large-scale computing cluster and a service processing method, a device, a slave node and a readable storage medium thereof, wherein the scheme comprises the following steps: acquiring data to be processed from an NFS server according to a computing service request; calling the operation software and the dynamic link library which are pre-mounted in the local directory of the shared directory, and carrying out corresponding processing on data to be processed; and writing the processed data into a read-write buffer area of the NFS server so that the NFS server can conveniently flush the data in the read-write buffer area to a physical disk when a preset trigger condition is reached. Therefore, the scheme improves the efficiency, maintainability and manageability of deploying the operation software and the dynamic link library by sharing the operation software and the dynamic link library through the user mode network file system. In addition, in the service processing process, only when the preset condition is reached, the NFS server flushes the data in the read-write buffer area to the physical disk, so that the IO times of the disk are reduced, and the overall performance of the large-scale computing cluster is improved.

Description

Business processing method for large-scale computing cluster

Technical Field

The present application relates to the field of computer technologies, and in particular, to a large-scale computing cluster, a service processing method and apparatus thereof, a slave node, and a readable storage medium.

Background

Mass storage and extensible file storage systems are widely applied and deeply developed in enterprise information systems, the number of core file storage applications of users is increasing, and data interaction of files through networks is a mode commonly adopted by enterprise users. With the increasingly huge enterprise data, users have increasingly high requirements on data transmission performance and stability, and the application to network file systems is gradually wide.

Nfs (network File system), a network File system, is one of File systems supported by FreeBSD, which allows computers in a network to share resources through a TCP/IP network. In the application of NFS, a client application of a local NFS can transparently read and write files located on a remote NFS server, just like accessing local files.

The kernel mode means that the CPU can access all data in the memory, including peripheral devices such as a hard disk and a network card, and the CPU can also switch itself from one program to another program. The user state refers to that only limited access to the memory is available, access to peripheral equipment is not allowed, the capacity of occupying the CPU is deprived, and CPU resources can be acquired by other programs. The NFS-Ganesha is a user-mode network file system, belongs to an open source project, and has better manageability and maintainability compared with a kernel-mode NFS under a system service fault scene, and the user-mode NFS-Ganesha is easy to implement and maintain, so that the application prospect of the mass large-data distributed object storage NFS-Ganesha is very wide at present.

With the continuous development of the big data era, the IO performance of a single computer is limited, so that complex computing tasks cannot be efficiently completed, and services under many computing scenes need large-scale computing nodes to meet working requirements; when a computing service operation is operated, operation software and dynamic library files required by computing need to be loaded, and the operation software and the dynamic library files need to be installed and deployed on each computing node in a large-scale computing node.

Therefore, how to avoid the complexity of the deployment work of the operation software and the dynamic link library in the large-scale computing cluster and improve the efficiency of service processing is a problem to be solved by technical personnel in the field.

Disclosure of Invention

The application aims to provide a large-scale computing cluster, a service processing method and device thereof, a slave node and a readable storage medium, and aims to solve the problems that operation software and a dynamic link library need to be deployed at each node in the traditional large-scale computing cluster, the deployment efficiency and maintainability are low, and the service processing efficiency of the cluster is low. The specific scheme is as follows:

in a first aspect, the present application provides a service processing method for a large-scale computing cluster, which is applied to a slave node, and includes:

acquiring data to be processed from an NFS server according to a computing service request;

according to the computing service request, calling operation software and a dynamic link library which are pre-mounted in a local directory of a shared directory, and carrying out corresponding processing on the data to be processed; the control node of the large-scale computing cluster exports and shares the operating software and the directory where the dynamic link library is located in advance based on a user mode network file system to generate the shared directory;

writing the processed data into a read-write buffer of the NFS server, so that the NFS server can conveniently flush the data in the read-write buffer to a physical disk when a preset trigger condition is reached.

Preferably, the acquiring, according to the computing service request, the data to be processed from the NFS server includes:

sending a data acquisition request to an NFS server according to a computing service request, so that the NFS server can read target data from a physical disk to the read-write buffer area according to the data acquisition request, wherein the target data comprises data to be processed; and acquiring the data to be processed from the read-write buffer.

Preferably, the writing the processed data into a read-write buffer of the NFS server includes:

generating a write request message of an RPC layer according to the processed data, wherein the write request message comprises an Ethernet header, an IP header, a TCP header, an RPC header and an NFS data segment;

and sending the write request message to the NFS server so as to write the processed data into a read-write buffer of the NFS server.

Preferably, after the sending the write request packet to the NFS server, the method further includes:

and in the read-write buffer area, the NFS server allocates continuous memory for NFS data segments with continuous target logical addresses.

Preferably, before the invoking of the job software and the dynamic link library pre-mounted to the local directory of the shared directory, the method further includes:

based on a user mode network file system, exporting and sharing the operation software of the control node and the directory where the dynamic link library is located to generate a shared directory;

and mounting the local directory of each slave node to the shared directory through a batch processing command script.

Preferably, the preset trigger condition includes any one or any combination of the following: the buffer amount of the read-write buffer area exceeds a preset size, no write request is received after the buffer amount exceeds a preset time, and the target logic address of the current write request is not continuous with the target logic address of the written data in the read-write buffer area.

In a second aspect, the present application provides a service processing apparatus for a large-scale computing cluster, applied to a slave node, including:

a data acquisition module: the NFS processing system is used for acquiring data to be processed from the NFS server according to the computing service request;

a data processing module: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for calling operation software and a dynamic link library which are pre-mounted in a local directory of a shared directory according to the computing service request and correspondingly processing the data to be processed; the control node of the large-scale computing cluster exports and shares the operating software and the directory where the dynamic link library is located in advance based on a user mode network file system to generate the shared directory;

a data writing module: and the data processing device is used for writing the processed data into a read-write buffer of the NFS server so that the NFS server can conveniently flush the data in the read-write buffer to a physical disk when a preset trigger condition is reached.

In a third aspect, the present application provides a slave node of a large-scale computing cluster, comprising:

a memory: for storing a computer program;

a processor: for executing the computer program to implement the steps of a business process method of a large-scale computing cluster as described above.

In a fourth aspect, the present application provides a large-scale computing cluster, comprising: the NFS server comprises an NFS server and an NFS client, wherein the NFS client comprises a control node and a slave node;

the control node is used for exporting and sharing the directories where the operating software and the dynamic link library are located in advance based on a user mode network file system, generating a shared directory, and mounting the local directory of the slave node to the shared directory;

the slave node is used for acquiring data to be processed from the NFS server according to the calculation service request; according to the computing service request, calling operation software and a dynamic link library in a local directory to correspondingly process the data to be processed; writing the processed data into a read-write buffer of the NFS server;

and the NFS server is used for flushing the data of the read-write buffer area to a physical disk when a preset trigger condition is reached.

In a fifth aspect, the present application provides a readable storage medium having stored thereon a computer program for implementing the steps of a method of business processing for a large-scale computing cluster as described above when executed by a processor.

The application provides a business processing method of a large-scale computing cluster, which is applied to a slave node, and comprises the following steps: acquiring data to be processed from an NFS server according to a computing service request; according to the computing service request, calling the operation software and the dynamic link library which are pre-mounted in the local directory of the shared directory, and carrying out corresponding processing on the data to be processed; and writing the processed data into a read-write buffer area of the NFS server so that the NFS server can conveniently flush the data in the read-write buffer area to a physical disk when a preset trigger condition is reached. The control node of the large-scale computing cluster exports and shares the operation software and the directory where the dynamic link library is located in advance based on a user mode network file system to generate a shared directory.

Therefore, the method improves the efficiency, maintainability and manageability of the large-scale computing cluster deployment operation software and the dynamic link library by sharing the operation software and the dynamic link library through the user mode network file system. In addition, in the service processing process, the slave node sends the processed data to a read-write buffer area of the NFS server, and the NFS server only flushes the data in the read-write buffer area to the physical disk when a preset condition is reached, so that the direct IO times of the disk are reduced, the response time of a service request is short, the throughput is high, the concurrency resistance is high, and the overall performance of a large-scale computing cluster is improved.

In addition, the application also provides a service processing device of the large-scale computing cluster, a dependent node, the large-scale computing cluster and a readable storage medium, and the technical effect of the method corresponds to that of the method, and the method is not repeated herein.

Drawings

For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart illustrating a first implementation of a service processing method for a large-scale computing cluster according to an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating an implementation of a second embodiment of a service processing method for a large-scale computing cluster according to the present application;

fig. 3 is a schematic view of a service processing process in a second embodiment of a service processing method for a large-scale computing cluster provided in the present application;

fig. 4 is a schematic diagram of a data caching process in a second embodiment of a service processing method for a large-scale computing cluster according to the present application;

fig. 5 is a functional block diagram of an embodiment of a service processing apparatus of a large-scale computing cluster provided in the present application.

Detailed Description

The core of the application is to provide a large-scale computing cluster and a service processing method, a device, a slave node and a readable storage medium thereof, and the efficiency, the maintainability and the manageability of the deployment of the operating software and the dynamic link library are improved by sharing the operating software and the dynamic link library by a user-mode network file system. In addition, the NFS server only flushes the data in the read-write buffer area to the physical disk when the preset conditions are met, so that the IO times of the disk are reduced, and the overall performance of the large-scale computing cluster is improved.

In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present application and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a first embodiment of a service processing method for a large-scale computing cluster is described below, where the first embodiment is applied to a slave node, and includes:

s101, acquiring data to be processed from an NFS server according to a computing service request;

s102, according to the computing service request, calling operation software and a dynamic link library which are pre-mounted in a local directory of a shared directory, and carrying out corresponding processing on the data to be processed;

the control node of the large-scale computing cluster exports and shares the operation software and the directory where the dynamic link library is located in advance based on a user mode network file system to generate the shared directory.

And S103, writing the processed data into a read-write buffer of the NFS server so that the NFS server can conveniently flush the data in the read-write buffer to a physical disk when a preset trigger condition is reached.

The large-scale computing cluster of this embodiment includes an NFS server and an NFS client, where the NFS client specifically includes a control node and a slave node, and the implementation process of this embodiment is implemented based on the slave node. As described above, the control node of this embodiment exports and shares the directories where the job software and the dynamic link library are located in advance based on a user-mode network file system (NFS-Ganesha), and mounts the local directory of each slave node to the shared directory, so that each slave node can acquire the job software and the dynamic link library from the shared directory as if accessing the local directory. Based on this, the present embodiment avoids the cumbersome work of deploying job software and dynamic link libraries on each node.

The business processing process based on the large-scale computing cluster is as follows: starting from a request of initiating a calculation operation by an NFS client, firstly, the NFS client initiates storage IO to an NFS server, reads data to be processed from the NFS server to the NFS client, then a slave node of the NFS client initiates software calculation to a control node, at this time, the control node serves as both the NFS server and the NFS client, is equivalent to two sets of user mode network file system services, and serves as a software node, and after calculation, the NFS client stores the processed data into the NFS server again.

To improve file system performance, the kernel may allocate buffers from physical memory for caching system operations and data files. When the kernel receives the read-write request, the kernel firstly goes to the cache region to find whether the requested data exists, if so, the data is directly returned, and if not, the disk is directly operated through the drive program. Therefore, in this embodiment, the NFS server sets a read/write buffer in advance, on one hand, the read/write buffer may be used to cache processed data written by the NFS client, and on the other hand, the read/write buffer may be used to cache to-be-processed data requested to be read by the NFS client.

Specifically, the NFS server of this embodiment implements pre-reading through the read/write buffer, that is, when the NFS client reads data, the file system reads more file content than expected for the application program once and caches the file content in the read/write buffer, and when the NFS client initiates a read request again, the NFS client does not need to perform repeated read requests, thereby reducing IO times; when the NFS client writes data, the NFS client writes the data into a read-write buffer area, and when a certain trigger condition is met, the NFS server prints the data onto a physical disk, so that the system call times are reduced, and the disk access frequency is reduced.

The method for processing the service of the large-scale computing cluster is applied to the slave node, and the efficiency, maintainability and manageability of the large-scale computing cluster to deploy the operating software and the dynamic link library are improved by sharing the operating software and the dynamic link library through the user-mode network file system. In addition, in the service processing process, the slave node sends the processed data to a read-write buffer area of the NFS server, and the NFS server only flushes the data in the read-write buffer area to the physical disk when a preset condition is reached, so that the direct IO times of the disk are reduced, the response time of a service request is short, the throughput is high, the concurrency resistance is high, and the overall performance of a large-scale computing cluster is improved.

The second embodiment of the service processing method for a large-scale computing cluster provided by the present application is described in detail below, and is implemented based on the first embodiment, and is expanded to a certain extent on the basis of the first embodiment.

Fig. 2 is a flowchart of implementation of the second embodiment, and fig. 3 is a process diagram of the second embodiment. Referring to fig. 2 and fig. 3, the second embodiment is applied to a slave node, and specifically includes:

s201, exporting and sharing the operation software of the control node and the directory where the dynamic link library is located based on a user mode network file system to generate a shared directory;

s202, mounting the local directories of the slave nodes to the shared directory through a batch processing command script;

any one of the large-scale computing nodes is used as a control node, for example, the first computing node in 256 computing nodes is used as a control node, services are deployed, and NFS shares export job software and a dynamic link library. The deployment service is to set up a user mode network file system service and export and share the operation software and the directory where the dynamic link library is located. And in the large-scale calculation, the shared export directory is mounted to the local directory by the rest calculation nodes through the batch processing command script. Therefore, when the large-scale computing node starts to run the service, the slave node only needs to call the operation software and the dynamic link library of the local directory.

S203, sending a data acquisition request to an NFS server according to a computing service request, so that the NFS server can read target data from a physical disk to the read-write buffer according to the data acquisition request, wherein the target data includes but is not limited to data to be processed; acquiring the data to be processed from the read-write buffer area;

s204, according to the computing service request, invoking the operation software and the dynamic link library which are pre-mounted in the local directory of the shared directory, and carrying out corresponding processing on the data to be processed;

s205, generating a write request message of an RPC layer according to the processed data;

as shown in fig. 4, the write request message includes an ethernet header, an IP header, a TCP header, an RPC header, and an NFS data segment.

S206, sending the write request message to the NFS server to write the processed data into a read-write buffer of the NFS server;

specifically, the NFS server sets a read/write BUFFER of the user-mode network file system, and allocates a segment of continuous memory space NFS cache [ BUFFER _ SIZE ].

And S207, in the read-write buffer area, the NFS server allocates continuous memory for NFS data segments with continuous target logical addresses, and when a preset trigger condition is reached, the data in the read-write buffer area is flushed to a physical disk.

Analyzing the NFS write request in each data packet by decoding, distributing the required memory from the read/write buffer from the data segment received when the offset of the NFS data packet is 0, and receiving the data segment of the NFS into the buffer of the buffer; when memory is allocated from the IO buffer of the NFS, NFS data with continuous target addresses are allocated to continuous memory, and a plurality of continuous NFS data are combined into one continuous physical memory; and when a certain trigger condition is met, brushing the continuous physical memory data segments into the physical disk.

As a specific implementation manner, the preset trigger condition includes any one or any combination of the following: the buffer amount of the read-write buffer area exceeds a preset size, no write request is received after the buffer amount exceeds a preset time, and the target logic address of the current write request is not continuous with the target logic address of the written data in the read-write buffer area.

In summary, as shown in fig. 3, in the implementation process of this embodiment, the upper layer application makes a system call to the NFS client through the virtual file system layer, and the NFS client transmits the data acquisition request to the NFS server through the RPC layer. After receiving a data acquisition request of the RPC layer, the NFS server reads data from the disk array through specific file system storage and caches the data to a preset read-write buffer area. After receiving the data writing request, the NFS caches the written data to a read-write buffer, and when a preset trigger condition is reached, the NFS flushes the data to the disk array.

As shown in fig. 4, after receiving a write request message from an RPC layer, the NFS server obtains an NFS operation word and an NFS data segment from the write request message, caches the NFS data segment in a read/write buffer according to a logical address, and determines whether to flush data to a disk array according to whether a trigger condition is met. It is understood that after data is flushed, the NFS server will release the read-write buffer.

It can be seen that the service processing method for a large-scale computing cluster provided in this embodiment relates to the field of data storage and sequential IO caching of a network file system, and in particular relates to a cache acceleration system of a user-mode network file system in a large-scale computing node. Specifically, the embodiment implements a large-scale computing node cache acceleration system based on the user-mode network file system, and improves the efficiency, maintainability and manageability of the large-scale computing node deployment software and the dynamic link library through a method of sharing the operation software and the dynamic link library by the user-mode network file system; in addition, data in the computation scene service is pre-read and cached in a read-write buffer area, and the direct IO times of a disk are reduced through the pre-read cache, so that the computation response time is short, the throughput is high, the concurrency resistance is strong, and the overall performance of large-scale computation is improved.

In the following, a business processing apparatus of a large-scale computing cluster provided in an embodiment of the present application is introduced, and a business processing apparatus of a large-scale computing cluster described below and a business processing method of a large-scale computing cluster described above may be referred to correspondingly.

The service processing apparatus of the present embodiment is applied to a slave node, as shown in fig. 5, and includes:

the data acquisition module 501: the NFS processing system is used for acquiring data to be processed from the NFS server according to the computing service request;

the data processing module 502: the system comprises a data processing module, a data processing module and a data processing module, wherein the data processing module is used for calling operation software and a dynamic link library which are pre-mounted in a local directory of a shared directory according to the computing service request and correspondingly processing the data to be processed; the control node of the large-scale computing cluster exports and shares the operating software and the directory where the dynamic link library is located in advance based on a user mode network file system to generate the shared directory;

the data writing module 503: and the data processing device is used for writing the processed data into a read-write buffer of the NFS server so that the NFS server can conveniently flush the data in the read-write buffer to a physical disk when a preset trigger condition is reached.

The service processing apparatus of the large-scale computing cluster of this embodiment is used to implement the service processing method of the large-scale computing cluster, and therefore a specific implementation manner of the apparatus can be seen in the foregoing embodiment parts of the service processing method of the large-scale computing cluster, for example, the data obtaining module 501, the data processing module 502, and the data writing module 503 are respectively used to implement steps S101, S102, and S103 in the service processing method of the large-scale computing cluster. Therefore, specific embodiments thereof may be referred to in the description of the corresponding respective partial embodiments, and will not be described herein.

In addition, since the service processing apparatus of the large-scale computing cluster of this embodiment is used to implement the service processing method of the large-scale computing cluster, the role of the service processing apparatus corresponds to that of the method described above, and details are not described here.

The present application further provides a slave node of a large-scale computing cluster, comprising:

a memory: for storing a computer program;

a processor: for executing the computer program for implementing the steps of a business process method of a large-scale computing cluster as described above.

In addition, the present application can also provide a large-scale computing cluster, including: the NFS server comprises an NFS server and an NFS client, wherein the NFS client comprises a control node and a slave node;

Finally, the present application also provides a readable storage medium having stored thereon a computer program for implementing the steps of a method of business processing of a large-scale computing cluster as described above when executed by a processor.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above detailed descriptions of the solutions provided in the present application, and the specific examples applied herein are set forth to explain the principles and implementations of the present application, and the above descriptions of the examples are only used to help understand the method and its core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A business processing method of a large-scale computing cluster is applied to a slave node and comprises the following steps:

2. The method of claim 1, wherein the obtaining the data to be processed from the NFS server according to the computing service request comprises:

3. The method of claim 1, wherein the writing the processed data to a read-write buffer of the NFS server comprises:

4. The method of claim 3, wherein after the sending the write request message to the NFS server, further comprising:

5. The method of claim 1, wherein prior to said invoking job software and dynamically linked libraries pre-mounted to a local directory of a shared directory, further comprising:

6. The method according to any one of claims 1 to 5, wherein the preset trigger condition comprises any one or any combination of the following: the buffer amount of the read-write buffer area exceeds a preset size, no write request is received after the buffer amount exceeds a preset time, and the target logic address of the current write request is not continuous with the target logic address of the written data in the read-write buffer area.

7. A business processing device of a large-scale computing cluster is applied to a slave node and comprises the following components:

8. A slave node of a large-scale computing cluster, comprising:

a memory: for storing a computer program;

a processor: for executing said computer program for implementing the steps of a business process method of a large scale computing cluster according to any of claims 1-6.

9. A large-scale computing cluster, comprising: the NFS server comprises an NFS server and an NFS client, wherein the NFS client comprises a control node and a slave node;

10. A readable storage medium, having stored thereon a computer program for implementing the steps of a method of business processing of a large scale computing cluster as claimed in any one of claims 1 to 6 when executed by a processor.