CN114281636A - Method and device for processing user space file system fault - Google Patents

Method and device for processing user space file system fault Download PDF

Info

Publication number
CN114281636A
CN114281636A CN202111339749.5A CN202111339749A CN114281636A CN 114281636 A CN114281636 A CN 114281636A CN 202111339749 A CN202111339749 A CN 202111339749A CN 114281636 A CN114281636 A CN 114281636A
Authority
CN
China
Prior art keywords
file system
user space
daemon
space file
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111339749.5A
Other languages
Chinese (zh)
Other versions
CN114281636B (en
Inventor
吴广远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111339749.5A priority Critical patent/CN114281636B/en
Publication of CN114281636A publication Critical patent/CN114281636A/en
Application granted granted Critical
Publication of CN114281636B publication Critical patent/CN114281636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method, a system, equipment and a storage medium for processing faults of a user space file system, wherein the method comprises the following steps: dynamically acquiring a list of all computing nodes in the cluster, and distributing daemon to all computing nodes according to the list; detecting whether the management process condition of the computing node is normal or not through a daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; responding to the normal user space file system mounting point of the computing node, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through a daemon; and in response to the distributed file system file not being accessible through the user space file system mount point, canceling the user space file system mount point and re-mounting. The invention can greatly improve the operation and maintenance efficiency of the Hadoop cluster, reduce the waste of computing resources and improve the satisfaction degree of users on the Hadoop cluster.

Description

Method and device for processing user space file system fault
Technical Field
The present invention relates to the field of big data, and more particularly, to a method, system, device, and storage medium for handling a failure of a user space file system.
Background
In the face of massive unstructured data processing tasks, single computing power is difficult to deal with, if multi-computer parallel operation is adopted, application manufacturers need to develop a distributed file system and a scheduling framework by themselves, on one hand, the difficulty is high, a large amount of manpower and material resources are consumed, on the other hand, the application manufacturers cannot concentrate on data processing algorithm development, so in the face of the scene, most application manufacturers can select Hadoop based on an open source architecture to serve as a bottom platform, and application programs process massive unstructured data based on a Hadoop distributed file system (Hdfs) and a distributed scheduling framework (Yarn).
The development language of the Hadoop main push is Java, but most of the traditional unstructured data processing algorithms are developed by adopting C language in pursuit of extreme performance, and Hdfs has very limited support to the C language, so that Fuse (file in user space file system) is adopted to mount Hdfs to a Hadoop computing node, and the distributed file system is operated like a local file system by the Fuse.
In such usage scenarios, the Yarn is responsible for the management of computing resources (CPUs and memories) of all Hadoop computing nodes, but the Yarn cannot manage the computing resources occupied by the Fuse, so that the situation that the data processing subtask and the Fuse frequently conflict with each other in terms of resources, the Fuse is dead in a false state or a mount point fails, and finally all the computing tasks allocated to the node fail.
Due to the problem of the Yarn self-scheduling algorithm, nodes with resource contention cannot be predicted, and normally, only after a large number of computing tasks fail, the Fuse mounting point exception is manually processed, and the data processing task is resubmitted, so that the maintenance task of the Hadoop platform is heavy, and the computing resources of the Hadoop cluster are seriously wasted.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, a system, a computer device, and a computer readable storage medium for processing a user space file system fault, where the method and the system can identify an abnormal scene where a user space file system mount point fails or is stuck by deploying a user space file system daemon in all computing nodes of a Hadoop cluster, and automatically repair the user space file system mount point, thereby greatly improving the operation and maintenance efficiency of the Hadoop cluster, reducing the waste of computing resources, and improving the satisfaction of a user on the Hadoop cluster.
Based on the above object, an aspect of the embodiments of the present invention provides a method for handling a failure of a user space file system, including the following steps: dynamically acquiring a list of all computing nodes in a cluster, and distributing daemon to all computing nodes according to the list; detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the fact that the management process condition of the computing node is normal; responding to the normal user space file system mounting point of the computing node, and detecting whether a distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and in response to failing to access the distributed file system files through the user space file system mount point, canceling the user space file system mount point and re-mounting.
In some embodiments, the method further comprises: monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and in response to the daemon running abnormally and the number of restarting times reaching a threshold, replacing the daemon with a new daemon.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
In some embodiments, the method further comprises: and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
In another aspect of the embodiments of the present invention, a system for processing a failure of a user space file system is provided, including: the distribution module is configured to dynamically acquire a list of all computing nodes in the cluster and distribute daemons to all computing nodes according to the list; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and in response to the fact that the management process condition of the computing node is normal, detect whether a user space file system mount point of the computing node is invalid through the daemon; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and an execution module configured to cancel the userspace file system mount point and re-mount in response to an inability to access the distributed file system file via the userspace file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and in response to the daemon running abnormally and the number of restarting times reaching a threshold, replacing the daemon with a new daemon.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: by deploying the user space file system daemon in all the computing nodes of the Hadoop cluster, the abnormal scene that the user space file system mounting points are invalid or stuck can be identified, the user space file system mounting points are automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the computing resource waste is reduced, and the satisfaction degree of users on the Hadoop cluster is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram illustrating an embodiment of a method for handling a failure of a user space file system according to the present invention;
FIG. 2 is a diagram illustrating an embodiment of a system for handling a failure of a user space file system according to the present invention;
FIG. 3 is a diagram illustrating a hardware structure of an embodiment of a computer device for handling a failure of a user space file system according to the present invention;
FIG. 4 is a diagram of an embodiment of a computer storage medium for handling a user space file system failure provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In a first aspect of an embodiment of the present invention, an embodiment of a method for handling a failure of a user space file system is provided. Fig. 1 is a schematic diagram illustrating an embodiment of a method for handling a failure of a user space file system according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, dynamically acquiring a list of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the list;
s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether the user space file system mounting point of the computing node is invalid or not through the daemon in response to the fact that the management process condition of the computing node is normal;
s3, responding to the normal user space file system mounting point of the computing node, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and
and S4, in response to the distributed file system file can not be accessed through the user space file system mounting point, canceling the user space file system mounting point and re-mounting.
The application program submits tasks to Resource Manager (Resource management) nodes of a distributed scheduling framework, the Resource management nodes divide the tasks into a plurality of Map (Map) and Reduce (simplified) tasks, the Map tasks are distributed to different computing nodes according to a certain algorithm, the computing nodes access data on Hdfs through local Fuse to perform computation, the computation results are written on the Hdfs, and then the simplified tasks are collected through the Fuse to complete the computation tasks. In the process, once the Fuse mount of a certain node fails, all the computing tasks running on the node fail, so that the overall computing efficiency of the Hadoop cluster is slowed down, and if the overall computing efficiency is severe, the computing tasks fail.
In the embodiment of the invention, the monitoring nodes are added, the daemon process of the Fuse is deployed to all the nodes of the cluster in batch, and the Fuse mounting conditions of all the nodes of the cluster are monitored.
And dynamically acquiring a list of all the computing nodes in the cluster, and distributing a daemon program to all the computing nodes according to the list. And dynamically acquiring a list of all the computing nodes of the cluster through the monitoring node, and distributing the files of the daemon to all the computing nodes.
In some embodiments, the method further comprises: monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and in response to the daemon running abnormally and the number of restarting times reaching a threshold, replacing the daemon with a new daemon. The monitoring nodes monitor the operating states of the daemons of all the computing nodes, and the daemons of the nodes are restarted in time after the nodes with the failed daemons are found.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes. The health condition of the Hdfs is dynamically sensed through the monitoring nodes, for example, the daemons of all the computing nodes are timely stopped when the Hdfs service is stopped, and Fuse mounting of all the computing nodes is cancelled.
And detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether the user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node. And detecting whether the NodeManager (node management) process condition of each computing node is normal or not through a daemon.
In some embodiments, the method further comprises: and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
And responding to the normal user space file system mounting point of the computing node, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon.
In response to failing to access the distributed file system files through the user space file system mount point, canceling the user space file system mount point and re-mounting. And detecting whether the Hdfs file can be normally accessed through the Fuse mounting point (if the Fuse process is stuck, the Hdfs file cannot be normally accessed), and if the Hdfs file cannot be normally accessed, canceling the Fuse mounting point and re-mounting.
According to the embodiment of the invention, the user space file system daemon is deployed in all the computing nodes of the Hadoop cluster, so that the abnormal scene that the mounting points of the user space file system are invalid or jammed can be identified, the mounting points of the user space file system are automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the waste of computing resources is reduced, and the satisfaction degree of users on the Hadoop cluster is improved.
It should be particularly noted that, the steps in the embodiments of the method for handling a user-space file system failure described above can be mutually intersected, replaced, added, and deleted, so that these methods for handling a user-space file system failure, which are reasonably transformed by permutation and combination, should also belong to the scope of the present invention, and should not limit the scope of the present invention to the embodiments.
In view of the above object, a second aspect of the embodiments of the present invention provides a system for handling a failure of a user space file system. As shown in fig. 2, the system 200 includes the following modules: the distribution module is configured to dynamically acquire a list of all computing nodes in the cluster and distribute daemons to all computing nodes according to the list; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and in response to the fact that the management process condition of the computing node is normal, detect whether a user space file system mount point of the computing node is invalid through the daemon; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and an execution module configured to cancel the userspace file system mount point and re-mount in response to an inability to access the distributed file system file via the userspace file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and in response to the daemon running abnormally and the number of restarting times reaching a threshold, replacing the daemon with a new daemon.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, dynamically acquiring a list of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the list; s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether the user space file system mounting point of the computing node is invalid or not through the daemon in response to the fact that the management process condition of the computing node is normal; s3, responding to the normal user space file system mounting point of the computing node, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and S4, in response to the distributed file system file not being accessible through the user-space file system mount point, canceling the user-space file system mount point and re-mounting.
In some embodiments, the steps further comprise: monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and in response to the daemon running abnormally and the number of restarting times reaching a threshold, replacing the daemon with a new daemon.
In some embodiments, the steps further comprise: and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
In some embodiments, the steps further comprise: and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
Fig. 3 is a schematic hardware structural diagram of an embodiment of the computer device for processing a user space file system failure according to the present invention.
Taking the device shown in fig. 3 as an example, the device includes a processor 301 and a memory 302.
The processor 301 and the memory 302 may be connected by a bus or other means, such as the bus connection in fig. 3.
The memory 302 is a non-volatile computer-readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions/modules corresponding to the method for handling a user space file system failure in the embodiments of the present application. The processor 301 executes various functional applications of the server and data processing, i.e., implements a method of handling user-space file system failures, by running non-volatile software programs, instructions, and modules stored in the memory 302.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a method of handling a user space file system failure, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Corresponding computer instructions 303 of one or more methods of handling a user space file system failure are stored in the memory 302 and when executed by the processor 301 perform the method of handling a user space file system failure in any of the above-described method embodiments.
Any embodiment of a computer device implementing the method for handling a user space file system failure as described above may achieve the same or similar effects as any of the preceding method embodiments corresponding thereto.
The present invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs a method of handling a user space file system failure.
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for handling a user space file system failure according to the present invention. Taking the computer storage medium as shown in fig. 4 as an example, the computer readable storage medium 401 stores a computer program 402 which, when executed by a processor, performs the method as described above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for handling a user space file system failure can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for handling a user space file system failure, comprising the steps of:
dynamically acquiring a list of all computing nodes in a cluster, and distributing daemon to all computing nodes according to the list;
detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the fact that the management process condition of the computing node is normal;
responding to the normal user space file system mounting point of the computing node, and detecting whether a distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and
in response to failing to access the distributed file system files through the user space file system mount point, canceling the user space file system mount point and re-mounting.
2. The method of claim 1, further comprising:
monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and
and in response to the daemon running abnormally and the number of reboots reaching a threshold, replacing the daemon with a new daemon.
3. The method of claim 1, further comprising:
and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
4. The method of claim 1, further comprising:
and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
5. A system for handling a user space file system failure, comprising:
the distribution module is configured to dynamically acquire a list of all computing nodes in the cluster and distribute daemons to all computing nodes according to the list;
the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and in response to the fact that the management process condition of the computing node is normal, detect whether a user space file system mount point of the computing node is invalid through the daemon;
the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and
an execution module configured to cancel the userspace file system mount point and re-mount in response to an inability to access the distributed file system file via the userspace file system mount point.
6. The system of claim 5, further comprising a monitoring module configured to:
monitoring the operating states of daemons of all the computing nodes, and restarting the daemons in response to abnormal operation of the daemons; and
and in response to the daemon running abnormally and the number of reboots reaching a threshold, replacing the daemon with a new daemon.
7. The system of claim 5, further comprising a second monitoring module configured to:
and dynamically acquiring the health condition of the distributed file system, and in response to the occurrence of an abnormality in the distributed file system, terminating the daemon of all the computing nodes and canceling the user space file system mount of all the computing nodes.
8. The system of claim 5, further comprising a second execution module configured to:
and in response to the failure of the user space file system mounting point of the computing node, re-mounting the user space file system.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 4.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.
CN202111339749.5A 2021-11-12 2021-11-12 Method and device for processing user space file system fault Active CN114281636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111339749.5A CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111339749.5A CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Publications (2)

Publication Number Publication Date
CN114281636A true CN114281636A (en) 2022-04-05
CN114281636B CN114281636B (en) 2023-08-25

Family

ID=80869037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111339749.5A Active CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Country Status (1)

Country Link
CN (1) CN114281636B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301442A (en) * 2014-11-17 2015-01-21 浪潮电子信息产业股份有限公司 Method for realizing client of access object storage cluster based on fuse
CN108920628A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of distributed file system access method and device being adapted to big data platform
CN110365839A (en) * 2019-07-04 2019-10-22 Oppo广东移动通信有限公司 Closedown method, device, medium and electronic equipment
JP2021022357A (en) * 2019-07-26 2021-02-18 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd Hybrid file construction method and system therefor based on fuse technology
CN113127437A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 File system management method, cloud system, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301442A (en) * 2014-11-17 2015-01-21 浪潮电子信息产业股份有限公司 Method for realizing client of access object storage cluster based on fuse
CN108920628A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of distributed file system access method and device being adapted to big data platform
CN110365839A (en) * 2019-07-04 2019-10-22 Oppo广东移动通信有限公司 Closedown method, device, medium and electronic equipment
JP2021022357A (en) * 2019-07-26 2021-02-18 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd Hybrid file construction method and system therefor based on fuse technology
CN113127437A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 File system management method, cloud system, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114281636B (en) 2023-08-25

Similar Documents

Publication Publication Date Title
Guo et al. Failure recovery: When the cure is worse than the disease
US10489232B1 (en) Data center diagnostic information
CN110677305B (en) Automatic scaling method and system in cloud computing environment
CN107016480B (en) Task scheduling method, device and system
US8386855B2 (en) Distributed healthchecking mechanism
US9535754B1 (en) Dynamic provisioning of computing resources
US10924538B2 (en) Systems and methods of monitoring software application processes
US20200319935A1 (en) System and method for automatically scaling a cluster based on metrics being monitored
US8984108B2 (en) Dynamic CLI mapping for clustered software entities
KR20120079847A (en) Method and system for minimizing loss in a computer application
CN111324423B (en) Method and device for monitoring processes in container, storage medium and computer equipment
Pourmajidi et al. On challenges of cloud monitoring
CN109697078B (en) Repairing method of non-high-availability component, big data cluster and container service platform
Alfatafta et al. Toward a generic fault tolerance technique for partial network partitioning
CN111209110A (en) Task scheduling management method, system and storage medium for realizing load balance
US9183092B1 (en) Avoidance of dependency issues in network-based service startup workflows
CN111538585A (en) Js-based server process scheduling method, system and device
Huda et al. An agent oriented proactive fault-tolerant framework for grid computing
CN113377535A (en) Distributed timing task allocation method, device, equipment and readable storage medium
US20080216057A1 (en) Recording medium storing monitoring program, monitoring method, and monitoring system
US8203937B2 (en) Global detection of resource leaks in a multi-node computer system
CN114281636A (en) Method and device for processing user space file system fault
Stack et al. Self-healing in a decentralised cloud management system
CN115686831A (en) Task processing method and device based on distributed system, equipment and medium
Cardoso et al. Validation of a dynamic checkpoint mechanism for apache hadoop with failure scenarios

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant