CN114281636B - Method and device for processing user space file system fault - Google Patents

Method and device for processing user space file system fault Download PDF

Info

Publication number
CN114281636B
CN114281636B CN202111339749.5A CN202111339749A CN114281636B CN 114281636 B CN114281636 B CN 114281636B CN 202111339749 A CN202111339749 A CN 202111339749A CN 114281636 B CN114281636 B CN 114281636B
Authority
CN
China
Prior art keywords
file system
user space
daemon
space file
response
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111339749.5A
Other languages
Chinese (zh)
Other versions
CN114281636A (en
Inventor
吴广远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111339749.5A priority Critical patent/CN114281636B/en
Publication of CN114281636A publication Critical patent/CN114281636A/en
Application granted granted Critical
Publication of CN114281636B publication Critical patent/CN114281636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a method, a system, equipment and a storage medium for processing user space file system faults, wherein the method comprises the following steps: dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists; detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; responding to the normal mounting point of the user space file system of the computing node, and detecting whether the distributed file system file can be accessed through the mounting point of the user space file system or not through a daemon; and in response to failing to access the distributed file system file through the user space file system mount point, cancelling the user space file system mount point and re-mounting. The Hadoop cluster operation and maintenance method and device can greatly improve the Hadoop cluster operation and maintenance efficiency, reduce the waste of computing resources and improve the satisfaction degree of users on the Hadoop clusters.

Description

Method and device for processing user space file system fault
Technical Field
The present application relates to the field of big data, and more particularly, to a method, system, device, and storage medium for handling user space file system failures.
Background
In the face of massive unstructured data processing tasks, single computing power is difficult to deal with, if multi-machine parallel operation is adopted, an application manufacturer is required to develop a distributed file system and a scheduling frame by oneself, on one hand, the difficulty is relatively high, a large amount of manpower and material resources are consumed, on the other hand, the application manufacturer cannot concentrate on the development of a data processing algorithm, so that in the face of the scene, most application manufacturers can select Hadoop based on an open source architecture as a platform of a bottom layer, and application programs process massive unstructured data based on a Hadoop distributed file system (Hdfs) and a distributed scheduling frame (Yarn).
The development language of Hadoop is Java, but in order to pursue extremely good performance, the traditional unstructured data processing algorithm is mostly developed by adopting a C language, and the support of Hdfs on the C language is very limited, so that a Fuse (Filesystem in Userspace, user space file system) is adopted to mount Hdfs to a Hadoop computing node, and the distributed file system is operated like a local file system through the Fuse.
In such a usage scenario, yarn is responsible for the management of computing resources (CPU, memory) of all Hadoop computing nodes, but Yarn cannot manage computing resources occupied by Fuse, resulting in a situation that data processing subtasks and Fuse often generate resource contention, resulting in a Fuse dying or mounting point failure, and finally resulting in failure of all computing tasks allocated to the node.
Because of the problem of the Yarn self-scheduling algorithm, nodes with resource contention cannot be predicted, normally, the abnormal Fuse mount points can be manually processed only after a large number of calculation tasks fail, and the data processing tasks are submitted again, so that the Hadoop platform is heavy in maintenance task, and the calculation resources of the Hadoop cluster are seriously wasted.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide a method, a system, a computer device, and a computer readable storage medium for handling a failure of a user space file system.
Based on the above objects, an aspect of the embodiments of the present application provides a method for handling a user space file system failure, including the steps of: dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists; detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; detecting, by the daemon, whether a distributed file system file can be accessed through the user space file system mount point in response to the user space file system mount point of the computing node being normal; and canceling the user space file system mount point and re-mounting in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the method further comprises: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the method further comprises: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In another aspect of an embodiment of the present application, there is provided a system for handling a user space file system failure, including: the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and an execution module configured to cancel the user space file system mount point and re-mount in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In yet another aspect of the embodiment of the present application, there is also provided a computer apparatus, including: at least one processor; and a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method as above.
In yet another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method steps as described above.
The application has the following beneficial technical effects: by deploying the user space file system daemon in all computing nodes of the Hadoop cluster, abnormal scenes that the mounting points of the user space file system fail or are blocked can be identified, the mounting points of the user space file system are automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the waste of computing resources is reduced, and the satisfaction degree of users on the Hadoop cluster is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an embodiment of a method for handling user space file system failures provided by the present application;
FIG. 2 is a schematic diagram of an embodiment of a system for handling user space file system failures provided by the present application;
FIG. 3 is a schematic hardware architecture diagram of an embodiment of a computer device for handling user space file system failures provided by the present application;
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for handling user space file system failures provided by the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present application, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present application, and the following embodiments are not described one by one.
In a first aspect of the embodiment of the present application, an embodiment of a method for handling a user space file system failure is provided. FIG. 1 is a schematic diagram illustrating an embodiment of a method for handling a user space file system failure provided by the present application. As shown in fig. 1, the embodiment of the present application includes the following steps:
s1, dynamically acquiring lists of all computing nodes in a cluster, and distributing daemons to all computing nodes according to the lists;
s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node;
s3, responding to the fact that the user space file system mounting point of the computing node is normal, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and
and S4, canceling the user space file system mounting point and re-mounting the user space file system mounting point in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
The application program submits the task to a Resource Manager node of the distributed scheduling framework, the Resource Manager node decomposes the task into a plurality of Map tasks and Reduce tasks, the Map tasks are distributed to different computing nodes according to a certain algorithm, the computing nodes access data on Hdfs through a local Fuse to calculate, the calculation result is written on the Hdfs, and then the simplified task gathers the calculation result before the collection through the Fuse to complete the calculation task. In this process, once the Fuse mount of a certain node fails, all the computing tasks running on the node fail, so that the overall computing efficiency of the Hadoop cluster is slowed down, and if serious, the computing tasks fail.
According to the embodiment of the application, the monitoring nodes are added, the daemon process of the Fuse is deployed to all the nodes of the cluster in batches, and the Fuse mounting condition of all the nodes of the cluster is monitored.
And dynamically acquiring a list of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the list. And dynamically acquiring a list of all computing nodes of the cluster through the monitoring node, and distributing the file of the daemon to all the computing nodes.
In some embodiments, the method further comprises: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold. The daemon running state of all the computing nodes is monitored by the monitoring nodes, and the daemons of the nodes are restarted in time after the nodes with daemons failing are found.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes. The monitoring node dynamically senses the health condition of Hdfs, for example, daemons of all the computing nodes are timely terminated when the Hdfs service is terminated, and Fuse mounting of all the computing nodes is canceled.
Detecting whether the management process condition of the computing node is normal through the daemon, and detecting whether the user space file system mounting point of the computing node is invalid through the daemon in response to the management process condition of the computing node being normal. The daemon detects whether the NodeManager process condition of each computing node is normal.
In some embodiments, the method further comprises: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
And in response to the user space file system mounting point of the computing node being normal, detecting whether the distributed file system file can be accessed through the user space file system mounting point by the daemon.
And canceling the user space file system mounting point and re-mounting the distributed file system file in response to the failure to access the distributed file system file through the user space file system mounting point. And detecting whether the Hdfs file can be normally accessed through the Fuse mounting point (Hdfs can not be normally accessed if the Fuse process is blocked), and if the Hdfs file can not be normally accessed, canceling the Fuse mounting point and re-mounting.
According to the embodiment of the application, the user space file system daemon is deployed in all the computing nodes of the Hadoop cluster, so that abnormal scenes of failure or blocking of the mounting points of the user space file system can be identified, the mounting points of the user space file system can be automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the waste of computing resources is reduced, and the satisfaction degree of users to the Hadoop cluster is improved.
It should be noted that, in the embodiments of the method for handling a user space file system failure, the steps may be intersected, replaced, added and deleted, so that the method for handling a user space file system failure by using these reasonable permutation and combination transforms shall also belong to the protection scope of the present application, and shall not limit the protection scope of the present application to the embodiments.
Based on the above object, a second aspect of the embodiments of the present application proposes a system for handling a user space file system failure. As shown in fig. 2, the system 200 includes the following modules: the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and an execution module configured to cancel the user space file system mount point and re-mount in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In view of the above object, a third aspect of the embodiments of the present application provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, dynamically acquiring lists of all computing nodes in a cluster, and distributing daemons to all computing nodes according to the lists; s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; s3, responding to the fact that the user space file system mounting point of the computing node is normal, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and S4, canceling the user space file system mounting point and re-mounting in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
In some embodiments, the steps further comprise: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the steps further comprise: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the steps further comprise: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
As shown in fig. 3, a hardware structure diagram of an embodiment of the computer device for handling a user space file system failure according to the present application is shown.
Taking the example of the device shown in fig. 3, a processor 301 and a memory 302 are included in the device.
The processor 301 and the memory 302 may be connected by a bus or otherwise, for example in fig. 3.
The memory 302 serves as a non-volatile computer readable storage medium, and may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the method of handling user space file system failures in embodiments of the present application. The processor 301 executes various functional applications of the server and data processing, i.e., implements a method of handling user space file system failures, by running non-volatile software programs, instructions, and modules stored in the memory 302.
Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a method of handling user space file system failures, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more computer instructions 303 corresponding to a method of handling a user space file system failure are stored in the memory 302, which when executed by the processor 301, perform the method of handling a user space file system failure in any of the method embodiments described above.
Any one of the embodiments of the computer apparatus that performs the above-described method of handling a user space file system failure may achieve the same or similar effects as any of the previously-described method embodiments that correspond thereto.
The present application also provides a computer readable storage medium storing a computer program which when executed by a processor performs a method of handling a user space file system failure.
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for handling a user space file system failure according to the present application. Taking a computer storage medium as shown in fig. 4 as an example, the computer readable storage medium 401 stores a computer program 402 that when executed by a processor performs the above method.
Finally, it should be noted that, as will be appreciated by those skilled in the art, implementing all or part of the above-described embodiments of the method, the program of the method for handling the user space file system failure may be stored in a computer readable storage medium, and the program may include the steps of the embodiments of the above-described methods when executed. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present application has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the application, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the application, and many other variations of the different aspects of the embodiments of the application as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present application.

Claims (10)

1. A method of handling user space file system failures, comprising the steps of:
dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists;
detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node;
detecting, by the daemon, whether a distributed file system file can be accessed through the user space file system mount point in response to the user space file system mount point of the computing node being normal; and
and canceling the user space file system mounting point and re-mounting the distributed file system file in response to the failure to access the distributed file system file through the user space file system mounting point.
2. The method according to claim 1, wherein the method further comprises:
monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and
in response to the daemon running abnormally and the number of restarts reaching a threshold, the daemon is replaced with a new daemon.
3. The method according to claim 1, wherein the method further comprises:
and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
4. The method according to claim 1, wherein the method further comprises:
and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
5. A system for handling user space file system failures, comprising:
the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists;
the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node;
the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and
and the execution module is configured to cancel the user space file system mounting point and re-mount the user space file system in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
6. The system of claim 5, further comprising a monitoring module configured to:
monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and
in response to the daemon running abnormally and the number of restarts reaching a threshold, the daemon is replaced with a new daemon.
7. The system of claim 5, further comprising a second monitoring module configured to:
and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
8. The system of claim 5, further comprising a second execution module configured to:
and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-4.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any of claims 1-4.
CN202111339749.5A 2021-11-12 2021-11-12 Method and device for processing user space file system fault Active CN114281636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111339749.5A CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111339749.5A CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Publications (2)

Publication Number Publication Date
CN114281636A CN114281636A (en) 2022-04-05
CN114281636B true CN114281636B (en) 2023-08-25

Family

ID=80869037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111339749.5A Active CN114281636B (en) 2021-11-12 2021-11-12 Method and device for processing user space file system fault

Country Status (1)

Country Link
CN (1) CN114281636B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301442A (en) * 2014-11-17 2015-01-21 浪潮电子信息产业股份有限公司 Method for realizing client of access object storage cluster based on fuse
CN108920628A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of distributed file system access method and device being adapted to big data platform
CN110365839A (en) * 2019-07-04 2019-10-22 Oppo广东移动通信有限公司 Closedown method, device, medium and electronic equipment
JP2021022357A (en) * 2019-07-26 2021-02-18 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd Hybrid file construction method and system therefor based on fuse technology
CN113127437A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 File system management method, cloud system, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301442A (en) * 2014-11-17 2015-01-21 浪潮电子信息产业股份有限公司 Method for realizing client of access object storage cluster based on fuse
CN108920628A (en) * 2018-06-29 2018-11-30 郑州云海信息技术有限公司 A kind of distributed file system access method and device being adapted to big data platform
CN110365839A (en) * 2019-07-04 2019-10-22 Oppo广东移动通信有限公司 Closedown method, device, medium and electronic equipment
JP2021022357A (en) * 2019-07-26 2021-02-18 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd Hybrid file construction method and system therefor based on fuse technology
CN113127437A (en) * 2019-12-31 2021-07-16 阿里巴巴集团控股有限公司 File system management method, cloud system, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114281636A (en) 2022-04-05

Similar Documents

Publication Publication Date Title
CN109656742B (en) Node exception handling method and device and storage medium
Machida et al. Modeling and analysis of software rejuvenation in a server virtualized system with live VM migration
CN110677305B (en) Automatic scaling method and system in cloud computing environment
Guo et al. Failure recovery: When the cure is worse than the disease
US7984453B2 (en) Event notifications relating to system failures in scalable systems
US10019822B2 (en) Integrated infrastructure graphs
US9104480B2 (en) Monitoring and managing memory thresholds for application request threads
CN106789141B (en) Gateway equipment fault processing method and device
CN109286529A (en) A kind of method and system for restoring RabbitMQ network partition
JP2020115330A (en) System and method of monitoring software application process
CN109697078B (en) Repairing method of non-high-availability component, big data cluster and container service platform
CN113760652B (en) Method, system, device and storage medium for full link monitoring based on application
CN111209110A (en) Task scheduling management method, system and storage medium for realizing load balance
US9183092B1 (en) Avoidance of dependency issues in network-based service startup workflows
EP3591530B1 (en) Intelligent backup and recovery of cloud computing environment
CN113965576A (en) Container-based big data acquisition method and device, storage medium and equipment
CN114281636B (en) Method and device for processing user space file system fault
Ali et al. Probabilistic normed load monitoring in large scale distributed systems using mobile agents
Qiang et al. CDMCR: multi‐level fault‐tolerant system for distributed applications in cloud
CN108154343B (en) Emergency processing method and system for enterprise-level information system
CN115712521A (en) Cluster node fault processing method, system and medium
US20170244781A1 (en) Analysis for multi-node computing systems
CN116723077A (en) Distributed IT automatic operation and maintenance system
Agarwal et al. Correlating failures with asynchronous changes for root cause analysis in enterprise environments
Stack et al. Self-healing in a decentralised cloud management system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant