CN114281636B - Method and device for processing user space file system fault - Google Patents
Method and device for processing user space file system fault Download PDFInfo
- Publication number
- CN114281636B CN114281636B CN202111339749.5A CN202111339749A CN114281636B CN 114281636 B CN114281636 B CN 114281636B CN 202111339749 A CN202111339749 A CN 202111339749A CN 114281636 B CN114281636 B CN 114281636B
- Authority
- CN
- China
- Prior art keywords
- file system
- user space
- daemon
- space file
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012545 processing Methods 0.000 title abstract description 9
- 230000004044 response Effects 0.000 claims abstract description 42
- 230000008569 process Effects 0.000 claims abstract description 23
- 238000012544 monitoring process Methods 0.000 claims description 17
- 230000005856 abnormality Effects 0.000 claims description 14
- 230000036541 health Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 238000001514 detection method Methods 0.000 claims description 6
- 238000012423 maintenance Methods 0.000 abstract description 5
- 239000002699 waste material Substances 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application provides a method, a system, equipment and a storage medium for processing user space file system faults, wherein the method comprises the following steps: dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists; detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; responding to the normal mounting point of the user space file system of the computing node, and detecting whether the distributed file system file can be accessed through the mounting point of the user space file system or not through a daemon; and in response to failing to access the distributed file system file through the user space file system mount point, cancelling the user space file system mount point and re-mounting. The Hadoop cluster operation and maintenance method and device can greatly improve the Hadoop cluster operation and maintenance efficiency, reduce the waste of computing resources and improve the satisfaction degree of users on the Hadoop clusters.
Description
Technical Field
The present application relates to the field of big data, and more particularly, to a method, system, device, and storage medium for handling user space file system failures.
Background
In the face of massive unstructured data processing tasks, single computing power is difficult to deal with, if multi-machine parallel operation is adopted, an application manufacturer is required to develop a distributed file system and a scheduling frame by oneself, on one hand, the difficulty is relatively high, a large amount of manpower and material resources are consumed, on the other hand, the application manufacturer cannot concentrate on the development of a data processing algorithm, so that in the face of the scene, most application manufacturers can select Hadoop based on an open source architecture as a platform of a bottom layer, and application programs process massive unstructured data based on a Hadoop distributed file system (Hdfs) and a distributed scheduling frame (Yarn).
The development language of Hadoop is Java, but in order to pursue extremely good performance, the traditional unstructured data processing algorithm is mostly developed by adopting a C language, and the support of Hdfs on the C language is very limited, so that a Fuse (Filesystem in Userspace, user space file system) is adopted to mount Hdfs to a Hadoop computing node, and the distributed file system is operated like a local file system through the Fuse.
In such a usage scenario, yarn is responsible for the management of computing resources (CPU, memory) of all Hadoop computing nodes, but Yarn cannot manage computing resources occupied by Fuse, resulting in a situation that data processing subtasks and Fuse often generate resource contention, resulting in a Fuse dying or mounting point failure, and finally resulting in failure of all computing tasks allocated to the node.
Because of the problem of the Yarn self-scheduling algorithm, nodes with resource contention cannot be predicted, normally, the abnormal Fuse mount points can be manually processed only after a large number of calculation tasks fail, and the data processing tasks are submitted again, so that the Hadoop platform is heavy in maintenance task, and the calculation resources of the Hadoop cluster are seriously wasted.
Disclosure of Invention
In view of the above, an object of the embodiments of the present application is to provide a method, a system, a computer device, and a computer readable storage medium for handling a failure of a user space file system.
Based on the above objects, an aspect of the embodiments of the present application provides a method for handling a user space file system failure, including the steps of: dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists; detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; detecting, by the daemon, whether a distributed file system file can be accessed through the user space file system mount point in response to the user space file system mount point of the computing node being normal; and canceling the user space file system mount point and re-mounting in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the method further comprises: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the method further comprises: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In another aspect of an embodiment of the present application, there is provided a system for handling a user space file system failure, including: the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and an execution module configured to cancel the user space file system mount point and re-mount in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In yet another aspect of the embodiment of the present application, there is also provided a computer apparatus, including: at least one processor; and a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method as above.
In yet another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, implements the method steps as described above.
The application has the following beneficial technical effects: by deploying the user space file system daemon in all computing nodes of the Hadoop cluster, abnormal scenes that the mounting points of the user space file system fail or are blocked can be identified, the mounting points of the user space file system are automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the waste of computing resources is reduced, and the satisfaction degree of users on the Hadoop cluster is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an embodiment of a method for handling user space file system failures provided by the present application;
FIG. 2 is a schematic diagram of an embodiment of a system for handling user space file system failures provided by the present application;
FIG. 3 is a schematic hardware architecture diagram of an embodiment of a computer device for handling user space file system failures provided by the present application;
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for handling user space file system failures provided by the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the following embodiments of the present application will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present application, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present application, and the following embodiments are not described one by one.
In a first aspect of the embodiment of the present application, an embodiment of a method for handling a user space file system failure is provided. FIG. 1 is a schematic diagram illustrating an embodiment of a method for handling a user space file system failure provided by the present application. As shown in fig. 1, the embodiment of the present application includes the following steps:
s1, dynamically acquiring lists of all computing nodes in a cluster, and distributing daemons to all computing nodes according to the lists;
s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node;
s3, responding to the fact that the user space file system mounting point of the computing node is normal, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and
and S4, canceling the user space file system mounting point and re-mounting the user space file system mounting point in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
The application program submits the task to a Resource Manager node of the distributed scheduling framework, the Resource Manager node decomposes the task into a plurality of Map tasks and Reduce tasks, the Map tasks are distributed to different computing nodes according to a certain algorithm, the computing nodes access data on Hdfs through a local Fuse to calculate, the calculation result is written on the Hdfs, and then the simplified task gathers the calculation result before the collection through the Fuse to complete the calculation task. In this process, once the Fuse mount of a certain node fails, all the computing tasks running on the node fail, so that the overall computing efficiency of the Hadoop cluster is slowed down, and if serious, the computing tasks fail.
According to the embodiment of the application, the monitoring nodes are added, the daemon process of the Fuse is deployed to all the nodes of the cluster in batches, and the Fuse mounting condition of all the nodes of the cluster is monitored.
And dynamically acquiring a list of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the list. And dynamically acquiring a list of all computing nodes of the cluster through the monitoring node, and distributing the file of the daemon to all the computing nodes.
In some embodiments, the method further comprises: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold. The daemon running state of all the computing nodes is monitored by the monitoring nodes, and the daemons of the nodes are restarted in time after the nodes with daemons failing are found.
In some embodiments, the method further comprises: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes. The monitoring node dynamically senses the health condition of Hdfs, for example, daemons of all the computing nodes are timely terminated when the Hdfs service is terminated, and Fuse mounting of all the computing nodes is canceled.
Detecting whether the management process condition of the computing node is normal through the daemon, and detecting whether the user space file system mounting point of the computing node is invalid through the daemon in response to the management process condition of the computing node being normal. The daemon detects whether the NodeManager process condition of each computing node is normal.
In some embodiments, the method further comprises: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
And in response to the user space file system mounting point of the computing node being normal, detecting whether the distributed file system file can be accessed through the user space file system mounting point by the daemon.
And canceling the user space file system mounting point and re-mounting the distributed file system file in response to the failure to access the distributed file system file through the user space file system mounting point. And detecting whether the Hdfs file can be normally accessed through the Fuse mounting point (Hdfs can not be normally accessed if the Fuse process is blocked), and if the Hdfs file can not be normally accessed, canceling the Fuse mounting point and re-mounting.
According to the embodiment of the application, the user space file system daemon is deployed in all the computing nodes of the Hadoop cluster, so that abnormal scenes of failure or blocking of the mounting points of the user space file system can be identified, the mounting points of the user space file system can be automatically repaired, the operation and maintenance efficiency of the Hadoop cluster is greatly improved, the waste of computing resources is reduced, and the satisfaction degree of users to the Hadoop cluster is improved.
It should be noted that, in the embodiments of the method for handling a user space file system failure, the steps may be intersected, replaced, added and deleted, so that the method for handling a user space file system failure by using these reasonable permutation and combination transforms shall also belong to the protection scope of the present application, and shall not limit the protection scope of the present application to the embodiments.
Based on the above object, a second aspect of the embodiments of the present application proposes a system for handling a user space file system failure. As shown in fig. 2, the system 200 includes the following modules: the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists; the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node; the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and an execution module configured to cancel the user space file system mount point and re-mount in response to the inability to access the distributed file system file through the user space file system mount point.
In some embodiments, the system further comprises a monitoring module configured to: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the system further comprises a second monitoring module configured to: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the system further comprises a second execution module configured to: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
In view of the above object, a third aspect of the embodiments of the present application provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, dynamically acquiring lists of all computing nodes in a cluster, and distributing daemons to all computing nodes according to the lists; s2, detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node; s3, responding to the fact that the user space file system mounting point of the computing node is normal, and detecting whether the distributed file system file can be accessed through the user space file system mounting point or not through the daemon; and S4, canceling the user space file system mounting point and re-mounting in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
In some embodiments, the steps further comprise: monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and replacing the daemon with a new daemon in response to the daemon running abnormally and the number of restarts reaching a threshold.
In some embodiments, the steps further comprise: and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
In some embodiments, the steps further comprise: and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
As shown in fig. 3, a hardware structure diagram of an embodiment of the computer device for handling a user space file system failure according to the present application is shown.
Taking the example of the device shown in fig. 3, a processor 301 and a memory 302 are included in the device.
The processor 301 and the memory 302 may be connected by a bus or otherwise, for example in fig. 3.
The memory 302 serves as a non-volatile computer readable storage medium, and may be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the method of handling user space file system failures in embodiments of the present application. The processor 301 executes various functional applications of the server and data processing, i.e., implements a method of handling user space file system failures, by running non-volatile software programs, instructions, and modules stored in the memory 302.
Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of a method of handling user space file system failures, etc. In addition, memory 302 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 302 may optionally include memory located remotely from processor 301, which may be connected to the local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more computer instructions 303 corresponding to a method of handling a user space file system failure are stored in the memory 302, which when executed by the processor 301, perform the method of handling a user space file system failure in any of the method embodiments described above.
Any one of the embodiments of the computer apparatus that performs the above-described method of handling a user space file system failure may achieve the same or similar effects as any of the previously-described method embodiments that correspond thereto.
The present application also provides a computer readable storage medium storing a computer program which when executed by a processor performs a method of handling a user space file system failure.
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for handling a user space file system failure according to the present application. Taking a computer storage medium as shown in fig. 4 as an example, the computer readable storage medium 401 stores a computer program 402 that when executed by a processor performs the above method.
Finally, it should be noted that, as will be appreciated by those skilled in the art, implementing all or part of the above-described embodiments of the method, the program of the method for handling the user space file system failure may be stored in a computer readable storage medium, and the program may include the steps of the embodiments of the above-described methods when executed. The storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (RAM), or the like. The computer program embodiments described above may achieve the same or similar effects as any of the method embodiments described above.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present application has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the application, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the application, and many other variations of the different aspects of the embodiments of the application as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present application.
Claims (10)
1. A method of handling user space file system failures, comprising the steps of:
dynamically acquiring lists of all computing nodes in the cluster, and distributing daemons to all computing nodes according to the lists;
detecting whether the management process condition of the computing node is normal or not through the daemon, and detecting whether a user space file system mounting point of the computing node is invalid or not through the daemon in response to the normal management process condition of the computing node;
detecting, by the daemon, whether a distributed file system file can be accessed through the user space file system mount point in response to the user space file system mount point of the computing node being normal; and
and canceling the user space file system mounting point and re-mounting the distributed file system file in response to the failure to access the distributed file system file through the user space file system mounting point.
2. The method according to claim 1, wherein the method further comprises:
monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and
in response to the daemon running abnormally and the number of restarts reaching a threshold, the daemon is replaced with a new daemon.
3. The method according to claim 1, wherein the method further comprises:
and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
4. The method according to claim 1, wherein the method further comprises:
and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
5. A system for handling user space file system failures, comprising:
the distribution module is configured to dynamically acquire lists of all computing nodes in the cluster and distribute daemons to all the computing nodes according to the lists;
the first detection module is configured to detect whether the management process condition of the computing node is normal through the daemon, and detect whether the user space file system mounting point of the computing node is invalid through the daemon in response to the normal management process condition of the computing node;
the second detection module is configured to respond to the fact that the user space file system mounting point of the computing node is normal, and detect whether the distributed file system file can be accessed through the user space file system mounting point through the daemon; and
and the execution module is configured to cancel the user space file system mounting point and re-mount the user space file system in response to the fact that the distributed file system file cannot be accessed through the user space file system mounting point.
6. The system of claim 5, further comprising a monitoring module configured to:
monitoring the daemon running state of all computing nodes, and restarting the daemon in response to the daemon running abnormality; and
in response to the daemon running abnormally and the number of restarts reaching a threshold, the daemon is replaced with a new daemon.
7. The system of claim 5, further comprising a second monitoring module configured to:
and dynamically acquiring the health condition of the distributed file system, and responding to the abnormality of the distributed file system, terminating daemons of all computing nodes and canceling the user space file system mounting of all computing nodes.
8. The system of claim 5, further comprising a second execution module configured to:
and re-mounting the user space file system in response to the failure of the mounting point of the user space file system of the computing node.
9. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, which when executed by the processor, perform the steps of the method of any one of claims 1-4.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111339749.5A CN114281636B (en) | 2021-11-12 | 2021-11-12 | Method and device for processing user space file system fault |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111339749.5A CN114281636B (en) | 2021-11-12 | 2021-11-12 | Method and device for processing user space file system fault |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114281636A CN114281636A (en) | 2022-04-05 |
CN114281636B true CN114281636B (en) | 2023-08-25 |
Family
ID=80869037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111339749.5A Active CN114281636B (en) | 2021-11-12 | 2021-11-12 | Method and device for processing user space file system fault |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114281636B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104301442A (en) * | 2014-11-17 | 2015-01-21 | 浪潮电子信息产业股份有限公司 | Method for realizing client of access object storage cluster based on fuse |
CN108920628A (en) * | 2018-06-29 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of distributed file system access method and device being adapted to big data platform |
CN110365839A (en) * | 2019-07-04 | 2019-10-22 | Oppo广东移动通信有限公司 | Closedown method, device, medium and electronic equipment |
JP2021022357A (en) * | 2019-07-26 | 2021-02-18 | 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd | Hybrid file construction method and system therefor based on fuse technology |
CN113127437A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | File system management method, cloud system, device, electronic equipment and storage medium |
-
2021
- 2021-11-12 CN CN202111339749.5A patent/CN114281636B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104301442A (en) * | 2014-11-17 | 2015-01-21 | 浪潮电子信息产业股份有限公司 | Method for realizing client of access object storage cluster based on fuse |
CN108920628A (en) * | 2018-06-29 | 2018-11-30 | 郑州云海信息技术有限公司 | A kind of distributed file system access method and device being adapted to big data platform |
CN110365839A (en) * | 2019-07-04 | 2019-10-22 | Oppo广东移动通信有限公司 | Closedown method, device, medium and electronic equipment |
JP2021022357A (en) * | 2019-07-26 | 2021-02-18 | 広東叡江云計算股▲分▼有限公司Guangdong Eflycloud Computing Co., Ltd | Hybrid file construction method and system therefor based on fuse technology |
CN113127437A (en) * | 2019-12-31 | 2021-07-16 | 阿里巴巴集团控股有限公司 | File system management method, cloud system, device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114281636A (en) | 2022-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109656742B (en) | Node exception handling method and device and storage medium | |
Machida et al. | Modeling and analysis of software rejuvenation in a server virtualized system with live VM migration | |
CN110677305B (en) | Automatic scaling method and system in cloud computing environment | |
Guo et al. | Failure recovery: When the cure is worse than the disease | |
US7984453B2 (en) | Event notifications relating to system failures in scalable systems | |
US10019822B2 (en) | Integrated infrastructure graphs | |
US9104480B2 (en) | Monitoring and managing memory thresholds for application request threads | |
CN106789141B (en) | Gateway equipment fault processing method and device | |
CN109286529A (en) | A kind of method and system for restoring RabbitMQ network partition | |
JP2020115330A (en) | System and method of monitoring software application process | |
CN109697078B (en) | Repairing method of non-high-availability component, big data cluster and container service platform | |
CN113760652B (en) | Method, system, device and storage medium for full link monitoring based on application | |
CN111209110A (en) | Task scheduling management method, system and storage medium for realizing load balance | |
US9183092B1 (en) | Avoidance of dependency issues in network-based service startup workflows | |
EP3591530B1 (en) | Intelligent backup and recovery of cloud computing environment | |
CN113965576A (en) | Container-based big data acquisition method and device, storage medium and equipment | |
CN114281636B (en) | Method and device for processing user space file system fault | |
Ali et al. | Probabilistic normed load monitoring in large scale distributed systems using mobile agents | |
Qiang et al. | CDMCR: multi‐level fault‐tolerant system for distributed applications in cloud | |
CN108154343B (en) | Emergency processing method and system for enterprise-level information system | |
CN115712521A (en) | Cluster node fault processing method, system and medium | |
US20170244781A1 (en) | Analysis for multi-node computing systems | |
CN116723077A (en) | Distributed IT automatic operation and maintenance system | |
Agarwal et al. | Correlating failures with asynchronous changes for root cause analysis in enterprise environments | |
Stack et al. | Self-healing in a decentralised cloud management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |