CN111327685A - Data processing method, device and equipment of distributed storage system and storage medium - Google Patents

Data processing method, device and equipment of distributed storage system and storage medium Download PDF

Info

Publication number
CN111327685A
CN111327685A CN202010071657.2A CN202010071657A CN111327685A CN 111327685 A CN111327685 A CN 111327685A CN 202010071657 A CN202010071657 A CN 202010071657A CN 111327685 A CN111327685 A CN 111327685A
Authority
CN
China
Prior art keywords
abnormal
abnormal information
information
exception
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010071657.2A
Other languages
Chinese (zh)
Inventor
李娟�
张海军
谢全泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010071657.2A priority Critical patent/CN111327685A/en
Publication of CN111327685A publication Critical patent/CN111327685A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The application discloses a data processing method, a device, equipment and a medium for a distributed storage system, wherein the method comprises the following steps: when the slave node has abnormal information, judging whether the abnormal information reaches a reporting level; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node; and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information. When the slave node is abnormal, the report is carried out only when the abnormity is judged to reach the report level, so that the error report accuracy is improved, and the load of the master node is reduced. In addition, errors in the abnormal queue can be actively screened out, low-level errors which can be directly recovered can be recovered in advance, the fault tolerance of the whole system is improved, the load and logic complexity of the main node are further reduced, the trouble brought to a user by a large number of slight errors is reduced, and the user experience is improved.

Description

Data processing method, device and equipment of distributed storage system and storage medium
Technical Field
The present application relates to the field of distributed storage technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for processing data in a distributed storage system.
Background
The distributed storage system is used for storing data on a plurality of independent devices in a distributed mode. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes the bottleneck of the system performance, is also the focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, utilizes a plurality of storage servers to share the storage load, and utilizes the position server to position the storage information, thereby not only improving the reliability, the availability and the access efficiency of the system, but also being easy to expand.
In a distributed storage system, a storage cluster may include tens of or even hundreds of storage nodes, where the storage nodes are divided into master nodes and slave nodes, and a user monitors the health status of all the nodes in the whole cluster by using the master nodes. However, each node in the cluster generates a large amount of information to be fed back to the master node, and all abnormal information is sent to the master node indiscriminately, so that the load of the master node is greatly increased, a large amount of slight errors are reported, the trouble of a user is caused, and the experience of the user is influenced.
Disclosure of Invention
The application aims to provide a data processing method, a data processing device, data processing equipment and a computer readable storage medium for a distributed storage system, which can reduce the load of a main node while increasing the error reporting accuracy.
In order to achieve the above object, the present application provides a data processing method for a distributed storage system, including:
when abnormal information occurs in the slave node, judging whether the abnormal information reaches a reporting level;
if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node;
and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information.
Optionally, before the determining whether the abnormal information reaches the reporting level, the method further includes:
reading historical exception records in an exception queue corresponding to the slave node;
comparing the abnormal information with the historical abnormal record to judge whether the abnormal information appears;
if so, updating the occurring time period and the occurring times of the abnormal information;
if not, adding an abnormal record to the abnormal information which does not appear, wherein the abnormal record at least comprises the occurrence time period and the occurrence frequency of the abnormal information.
Optionally, the determining whether the abnormal information reaches the reporting level includes:
acquiring a pre-established abnormal rule file, wherein the abnormal rule file is used for recording an occurrence time period rule and an occurrence frequency threshold value required when each abnormal information reaches a reporting level;
judging whether a rule record corresponding to the abnormal information exists in the abnormal rule file or not;
if the abnormal information exists, judging whether the occurrence frequency of the abnormal information in the corresponding occurrence time period exceeds the occurrence frequency threshold value, if so, determining that the abnormal information reaches a reporting level; if not, determining that the abnormal information does not reach the reporting level.
Optionally, after the determining whether the rule record corresponding to the abnormal information exists in the abnormal rule file, the method further includes:
if the abnormal information does not exist, the abnormal information which does not exist is recorded into the log file and reported to the main node.
Optionally, the screening out the error that can be directly recovered from the abnormal information, and actively performing the abnormal recovery includes:
acquiring an abnormal recovery file;
judging whether recoverable operation corresponding to errors exists in the abnormal recovery file;
and if so, executing a command according to the recoverable operation to actively recover the abnormity.
Optionally, after the executing the command according to the recoverable operation to actively perform the abnormal recovery, the method further includes:
if the recovery command returns to meet the requirement, deleting the abnormal information from the abnormal queue;
and if the return of the recovery command is not in accordance with the requirement, the recovery operation is considered to be failed, and abnormal information is sequentially read from the abnormal queue to carry out abnormal recovery operation.
Optionally, the traversing the abnormal information in the abnormal queue, screening out an error that can be directly recovered in the abnormal information, and actively performing the abnormal recovery includes:
and the master node traverses the abnormal information in the abnormal queues respectively maintained by the slave nodes according to a preset polling period, screens out errors which can be directly recovered in the abnormal information, and actively recovers the abnormality.
To achieve the above object, the present application provides a data processing apparatus for a distributed storage system, including:
the abnormal reporting module is used for judging whether the abnormal information reaches a reporting level or not when the abnormal information appears in the slave node; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node;
and the exception handling module is used for traversing the exception information in the exception queue, screening errors which can be directly recovered in the exception information, and actively recovering the exception.
To achieve the above object, the present application provides a data processing apparatus for a distributed storage system, including:
a memory for storing a computer program;
a processor for implementing the steps of any of the aforementioned disclosed distributed storage system data processing methods when executing the computer program.
To achieve the above object, the present application provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of any one of the distributed storage system data processing methods disclosed in the foregoing.
According to the above scheme, the data processing method for the distributed storage system provided by the application comprises the following steps: when the slave node has abnormal information, judging whether the abnormal information reaches a reporting level; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node; and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information. According to the method, when the slave nodes are abnormal, the slave nodes do not transmit the alarm event to the master node, but the judgment process is added in each slave node, the abnormal information is temporarily placed in the abnormal queue when the abnormal information is judged not to reach the reporting level, the abnormal information is not reported to the master node, and the abnormal information is reported only when the abnormal information is judged to reach the reporting level, so that the error reporting accuracy is improved, and meanwhile, the load of the master node is reduced. In addition, errors in the abnormal queue can be actively screened out, low-level errors which can be directly recovered can be recovered in advance, the fault tolerance of the whole system is improved, the load and logic complexity of the main node are further reduced, the trouble brought to a user by a large number of slight errors is reduced, and the user experience is improved.
The application also discloses a data processing device of the distributed storage system, an electronic device and a computer readable storage medium, which can also realize the technical effects.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a data processing method of a distributed storage system according to an embodiment of the present application;
FIG. 2 is a flow chart of another data processing method for a distributed storage system according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of a data processing method of a distributed storage system according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating an exemplary distributed storage system management architecture according to an embodiment of the present disclosure;
fig. 5 is a schematic view of a workflow of an anomaly determination engine according to an embodiment of the present application;
fig. 6 is a schematic diagram of a workflow of an exception routing inspection recovery engine disclosed in an embodiment of the present application;
7-a, 7-b, and 7-c are specific examples of an exception recovery file, an exception queue, and an exception rule file, respectively;
fig. 8 is a structural diagram of a distributed storage system data processing apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of an electronic device disclosed in an embodiment of the present application;
fig. 10 is a block diagram of another electronic device disclosed in the embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In the prior art, each node in the cluster can generate a large amount of information to be fed back to the master node, wherein all abnormal information is sent to the master node indiscriminately, so that the load of the master node is greatly increased, a large amount of slight errors are reported, the trouble of a user is caused, and the experience of the user is influenced.
Therefore, the embodiment of the application discloses a data processing method of a distributed storage system, which reduces the load of a main node while increasing the error reporting accuracy.
Referring to fig. 1, a data processing method of a distributed storage system disclosed in an embodiment of the present application includes:
s101: when abnormal information occurs in the slave node, judging whether the abnormal information reaches a reporting level; if yes, go to step S102; if not, the step S103 is entered;
in the embodiment of the application, each slave node can monitor itself to detect whether abnormal information occurs. In a specific implementation, an anomaly determination engine may be deployed on each slave node, so that after the slave node has abnormal information, the slave node may determine whether the currently-occurring abnormal information reaches a reporting level by using the pre-deployed anomaly determination engine.
It can be understood that, when determining whether the abnormal information currently appearing in the slave node reaches the reporting level, the determination may be specifically performed by determining the number of occurrences, frequency, or risk level, etc. corresponding to the abnormal information, that is, the reporting level may be set according to the actual service requirement of the user in the specific implementation process, and is not limited herein.
S102: reporting the abnormal information to a main node;
s103: placing the abnormal information into an abnormal queue corresponding to the slave node;
specifically, after determining whether the abnormal information reaches the reporting level, if it is determined that the obtained abnormal information reaches the reporting level, the abnormal information may be directly reported to the host node, and specifically may be forwarded to the host node in the form of an alarm event by standing for horse to be processed.
If the abnormal information is judged to not reach the reporting level, the abnormal information does not need to be reported to the main node at the moment, and the abnormal information can be temporarily stored in the abnormal queue corresponding to the current slave node. And each slave node is pre-established with a corresponding exception queue for storing all the exception information which does not reach the reporting level.
S104: and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information.
It should be noted that, in the embodiment of the present application, an exception routing inspection recovery engine may be deployed in the master node in advance, so as to traverse the exception information in the exception queue of each slave node, thereby actively recovering some directly recoverable minor errors, and actively solving part of the exceptions, without reporting to the user.
Specifically, the traversing the abnormal information in the abnormal queue, and screening out the error that can be directly recovered in the abnormal information, where the process of actively performing the abnormal recovery may specifically include: and the master node traverses the abnormal information in the abnormal queues respectively maintained by the slave nodes according to a preset polling period, screens out errors which can be directly recovered in the abnormal information, and actively recovers the abnormality. The preset inspection period can be one day or 12 hours, namely, the inspection period can be customized according to the exception handling rate requirement in an actual scene, a longer inspection period is set for the exception handling requirement, and a correspondingly shorter inspection period is set for the exception handling requirement of the system.
According to the above scheme, the data processing method for the distributed storage system provided by the application comprises the following steps: when the slave node has abnormal information, judging whether the abnormal information reaches a reporting level; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node; and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information. According to the method, when the slave nodes are abnormal, the slave nodes do not transmit the alarm event to the master node, but the judgment process is added in each slave node, the abnormal information is temporarily placed in the abnormal queue when the abnormal information is judged not to reach the reporting level, the abnormal information is not reported to the master node, and the abnormal information is reported only when the abnormal information is judged to reach the reporting level, so that the error reporting accuracy is improved, and meanwhile, the load of the master node is reduced. In addition, errors in the abnormal queue can be actively screened out, low-level errors which can be directly recovered can be recovered in advance, the fault tolerance of the whole system is improved, the load and logic complexity of the main node are further reduced, the trouble brought to a user by a large number of slight errors is reduced, and the user experience is improved.
The embodiment of the application discloses another data processing method of a distributed storage system, and compared with the previous embodiment, the embodiment further describes and optimizes the technical scheme. Referring to fig. 2, specifically:
s201: when the slave node has abnormal information, reading a historical abnormal record in an abnormal queue corresponding to the slave node;
s202: comparing the abnormal information with the historical abnormal record to judge whether the abnormal information appears; if yes, go to step S203; if not, go to step S204;
s203: updating the occurrence time period and the occurrence frequency of the abnormal information;
s204: adding an abnormal record to the abnormal information which does not appear, wherein the abnormal record at least comprises the occurrence time period and the occurrence frequency of the abnormal information;
in the embodiment of the application, after the slave node generates the abnormal information, the historical abnormal record stored in the abnormal queue corresponding to the slave node is read first, and the currently generated abnormal information is compared with the historical abnormal record. If the current abnormal information occurs before, updating the occurrence time period and the occurrence frequency of the abnormal information; and if the current abnormal information does not occur before, adding an abnormal record for the current abnormal information. Specifically, the newly added abnormality record may include at least an occurrence period and an occurrence number of the current abnormality information.
S205: acquiring a pre-established abnormal rule file, wherein the abnormal rule file is used for recording an occurrence time period rule and an occurrence frequency threshold value required when each abnormal information reaches a reporting level;
s206: judging whether a rule record corresponding to the abnormal information exists in the abnormal rule file or not; if yes, go to step S207;
as a preferred implementation manner, in the embodiment of the present application, after determining whether a rule record corresponding to the abnormal information exists in the abnormal rule file, the method may further include: and if no rule record corresponding to the abnormal information exists, recording the nonexistent abnormal information into a log file and reporting the nonexistent abnormal information to the main node. The log file can be used for searching for missing errors later, and is convenient to maintain and update.
S207: judging whether the occurrence frequency of the abnormal information in the corresponding occurrence time period exceeds the occurrence frequency threshold value or not; if yes, go to step S208; if not, go to step S209;
s208: determining that the abnormal information reaches a reporting level, and reporting the abnormal information to a main node;
s209: determining that the abnormal information does not reach a reporting level, and placing the abnormal information into an abnormal queue corresponding to the slave node;
it should be noted that, in the embodiment of the present application, an exception rule file may be created in advance, where the file is used to record a rule for determining whether the exception information reaches the reporting level, and specifically, may record an occurrence time period rule and an occurrence frequency threshold value that are required when various exception information reaches the reporting level. Then, whether the abnormal information meets the occurrence time period rule and the occurrence frequency threshold value specified by the abnormal rule file or not can be compared, if yes, the abnormal information is judged to reach the reporting level, and the abnormal information is reported to the main node; if not, judging that the abnormal information does not reach the reporting level, and storing the abnormal information into an abnormal queue corresponding to the slave node.
S210: and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information.
The embodiment of the present application discloses a further data processing method for a distributed storage system, and compared with the previous embodiment, the present embodiment further describes and optimizes the technical solution. Referring to fig. 3, specifically:
s301: when abnormal information occurs in the slave node, judging whether the abnormal information reaches a reporting level; if yes, go to step S302; if not, go to step S303;
s302: reporting the abnormal information to a main node;
s303: placing the abnormal information into an abnormal queue corresponding to the slave node;
s304: traversing the abnormal information in the abnormal queue and acquiring an abnormal recovery file;
s305: judging whether recoverable operation corresponding to errors exists in the abnormal recovery file; if yes, go to step S306;
s306: and executing the command according to the recoverable operation to actively perform abnormal recovery.
In the embodiment of the application, the master node can periodically scan the abnormal queue of each slave node by using an independent process to read the abnormal information in the abnormal queue, and perform matching by using the abnormal information and the abnormal recovery file to find whether recoverable operation of an error corresponding to the abnormal information exists in the abnormal recovery file, and if so, execute a corresponding recovery command according to the recoverable operation to perform abnormal recovery processing.
It can be understood that, after the executing the command according to the recoverable operation to actively perform the abnormal recovery, the embodiment of the application may further include: if the recovery command returns to meet the requirement, deleting the abnormal information from the abnormal queue; and if the return of the recovery command is not in accordance with the requirement, the recovery operation is considered to be failed, and abnormal information is sequentially read from the abnormal queue to carry out abnormal recovery operation.
The following describes a data processing method of a distributed storage system provided by the present application with a specific implementation scenario. Specifically, a distributed storage system management architecture may be correspondingly constructed, and as shown in fig. 4, the architecture may include an exception determination engine and an exception routing recovery engine. The abnormity judgment engine is deployed on the slave nodes, when the abnormity appears in the slave nodes, firstly, whether the abnormity can be directly sent to the master node is judged, and if the abnormity does not need to be reported to the master node, the abnormity is temporarily recorded in an abnormity queue. If the abnormal condition needs to be reported to the main node, the abnormal condition is processed by the main node. The exception routing inspection recovery engine is deployed on the master node, and can read errors in the slave node exception queue and perform exception recovery on slight errors from the perspective of the whole cluster management. Therefore, partial slight abnormity can be actively solved, and the abnormal information is not reported to the user.
FIG. 5 is a schematic diagram of a workflow of the anomaly determination engine. As shown in fig. 5, when the slave node generates the abnormal information, the abnormal determination engine is triggered first, and the abnormal determination engine reads the historical abnormal record in the abnormal queue first, checks whether the received new abnormal occurs before, updates the abnormal occurrence time period and the abnormal occurrence frequency information if the received new abnormal occurs before, adds an abnormal record if the received new abnormal occurs before, and further reads the abnormal rule file for processing. If the abnormal rule file does not contain the abnormal rule record, the log file of the abnormal record is reported to the main node, and the log file can be used for searching errors missed by the engine in the later period, so that the maintenance and the updating of the engine are facilitated; if the abnormal type rule exists, judging whether the abnormal type rule needs to be sent to the main node immediately according to the rule; if not, it is not sent for the moment.
Fig. 6 is a schematic diagram of the work flow of the exception patrol restoration engine. As shown in fig. 6, the exception patrol recovery engine deployed on the master node acts as an independent process, and may periodically scan the exception queues in the slave nodes. The scanning process may specifically be: reading an abnormal message in the abnormal queue; comparing the abnormal data with an abnormal recovery file stored in an abnormal inspection recovery engine; if the recoverable operation corresponding to the error is not found in the abnormal recovery file, the current abnormal information is ignored, and the next piece of information is directly read and compared; if the corresponding error recoverable mode is found in the abnormal recovery file, executing the command according to the error recovering method, further, if the returned result of the recovery command meets the requirement, deleting the abnormal information in the abnormal queue to represent that the recovery operation is successful, and if the returned result of the recovery command does not meet the requirement, considering that the recovery operation is failed, and directly reading the next piece of abnormal information for comparison.
Fig. 7-a is a specific example of an exception recovery file of the exception routing inspection recovery engine, where the exception recovery file records the exception information classification and the corresponding recovery operation, so that it can be found whether there is a recovery operation corresponding to the type of the current exception information in the exception recovery file. Fig. 7-b and fig. 7-c are a specific example of an exception queue and an exception rule file of an exception decision engine, respectively, wherein the exception queue mainly records all exception information received from the exception decision engine in a node, which may include but is not limited to: the exception type, the exception reporting time point, the exception additional information and the like, and each exception received by the exception judgment engine is recorded in the exception queue. The abnormal rule file is a rule table obtained by counting a large number of abnormal types, occurrence time points, occurrence times and influences on a system, wherein when a certain abnormal condition meets the condition that the occurrence times of a certain time period reaches a threshold value, the abnormal condition is reported to the main node when the certain abnormal condition meets the condition that the occurrence times of the certain time period reaches the threshold value, otherwise, if the abnormal condition does not meet the reporting rule temporarily, the abnormal condition is not reported temporarily. The abnormity judgment engine realizes the accurate screening of the report of the error event by comparing the contents of the abnormity queue and the abnormity rule file, and improves the fault tolerance and the accuracy of the system alarm.
In the embodiment of the application, the coupling degree of the whole system can be reduced by separating two engines which can run independently and respectively deploying the two engines to the main node and the slave node. The two engines work on the main node and the slave nodes at the same time, and the abnormity is processed by the abnormity judgment engine and then is sent to the main node for abnormity secondary processing, so that the working pressure of the main node is reduced. The abnormity inspection engine deployed on the main node can periodically inspect the abnormity information on each slave node to perform low-level abnormity recovery operation, so that the trouble caused by low-level abnormity reporting to the user is reduced, and the user experience is improved. Meanwhile, the recovery of the abnormity and the judgment of the abnormity are both evaluated by an abnormity rule file, so that the accuracy of abnormity reporting is improved, the fault tolerance of the whole distributed storage system is integrally improved, and the high availability of the distributed cluster is ensured.
In the following, a data processing apparatus of a distributed storage system provided by an embodiment of the present application is introduced, and a data processing apparatus of a distributed storage system described below and a data processing method of a distributed storage system described above may be referred to each other.
Referring to fig. 8, a distributed storage system data processing apparatus according to an embodiment of the present application includes:
an exception reporting module 401, configured to determine, when an exception occurs in a slave node, whether the exception reaches a reporting level; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node;
and the exception handling module 402 is configured to traverse the exception information in the exception queue, screen out an error that can be directly recovered in the exception information, and actively perform exception recovery.
For the specific implementation process of the modules 401 and 402, reference may be made to the corresponding content disclosed in the foregoing embodiments, and details are not repeated here.
The present application further provides an electronic device, and as shown in fig. 9, an electronic device provided in an embodiment of the present application includes:
a memory 100 for storing a computer program;
the processor 200, when executing the computer program, may implement the steps provided by the above embodiments.
Specifically, the memory 100 includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and computer-readable instructions, and the internal memory provides an environment for the operating system and the computer-readable instructions in the non-volatile storage medium to run. The processor 200 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data processing chip in some embodiments, and provides computing and controlling capability for the electronic device, and when executing the computer program stored in the memory 100, the steps of the data processing method of the distributed storage system disclosed in any of the foregoing embodiments may be implemented.
On the basis of the above embodiment, as a preferred implementation, referring to fig. 10, the electronic device further includes:
and an input interface 300 connected to the processor 200, for acquiring computer programs, parameters and instructions imported from the outside, and storing the computer programs, parameters and instructions into the memory 100 under the control of the processor 200. The input interface 300 may be connected to an input device for receiving parameters or instructions manually input by a user. The input device may be a touch layer covered on a display screen, or a button, a track ball or a touch pad arranged on a terminal shell, or a keyboard, a touch pad or a mouse, etc.
And a display unit 400 connected to the processor 200 for displaying data processed by the processor 200 and for displaying a visualized user interface. The display unit 400 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like.
And a network port 500 connected to the processor 200 for performing communication connection with each external terminal device. The communication technology adopted by the communication connection can be a wired communication technology or a wireless communication technology, such as a mobile high definition link (MHL) technology, a Universal Serial Bus (USB), a High Definition Multimedia Interface (HDMI), a wireless fidelity (WiFi), a bluetooth communication technology, a low power consumption bluetooth communication technology, an ieee802.11 s-based communication technology, and the like.
While fig. 10 illustrates only an electronic device having the assembly 100 and 500, those skilled in the art will appreciate that the configuration illustrated in fig. 10 is not intended to be limiting of electronic devices and may include fewer or more components than those illustrated, or some components may be combined, or a different arrangement of components.
The present application also provides a computer-readable storage medium, which may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk. The storage medium has a computer program stored thereon, and the computer program realizes the steps of the data processing method of the distributed storage system disclosed in any one of the foregoing embodiments when executed by the processor.
When the slave nodes are abnormal, the slave nodes do not transmit the alarm event to the master node, but the judgment process is added in each slave node, the abnormal information is temporarily placed in the abnormal queue when the abnormal information is judged not to reach the reporting level and is not reported to the master node, and the abnormal information is reported only when the abnormal information is judged to reach the reporting level, so that the error reporting accuracy is improved, and meanwhile, the load of the master node is reduced. In addition, errors in the abnormal queue can be actively screened out, low-level errors which can be directly recovered can be recovered in advance, the fault tolerance of the whole system is improved, the load and logic complexity of the main node are further reduced, the trouble brought to a user by a large number of slight errors is reduced, and the user experience is improved.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for processing data in a distributed storage system, comprising:
when abnormal information occurs in the slave node, judging whether the abnormal information reaches a reporting level;
if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node;
and traversing the abnormal information in the abnormal queue, screening errors which can be directly recovered in the abnormal information, and actively recovering the abnormal information.
2. The data processing method according to claim 1, wherein before the determining whether the abnormal information reaches the reporting level, the method further comprises:
reading historical exception records in an exception queue corresponding to the slave node;
comparing the abnormal information with the historical abnormal record to judge whether the abnormal information appears;
if so, updating the occurring time period and the occurring times of the abnormal information;
if not, adding an abnormal record to the abnormal information which does not appear, wherein the abnormal record at least comprises the occurrence time period and the occurrence frequency of the abnormal information.
3. The data processing method of claim 2, wherein the determining whether the exception information reaches a reporting level comprises:
acquiring a pre-established abnormal rule file, wherein the abnormal rule file is used for recording an occurrence time period rule and an occurrence frequency threshold value required when each abnormal information reaches a reporting level;
judging whether a rule record corresponding to the abnormal information exists in the abnormal rule file or not;
if the abnormal information exists, judging whether the occurrence frequency of the abnormal information in the corresponding occurrence time period exceeds the occurrence frequency threshold value, if so, determining that the abnormal information reaches a reporting level; if not, determining that the abnormal information does not reach the reporting level.
4. The data processing method according to claim 3, further comprising, after the determining whether the rule record corresponding to the exception information exists in the exception rule file:
if the abnormal information does not exist, the abnormal information which does not exist is recorded into the log file and reported to the main node.
5. The data processing method according to any one of claims 1 to 4, wherein the screening out directly recoverable errors in the exception information, actively performing exception recovery comprises:
acquiring an abnormal recovery file;
judging whether recoverable operation corresponding to errors exists in the abnormal recovery file;
and if so, executing a command according to the recoverable operation to actively recover the abnormity.
6. The data processing method according to claim 5, further comprising, after said executing the command in accordance with the recoverable operation to proactively perform exception recovery:
if the recovery command returns to meet the requirement, deleting the abnormal information from the abnormal queue;
and if the return of the recovery command is not in accordance with the requirement, the recovery operation is considered to be failed, and abnormal information is sequentially read from the abnormal queue to carry out abnormal recovery operation.
7. The data processing method according to claim 1, wherein traversing the exception information in the exception queue, screening out directly recoverable errors in the exception information, and actively performing exception recovery comprises:
and the master node traverses the abnormal information in the abnormal queues respectively maintained by the slave nodes according to a preset polling period, screens out errors which can be directly recovered in the abnormal information, and actively recovers the abnormality.
8. A distributed storage system data processing apparatus, comprising:
the abnormal reporting module is used for judging whether the abnormal information reaches a reporting level or not when the abnormal information appears in the slave node; if not, the abnormal information is put into an abnormal queue corresponding to the slave node; if so, reporting the abnormal information to the main node;
and the exception handling module is used for traversing the exception information in the exception queue, screening errors which can be directly recovered in the exception information, and actively recovering the exception.
9. A distributed storage system data processing apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the distributed storage system data processing method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the distributed storage system data processing method according to any one of claims 1 to 7.
CN202010071657.2A 2020-01-21 2020-01-21 Data processing method, device and equipment of distributed storage system and storage medium Withdrawn CN111327685A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010071657.2A CN111327685A (en) 2020-01-21 2020-01-21 Data processing method, device and equipment of distributed storage system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010071657.2A CN111327685A (en) 2020-01-21 2020-01-21 Data processing method, device and equipment of distributed storage system and storage medium

Publications (1)

Publication Number Publication Date
CN111327685A true CN111327685A (en) 2020-06-23

Family

ID=71166168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010071657.2A Withdrawn CN111327685A (en) 2020-01-21 2020-01-21 Data processing method, device and equipment of distributed storage system and storage medium

Country Status (1)

Country Link
CN (1) CN111327685A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064765A (en) * 2021-04-26 2021-07-02 杭州海康威视数字技术股份有限公司 Node exception handling method and device, electronic equipment and machine-readable storage medium
CN113791922A (en) * 2021-07-30 2021-12-14 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN113923036A (en) * 2021-10-18 2022-01-11 北京八分量信息科技有限公司 Block chain information management method and device of continuous immune safety system
CN114866877A (en) * 2022-07-08 2022-08-05 山西交控生态环境股份有限公司 Sewage treatment remote data transmission method and system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113064765A (en) * 2021-04-26 2021-07-02 杭州海康威视数字技术股份有限公司 Node exception handling method and device, electronic equipment and machine-readable storage medium
CN113064765B (en) * 2021-04-26 2023-09-05 杭州海康威视数字技术股份有限公司 Node exception handling method, device, electronic equipment and machine-readable storage medium
CN113791922A (en) * 2021-07-30 2021-12-14 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN113791922B (en) * 2021-07-30 2024-02-20 济南浪潮数据技术有限公司 Exception handling method, system and device for distributed storage system
CN113923036A (en) * 2021-10-18 2022-01-11 北京八分量信息科技有限公司 Block chain information management method and device of continuous immune safety system
CN114866877A (en) * 2022-07-08 2022-08-05 山西交控生态环境股份有限公司 Sewage treatment remote data transmission method and system
CN114866877B (en) * 2022-07-08 2022-09-06 山西交控生态环境股份有限公司 Sewage treatment remote data transmission method and system

Similar Documents

Publication Publication Date Title
CN111327685A (en) Data processing method, device and equipment of distributed storage system and storage medium
US11308092B2 (en) Stream processing diagnostics
US11513935B2 (en) System and method for detecting anomalies by discovering sequences in log entries
CN110213068B (en) Message middleware monitoring method and related equipment
US8805999B2 (en) Administering event reporting rules in a distributed processing system
JP6160064B2 (en) Application determination program, failure detection apparatus, and application determination method
JP4506520B2 (en) Management server, message extraction method, and program
KR102139058B1 (en) Cloud computing system for zero client device using cloud server having device for managing server and local server
US20230047615A1 (en) Communication Device, Surveillance Server, and Log Collection Method
CN111143167B (en) Alarm merging method, device, equipment and storage medium for multiple platforms
JP2010182015A (en) Quality control system, quality control apparatus, and quality control program
JP2019057139A (en) Operation management system, monitoring server, method and program
CN110471945B (en) Active data processing method, system, computer equipment and storage medium
JP6418377B2 (en) Management target device, management device, and network management system
CN110659147B (en) Self-repairing method and system based on module self-checking behavior
CN111628924B (en) E-mail sending method, system, storage medium and electronic equipment
US11657321B2 (en) Information processing device, non-transitory storage medium and information processing method
JP2018180982A (en) Information processing device and log recording method
CN106534262A (en) Network information system fault switching method
JP2009245154A (en) Computer system, method, and computer program for evaluating symptom
JP5623950B2 (en) IT failure sign detection device and program
CN111324513B (en) Monitoring management method and system for artificial intelligence development platform
US20230336409A1 (en) Combination rules creation device, method and program
KR20130042438A (en) Method and apparatus for managing rfid resource
JP2007272328A (en) Computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200623

WW01 Invention patent application withdrawn after publication