CN115686922A - Anomaly detection method and device for distributed storage system - Google Patents

Anomaly detection method and device for distributed storage system Download PDF

Info

Publication number
CN115686922A
CN115686922A CN202211611973.XA CN202211611973A CN115686922A CN 115686922 A CN115686922 A CN 115686922A CN 202211611973 A CN202211611973 A CN 202211611973A CN 115686922 A CN115686922 A CN 115686922A
Authority
CN
China
Prior art keywords
segment
unit
segment unit
units
slave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211611973.XA
Other languages
Chinese (zh)
Inventor
陈靓
王中原
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Peng Yun Network Technology Co ltd
Original Assignee
Nanjing Peng Yun Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Peng Yun Network Technology Co ltd filed Critical Nanjing Peng Yun Network Technology Co ltd
Priority to CN202211611973.XA priority Critical patent/CN115686922A/en
Publication of CN115686922A publication Critical patent/CN115686922A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The application discloses an anomaly detection method and device for a distributed storage system, wherein the distributed storage system comprises a plurality of segments, each segment comprises a plurality of segment units, and the method comprises the following steps: sending requests to a plurality of segment units in each segment; receiving exception information, the exception information issued by one or more segment units in response to the request; and aiming at each segment unit which sends the abnormal information, determining the current state of each segment unit according to the return information of other segment units. The method can quickly detect whether the segment unit is abnormal or not, and further confirms the state of the segment unit by requesting other segment units in the segment under the condition that the segment unit is abnormal, thereby quickly and accurately determining (diagnosing) whether the state of the segment unit is abnormal or not and avoiding misdiagnosis.

Description

Anomaly detection method and device for distributed storage system
Technical Field
The present disclosure relates to the field of data storage, and in particular, to an anomaly detection method and apparatus for a distributed storage system, a storage medium, and an electronic device.
Background
A large-scale distributed storage system is often composed of a large number of segments (segments), each segment is composed of segment units (segment units) from different machines, each segment unit can be different devices on different machines, such as a segment of storage space on a storage medium, and the machines are connected through a network to form a distributed cluster. Multiple segment units may be located in the same node.
After a certain scale of distributed cluster is operated for a period of time, network abnormality between nodes and abnormality of devices (main board, network card, storage medium, etc.) inside a single node may occur. When these anomalies occur, which can cause one or more segments to fail to operate normally, correct and rapid anomaly detection and diagnosis for each segment is the basis for ensuring that the distributed storage system continues to operate in the event of an anomaly.
However, under the common influence of the cluster network abnormality, the single node internal device abnormality and other factors, it is relatively difficult to accurately diagnose the abnormality of a certain segment unit in a certain segment, especially on a distributed processing architecture.
In a conventional distributed storage system, a node failure problem and a network connection problem are often completed by operation and maintenance personnel. When the performance of the whole system is reduced or the whole system is unavailable, various operation and maintenance alarms of the host computer class and the network class are observed and analyzed, after the alarms are summarized and analyzed, a fault point and an influence range are confirmed, and then the manual mode needs to consume a large amount of manpower and cannot achieve real-time and accuracy.
Disclosure of Invention
The embodiment of the application provides an abnormality detection method and device for a distributed storage system, a storage medium, electronic equipment and a computer program product.
In a first aspect, an embodiment of the present application provides an anomaly detection method for a distributed storage system, which is used for an electronic device, where the distributed storage system includes a plurality of segments, each segment includes a plurality of segment units, and the method includes:
sending a request to the plurality of segment units in each segment;
receiving exception information, the exception information issued by one or more segment units in response to the request;
and aiming at each segment unit sending the abnormal information, determining the current state of each segment unit according to the return information of other segment units.
In a possible implementation of the first aspect, the request is a heartbeat signal, where the heartbeat signal is sent to the segment units at one or more predetermined time intervals, and the anomaly information is received from the segment unit or units when the timeout number of the heartbeat signal is greater than or equal to a predetermined threshold.
In a possible implementation of the first aspect, the request is a data operation request, where the data operation request is sent to the segment units and the exception information is received directly from the one or more segment units.
In one possible implementation of the first aspect, one or more of the predetermined time intervals are the same or different.
In one possible implementation of the first aspect described above, the heartbeat signal is sent each time at a different predetermined time interval or at the same predetermined time interval.
In one possible implementation of the first aspect, the plurality of segment units comprises a master segment unit and a plurality of slave segment units,
and when the received abnormal information comes from the main segment unit, determining the current state of the main segment unit according to the return information of the plurality of slave segment units.
In one possible implementation of the foregoing first aspect, the method further includes:
causing each of said slave segment units to send a detection request to said master segment unit;
counting the number of slave segment units that receive the return information from the master segment unit, wherein the return information is generated by the master segment unit in response to the detection request;
based on the number, a current state of the primary segment unit is determined.
In a possible implementation of the first aspect, when the number is greater than or equal to a predetermined threshold, the current state of the main segment unit is determined to be a normal state, and otherwise, the current state is determined to be an abnormal state.
In one possible implementation of the first aspect, the plurality of segment units comprises a master segment unit and a plurality of slave segment units,
and when the received abnormal information comes from the slave segment unit, determining the current state of the slave segment unit according to the return information of the master segment unit.
In a possible implementation of the first aspect, the method further includes:
causing the master segment unit to send a detection request to the slave segment unit;
determining the current state of the slave segment unit according to whether the master segment unit receives the return information from the slave segment unit, wherein the return information is generated by the slave segment unit in response to the detection request.
In a possible implementation of the first aspect, if the master segment unit receives the return information from the slave segment unit, it is determined that the current state of the slave segment unit is a normal state, and otherwise, it is determined that the current state is an abnormal state.
In a second aspect, an embodiment of the present application provides an anomaly detection apparatus for a distributed storage system, where the distributed storage system includes a plurality of segments, each segment includes a plurality of segment units, and the apparatus includes:
a sending unit configured to send a request to the plurality of segment units;
a receiving unit for receiving exception information issued by one or more segment units in response to the request;
and the determining unit is used for determining the current state of each segment unit according to the return information of other segment units aiming at each segment unit which sends the abnormal information.
In a third aspect, the present application provides a computer program product including computer executable instructions, which are executed by a processor to implement the anomaly detection method for a distributed storage system according to the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, where instructions are stored, and when executed on a computer, the instructions cause the computer to execute the anomaly detection method for a distributed storage system according to the first aspect.
In a fifth aspect, an embodiment of the present application provides an electronic device, including: one or more processors; one or more memories; wherein the one or more memories store one or more programs that, when executed by the one or more processors, cause the electronic device to perform the anomaly detection method for a distributed storage system of the first aspect.
According to the method and the device, whether the segment unit is abnormal or not can be detected quickly, and under the condition that the segment unit is abnormal, the state of the segment unit is further confirmed by requesting other segment units in the segment, so that whether the state of the segment unit is abnormal or not is determined (diagnosed) quickly and accurately, and misdiagnosis is avoided.
Drawings
FIG. 1 illustrates a block diagram of an electronic device, according to some embodiments of the present application;
FIG. 2 illustrates a schematic diagram of a distributed storage system, according to some embodiments of the present application;
FIG. 3 illustrates a flow diagram of an anomaly detection method for a distributed storage system, according to some embodiments of the present application;
FIG. 4 illustrates another flow chart of an anomaly detection method for a distributed storage system, according to some embodiments of the present application;
FIG. 5 illustrates yet another flow diagram of an anomaly detection method for a distributed storage system, according to some embodiments of the present application;
FIG. 6 illustrates a block diagram of an anomaly detection apparatus of a distributed storage system, according to some embodiments of the present application.
Detailed Description
The illustrative embodiments of the present application include, but are not limited to, a method, apparatus, medium, electronic device, and computer program product for anomaly detection for a distributed storage system.
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 illustrates a block diagram of an electronic device, according to some embodiments of the present application.
As shown in fig. 1, electronic device 100 may include one or more processors 102, a system motherboard 108 coupled to at least one of processors 102, a system memory 104 coupled to system motherboard 108, a non-volatile memory (NVM) 106 coupled to system motherboard 108, and a network interface 110 coupled to system motherboard 108.
The processor 102 may include one or more single-core or multi-core processors. The processor 102 may include any combination of general purpose processors (CPUs) and dedicated processors (e.g., graphics processors, application processors, baseband processors, etc.). A Graphics Processing Unit (GPU) is a special-purpose processor, has a higher order of core count and a stronger parallel computing capability than a general-purpose processor, and is widely used for computer graphics processing. In embodiments of the invention, the processor 102 may be configured to perform one or more of the various embodiments according to FIG. 3.
In some embodiments, system motherboard 108 may include any suitable interface controller (not shown in FIG. 1) to provide any suitable interface to at least one of processors 102 and/or any suitable device or component in communication with system motherboard 108.
In some embodiments, system motherboard 108 may include one or more memory controllers to provide an interface to system memory 104. System memory 104 may be used to load and store data and/or instructions 120. In some embodiments, system memory 104 of electronic device 100 may include any suitable volatile memory, such as suitable Dynamic Random Access Memory (DRAM).
Non-volatile memory 106 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions 120. In some embodiments, the non-volatile memory 106 may include any suitable non-volatile memory such as flash memory and/or any suitable non-volatile storage device, such as at least one of a HDD (Hard Disk Drive), a CD (Compact Disc) Drive, a DVD (Digital Versatile Disc) Drive.
The non-volatile memory 106 may comprise a portion of the memory resources installed on the device of the electronic device 100, or it may be accessible by, but not necessarily a part of, an external device. For example, the non-volatile memory 106 may be accessed over a network via the network interface 110.
In particular, the system memory 104 and the non-volatile storage 106 may each include: a temporary copy and a permanent copy of the instruction 120. The instructions 120 may include: instructions that, when executed by at least one of the processors 102, cause the electronic device 100 to implement the method as shown in fig. 3. In some embodiments, the instructions 120, hardware, firmware, and/or software components thereof may additionally/alternatively be disposed in the system motherboard 108, the network interface 110, and/or the processor 102.
Network interface 110 may include a transceiver to provide a radio interface for electronic device 100 to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, the network interface 110 may be integrated with other components of the electronic device 100. For example, the network interface 110 may be integrated with at least one of the processors 102, the system memory 104, the non-volatile storage 106, and a firmware device (not shown) having instructions that, when executed by at least one of the processors 102, the electronic device 100 implements one or more of the various embodiments illustrated in FIG. 3.
The network interface 110 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 110 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
In one embodiment, at least one of the processors 102 may be packaged together with one or more controllers for a system motherboard 108 to form a System In Package (SiP). In one embodiment, at least one of the processors 102 may be integrated on the same die with one or more controllers for the system motherboard 108 to form a system on a chip (SoC).
The electronic device 100 may further include: input/output (I/O) devices 112 are connected to system motherboard 108. I/O device 112 may include a user interface to enable a user to interact with electronic device 100; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 100. In some embodiments, the electronic device 100 further comprises a sensor for determining at least one of environmental conditions and location information associated with the electronic device 100.
In some embodiments, the I/O devices 112 may include, but are not limited to, a display (e.g., a liquid crystal display, a touch screen display, etc.), a speaker, a microphone, one or more cameras (e.g., still image cameras and/or video cameras), a flashlight (e.g., a light emitting diode flash), a keyboard, and a graphics card. The display card is composed of a graphic processor integrated with an I/O interface (such as a PCIE interface) conforming to the data transmission protocol specification.
In some embodiments, the peripheral component interfaces may include, but are not limited to, a non-volatile memory port, an audio jack, and a power interface.
In some embodiments, the sensors may include, but are not limited to, a gyroscope sensor, an accelerometer, a proximity sensor, an ambient light sensor, and a positioning unit. The positioning unit may also be part of the network interface 110 or interact with the network interface 110 to communicate with components of a positioning network, such as Global Positioning System (GPS) satellites.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Program code may be applied to input instructions to perform the functions described in this disclosure and to generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a system for processing instructions that includes processor 102 includes any system having a processor such as a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this disclosure are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
One or more aspects of at least one embodiment may be implemented by instructions stored on a computer-readable storage medium, which when read and executed by a processor, enable an electronic device to implement the methods of the embodiments described herein.
For ease of understanding, the following description will first describe related terms and concepts to which embodiments of the present application may relate.
And (3) rolling: a storage area in a distributed storage system.
Section (2): the basic logical unit of the volume is formed, one volume is provided with n sections (n is larger than or equal to 1), and the number of the sections in one volume = the size of the volume/the size of the sections.
A segment unit: the basic units or physical units that make up the segment. There are a variety of segment units with different division, including a master segment unit and a slave segment unit. The segment unit corresponds to a certain segment of continuous space on a physical disk and can store user data.
Storage medium: a storage medium, such as a physical disk, for storing data.
A storage node: a storage node may contain a plurality of storage media, and one storage medium may belong to only one storage node.
It is understood that both storage nodes and physical disks belong to the physical concept, and a storage node is used for managing a plurality of physical disks (i.e., data disks), and at the same time, the same physical disk is managed by only one storage node. The volume is a logical concept, and different storage block spaces can be divided on a certain physical disk to form different volumes, that is, segment units of the same volume can be distributed on different physical disks.
The anomaly detection method for the distributed storage system provided by the present application can be applied to the electronic device 100 shown in fig. 1, where the electronic device 100 is, for example, a data driver, a computer, a mobile device, a platform device, and the like.
Fig. 2 is a schematic diagram of a distributed storage system 20 according to an embodiment of the present invention. As shown in fig. 2, the distributed storage system 20 includes a data drive 21, a segment 22. The segment 22 comprises, for example, one master segment unit P1 and two slave segment units S1, S2.
It will be appreciated that the distributed storage system 20 may include a plurality of data drives and a plurality of volumes (not shown), each volume including a plurality of segments, each segment including a plurality of segment units. The segment 22 in fig. 2 is located in one of the volumes.
FIG. 3 is a flow chart of an anomaly detection method for a distributed storage system in accordance with the present invention. In the present embodiment, the data driver 21 is taken as an example for explanation, and it is understood that the following method is executed by the data driver 21.
Referring to fig. 3, in step S31, a request is sent to a plurality of segment units in each segment. It will be appreciated that the multiple segment units in each segment may be located in different storage nodes, and that sending the request to the multiple segment units in each segment may be accomplished by sending the request to these storage nodes.
In step S32, exception information is received, the exception information being issued by one or more segment units in response to the request.
It will be appreciated that the segment units from which exception information originates may belong to different segments, located in different storage nodes.
In step S33, for each segment unit that has sent out the exception information, the current state of each segment unit is determined according to the return information of other segment units.
Illustratively, for a segment unit which sends out exception information, the current state of the segment unit is determined according to the return information of other segment units belonging to the same segment as the segment unit.
Illustratively, the request is, for example, a heartbeat signal, the heartbeat signal is transmitted to a plurality of segment units in each segment at one or more predetermined time intervals, and the anomaly information is received from the one or more segment units when the timeout number of the heartbeat signal is greater than or equal to a predetermined threshold value.
The one or more predetermined time intervals may be different. For example, there are three predetermined time intervals T1, T2, T3, T1=100ms, T2=200ms, T3=400ms. Alternatively, the one or more predetermined time intervals may be the same, e.g. three predetermined time intervals T1= T2= T3=100ms.
It is understood that the number of predetermined time intervals and the specific time intervals exemplified above are only for example and are not limited in any way.
The predetermined threshold may be adjusted according to a network environment, and may be any value without limitation.
Illustratively, the predetermined threshold is, for example, 3. For example, the first predetermined time interval T1=100ms, the heartbeat signal 1 is transmitted to the plurality of segment units at the first predetermined time interval T1. When this heartbeat signal 1 does not return normally to cause a timeout, the interval at which the heartbeat signal is transmitted later may be set to, for example, the second predetermined time interval T2=200ms. Thus, the next heartbeat detection 2 is sent after 200ms.
If the heartbeat detection 2 returns normally, the time interval after which the heartbeat signal is transmitted is restored to T1=100ms. When this heartbeat signal 2 does not return normally to cause a time-out, the interval at which the heartbeat signal is transmitted later is set to, for example, the third predetermined time interval T3=400ms. Thus, the next heartbeat detection 3 is sent after 400ms.
If the heartbeat detection 3 returns normally, the time interval after which the heartbeat signal is transmitted is restored to T1=100ms. When the heartbeat signal 3 does not return normally to cause timeout, and the number of times of timeout of the heartbeat signal is three, and is greater than or equal to the predetermined threshold 3, it may be determined that the network environment is abnormal.
It can be understood that, after a total detection time of T1+ T2+ T3=700ms, it is determined that the network environment is abnormal. Through the network detection mechanism of millisecond level, the abnormity of the network environment can be ensured to be quickly sensed so as to carry out subsequent processing logic.
It is understood that the network environment occurring abnormality may be a storage medium failure, a storage node failure, a network failure, and the like.
In the above example, the heartbeat signal is transmitted at different predetermined time intervals each time. It will be appreciated that the heartbeat signals can be sent at the same predetermined time interval each time.
Illustratively, the request is, for example, a data operation request, wherein the data operation request is sent to a plurality of segment units, and the exception information is received directly from one or more segment units.
It is understood that the data operation request is, for example, a request for reading or writing data in each segment unit, and may also be a request for performing other operations on data.
For example, when a storage node fails, all segment units located in the storage node directly return exception information in response to the data operation request. It will be appreciated that in this case, anomaly information can be returned more quickly than the detection mechanism described above using the heartbeat signal.
As described above, it is possible to quickly detect that an abnormality occurs in the network environment and receive abnormality information from a part of segment units. However, if a certain segment unit is given out abnormal information due to, for example, a short jitter of the network, and it is immediately determined that the segment unit is in an abnormal state, it may cause erroneous judgment and thus unnecessary migration of a large amount of data.
The process of further state determination for these segment units that issue exception information is described in detail below.
Each segment includes a plurality of segment units including a master segment unit and a plurality of slave segment units. That is, each segment includes one master segment unit and a plurality of slave segment units. Illustratively, as shown in fig. 2, the segment 22 includes one master segment unit P1 and two slave segment units S1, S2.
When the received abnormal information comes from the main segment unit P1, the current state of the main segment unit P1 is determined according to the return information of the two sub segment units S1 and S2. It can be understood that the current state of the master segment unit P1 is determined based on the returned information of the two slave segment units S1, S2 located at the same segment as the master segment unit P1.
Referring to FIG. 4, in step S3311, each slave segment unit S1, S2 is caused to send a detection request to the master segment unit P1. It will be appreciated that each slave segment unit S1, S2 sends a detection request to the master segment unit P1 by sending a detection instruction to each slave segment unit S1, S2.
In step S3312, the number of slave segment units that have received return information from the master segment unit P1, the return information being generated by the master segment unit P1 in response to the detection request, is counted.
It will be appreciated that when the master segment unit P1 is normal, return information is generated in response to the detection request and sent back to the slave segment units S1, S2. In contrast, when the main segment unit P1 is abnormal, no return information is sent.
If both slave segment units S1, S2 receive return information, the counted number is 2.
In step S3313, based on the number, the current state of the main segment unit P1 is determined. Illustratively, when the number is greater than or equal to a predetermined threshold, the current state of the main segment unit P1 is determined to be a normal state, otherwise, the current state is determined to be an abnormal state.
The predetermined threshold is determined, for example, based on most principles, i.e., the predetermined threshold = [ N/2] +1, N denotes the number of segment units in this segment, and [ N/2] denotes rounding it. For example, if the number N of segment units in this segment is 3, then the predetermined threshold value = [3/2] +1=2.
In this embodiment, if the number obtained by statistics is 2 and is greater than or equal to the predetermined threshold 2, it may be determined that the current state of the main segment unit P1 is a normal state.
It is understood that if no return information is received by both the slave segment units S1, S2, the counted number is 0, and is smaller than the predetermined threshold 2, and the current state of the master segment unit P1 is determined to be an abnormal state.
It will be appreciated that if an exception also occurs in either of the slave segment units S1, S2 itself, it will not receive any return information either. For example, if the slave segment unit S1 itself has an abnormality and does not receive any return information, and the slave segment unit S2 is normal and receives return information, the counted number is 1 and is smaller than the predetermined threshold 2, and it is determined that the current state of the master segment unit P1 is an abnormal state.
It can be understood that when the master segment unit P1 sends the abnormal information, the judgment is further performed according to the return information received by the slave segment units S1 and S2 from the master segment unit P1, so that whether the state of the master segment unit P1 is normal or abnormal can be accurately determined, and the erroneous judgment is avoided.
On the other hand, when the received exception information comes from the slave segment unit, the current state of the slave segment unit is determined according to the return information of the master segment unit.
For example, the slave segment unit S1 issues abnormality information, and the current state of the slave segment unit S1 is determined based on the return information of the master segment unit P1.
Referring to fig. 5, in step S3321, the master segment unit P1 is caused to transmit a detection request to the slave segment unit S1. It will be appreciated that the master segment unit P1 is caused to send a detection request to the slave segment unit S1 by sending a detection instruction to the master segment unit P1.
In step S3322, the current status of the slave segment unit S1 is determined according to whether the master segment unit P1 receives return information from the slave segment unit S1, wherein the return information is generated by the slave segment unit S1 in response to the detection request.
It will be appreciated that when the slave segment unit S1 is normal, return information is generated in response to the detection request and sent back to the master segment unit P1. In contrast, when an abnormality occurs in the slave segment unit S1, no return information is sent.
If the master segment unit P1 receives the return information from the slave segment unit S1, the current state of the slave segment unit S1 is determined to be a normal state, otherwise, the current state is determined to be an abnormal state.
According to the method and the device, whether the segment unit is abnormal or not can be detected quickly, and under the condition that the segment unit is abnormal, the state of the segment unit is further confirmed by requesting other segment units in the segment, so that whether the state of the segment unit is abnormal or not is determined (diagnosed) quickly and accurately, and misdiagnosis is avoided.
The present invention also provides an anomaly detection apparatus 60 for a distributed storage system comprising a plurality of segments, each segment comprising a plurality of segment units. As shown in fig. 6, the apparatus 60 includes: a sending unit 61 configured to send a request to the segment units; a receiving unit 62 for receiving exception information issued by one or more segment units in response to the request; and the determining unit 63 is configured to determine, for each segment unit that sends out the exception information, a current state of each segment unit according to the return information of other segment units.
It is understood that the sending unit 61, the receiving unit 62 and the determining unit 63 can be realized by the processor 102 having these modules or unit functions in the electronic device 100. The embodiments disclosed above are method embodiments corresponding to the present embodiment, and the present embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The present invention also provides a computer program product comprising computer executable instructions for execution by the processor 102 to implement the anomaly detection method for a distributed storage system of the present invention.
The present invention also provides a computer-readable storage medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the inventive anomaly detection method for a distributed storage system.
It is noted that, in the examples and description of the present patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.
It should be noted that the order of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Moreover, those skilled in the art will appreciate that although some embodiments described herein include some features included in other embodiments, not others, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Claims (15)

1. An anomaly detection method for a distributed storage system for an electronic device, the distributed storage system comprising a plurality of segments, each segment comprising a plurality of segment units, the method comprising:
sending a request to the plurality of segment units in each segment;
receiving exception information, the exception information issued by one or more segment units in response to the request;
and aiming at each segment unit sending the abnormal information, determining the current state of each segment unit according to the return information of other segment units.
2. The abnormality detection method according to claim 1, characterized in that said request is a heartbeat signal, wherein said heartbeat signal is transmitted to said plurality of segment units at one or more predetermined time intervals, and said abnormality information is received from said one or more segment units when the number of timeouts of said heartbeat signal is equal to or greater than a predetermined threshold value.
3. The anomaly detection method of claim 1, wherein said request is a data operation request, wherein said data operation request is sent to said plurality of segment units and said anomaly information is received directly from said one or more segment units.
4. The abnormality detection method according to claim 2, characterized in that one or more of the predetermined time intervals are the same or different.
5. The abnormality detection method according to claim 4, characterized in that the heartbeat signal is transmitted at different predetermined time intervals or at the same predetermined time interval each time.
6. The abnormality detection method according to any one of claims 1-5, characterized in that a plurality of segment units includes one master segment unit and a plurality of slave segment units,
and when the received abnormal information comes from the main segment unit, determining the current state of the main segment unit according to the return information of the plurality of slave segment units.
7. The abnormality detection method according to claim 6, characterized by further comprising:
causing each of said slave segment units to send a detection request to said master segment unit;
counting the number of the slave segment units that receive the return information from the master segment unit, wherein the return information is generated by the master segment unit in response to the detection request;
based on the number, a current state of the primary segment unit is determined.
8. The abnormality detection method according to claim 7, characterized in that when the number is equal to or greater than a predetermined threshold value, the current state of the main segment unit is determined to be a normal state, and otherwise, an abnormal state is determined.
9. The abnormality detection method according to any one of claims 1-5, characterized in that a plurality of segment units includes one master segment unit and a plurality of slave segment units,
and when the received abnormal information comes from the slave segment unit, determining the current state of the slave segment unit according to the return information of the master segment unit.
10. The abnormality detection method according to claim 9, characterized by further comprising:
causing the master segment unit to send a detection request to the slave segment unit;
determining the current state of the slave segment unit according to whether the master segment unit receives the return information from the slave segment unit, wherein the return information is generated by the slave segment unit in response to the detection request.
11. The abnormality detection method according to claim 10, characterized in that if said master segment unit receives said return information from said slave segment unit, it is determined that said current state of said slave segment unit is a normal state, and otherwise it is determined as an abnormal state.
12. An anomaly detection apparatus for a distributed storage system, the distributed storage system comprising a plurality of segments, each segment comprising a plurality of segment units, the apparatus comprising:
a sending unit configured to send a request to the plurality of segment units;
a receiving unit for receiving exception information issued by one or more segment units in response to the request;
and the determining unit is used for determining the current state of each segment unit according to the return information of other segment units aiming at each segment unit which sends the abnormal information.
13. A computer program product comprising computer executable instructions, wherein the instructions are executed by a processor to implement the anomaly detection method of any one of claims 1-10.
14. A computer-readable storage medium having stored thereon instructions that, when executed on a computer, cause the computer to perform the anomaly detection method of any one of claims 1-11.
15. An electronic device, comprising:
one or more processors;
one or more memories;
wherein the one or more memories store one or more programs that, when executed by the one or more processors, cause the electronic device to perform the anomaly detection method of any of claims 1-11.
CN202211611973.XA 2022-12-15 2022-12-15 Anomaly detection method and device for distributed storage system Pending CN115686922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211611973.XA CN115686922A (en) 2022-12-15 2022-12-15 Anomaly detection method and device for distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211611973.XA CN115686922A (en) 2022-12-15 2022-12-15 Anomaly detection method and device for distributed storage system

Publications (1)

Publication Number Publication Date
CN115686922A true CN115686922A (en) 2023-02-03

Family

ID=85055474

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211611973.XA Pending CN115686922A (en) 2022-12-15 2022-12-15 Anomaly detection method and device for distributed storage system

Country Status (1)

Country Link
CN (1) CN115686922A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116111731A (en) * 2023-04-13 2023-05-12 东莞先知大数据有限公司 Distributed energy storage equipment abnormality determination method, device, equipment and medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116111731A (en) * 2023-04-13 2023-05-12 东莞先知大数据有限公司 Distributed energy storage equipment abnormality determination method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN111274059B (en) Software exception handling method and device of slave device
WO2023185767A1 (en) Slow disk drive detection method and apparatus, and electronic device and storage medium
CN115686922A (en) Anomaly detection method and device for distributed storage system
US10157005B2 (en) Utilization of non-volatile random access memory for information storage in response to error conditions
WO2019000206A1 (en) Methods and apparatus to perform error detection and/or correction in a memory device
US9720756B2 (en) Computing system with debug assert mechanism and method of operation thereof
CN118426689A (en) Data processing method, system and equipment for solid state disk
CN106030544B (en) Method for detecting memory of computer equipment and computer equipment
WO2024016864A1 (en) Processor, information acquisition method, single board and network device
US20230367515A1 (en) Storing and recovering critical data in a memory device
CN110347639B (en) System on chip and method of operation thereof
US10962593B2 (en) System on chip and operating method thereof
US20130145137A1 (en) Methods and Apparatus for Saving Conditions Prior to a Reset for Post Reset Evaluation
US7979644B2 (en) System controller and cache control method
CN116450473A (en) Method for positioning memory stepping problem and electronic equipment
CN115599316B (en) Distributed data processing method, apparatus, device, medium, and computer program product
CN114461479A (en) Method and device for debugging multimedia processing chip, storage medium and electronic equipment
CN111596199A (en) Test chip, integrated circuit test method and system and detection equipment
CN115794168A (en) Updating method and device for distributed storage system
US10877921B2 (en) Methods and apparatus to extend USB-C software support to non-USB-C devices
CN113535494B (en) Equipment debugging method and electronic equipment
EP4414853A1 (en) System and method for fault page handling
TWI795950B (en) Hard disk monitoring method, electronic device, and storage medium
US20240143112A1 (en) Touch sensing apparatus, electronic device and touch operation recording method thereof
CN115729476A (en) Data writing method and device for distributed storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination