CN117389792A - Fault checking method and device, storage medium and electronic equipment - Google Patents

Fault checking method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117389792A
CN117389792A CN202311712644.9A CN202311712644A CN117389792A CN 117389792 A CN117389792 A CN 117389792A CN 202311712644 A CN202311712644 A CN 202311712644A CN 117389792 A CN117389792 A CN 117389792A
Authority
CN
China
Prior art keywords
service
fault
target
link
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311712644.9A
Other languages
Chinese (zh)
Inventor
李绍华
刘仪阳
侯聪聪
李海燕
张婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202311712644.9A priority Critical patent/CN117389792A/en
Publication of CN117389792A publication Critical patent/CN117389792A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0775Content or structure details of the error report, e.g. specific table structure, specific error fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0778Dumping, i.e. gathering error/state information after a fault for later diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/144Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Abstract

The specification discloses a fault detection method, a fault detection device, a storage medium and electronic equipment. In the fault detection method provided by the specification, a target tracking identifier input by a user is obtained; determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link; collecting operation data of each service contained in the target service link; determining faults generated in each service according to the operation data; generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, wherein the public field is used for representing fault information.

Description

Fault checking method and device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a fault detection method, a fault detection device, a storage medium, and an electronic device.
Background
Nowadays, with the popularization of distributed systems and the increase of service architecture complexity, more and more services are involved in service flows, and the lengthy characteristic of service links and complex calling relations between services make the troubleshooting and diagnosis of system faults very difficult.
At present, when troubleshooting a fault in a service link, a user who is required to troubleshoot the problem is familiar with the whole link flow and internal call logic, and each service in the link is troubleshooted one by one, the problem is located, and time and energy are consumed. In many cases, even multiple people are needed to assist in the investigation, further increasing the communication cost.
Therefore, how to realize accurate and efficient troubleshooting is a problem to be solved.
Disclosure of Invention
The present disclosure provides a fault detection method, a fault detection device, a storage medium and an electronic device, so as to at least partially solve the above-mentioned problems in the prior art.
The technical scheme adopted in the specification is as follows:
the specification provides a fault detection method, which comprises the following steps:
acquiring a target tracking identifier input by a user;
determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link;
collecting operation data of each service contained in the target service link;
determining faults generated in each service according to the operation data;
generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, wherein the public field is used for representing fault information.
Optionally, collecting operation data of each service included in the target service link specifically includes:
collecting operation data of each service contained in the target service link according to a first appointed period;
determining faults generated in each service, which specifically comprises:
determining faults generated in each service according to a second designated period;
wherein the second specified period is not less than the first specified period.
Optionally, determining, according to the operation data, a fault generated in each service specifically includes:
according to the execution sequence of each service in the target service link, sequentially detecting whether faults occur in each service contained in the target service link according to the operation data;
and when determining the fault generated in any service in the target service link, terminating detection.
Optionally, according to the operation data, detecting whether a fault occurs in each service included in the target service link in sequence, specifically includes:
for each service contained in the target service link in turn, determining that the service has a fault when detecting that the service has a first type of abnormality;
and when detecting that the service has the abnormality of the second type, determining that the service fails in the previous service in the target business link.
Optionally, the common field includes at least one of a time of failure, a trace identifier, a service description, an error code, an error description, and a calling direction.
Optionally, generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, which specifically includes:
filling preset field values of all public fields according to the fault information of the fault;
generating a fault log according to the public fields and the field values of the public fields;
the fault log is stored in a pre-established index to enable the user to query the index.
Optionally, the fault log is stored in a pre-established index, so that the user queries the index, which specifically includes:
storing the fault log in a pre-established index;
and responding to the query operation of the user, and displaying the fault log to the user in the form of a topological graph, wherein each node in the topological graph represents each service in the target service link, each node is connected according to the execution sequence of each service in the target service link, and the node information of each node comprises the fault log of the service corresponding to the node.
The device of troubleshooting that this specification provided, the device includes:
the acquisition module is used for acquiring a target tracking identifier input by a user;
the matching module is used for determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link;
the acquisition module is used for acquiring the operation data of each service contained in the target service link;
the determining module is used for determining faults generated in each service according to the operation data;
the generating module is used for generating a fault log according to the faults and preset public fields and feeding the fault log back to the user, and the public fields are used for representing fault information.
The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described fault detection method.
The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above described fault detection method when executing the program.
The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:
in the fault detection method provided by the specification, a target tracking identifier input by a user is obtained; determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link; collecting operation data of each service contained in the target service link; determining faults generated in each service according to the operation data; generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, wherein the public field is used for representing fault information.
When the fault detection method provided by the specification is adopted to detect faults generated in the target service link, after the target tracking identification is acquired, the operation data of each service in the target service link corresponding to the target tracking identification can be acquired, the faults generated in each service are respectively determined according to the operation data, and a fault log is generated by combining with a preset public field and fed back to a user. The method can be used for more commonly realizing fault detection in complex links and distributed system application, can intuitively display faults to users, realizes the effect of judging the cause of the faults through pages, and completes decoupling of technicians and operation and maintenance personnel.
Drawings
The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:
FIG. 1 is a schematic flow chart of a fault detection method in the present specification;
FIG. 2 is a schematic diagram of a fault log presented in the form of a topology map in the present specification;
FIG. 3 is a schematic diagram of another fault log in the form of a topology diagram in the present specification;
FIG. 4 is a schematic diagram of a fault detection device provided in the present specification;
fig. 5 is a schematic system structure diagram of the fault checking device provided in the present disclosure when in use;
fig. 6 is a schematic view of the electronic device corresponding to fig. 1 provided in the present specification.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present application based on the embodiments herein.
The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a fault detection method in the present specification, which specifically includes the following steps:
s100: and acquiring a target tracking identifier input by a user.
All steps in the troubleshooting method provided in the present specification can be implemented by any electronic device having a computing function, such as a terminal, a server, and the like.
In the process of actually executing the service, a fault may be reported, and the service cannot be executed continuously. At this time, it is necessary to check and repair a fault which may occur in the service. In many cases, however, a fault may occur in any service in a traffic link, and it is often impossible to determine which link of the traffic has failed from the appearance of the fault alone. For example, assuming that a user is using a computer and the network is abnormal, the computer may exhibit an abnormality from the user's perspective that is simply not able to connect to the network. In practice, however, there are many problems that may occur, such as a failure at the network operator, a fiber failure, a network cable failure, a failure of the computer hardware itself, etc.
It will be appreciated that in many cases, it is desirable to address a problem that arises in the process of executing a service, and many links in the service link need to be examined. Thus, in the present method, the target tracking identifier corresponding to the target traffic link may be first acquired for use in a subsequent step.
S102: and determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link.
After the target tracking identifier is obtained in step S100, in this step, a target service link corresponding to the target tracking identifier may be determined.
One service link contains all the services that need to be performed when executing the corresponding service, typically corresponding to the operations that the user needs to perform during the service. For example, in an online shopping service, the service link may include services such as entering a store, picking up goods, placing an order, paying for, etc.
In the method, the corresponding relation between the service link and the tracking identifier can be preset. In the preset corresponding relation, each different service link corresponds to a unique tracking identifier. Based on this, the corresponding target service link can be determined by the target tracking identifier.
S104: and collecting the operation data of each service contained in the target service link.
After determining the target service link to be subjected to fault detection, in this step, data acquisition may be performed on each service included in the target service link for use in a subsequent step.
There are a number of ways in which the operational data for each service contained in the target traffic link may be collected. One specific embodiment of this specification is given herein for reference, and specifically, a log collector such as a filebean may be employed. And (3) by deploying the filebean in each service system, writing the log file of the fixed file name prefix under the fixed path according to the fixed format, and collecting the log content of the service by the filebean as running data. The collected operation data may be different according to the kind of service, and the present specification does not particularly limit this.
S106: and determining faults generated in each service according to the operation data.
Based on the operation data collected in step S104, it can be determined in this step whether and what kind of faults exist in each service.
Further, in determining a failure generated in the target traffic link, there may be no need to analyze and detect all services in the target traffic link in many cases. Specifically, according to the execution sequence of each service in the target service link, according to the operation data, whether faults occur in each service contained in the target service link or not can be sequentially detected; and when determining the fault generated in any service in the target service link, terminating detection.
For any service link implementation, the services contained in the service link implementation have a definite sequence when being executed. That is, in executing one service, it is required to be fixedly implemented in the order of service a→service b→service C … …. Once a service in a service link fails, the execution of the service is typically stuck on the service and is not continued. Even if the backward execution is continued, in the service link, all the services after the failed service are reported by the failure generated before and cannot be executed normally. Therefore, when it is determined that there is a failed service in the target traffic link, it is probably meaningless to continue the backward detection, and the detection may be stopped, and the failure is preferentially solved.
Additionally, during troubleshooting, a partial fault generated in one service may not be directly detected and needs to be determined by virtue of anomalies in the operational data of the service downstream thereof. Therefore, in performing the troubleshooting, specifically, for each service included in the target service link in turn, when detecting that the service has the first type of abnormality, it may be determined that the service has failed; and when detecting that the service has the abnormality of the second type, determining that the service fails in the previous service in the target business link.
Wherein the second type of exception may include, but is not limited to, unresponsiveness, starvation, etc. exceptions caused by timeout, excessive amounts; the first type of exception may include any exception other than the second type of exception. In some cases, when a service node generates a timeout or an excessive amount, the service node may not directly detect the service node itself, but may cause an anomaly in the next service of the service in the service link. At this time, a failure in the service can be detected by the operation data of the next service of the service.
The embodiment for realizing the fault checking method provided in the specification mainly aims at the situation that errors occur when a user executes a service, and the fault checking is performed according to the requirement of the user. However, it should be noted that, in addition to the above, the method may also be used for long-term detection of a certain fixed target service link. Specifically, when collecting operation data of each service included in a target service link, the operation data of each service included in the target service link may be collected according to a first specified period; in determining faults generated in each service, the faults generated in each service can be determined according to a second designated period; wherein the second specified period is not less than the first specified period.
In practical applications, there are many services that can be implemented long-term, automatically. For such traffic, the user is likely not to be able to discover the first time after a failure. The method can further carry out long-term fault investigation on the target service link by the user. And collecting operation data of each service according to the first specified period, and analyzing faults generated in each service according to the second specified period. The first specified period and the second specified period can be set according to requirements, and the second specified period is only required to be ensured to be not smaller than the first specified period. For example, the first specified period may be one minute, two minutes, five minutes, or the like, and the second specified period may be one hour, eight hours, one day, or the like, which is not particularly limited in this specification.
S108: generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, wherein the public field is used for representing fault information.
According to the specific fault determined in step S106, in this step, a fault log is generated by combining the preset common fields, and is fed back to the user. A separate fault log may be generated for each service in the target traffic link. In generating the fault log, the collected operation data may be received, for example, by kafka, and the data matched with the common field in the operation data may be extracted and consumed as the fault log. It should be noted that, in the case that no service fails, a fault log may be selectively generated, so as to provide the user with a query for each link in the service link.
Wherein the common field is used for representing specific information of the fault, and contents in the common field can include at least one of fault time, tracking identification, service description, error code, error description and calling direction. The tracking identification corresponds to the service link, the whole link is unique, the service initial section acquires and then transmits the service link backward, and the service link is used in the subsequent service of the service link and the generated fault log until the service link is ended; the service identifier is a unique identifier of one service in a service deployment environment and is used for coping with the situation of multiple copies, namely, different copies of the same service are distinguished; the service description is a description of the service to which the fault log belongs, and the content is different according to the service; the service description is the description of the service corresponding to the target service link, the dimension covered by the service is larger, and a plurality of services can be covered; the error codes are used for describing the running condition of the fault log, different error codes correspond to different faults, and the condition without the faults also corresponds to one error code independently; the error description is used for describing specific description information of the running condition of the fault log, and fuzzy inquiry is provided for pages; calling direction, which represents the running direction of the service, namely the adjacent relation and execution sequence among the services. Of course, the foregoing is only some embodiments of the content that may be included in the common field, and the user may further add or subtract the common field included in the fault log in the practical application process in combination with specific requirements, which is not specifically limited in this specification.
After the fault detection is completed and the corresponding fault log is generated by combining the public fields, the fault log can be directly displayed to the user. However, in many cases, the user cannot view the fault log in the first time, so the fault log can be saved first and the user can wait for the query. Specifically, according to the fault information of the fault, filling the preset field value of each public field; generating a fault log according to the public fields and the field values of the public fields; the fault log is stored in a pre-established index to enable the user to query the index. The index may be, for example, elasticsearch (ES) index, and the present specification does not limit this.
More preferably, when the fault log is displayed to the user, a topological graph mode can be adopted, so that the user can more clearly and intuitively see the problem, the time for the user to extract key information from the fault log is reduced, and the user experience is optimized. Specifically, the fault log may be stored in a pre-established index; and responding to the query operation of the user, and displaying the fault log to the user in the form of a topological graph, wherein each node in the topological graph represents each service in the target service link, each node is connected according to the execution sequence of each service in the target service link, and the node information of each node comprises the fault log of the service corresponding to the node.
Fig. 2 is a schematic diagram showing a fault log in the form of a topology diagram provided in the present specification. As shown in fig. 2, the node A, B, C is used to represent three different services A, B, C in a target traffic link, and the arrow between the nodes represents the direction of execution of the target traffic link. In fig. 2, the service C is abnormal and thus is not displayed later. The fault log of the service C corresponding to the fault log is set in the node C as node information at the node C, and when the user browses the node information of the node C, it can be further observed that the fault occurs at the step C2 in the different methods C1, C2 and C3 under the service C, and the fault occurs at the step C22 in the steps C21, C22 and C23 under the method C2. Meanwhile, the detailed information of the step C22 can be displayed for the user so that the user can solve the fault. The node information in the initial interface may be hidden, so that the user may display the node information by clicking the node, or may directly query the information of each node through the operation of central query, which is not specifically limited in this specification.
Fig. 3 is another schematic diagram showing a fault log in the form of a topology provided in the present specification. As shown in fig. 3, nodes A, B, C, D represent different services A, B, C, D, respectively. When the service D is abnormal due to overtime and excessive quantity, the service C which can be reversely pushed to the service D fails.
When the fault detection method provided by the specification is adopted to detect faults generated in the target service link, after the target tracking identification is acquired, the operation data of each service in the target service link corresponding to the target tracking identification can be acquired, the faults generated in each service are respectively determined according to the operation data, and a fault log is generated by combining with a preset public field and fed back to a user. The method can be used for more commonly realizing fault detection in complex links and distributed system application, can intuitively display faults to users, realizes the effect of judging the cause of the faults through pages, and completes decoupling of technicians and operation and maintenance personnel.
The above description provides a fault checking method based on the same thought, and the description also provides a corresponding fault checking device, as shown in fig. 4.
Fig. 4 is a schematic diagram of a fault detection device provided in the present specification, specifically including:
the acquisition module 200 is configured to acquire a target tracking identifier input by a user;
the matching module 202 is configured to determine a target service link corresponding to the target tracking identifier according to a preset correspondence between the tracking identifier and the service link;
the acquisition module 204 is configured to acquire operation data of each service included in the target service link;
a determining module 206, configured to determine, according to the operation data, a fault generated in each service;
and the generating module 208 is configured to generate a fault log according to the fault and a preset common field, where the common field is used to characterize fault information, and feed back the fault log to the user.
Optionally, the collecting module 204 is specifically configured to collect, according to a first specified period, operation data of each service included in the target service link;
the determining module 206 is specifically configured to determine, according to a second specified period, a fault generated in each service;
wherein the second specified period is not less than the first specified period.
Optionally, the determining module 206 is specifically configured to sequentially detect, according to the execution order of the services in the target service link and the operation data, whether a fault occurs in each service included in the target service link; and when determining the fault generated in any service in the target service link, terminating detection.
Optionally, the determining module 206 is specifically configured to determine, for each service included in the target service link in turn, that the service fails when detecting that the service has the first type of abnormality; and when detecting that the service has the abnormality of the second type, determining that the service fails in the previous service in the target business link.
Optionally, the common field includes at least one of a time of failure, a trace identifier, a service description, an error code, an error description, and a calling direction.
Optionally, the generating module 208 is specifically configured to fill a preset field value of each common field according to the fault information of the fault; generating a fault log according to the public fields and the field values of the public fields; the fault log is stored in a pre-established index to enable the user to query the index.
Optionally, the generating module 208 is specifically configured to store the fault log in a pre-established index; and responding to the query operation of the user, and displaying the fault log to the user in the form of a topological graph, wherein each node in the topological graph represents each service in the target service link, each node is connected according to the execution sequence of each service in the target service link, and the node information of each node comprises the fault log of the service corresponding to the node.
When the fault detection method or device provided in the present specification is adopted, a common field may be preset as a contractor, and tools such as filebeat, kafka may be deployed in the system of each service. When the fault detection method provided by the specification is applied, a server can be additionally arranged as a central node for executing the method and is communicated with each service; when the fault checking device provided by the specification is applied, the device can be deployed in all services, communication between the devices can be realized, and any device can be used as a central node.
Fig. 5 is a schematic structural diagram of the fault checking device provided in the present disclosure when in use. As shown in fig. 5, the edge node and the center node may be fault detection devices provided in the present disclosure, and may be deployed under different services. The fault checking device has different functions when being used as an edge node and a central node, but the structure is identical to the achievable function, and the two functions can be exchanged at any time, so that any fault checking device can be set as the central node according to different requirements. The edge node needs to collect the log generated by the service in operation, namely operation data, and sends the log to the center node; the central node collects operation data of services of each node including the central node, analyzes and generates fault logs. The operation data can be collected through a filecoat tool, the edge node and the central node can interact through an http interface, the central node can generate a fault log through kafka, and the generated fault log can be stored in an ES index library.
Since the operational data of each service in one service link needs to be consumed at the central node, there may be a problem that the data is insufficient in production and consumption. To address this problem, the present specification provides several embodiments herein for reference. For example, the http interface provided can be set as a batch interface, and the fault logs of the number can be produced at one time when the conditions allow according to the actual situations such as CPU memory, core number, service demand and the like; for another example, a separate thread pool may be enabled for the http interface, not shared with other thread pools; for another example, the processing pressure of a single service may be apportioned by k8s internal scheduling to different instances; for another example, a big data related component such as a link can be adopted, and the speed of storing kafka consumption data into an ES index library can be accelerated in a mode of improving the parallelism through stream processing.
Furthermore, when the fault checking device provided by the specification is applied, the acquisition and transmission of the tracking mark are very important, and the proceeding direction of the fault checking can be determined only under the condition of definitely tracking the mark. For acquisition and delivery of tracking identifications, several embodiments are provided herein by reference. In particular, on the one hand, as introduced above, the nodes, i.e. the devices, interact between the different services through the http interface. Therefore, when one service sends an http request to the next service in the service link, the tracking identifier may be placed in the header of the protocol header of the request, and the next service may obtain the tracking identifier by using a suitable dopilter method through the interceptor filter. On the other hand, after the service obtains the tracking identifier, the tracking identifier can be directly put into the wireless variable, and in different methods of the same service, the tracking identifier can be directly obtained through the wireless variable.
The present specification also provides a computer readable storage medium storing a computer program operable to perform the troubleshooting method provided in fig. 1 described above.
The present specification also provides a schematic structural diagram of the electronic device shown in fig. 6. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as illustrated in fig. 6, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to implement the fault detection method described in fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
Improvements to one technology can clearly distinguish between improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) and software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A fault detection method, comprising:
acquiring a target tracking identifier input by a user;
determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link;
collecting operation data of each service contained in the target service link;
determining faults generated in each service according to the operation data;
generating a fault log according to the fault and a preset public field, and feeding back the fault log to the user, wherein the public field is used for representing fault information.
2. The method of claim 1, wherein collecting operational data for each service included in the target traffic link comprises:
collecting operation data of each service contained in the target service link according to a first appointed period;
determining faults generated in each service, which specifically comprises:
determining faults generated in each service according to a second designated period;
wherein the second specified period is not less than the first specified period.
3. The method of claim 1, wherein determining faults generated in the services based on the operational data, comprises:
according to the execution sequence of each service in the target service link, sequentially detecting whether faults occur in each service contained in the target service link according to the operation data;
and when determining the fault generated in any service in the target service link, terminating detection.
4. A method according to claim 3, wherein sequentially detecting whether a fault occurs in each service included in the target traffic link based on the operation data, specifically comprises:
for each service contained in the target service link in turn, determining that the service has a fault when detecting that the service has a first type of abnormality;
and when detecting that the service has the abnormality of the second type, determining that the service fails in the previous service in the target business link.
5. The method of claim 1, wherein the common field comprises at least one of a time of failure, a trace identification, a service description, an error code, an error description, and a call direction.
6. The method of claim 1, wherein generating a fault log and feeding back to the user according to the fault and a preset common field, specifically comprises:
filling preset field values of all public fields according to the fault information of the fault;
generating a fault log according to the public fields and the field values of the public fields;
the fault log is stored in a pre-established index to enable the user to query the index.
7. The method of claim 6, wherein storing the fault log in a pre-established index for the user to query the index, comprises:
storing the fault log in a pre-established index;
and responding to the query operation of the user, and displaying the fault log to the user in the form of a topological graph, wherein each node in the topological graph represents each service in the target service link, each node is connected according to the execution sequence of each service in the target service link, and the node information of each node comprises the fault log of the service corresponding to the node.
8. A fault detection device, comprising:
the acquisition module is used for acquiring a target tracking identifier input by a user;
the matching module is used for determining a target service link corresponding to the target tracking identifier according to a preset corresponding relation between the tracking identifier and the service link;
the acquisition module is used for acquiring the operation data of each service contained in the target service link;
the determining module is used for determining faults generated in each service according to the operation data;
the generating module is used for generating a fault log according to the faults and preset public fields and feeding the fault log back to the user, and the public fields are used for representing fault information.
9. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-7 when executing the program.
CN202311712644.9A 2023-12-13 2023-12-13 Fault checking method and device, storage medium and electronic equipment Pending CN117389792A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311712644.9A CN117389792A (en) 2023-12-13 2023-12-13 Fault checking method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311712644.9A CN117389792A (en) 2023-12-13 2023-12-13 Fault checking method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117389792A true CN117389792A (en) 2024-01-12

Family

ID=89463560

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311712644.9A Pending CN117389792A (en) 2023-12-13 2023-12-13 Fault checking method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117389792A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117608912A (en) * 2024-01-24 2024-02-27 之江实验室 Full-automatic log analysis and fault processing system and method based on NLP large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833184A (en) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 Service fault localization method, device, computer equipment and storage medium
CN112491611A (en) * 2020-11-25 2021-03-12 网银在线(北京)科技有限公司 Fault location system, method, apparatus, electronic device and computer readable medium
CN114546825A (en) * 2021-12-30 2022-05-27 中国电信股份有限公司 Fault tracking system, method, electronic device and readable medium
CN114745295A (en) * 2022-04-19 2022-07-12 京东科技控股股份有限公司 Data acquisition method, device, equipment and readable storage medium
CN115357761A (en) * 2022-08-29 2022-11-18 建信金融科技有限责任公司 Link tracking method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833184A (en) * 2018-06-29 2018-11-16 腾讯科技(深圳)有限公司 Service fault localization method, device, computer equipment and storage medium
CN112491611A (en) * 2020-11-25 2021-03-12 网银在线(北京)科技有限公司 Fault location system, method, apparatus, electronic device and computer readable medium
CN114546825A (en) * 2021-12-30 2022-05-27 中国电信股份有限公司 Fault tracking system, method, electronic device and readable medium
CN114745295A (en) * 2022-04-19 2022-07-12 京东科技控股股份有限公司 Data acquisition method, device, equipment and readable storage medium
CN115357761A (en) * 2022-08-29 2022-11-18 建信金融科技有限责任公司 Link tracking method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
王伟德: "elasticsearch数据分析与实战应用", 30 September 2021, 中国铁道出版社有限公司, pages: 7 *
蔡鸿明等: "互联网时代的软件工程", 30 November 2021, 上海交通大学出版社, pages: 234 *
詹庆东: "大学城图书馆联盟建设新模式研究", 30 June 2016, 海洋出版社, pages: 59 - 60 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117608912A (en) * 2024-01-24 2024-02-27 之江实验室 Full-automatic log analysis and fault processing system and method based on NLP large model

Similar Documents

Publication Publication Date Title
CN109947746B (en) Data quality control method and system based on ETL flow
CN106656536B (en) Method and equipment for processing service calling information
CN117389792A (en) Fault checking method and device, storage medium and electronic equipment
CN107066519B (en) Task detection method and device
CN116719698A (en) Identification method and device for index abnormality reasons
EP2733662A1 (en) Dynamic infrastructure administration system
CN111240876B (en) Fault positioning method and device for micro-service, storage medium and terminal
CN104750585A (en) Terminal detecting method and terminal detecting device
CN114745295A (en) Data acquisition method, device, equipment and readable storage medium
CN105260290A (en) Application exception information collection method and apparatus
CN109902030B (en) Method for recording and replaying automatic test steps of mobile phone application program
CN108376110A (en) A kind of automatic testing method, system and terminal device
CN111061802A (en) Power data management processing method and device and storage medium
CN112965882B (en) Data fault analysis method and device
CN114238081A (en) Method and system suitable for small satellite batch test
CN112162908B (en) Method and device for realizing program call link monitoring based on byte code injection technology
CN111858355B (en) Test case processing method and device, computer equipment and readable storage medium
CN116048977B (en) Test method and device based on data reduction
CN116136801B (en) Cloud platform data processing method and device, electronic equipment and storage medium
CN107704362A (en) A kind of method and device based on Ambari monitoring big data components
CN112685370A (en) Log collection method, device, equipment and medium
CN111930377A (en) Topological relation display method and device, server and storage medium
CN109324951A (en) The acquisition methods and device of hard disk information in server
CN116866161A (en) Abnormality positioning method and device, storage medium and electronic equipment
US8352959B2 (en) Apparatus, system, and method for non-intrusive monitoring of business events

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination