CN113094200B - Application program fault prediction method and device - Google Patents

Application program fault prediction method and device Download PDF

Info

Publication number
CN113094200B
CN113094200B CN202110633968.8A CN202110633968A CN113094200B CN 113094200 B CN113094200 B CN 113094200B CN 202110633968 A CN202110633968 A CN 202110633968A CN 113094200 B CN113094200 B CN 113094200B
Authority
CN
China
Prior art keywords
log text
log
word segmentation
sample
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110633968.8A
Other languages
Chinese (zh)
Other versions
CN113094200A (en
Inventor
秦天柱
罗家润
刘楚蓉
谢宗兴
徐逸扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110633968.8A priority Critical patent/CN113094200B/en
Publication of CN113094200A publication Critical patent/CN113094200A/en
Application granted granted Critical
Publication of CN113094200B publication Critical patent/CN113094200B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0718Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in an object-oriented system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a method and a device for predicting faults of an application program, and relates to the field of artificial intelligence, wherein the method comprises the following steps: acquiring a target running log text corresponding to a preset time window of an application program to be detected; performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text; calling a target fault prediction model to carry out fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected; the target fault prediction model is obtained by performing constraint training of fault prediction on the initial prediction model based on a sample log text set corresponding to a preset time window and a matched fault label; the sample log text set and the fault label are generated based on-line problem feedback data corresponding to a preset time window and sample log text corresponding to the on-line problem feedback data. The method and the device can assist developers to solve online problems in advance under the condition that the users are not sensitive, and improve user experience.

Description

Application program fault prediction method and device
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method and an apparatus for predicting a failure of an application.
Background
Under the influence of network environment, configuration selection, user standards, test limitations and the like, functional faults inevitably exist in the running process of the application program, and then a plurality of online problems are generated. In order to timely troubleshoot functional failures of application programs and avoid bad experience caused by related online problems, the online problems need to be located as early as possible, so that developers can know the problems and the root causes of tracing failures in time. However, in the prior art, online problem early warning and corresponding fault prediction are usually performed based on the priori knowledge of developers, and the defects of subjective interference, experience limitation, hysteresis and the like exist.
Therefore, it is desirable to provide an effective fault early warning scheme to solve the problems in the prior art.
Disclosure of Invention
The application provides a fault prediction method and device of an application program, which can perform fault prediction based on log text data of the application program so as to realize online problem early warning of the application program.
In one aspect, the present application provides a method for predicting a failure of an application, the method including:
acquiring a target running log text corresponding to a preset time window of an application program to be detected;
performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text;
calling a target fault prediction model to carry out fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected;
the target fault prediction model is obtained by performing constraint training of fault prediction on an initial prediction model based on a sample log text set corresponding to the preset time window and the matched fault label; the sample log text set and the fault label are generated based on the on-line problem feedback data corresponding to the preset time window and the sample log text corresponding to the on-line problem feedback data.
Another aspect provides a failure prediction apparatus for an application, the apparatus including:
a log text acquisition module: the method comprises the steps of obtaining a target running log text corresponding to a preset time window of an application program to be detected;
a word segmentation processing module: the log text word segmentation processing module is used for carrying out word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text;
a failure prediction module: the log text word segmentation sequence is used for carrying out fault prediction on the log text word segmentation sequence by calling a target fault prediction model to obtain a fault prediction result of the application program to be detected;
the target fault prediction model is obtained by performing constraint training of fault prediction on an initial prediction model based on a sample log text set corresponding to the preset time window and the matched fault label; the sample log text set and the fault label are generated based on the on-line problem feedback data corresponding to the preset time window and the sample log text corresponding to the on-line problem feedback data.
Another aspect provides a failure prediction device for an application program, the device comprising a processor and a memory, the memory having stored therein at least one instruction or at least one program, the at least one instruction or the at least one program being loaded and executed by the processor to implement the failure prediction method for an application program as described above.
Another aspect provides a computer-readable storage medium, in which at least one instruction or at least one program is stored, and the at least one instruction or the at least one program is loaded and executed by a processor to implement the failure prediction method of an application program as described above.
The application program fault prediction method, device, equipment and storage medium have the following technical effects:
the method comprises the steps of obtaining a target running log text corresponding to a preset time window of an application program to be detected; performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text; and calling a target fault prediction model to perform fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected, so that corresponding fault prediction and early warning can be realized before large-scale exposure of online problems, developers can be assisted to position and troubleshoot online problems in advance under the condition that users are not sensitive, and user experience is improved. And model training data are generated based on-line problem feedback data of the application program and corresponding sample log texts, the acquisition mode of the marked data is simple and convenient, strong correlation exists between the marked data and the on-line problems, a large number of support rules and complex knowledge maps do not need to be constructed, and the accuracy of model prediction and the training efficiency can be effectively improved.
Drawings
In order to more clearly illustrate the technical solutions and advantages of the embodiments of the present application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic diagram of an application environment provided by an embodiment of the present application;
FIG. 2 is a block chain system according to an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating a failure prediction method for an application according to an embodiment of the present application;
FIG. 4 is a word provided in an embodiment of the present applicationiA schematic projected in a high-dimensional vector;
FIG. 5 is a pair word provided in an embodiment of the present applicationiA schematic diagram of performing dimension expansion;
FIG. 6 is a pair word provided in an embodiment of the present applicationiThe feature vector of (a) is split;
FIG. 7 is a flowchart illustrating a method for training a target failure prediction model according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of an overall framework of a failure prediction method for an application according to an embodiment of the present disclosure;
fig. 9 is a block diagram of a failure prediction apparatus of an application according to an embodiment of the present application;
fig. 10 is a hardware block diagram of a server in the failure prediction method for an application according to the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
And (5) Bert: bidirectional Encoder representation from transforms, transformer-based bi-directional Encoder representation techniques, pre-training techniques for natural language processing.
NLP: nature Language Process, natural Language processing.
The on-line problem is as follows: problems may exist in the application and be perceived by the user.
Knowledge graph: knowledge base of semantic network.
Flink: apache Flink is an open source stream processing framework developed by the Apache software foundation, and the core of the framework is a distributed stream data stream engine written in Java and Scala. Flink executes arbitrary stream data programs in a data parallel and pipelined manner, and Flink's pipelined runtime system can execute batch and stream processing programs. In addition, the runtime of Flink itself supports the execution of iterative algorithms.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
In recent years, with research and development of artificial intelligence technology, artificial intelligence technology is widely applied in a plurality of fields, and the scheme provided by the embodiment of the application relates to technologies such as machine learning/deep learning and natural language processing of artificial intelligence, and is specifically described by the following embodiments:
referring to fig. 1, fig. 1 is a schematic diagram of an application environment provided in an embodiment of the present application, as shown in fig. 1, the application environment may include at least a server 01 and a terminal 02, and the server 01 and the terminal 02 may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In this embodiment, the server 01 may include an independently operating server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. In addition, a plurality of servers can be grouped into a block chain, and the servers are nodes on the block chain. Specifically, the server 01 may include a network communication unit, a processor, a memory, and the like.
Specifically, Cloud technology (Cloud technology) refers to a hosting technology for unifying serial resources such as hardware, software, and network in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data. It distributes the calculation task on the resource pool formed by a large number of computers, so that various application systems can obtain the calculation power, the storage space and the information service according to the requirements. The network that provides the resources is referred to as the "cloud". Among them, the artificial intelligence cloud Service is also generally called AIaaS (AI as a Service, chinese). The method is a service mode of an artificial intelligence platform, and particularly, the AIaaS platform splits several types of common AI services and provides independent or packaged services at a cloud. This service model is similar to the one opened in an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface), and part of the qualified developers can also use an AI framework and an AI infrastructure provided by the platform to deploy and operate and maintain the self-dedicated cloud artificial intelligence services.
Specifically, please refer to fig. 2, wherein fig. 2 is a schematic structural diagram of a block chain system according to an embodiment of the present disclosure. The server 01 may be a node 200 in the distributed system 100, where the distributed system may be a blockchain system, the blockchain system may be a distributed system formed by connecting a plurality of nodes in a network communication manner, the nodes may form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computer device, such as the server 01, the terminal 02, and other electronic devices, may become a node in the blockchain system by joining the Peer-To-Peer network, where the blockchain includes a series of blocks (blocks) that are consecutive in a chronological order of generation, and once a new Block is joined To the blockchain, the new Block is not removed, and the recorded data submitted by the nodes in the blockchain system is recorded in the blocks.
The blockchain is an emerging application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like, is a decentralized database in nature, and is a string of data blocks which are generated by using a cryptographic method in a correlation manner, wherein each data block contains information of a batch of network transactions, and the information is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer. The block chain underlying platform can comprise processing modules such as user management, basic service, intelligent contract and operation monitoring. The user management module is responsible for identity information management of all blockchain participants, and comprises public and private key generation maintenance (account management), key management, user real identity and blockchain address corresponding relation maintenance (authority management) and the like, and under the authorization condition, the user management module supervises and audits the transaction condition of certain real identities and provides rule configuration (wind control audit) of risk control; the basic service module is deployed on all block chain node equipment and used for verifying the validity of the service request, recording the service request to storage after consensus on the valid request is completed, for a new service request, the basic service firstly performs interface adaptation analysis and authentication processing (interface adaptation), then encrypts service information (consensus management) through a consensus algorithm, transmits the service information to a shared account (network communication) completely and consistently after encryption, and performs recording and storage; the intelligent contract module is responsible for registering and issuing contracts, triggering the contracts and executing the contracts, developers can define contract logics through a certain programming language, issue the contract logics to a block chain (contract registration), call keys or other event triggering and executing according to the logics of contract clauses, complete the contract logics and simultaneously provide the function of upgrading and canceling the contracts; the operation monitoring module is mainly responsible for deployment, configuration modification, contract setting, cloud adaptation in the product release process and visual output of real-time states in product operation, such as: alarm, monitoring network conditions, monitoring node equipment health status, and the like. The platform product service layer provides basic capability and an implementation framework of typical application, and developers can complete block chain implementation of business logic based on the basic capability and the characteristics of the superposed business. The application service layer provides the application service based on the block chain scheme for the business participants to use.
Specifically, the server 01 may be configured to provide a failure prediction service for the application to be detected, so as to generate a corresponding failure prediction result. Alternatively, training learning of the initial prediction model may be performed to generate the target failure prediction model.
In this embodiment, the terminal 02 may include a smart phone, a desktop computer, a tablet computer, a notebook computer, a digital assistant, an Augmented Reality (AR)/Virtual Reality (VR) device, an intelligent wearable device, a vehicle-mounted terminal, and other types of physical devices, but is not limited thereto, and may also include software running in the physical devices, such as an application program. The operating system running on terminal 02 in this embodiment of the present application may include, but is not limited to, an android system, an IOS system, linux, windows, and the like.
In this embodiment, the terminal 02 may be configured to send the operation log data of the application to be detected to the server 01, so that the server 01 provides a fault prediction service based on the operation log data, and may also be configured to receive online problem feedback data submitted by a user based on an online problem of the application to be detected, and send the online problem feedback data and a corresponding sample log text to the server 01, so that the server 01 performs training of an initial prediction model based on the online problem feedback data and the corresponding sample log text. In the embodiment of the present application, the online problem may be a problem that may exist in the application and is perceived by the user.
In addition, it should be noted that fig. 1 illustrates only an application environment of the failure prediction method and apparatus for an application program, and in practical applications, the application environment may include more or less nodes, and the application is not limited in this application.
An application program is on-line at a PC end or a mobile end, the outside is influenced by a network environment, use configuration, a user standard and upstream and downstream industries, the inside is limited by product design, operation planning, mechanism arrangement and program test, operation faults or functional faults can occur, and then on-line problems are generated, so that user experience is influenced. Therefore, in order to assist developers to find problems and prevent problems as soon as possible, or to find and solve problems online at an early stage, an effective fault early warning scheme is required. The method for predicting the failure of an application provided by the present application is described below with reference to fig. 3, and fig. 3 is a flow chart illustrating the method for predicting the failure of an application provided by the present application, which provides the method operation steps as an embodiment or a flow chart, but may include more or less operation steps based on conventional or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In practice, the system or server product may be implemented in a sequential or parallel manner (e.g., parallel processor or multi-threaded environment) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 3, the method may include:
s301: and acquiring a target running log text corresponding to a preset time window of the application program to be detected.
In the embodiment of the application, the application program to be detected is a program which may cause an online problem, and may be run on a PC end or a mobile end, etc. The target operation log text can be a log text obtained by preprocessing historical operation log data or real-time operation log data of the application program to be detected. In one embodiment, real-time running log data of an application program to be detected is obtained, and the real-time running log data is preprocessed to obtain a target running log text. The preprocessing method includes but is not limited to stream processing and the like.
Accordingly, step S301 may include:
s3011: and acquiring running log data of the application program to be detected.
S3013: and performing stream processing on the running log data to obtain a target running log text corresponding to a preset time window.
In practical application, when the application program to be detected runs, running log data can be generated, the running log data can be reported to a log pool of a server by combining a pre-constructed data channel based on modes such as a card and stream processing, and the stream processing mode can include but is not limited to Flink stream processing and the like. And then, performing stream processing on the running log data in the log pool to clean and filter the running log data and the like, further removing the log texts without information to obtain initial log texts, and performing time-interval aggregation on the initial log texts by taking a preset time window as an aggregation window based on the time sequence to obtain target running log texts corresponding to the preset time window. Specifically, the duration of the preset time window may be set based on actual requirements, for example, the preset time window may be 10min, 30min or 1 h.
In practical application, the application program to be detected may run on one or more terminals, the running log data may be derived from the one or more terminals, and correspondingly, the target running log text is derived from the running log data reported by the one or more terminals corresponding to the preset time window. The running log data may also be derived from the developer's reported data.
In practical application, the operation log data may be full operation log data of the application program to be detected, or the application program to be detected may include a plurality of subsystems, the operation log data may be log data corresponding to one or several of the subsystems to perform directional fault prediction of the relevant subsystems, or the operation log data may be log data for a specific service, activity or time period to perform corresponding fault prediction.
S303: and performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text.
In this embodiment of the application, the word segmentation processing of the target running log text may be performed based on a word segmentation mode in the existing natural language processing, and the target running log text is split into smaller granularities, for example, word segmentation modes such as word granularity, or sub-word granularity, such as word segmentation processing based on a wordpice Tokenization mode. WordPiece Tokenization is a word segmentation method for segmenting a text into subword granularities, and is to determine the possibility of forming a word segmentation based on the occurrence frequency of continuous byte pairs and further segment the text into subwords according to the possibility. Specifically, word segmentation processing can be performed by means of a word segmentation tool, and a target running log text can also be input into a preset word segmentation model to obtain a log text word segmentation sequence. In some cases, a word segmentation network may also be set in the target fault prediction model, and the word segmentation network of the target fault prediction model is used to perform word segmentation representation on the input target running log text to obtain a log text word segmentation sequence, thereby implementing end-to-end fault prediction.
In one embodiment, the word segmentation processing method may include: and performing the processing steps of punctuation removal, lower case transfer, digit removal, word root removal and the like on the target running log text. For example, the target operation log text is, for example, "[ business. cpp:34] waiting: Database unlinked error", the "business. cpp 34 waiting Database unlinked error" is obtained after the point de-marking process, the "business. cpp 34 warming Database unlinked error" is obtained after the lower writing process, the "business. cpp warming Database unlinked error" is obtained after the digital de-marking process, and the "business. cpp warming Database unlinked error" is obtained after the root word process. In some cases, part-of-speech tagging, Batch supplementation or the like is also performed on the target running log text, such as using [ UNK ] as a new word outside the dictionary, [ PAD ] as Batch supplementation or the like, so as to generate a log text word segmentation sequence with a preset length.
It should be noted that a preset word segmentation method may be called to perform word segmentation on the target running log text, and a pre-trained word segmentation model may also be called to perform word segmentation. Or integrating a preset word segmentation method or a word segmentation model in the following target fault prediction model to serve as a network layer of the model to realize end-to-end processing of fault prediction.
S305: and calling a target fault prediction model to perform fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected.
In the embodiment of the application, the target fault prediction model is a model obtained by performing constraint training of fault prediction on the initial prediction model based on a sample log text set corresponding to a preset time window and a matched fault label; the sample log text set and the fault label are generated based on-line problem feedback data corresponding to a preset time window and sample log text corresponding to the on-line problem feedback data. The initial predictive model may be a pre-trained language model.
In practical application, after the application program to be detected is applied online, some online problems may be generated due to a functional failure, the online problem feedback data is feedback data submitted by a user for the online problem of the application program to be detected, and the online problem feedback data may include, but is not limited to, subjective evaluation (such as rating or ranking of the severity of the online problem) and description text information of the user. Specifically, the application program to be detected is preset with a user feedback channel, when an online problem occurs, a user feedback interface corresponding to the user feedback channel is displayed, and the feedback interface may include an evaluation option, a scoring option, a problem classification option, a problem description window and the like so as to receive feedback of a user.
In practical applications, the sample log text corresponding to the on-line problem feedback data may be a log text corresponding to the submission time of the on-line problem feedback data. In the running process of the application program to be detected, corresponding problem log data can be generated when an online problem occurs, based on the occurrence time of the online problem or the submission time of feedback data, corresponding running log data can be obtained in a background log system, and then a sample log text is obtained through preprocessing. It should be noted that the log text corresponding to the occurrence time point of the online problem or the submission time point of the feedback data may be obtained, or the log text corresponding to the occurrence time period of the online problem or the submission time period of the feedback data may also be obtained. The sample log text includes log text corresponding to the problem log data, and may also include log text corresponding to normal log data before and after the problem log data.
In the embodiment of the application, the failure prediction result is used for representing the possibility that the application to be detected has an operation failure, that is, the possibility that the application to be detected has an online problem, and includes whether the application to be detected has a failure problem, the probability of the failure problem, or a failure index. Specifically, a probability threshold or a fault index threshold for the occurrence of the fault problem may be set, and when the probability of the occurrence of the fault problem in the fault prediction result is greater than the probability threshold or when the fault index is greater than the fault index threshold, it is determined that the fault problem exists in the application to be detected, so as to perform online problem early warning. And outputting a prediction result aiming at each target operation log text so as to perform online problem early warning on each target operation log text.
In some cases, when analyzing the full log run data of an application to be detected, the analysis is performed, by stream processing, the full-scale running log data may be processed into full-scale running log texts, and the full-scale running log texts may be aggregated into target running log texts in units of a preset time window based on the timing, such as target operation log texts corresponding to the first time window (T1), target operation log texts corresponding to the second time window (T2), and the like, respectively inputting the target operation log texts corresponding to the preset time windows into the target fault prediction model, and obtaining a prediction result corresponding to the target operation log text of each preset time window, wherein if the target operation log text corresponding to T1 has a fault problem, the probability of the fault occurrence is 78%, the target operation log text corresponding to T2 has no fault problem, and the probability of the fault occurrence is 0%. And generating a fault prediction result of the application program to be detected according to a preset result of each target operation log text.
In other cases, when log operation data corresponding to one or more subsystems of the application program to be detected is analyzed, operation log texts corresponding to the log operation data of each subsystem can be aggregated by taking a preset time window as a unit, so as to obtain target operation log texts, and further fault prediction of each subsystem is performed, for example, a fault problem exists in some target log texts of the first subsystem, or a fault prediction result of the first subsystem is a fault problem, and a fault index is 0.9 (the value range of the fault index is 0-1), and a fault prediction result of the application program to be detected can be generated according to the fault prediction result of each subsystem.
In summary, the target running log text corresponding to the preset time window of the application program to be detected is obtained; performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text; and calling a target fault prediction model to perform fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected, so that corresponding fault prediction and early warning can be realized before large-scale exposure of online problems, developers can be assisted to position and troubleshoot online problems in advance under the condition that users are not sensitive, and user experience is improved. And model training data are generated based on-line problem feedback data of the application program and corresponding sample log texts, the acquisition mode of the marked data is simple and convenient, strong correlation exists between the marked data and the on-line problems, a large number of support rules and complex knowledge maps do not need to be constructed, and the accuracy of model prediction and the training efficiency can be effectively improved.
In this embodiment of the application, the target failure prediction model includes an embedded network and a feature encoder, and accordingly, step S305 may include:
s3051: and coding the log text word segmentation sequence by utilizing an embedded network of the target fault prediction model to obtain a log word segmentation vector sequence corresponding to the log text word segmentation sequence.
In practical application, the log text word segmentation sequence is input into the embedded network, and the vectorization representation is realized by encoding the log text word segmentation sequence. In some embodiments, the embedded network includes a first embedded layer and a second embedded layer, and step S3051 may include:
s30511: and performing word embedding processing on each log word in the log text word segmentation sequence through the first embedding layer to obtain a word vector of each log word.
S30512: and performing position embedding processing on each log word segmentation based on the position of each log word segmentation in the log text word segmentation sequence through a second embedding layer to obtain a position vector of each log word segmentation.
S30513: and performing fusion processing on the word vector and the position vector of each log word segmentation to obtain a log word segmentation vector sequence.
Specifically, the first embedding layer may simulate projection of each log word segmentation in a high-dimensional vector by using a neural network, so as to implement word embedding processing of each log word segmentation, and obtain a corresponding word vector. The second embedding layer can record the position of each log word in the log text word segmentation sequence, and carry out vectorization representation on the position of each log word segmentation to obtain a position vector.
In one embodiment, the fusion process performed on the word vector and the position vector of each log participle may be, for example, an addition process. For example, the vector [ X11, X12, X13, X14] and the vector [ X21, X22, X23, X24] may be added to obtain a vector [ X11+ X21, X12+ X22, X13+ X23, X14+ X24 ]. Specifically, the word vectors and the position vectors of the log participles are added to obtain the log participle vector of each log participle, and a log participle vector sequence is generated based on the sequencing of the log participles and the log participle vectors. By combining the word vectors and the position vectors of the word segmentation of each log, the original meaning of the word segmentation can be kept, the position information of the word segmentation in the sentence is also kept, and the whole semantic meaning of the text can be interpreted more completely.
In one embodiment, the initial prediction model may be a BERT model, which is a transform-based pre-training language model proposed by google, and the set of pre-training parameters is generated by unsupervised training with a large amount of basic corpus. According to different application scenarios, such as in a prediction scenario using a log text, the BERT model may be a BERT model pre-trained using a corpus of the log text, or may be a BERT model pre-trained based on other chinese/english corpuses. Because the BERT model adopts a plurality of layers of transformers to carry out bidirectional learning on the text, and the transformers adopt a one-time reading mode to extract the text, the context relationship between words in the text can be more accurately learned, the semantics can be more deeply understood, namely, the semantics can be more deeply understood by the bidirectional training language model than the unidirectional language model, and the characteristic extraction can be accurately carried out on the log text.
Correspondingly, the embedded network can correspond to an Embedding network of the BERT model, the first embedded layer and the second embedded layer can respectively correspond to a word Embedding layer and a position Embedding layer of the Embedding network, and by combining the word Embedding layer and the position Embedding layer, the meaning of the participle is reserved, the position information of the participle in the sentence is reserved, and the whole semantic understanding of the input text is well interpreted. For example, the input sequence of log text participles may be, for example, word1/word2/word3/…/wordiInputting the word segmentation sequence of the log text into the word embedding layer to perform word embedding processing to obtain a word embedding vector, please refer to fig. 4, where fig. 4 shows a word embedding vectoriSchematic projected in a high-dimensional vector, word in the diagramiThe corresponding word embedding vector is [ X ]1,i, X2,i, X3,i, X4,i, X5,i,…,Xn,i]. Further, the log text is processedThe word segmentation sequence is input into the position embedding layer for position embedding processing, word1Can be characterized as X1,word2Can be characterized as X2By analogy, wordiCan be characterized as Xi
It should be noted that, the target running log text may not be subjected to segmentation processing, and then the embedded network of the initial prediction model may not set a segmentation embedding layer in the conventional BERT model, and also does not need to set a type embedding layer, and further does not need to perform training data processing and training operations related to the network layer, so that not only is the training workload reduced, but also the uncertainty and the external dependence of the model are effectively reduced.
S3052: and performing self-attention processing on the log word segmentation vector sequence by using a feature encoder of the target fault prediction model to obtain a self-attention vector corresponding to the log word segmentation vector sequence.
In practical application, the generated log word segmentation vector sequence is input into a feature encoder to be subjected to self-attention processing. In some embodiments, the feature encoder includes a vector transform layer and a self attention layer, and accordingly, step S3052 may include:
s30521: and performing weight matrix conversion on each participle vector in the log participle vector sequence by using a vector conversion layer to obtain a characteristic vector sequence.
In some cases, a vector transformation layer is used for introducing three weight matrixes of Queries, Keys and Values to perform weight matrix transformation on each participle vector, the participle vectors are respectively multiplied by the weight matrixes to obtain corresponding feature vectors, and a feature vector sequence is generated based on the feature vectors corresponding to the participle vectors.
S30522: and performing self-attention processing on the feature vector sequence by using a self-attention layer to obtain a self-attention vector.
In practical applications, the self-attention layer may be constructed based on a scaled dot product attention mechanism or a multi-head self-attention mechanism. In some embodiments, the self-attention layer is a network layer constructed based on a multi-head self-attention mechanism, that is, the self-attention layer may be a multi-head self-attention layer, and accordingly, step S30522 may include: and utilizing a self-attention layer to divide each feature vector in the feature vector sequence into a preset number of feature sub-vectors, performing multi-head self-attention calculation on each feature sub-vector corresponding to each feature vector based on a multi-head self-attention mechanism to obtain a self-attention value of each feature sub-vector, and generating the self-attention vector according to the self-attention value of each feature sub-vector.
Specifically, each feature vector may be segmented to obtain two or more corresponding feature sub-vectors, and the obtained feature sub-vectors are further subjected to stitching processing and self-attention value calculation to obtain a self-attention vector.
Taking an initial prediction model as a BERT model as an example, a vector transformation layer can correspond to a transform Encoder layer of the BERT model, a log word segmentation vector sequence is input into the layer to perform dimension expansion on each log word segmentation vector, and the log word segmentation vectors are projected to three dimensions of Key, Query and Value by utilizing three weight matrixes of Queries, Keys and Value to generate a feature vector sequence. Referring to FIG. 5, FIG. 5 shows a word pairiSchematic diagram for performing dimension expansion. Further, the Multi-Head self-Attention layer may correspond to a Multi Head Attention layer of the BERT model, into which a sequence of feature vectors is input to segment each feature vector, and to concatenate and self-Attention calculate the generated feature sub-vectors. Referring to FIG. 6, FIG. 6 shows a pair of wordsiThe feature vector of (2) is split, and the horizontal line in the graph represents the vector splitting position. It is understood that the preset number of values and the length of each feature sub-vector may be set according to actual requirements, for example, the feature vectors may be divided into equal lengths.
Further, the multi-head self-attention layer may perform self-attention processing of the feature sub-vectors based on the following expressions (one) to (three). Wherein the headhCharacterizing feature subvectors, Multihead (Q, K, V) characterizing a self-Attention vector, Attention (Q)i,Ki,Vi) Characterizing headhThe self-attention value can be calculated by the formula (III), dkCharacterize the dimension of K.
Figure 852256DEST_PATH_IMAGE001
S3053: and carrying out forward propagation processing on the self-attention vector to obtain a fault prediction result.
In practical application, the target fault prediction model can use a feedforward mode to perform forward propagation processing on the self-attention vector output by the feature encoder by using a subsequent network layer of the target fault prediction model, so as to derive a fault prediction result. Specifically, the self-attention vector may be input to the next-stage network of the feature encoder in the target failure prediction model for forward propagation processing. In some embodiments, the next level network may include a feed forward layer or a fully connected layer, etc.
It should be noted that the initial prediction model in the present application is not limited to the BERT model described above, and may be other similar models such as RNN (Recurrent Neural Network) and other transform-like models, which is not limited in the present application.
When the application program to be detected has a fault problem based on the fault prediction result, or the probability of the fault problem or the fault index is higher than the corresponding threshold value, the early warning information can be sent, and the corresponding running log text is marked and stored, so that developers can further position, analyze and solve the corresponding fault problem. For example, the log text may be input into a fault problem classification model to identify the type of fault problem that may occur, and so on.
Based on some or all of the foregoing embodiments, an embodiment of the present application further provides a method for training a target fault prediction model, please refer to fig. 7, where fig. 7 shows a flowchart of the method for training a target fault prediction model provided in the embodiment of the present application, and as shown in fig. 7, the method for training may include:
s501: and acquiring a sample training set, wherein the sample training set comprises a plurality of sample log text sets corresponding to the preset time window and fault labels matched with the sample log text sets.
In practical application, the time duration corresponding to each sample log text set is a preset time window, for example, if the preset time window is 1h, the time interval between the sample log text with the earliest generation time and the sample log text with the latest generation time in each sample log text set is 1 h. And the sample log text sets correspond to the fault labels one by one. The fault label is used for representing the possibility that the application program to be detected has an operation fault in a preset time window corresponding to the sample log text set, and the possibility may be whether the application program to be detected has a fault problem or not, or the possibility that the application program to be detected has a fault problem or a fault index, and the fault index may be a numerical value between 0 and 1. It can be understood that the larger the probability value or the fault indicator is, the more likely the fault problem is to occur or the more serious the defect of the application to be detected is.
In practical applications, step S501 may include:
s5011: and acquiring online problem feedback data submitted by a user based on an online problem and a sample log text corresponding to the submission time of the online problem feedback data.
In practical applications, the sample log text may be obtained by preprocessing the running log data corresponding to the submission time of the online problem, and the preprocessing manner is similar to the foregoing implementation manner in step S403, and is not described herein again.
S5012: and based on the time sequence and by taking a preset time window as an aggregation window, carrying out aggregation processing on the sample log texts to obtain a sample log text set corresponding to the preset time window.
In practical application, the sample log texts can be sequenced based on time sequence, and the acquired sample log texts are subjected to time-interval aggregation in a preset time window, so that a plurality of sample log text sets are generated.
S5013: and aggregating the on-line problem feedback data corresponding to the sample log text set to obtain a feedback data aggregation result.
In practical application, the on-line problem feedback data corresponding to each sample log text in each sample log text set can be determined based on the corresponding relationship between the sample log text and the on-line problem feedback data, and the on-line problem feedback data and the sample log text set are subjected to aggregation processing. Or determining the on-line problem feedback data in the time period corresponding to the preset time window based on the preset time window corresponding to the sample log text set, and performing aggregation processing on the on-line problem feedback data. The aggregation process of the feedback data may be, for example, counting the total number or average of the on-line problem feedback data, or the like.
As previously mentioned, the on-line question feedback data includes, but is not limited to, a user's subjective assessment (e.g., a rating or ranking of the severity of the on-line question) and descriptive textual information, among others. In some embodiments, the descriptive text information may also be subjected to a numerical processing to obtain a severity score, a grade, a classification, and the like of the corresponding online problem, specifically, a numerical processing manner may include, but is not limited to, a scoring scheme of emotion analysis, topic analysis, deep learning, and the like, and the application is not limited herein.
In some cases, the feedback data aggregation result corresponding to the sample log text set is the total feedback number of the on-line problem feedback data, and the number may be the total number of submissions of the user in a preset time window corresponding to the sample log text set, that is, the actual total feedback number of the on-line problem feedback data; or the number may be a weighted sum of the number of submissions or the number of feedback data based on the severity scores or grades of the individual submitted online questions obtained above, it being understood that the higher the severity score or grade, the higher the corresponding weighting factor, the greater the contribution to the total number of feedback.
In other cases, the feedback data aggregation result is an average value of feedback quantities of the feedback data with the online problem, such as the feedback data quantity in unit time, which is similar to the above, and an actual quantity or a weighted average quantity may also be used, which is not described herein again. Or the mean may be a mean or weighted mean of the severity scores or ratings of the on-line problem, or a combination of the amount of feedback and the severity score/rating.
S5014: and generating a fault label matched with the sample log text set according to the feedback data aggregation result.
In practical application, when the feedback data aggregation result is the feedback number, the obtained feedback data aggregation results corresponding to the multiple sample log text sets may be subjected to value assigning processing, normalization processing, or the like, so as to obtain a corresponding relationship between the feedback data aggregation result and whether an online problem exists, a probability of a fault problem occurring, or a fault index, and further obtain a fault label. When the feedback data aggregation result is a score or a grade, the feedback data result can be used as the fault label.
S503: and performing word segmentation processing and labeling processing on each sample log text in the sample log text set to obtain each sample word segmentation sequence corresponding to each sample log text.
In practical applications, step S503 may include:
s5031: and marking each sample log text in the corresponding sample log text set according to the feedback data aggregation result to obtain each corresponding marked log text.
In practical application, each sample log text can be directly labeled by the feedback data aggregation result, for example, the feedback number or the average value thereof is labeled. And labeling each sample log text by adopting a fault label obtained according to the feedback data aggregation result. It can be appreciated that the labels of the sample log texts in the same sample log text set are the same. In some embodiments, the sample log text may be annotated with [ CLS ], such as by taking the annotation as the first word of the sample log text.
S5032: and performing word segmentation processing on the corresponding labeled log texts to obtain a word segmentation sequence of each sample corresponding to each sample log text.
In practical applications, the word segmentation process for labeling the journal text is similar to that in step S303, and is not described herein again. It should be noted that word segmentation processing may be performed on the sample log text first, and then labeling processing is performed on the obtained word segmentation sequence, so as to generate a sample word segmentation sequence. In some embodiments, the first participle of the sample participle sequence is [ CLS ] labeled with the corresponding participle.
In practical applications, the step S503 may be implemented based on a separate tool or a segmentation model, or the step S503 may be implemented by setting a segmentation network in the initial prediction model. For example, the initial prediction model is a BERT model, a segmentation network can be set in the BERT model, based on a WordPiece Tokenization mode, a sample log text is broken into sample log segmentation words with smaller granularity, a [ CLS ] is used as a labeling result, a [ UNK ] is used as a new word outside a dictionary, a [ PAD ] is used for Batch supplement, and the like. In the prediction process, context prediction or word missing prediction and the like do not need to be carried out on the log text, so that the word segmentation network can carry out text string interval without using SEP or carry out word missing task processing by using MASK and the like. Therefore, even if a word which is not recorded in the dictionary appears in the sample log text, the word can be effectively processed, and the adaptability and the robustness of word segmentation processing are improved.
S505: and taking various word sequences of the partial words as the input of the initial prediction model, taking the corresponding fault labels as the expected output of the initial prediction model, and performing constraint training of fault prediction on the initial prediction model to obtain a target fault prediction model.
In practical application, a fault label is used as a dependent variable of a model, a sample word segmentation sequence is used as an independent variable of the model, an initial prediction model is subjected to constraint training, model parameters are evaluated by using relevant evaluation indexes of a classification model, such as an ROC Curve (receiver operating characteristic Curve) or an AUC value (Area Under the ROC Curve), a model with the model parameters meeting preset convergence conditions is used as a target fault prediction model, and finally a generated model file is deployed.
In some embodiments, losses in the model training process may be calculated based on cross-entropy to update model parameters. The corresponding loss function is shown in equation (four) below, where y characterizes the fault signature and p characterizes the predicted value of the model output.
Figure 694310DEST_PATH_IMAGE002
In some embodiments, the method may further include an updating step of the target fault prediction model, and specifically may include:
s601: updated online problem feedback data and corresponding updated sample log text are periodically obtained.
S603: based on the updated online problem feedback data and the updated sample log text, a plurality of sets of updated sample log text and corresponding failure labels are generated.
S605: and performing word segmentation processing and labeling processing on each updated sample log text in the updated sample log text set to obtain a corresponding updated sample word segmentation sequence.
S607: and updating the target fault prediction model by taking the updated sample word segmentation sequence as the input of the target fault prediction model and taking the corresponding fault label as the expected output of the target fault prediction model.
In practical application, the on-line problem feedback data is continuously updated and changed, and the model is updated by using the updated on-line problem feedback data and the corresponding sample log text, so that the prediction accuracy of the target fault prediction model can be improved.
In the embodiment of the present application, please refer to fig. 8, and fig. 8 shows an overall frame schematic diagram of a failure prediction method of an application program provided in the embodiment of the present application, as shown in fig. 8, the present application performs training of an initial prediction model by using on-line problem feedback data corresponding to an on-line problem, which is obtained by user feedback at an on-line initial stage of an application program to be detected, and operation log data obtained by log reporting to obtain a target failure prediction model, deploys the target failure prediction model to be loaded into subsequent log analysis of the application program to be detected, performs automatic failure prediction on an operation log text corresponding to operation log data obtained by automatic log reporting subsequently, can implement on-line problem early warning under the condition that a user is not aware, assist a developer in performing timely detection and troubleshooting of the on-line problem, and after the application program is upgraded or expanded, the method can still carry out fault prediction and early warning, effectively reduce the period of finding and processing the online problems, and improve the user experience. Moreover, under the condition that the application program system generates a large amount of complicated redundant logs or program updating and expanding, developers do not need to spend a large amount of time to concentrate on system monitoring, the efficiency of repairing, optimizing and iterating the application program system is greatly improved, the system developers are effectively assisted to find problems and prevent the problems from happening as soon as possible, the problems on line are prevented from being diffused, and a large amount of labor cost is reduced.
An embodiment of the present application further provides an apparatus 700 for predicting a failure of an application, as shown in fig. 9, the apparatus includes:
the log text acquisition module 710: the method is used for acquiring the target running log text corresponding to the preset time window of the application program to be detected.
The word segmentation processing module 720: and the method is used for performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text.
The failure prediction module 730: and the method is used for calling the target fault prediction model to carry out fault prediction on the log text word segmentation sequence so as to obtain a fault prediction result of the application program to be detected.
The target fault prediction model is obtained by performing fault prediction constraint training on the initial prediction model based on a sample log text set corresponding to a preset time window and a matched fault label; the sample log text set and the fault label are generated based on-line problem feedback data corresponding to a preset time window and sample log text corresponding to the on-line problem feedback data.
In some embodiments, the failure prediction module 730 may include:
an encoding processing unit: and the log word segmentation vector sequence corresponding to the log text word segmentation sequence is obtained by utilizing the embedded network of the target fault prediction model to encode the log text word segmentation sequence.
A self-attention processing unit: and the characteristic encoder is used for carrying out self-attention processing on the log word segmentation vector sequence by utilizing the target fault prediction model to obtain a self-attention vector corresponding to the log word segmentation vector sequence.
A forward propagation unit: and the method is used for carrying out forward propagation processing on the self-attention vector to obtain a fault prediction result.
In some embodiments, the embedded network includes a first embedded layer and a second embedded layer, and the encoding processing unit may include:
a word embedding processing subunit: the word embedding layer is used for carrying out word embedding processing on each log word in the log text word segmentation sequence to obtain a word vector of each log word.
A location embedding processing subunit: and the second embedding layer is used for embedding the positions of the log participles in the log text participle sequence based on the positions of the log participles to obtain the position vectors of the log participles.
A vector addition processing subunit: and the word vector and the position vector of each log word segmentation are subjected to fusion processing to obtain a log word segmentation vector sequence.
In some embodiments, the feature encoder includes a vector transform layer and a self-attention layer, and the self-attention processing unit may include:
weight matrix conversion subunit: and the weight matrix conversion is used for carrying out weight matrix conversion on each participle vector in the log participle vector sequence by utilizing a vector conversion layer to obtain a characteristic vector sequence.
Multi-head self-attention processing subunit: the method is used for performing self-attention processing on the feature vector sequence by using the self-attention layer to obtain a self-attention vector.
In some embodiments, the self-attention layer is a network layer constructed based on a multi-head self-attention mechanism, and accordingly, the multi-head self-attention processing subunit may be specifically configured to: and utilizing a self-attention layer to divide each feature vector in the feature vector sequence into a preset number of feature sub-vectors, performing multi-head self-attention calculation on each feature sub-vector corresponding to each feature vector based on a multi-head self-attention mechanism to obtain a self-attention value of each feature sub-vector, and generating the self-attention vector according to the self-attention value of each feature sub-vector.
In some embodiments, the log text acquisition module 710 may include:
a log data acquisition unit: the method is used for acquiring the running log data of the application program to be detected.
A stream processing unit: and the method is used for performing stream processing on the running log data to obtain a target running log text corresponding to a preset time window.
In some embodiments, the apparatus further comprises:
a sample training set acquisition module: the method comprises the steps of obtaining a sample training set before obtaining a target running log text corresponding to a preset time window of an application program to be detected, wherein the sample training set comprises a plurality of sample log text sets corresponding to the preset time window and fault labels matched with the sample log text sets.
The sample word segmentation sequence generation module: and the method is used for performing word segmentation processing and labeling processing on each sample log text in the sample log text set to obtain each sample word segmentation sequence corresponding to each sample log text.
A model training module: and the method is used for performing constraint training of fault prediction on the initial prediction model by taking various word sequences of the part of words as the input of the initial prediction model and taking corresponding fault labels as the expected output of the initial prediction model to obtain a target fault prediction model.
In some embodiments, the sample training set acquisition module may include:
a data acquisition unit: the method and the device are used for obtaining online question feedback data submitted by a user based on the online question and sample log texts corresponding to the submission time of the online question feedback data.
A log text aggregation processing unit: and the method is used for carrying out aggregation processing on the sample log texts by taking a preset time window as an aggregation window based on the time sequence to obtain a sample log text set corresponding to the preset time window.
A feedback data aggregation processing unit: and the method is used for aggregating the on-line problem feedback data corresponding to the sample log text set to obtain a feedback data aggregation result.
A failure tag generation unit: and generating a fault label matched with the sample log text set according to the feedback data aggregation result.
In some embodiments, the sample participle sequence generation module comprises:
a log text labeling unit: and the method is used for labeling each sample log text in the corresponding sample log text set according to the feedback data aggregation result to obtain each corresponding labeled log text.
A word segmentation processing unit: and the method is used for performing word segmentation processing on the corresponding labeled log texts to obtain the sample word segmentation sequences corresponding to the sample log texts.
The device and method embodiments in the device embodiment described above are based on the same application concept.
The embodiment of the application provides a failure prediction device of an application program, which includes a processor and a memory, where the memory stores at least one instruction or at least one program, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the failure prediction method of the application program provided by the above method embodiment.
The memory may be used to store software programs and modules, and the processor may execute various functional applications and data processing by operating the software programs and modules stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system, application programs needed by functions and the like; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory may also include a memory controller to provide the processor access to the memory.
The method provided by the embodiment of the application can be executed in a mobile terminal, a computer terminal, a server or a similar operation device. Taking an example of the application running on a server, fig. 10 is a hardware structure block diagram of the server of the failure prediction method for an application according to the embodiment of the present application. As shown in fig. 10, the server 800 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 810 (the processor 810 may include but is not limited to a Processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 830 for storing data, one or more storage media 820 (e.g., one or more mass storage devices) for storing applications 823 or data 822. Memory 830 and storage medium 820 may be, among other things, transient or persistent storage. The program stored in storage medium 820 may include one or more modules, each of which may include a series of instruction operations for a server. Still further, the central processor 810 may be configured to communicate with the storage medium 820 to execute a series of instruction operations in the storage medium 820 on the server 800. The server 800 may also include one or more power supplies 860, one or more wired or wireless network interfaces 850, one or more input-output interfaces 840, and/or one or more operating systems 821, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.
The input-output interface 840 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the server 800. In one example, i/o Interface 840 includes a Network adapter (NIC) that may be coupled to other Network devices via a base station to communicate with the internet. In one example, the input/output interface 840 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
It will be understood by those skilled in the art that the structure shown in fig. 10 is merely illustrative and is not intended to limit the structure of the electronic device. For example, server 800 may also include more or fewer components than shown in FIG. 10, or have a different configuration than shown in FIG. 10.
Embodiments of the present application further provide a storage medium, where the storage medium may be disposed in a server to store at least one instruction or at least one program for implementing a method for processing noise of an image in the method embodiments, and the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for processing noise of an image provided in the method embodiments.
Alternatively, in this embodiment, the storage medium may be located in at least one network server of a plurality of network servers of a computer network. Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.
As can be seen from the embodiments of the method, the apparatus, the device, the server or the storage medium for predicting the failure of the application program provided by the present application, the target running log text corresponding to the preset time window of the application program to be detected is obtained; performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text; and calling a target fault prediction model to perform fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected, so that corresponding fault prediction and early warning can be realized before large-scale exposure of online problems, developers can be assisted to position and troubleshoot online problems in advance under the condition that users are not sensitive, and user experience is improved. And model training data are generated based on-line problem feedback data of the application program and corresponding sample log texts, the acquisition mode of the marked data is simple and convenient, strong correlation exists between the marked data and the on-line problems, a large number of support rules and complex knowledge maps do not need to be constructed, and the accuracy of model prediction and the training efficiency can be effectively improved.
It should be noted that: the sequence of the embodiments of the present application is only for description, and does not represent the advantages and disadvantages of the embodiments. And specific embodiments thereof have been described above. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The embodiments in the present application are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, device and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware to implement the above embodiments, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk, an optical disk, or the like.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method for predicting failure of an application, the method comprising:
acquiring a sample training set, wherein the sample training set comprises a plurality of sample log text sets corresponding to preset time windows and fault labels matched with the sample log text sets;
performing word segmentation processing and labeling processing on each sample log text in the sample log text set to obtain each sample word segmentation sequence corresponding to each sample log text;
taking the various word sequences as the input of an initial prediction model, taking a corresponding fault label as the expected output of the initial prediction model, and performing constraint training of fault prediction on the initial prediction model to obtain a target fault prediction model;
acquiring a target running log text corresponding to the preset time window of the application program to be detected;
performing word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text;
calling the target fault prediction model to carry out fault prediction on the log text word segmentation sequence to obtain a fault prediction result of the application program to be detected;
the target fault prediction model is obtained by performing constraint training of fault prediction on an initial prediction model based on a sample log text set corresponding to the preset time window and the matched fault label; the sample log text set and the fault label are generated based on-line problem feedback data corresponding to the preset time window and sample log text corresponding to the on-line problem feedback data;
the obtaining a sample training set comprises:
acquiring online problem feedback data submitted by a user based on an online problem and a sample log text corresponding to the submission time of the online problem feedback data;
based on a time sequence and with the preset time window as an aggregation window, carrying out aggregation processing on the sample log text to obtain a sample log text set corresponding to the preset time window;
aggregating the on-line problem feedback data corresponding to the sample log text set to obtain a feedback data aggregation result;
and generating a fault label matched with the sample log text set according to the feedback data aggregation result.
2. The method according to claim 1, wherein the target failure prediction model comprises an embedded network and a feature encoder, and accordingly, the calling the target failure prediction model to perform failure prediction on the log text word segmentation sequence to obtain the failure prediction result of the application program to be detected comprises:
encoding the log text word segmentation sequence by utilizing an embedded network of the target fault prediction model to obtain a log word segmentation vector sequence corresponding to the log text word segmentation sequence;
performing self-attention processing on the log word segmentation vector sequence by using a feature encoder of the target fault prediction model to obtain a self-attention vector corresponding to the log word segmentation vector sequence;
and carrying out forward propagation processing on the self-attention vector to obtain the fault prediction result.
3. The method of claim 2, wherein the embedded network includes a first embedded layer and a second embedded layer, and accordingly, the encoding the log text word segmentation sequence by the embedded network using the target failure prediction model to obtain a log word segmentation vector sequence corresponding to the log text word segmentation sequence includes:
performing word embedding processing on each log word in the log text word segmentation sequence through the first embedding layer to obtain a word vector of each log word;
performing position embedding processing on each log word segmentation based on the position of each log word segmentation in the log text word segmentation sequence through the second embedding layer to obtain a position vector of each log word segmentation;
and performing fusion processing on the word vector and the position vector of each log word segmentation to obtain the log word segmentation vector sequence.
4. The method of claim 2, wherein the feature encoder comprises a vector transformation layer and a self-attention layer, and wherein the self-attention processing of the log participle vector sequence by the feature encoder using the target failure prediction model to obtain the self-attention vector corresponding to the log participle vector sequence comprises:
performing weight matrix conversion on each participle vector in the log participle vector sequence by using the vector conversion layer to obtain a characteristic vector sequence;
and performing self-attention processing on the feature vector sequence by utilizing the self-attention layer to obtain the self-attention vector.
5. The method according to claim 4, wherein the self-attention layer is a network layer constructed based on a multi-head self-attention mechanism, and accordingly, the self-attention processing the feature vector sequence by using the self-attention layer to obtain the self-attention vector comprises:
and splitting each feature vector in the feature vector sequence into a preset number of feature sub-vectors by using the self-attention layer, performing multi-head self-attention calculation on each feature sub-vector corresponding to each feature vector based on the multi-head self-attention mechanism to obtain a self-attention value of each feature sub-vector, and generating the self-attention vector according to the self-attention value of each feature sub-vector.
6. The method according to any one of claims 1 to 5, wherein the obtaining of the target running log text corresponding to the preset time window of the application program to be detected comprises:
acquiring running log data of the application program to be detected;
and performing stream processing on the running log data to obtain a target running log text corresponding to the preset time window.
7. The method according to claim 1, wherein the performing word segmentation processing and labeling processing on each sample log text in the sample log text set to obtain each sample word segmentation sequence corresponding to each sample log text comprises:
marking each sample log text in the corresponding sample log text set according to the feedback data aggregation result to obtain each corresponding marked log text;
and performing word segmentation processing on the corresponding labeled log texts to obtain sample word segmentation sequences corresponding to the sample log texts.
8. An apparatus for predicting failure of an application, the apparatus comprising:
a sample training set acquisition module: the method comprises the steps of obtaining a sample training set, wherein the sample training set comprises a plurality of sample log text sets corresponding to preset time windows and fault labels matched with the sample log text sets;
the sample word segmentation sequence generation module: the system comprises a word segmentation unit, a word tagging unit, a word segmentation unit and a word segmentation unit, wherein the word segmentation unit is used for performing word segmentation processing and labeling processing on each sample log text in the sample log text set to obtain each sample word segmentation sequence corresponding to each sample log text;
a model training module: the system comprises a plurality of word sequences, a target failure prediction model and a failure label database, wherein the word sequences are used as the input of the initial prediction model, the corresponding failure labels are used as the expected output of the initial prediction model, and the initial prediction model is subjected to failure prediction constraint training to obtain the target failure prediction model;
a log text acquisition module: the target running log text corresponding to the preset time window of the application program to be detected is obtained;
a word segmentation processing module: the log text word segmentation processing module is used for carrying out word segmentation processing on the target operation log text to obtain a log text word segmentation sequence corresponding to the target operation log text;
a failure prediction module: the target failure prediction model is used for calling the log text word segmentation sequence to perform failure prediction to obtain a failure prediction result of the application program to be detected;
the target fault prediction model is obtained by performing constraint training of fault prediction on an initial prediction model based on a sample log text set corresponding to the preset time window and the matched fault label; the sample log text set and the fault label are generated based on-line problem feedback data corresponding to the preset time window and sample log text corresponding to the on-line problem feedback data;
the sample training set acquisition module comprises:
a data acquisition unit: the online question feedback data acquisition module is used for acquiring online question feedback data submitted by a user based on an online question and a sample log text corresponding to the submission time of the online question feedback data;
a log text aggregation processing unit: the time sequence-based aggregation processing device is used for carrying out aggregation processing on the sample log texts by taking the preset time window as an aggregation window on the basis of a time sequence to obtain a sample log text set corresponding to the preset time window;
a feedback data aggregation processing unit: the system is used for aggregating the on-line problem feedback data corresponding to the sample log text set to obtain a feedback data aggregation result;
a failure tag generation unit: and the fault label matched with the sample log text set is generated according to the feedback data aggregation result.
9. A failure prediction device for an application, characterized in that the device comprises a processor and a memory, the device comprising a processor and a memory, in which memory at least one instruction or at least one program is stored, which is loaded and executed by the processor to implement the failure prediction method for an application as claimed in claims 1-7.
10. A computer-readable storage medium, in which at least one instruction or at least one program is stored, which is loaded and executed by a processor to implement a method for failure prediction of an application program as claimed in claims 1 to 7.
CN202110633968.8A 2021-06-07 2021-06-07 Application program fault prediction method and device Active CN113094200B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110633968.8A CN113094200B (en) 2021-06-07 2021-06-07 Application program fault prediction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110633968.8A CN113094200B (en) 2021-06-07 2021-06-07 Application program fault prediction method and device

Publications (2)

Publication Number Publication Date
CN113094200A CN113094200A (en) 2021-07-09
CN113094200B true CN113094200B (en) 2021-08-24

Family

ID=76666077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110633968.8A Active CN113094200B (en) 2021-06-07 2021-06-07 Application program fault prediction method and device

Country Status (1)

Country Link
CN (1) CN113094200B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114036293B (en) * 2021-11-03 2023-06-06 腾讯科技(深圳)有限公司 Data processing method and device and electronic equipment
CN113743110B (en) * 2021-11-08 2022-02-11 京华信息科技股份有限公司 Word missing detection method and system based on fine-tuning generation type confrontation network model
CN114462018B (en) * 2022-01-10 2023-05-30 电子科技大学 Password guessing system and method based on transducer model and deep reinforcement learning
CN115270125A (en) * 2022-08-11 2022-11-01 江苏安超云软件有限公司 IDS log classification prediction method, device, equipment and storage medium
CN115509789B (en) * 2022-09-30 2023-08-11 中国科学院重庆绿色智能技术研究院 Method and system for predicting faults of computing system based on component call analysis
CN115480946A (en) * 2022-10-11 2022-12-16 中国电信股份有限公司 Fault detection model modeling method, protection implementation method and related equipment
CN115913989B (en) * 2022-11-08 2023-09-19 广州鲁邦通物联网科技股份有限公司 Resource protection method of cloud management platform
CN115687031A (en) * 2022-11-15 2023-02-03 北京优特捷信息技术有限公司 Method, device, equipment and medium for generating alarm description text
CN116016122A (en) * 2022-12-05 2023-04-25 中国联合网络通信集团有限公司 Network fault solution prediction method, device, equipment and storage medium
CN116010896A (en) * 2023-02-03 2023-04-25 南京南瑞继保电气有限公司 Wind driven generator fault diagnosis method based on countermeasure training and transducer
CN116402219A (en) * 2023-03-29 2023-07-07 中科航迈数控软件(深圳)有限公司 Full life cycle operation and maintenance strategy method and device based on prediction model
CN117349129B (en) * 2023-12-06 2024-03-29 广东无忧车享科技有限公司 Abnormal optimization method and system for vehicle sales process service system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653444A (en) * 2015-12-23 2016-06-08 北京大学 Internet log data-based software defect failure recognition method and system
WO2018005489A1 (en) * 2016-06-27 2018-01-04 Purepredictive, Inc. Data quality detection and compensation for machine learning
CN110569909A (en) * 2019-09-10 2019-12-13 腾讯科技(深圳)有限公司 fault early warning method, device, equipment and storage medium based on block chain
WO2020075019A1 (en) * 2018-10-09 2020-04-16 International Business Machines Corporation Prediction model enhancement
CN111078479A (en) * 2019-09-26 2020-04-28 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device
CN111178378A (en) * 2019-11-07 2020-05-19 腾讯科技(深圳)有限公司 Equipment fault prediction method and device, electronic equipment and storage medium
CN111813587A (en) * 2020-05-28 2020-10-23 国网山东省电力公司 Software interface evaluation and fault early warning method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9262255B2 (en) * 2013-03-14 2016-02-16 International Business Machines Corporation Multi-stage failure analysis and prediction
US20160210661A1 (en) * 2014-12-31 2016-07-21 Anto Chittilappilly Managing digital media spend allocation using calibrated user-level attribution data
US9959158B2 (en) * 2015-10-13 2018-05-01 Honeywell International Inc. Methods and apparatus for the creation and use of reusable fault model components in fault modeling and complex system prognostics
US11537390B2 (en) * 2018-01-05 2022-12-27 Syracuse University Smart products lifecycle management platform
WO2019142331A1 (en) * 2018-01-19 2019-07-25 株式会社日立製作所 Failure prediction system and failure prediction method
US10884842B1 (en) * 2018-11-14 2021-01-05 Intuit Inc. Automatic triaging
CN111045939B (en) * 2019-12-09 2021-03-30 山西大学 Weibull distributed fault detection open source software reliability modeling method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653444A (en) * 2015-12-23 2016-06-08 北京大学 Internet log data-based software defect failure recognition method and system
WO2018005489A1 (en) * 2016-06-27 2018-01-04 Purepredictive, Inc. Data quality detection and compensation for machine learning
WO2020075019A1 (en) * 2018-10-09 2020-04-16 International Business Machines Corporation Prediction model enhancement
CN110569909A (en) * 2019-09-10 2019-12-13 腾讯科技(深圳)有限公司 fault early warning method, device, equipment and storage medium based on block chain
CN111078479A (en) * 2019-09-26 2020-04-28 腾讯科技(深圳)有限公司 Memory detection model training method, memory detection method and device
CN111178378A (en) * 2019-11-07 2020-05-19 腾讯科技(深圳)有限公司 Equipment fault prediction method and device, electronic equipment and storage medium
CN111813587A (en) * 2020-05-28 2020-10-23 国网山东省电力公司 Software interface evaluation and fault early warning method and system

Also Published As

Publication number Publication date
CN113094200A (en) 2021-07-09

Similar Documents

Publication Publication Date Title
CN113094200B (en) Application program fault prediction method and device
CN110597991B (en) Text classification method and device, computer equipment and storage medium
US10699195B2 (en) Training of artificial neural networks using safe mutations based on output gradients
US11907675B2 (en) Generating training datasets for training neural networks
CN113052149B (en) Video abstract generation method and device, computer equipment and medium
CN113919344A (en) Text processing method and device
CN110866119B (en) Article quality determination method and device, electronic equipment and storage medium
CN110633360B (en) Semantic matching method and related device
CN112417887B (en) Sensitive word and sentence recognition model processing method and related equipment thereof
CN113791757A (en) Software requirement and code mapping method and system
CN113704410A (en) Emotion fluctuation detection method and device, electronic equipment and storage medium
CN113128196A (en) Text information processing method and device, storage medium
Zhang et al. Putracead: Trace anomaly detection with partial labels based on gnn and pu learning
CN114610613A (en) Online real-time micro-service call chain abnormity detection method
Huang et al. Improving log-based anomaly detection by pre-training hierarchical transformers
US10614100B2 (en) Semantic merge of arguments
CN112749556B (en) Multi-language model training method and device, storage medium and electronic equipment
CN117312562A (en) Training method, device, equipment and storage medium of content auditing model
CN114969334B (en) Abnormal log detection method and device, electronic equipment and readable storage medium
CN116739408A (en) Power grid dispatching safety monitoring method and system based on data tag and electronic equipment
CN113821418B (en) Fault root cause analysis method and device, storage medium and electronic equipment
Tao et al. Biglog: Unsupervised large-scale pre-training for a unified log representation
CN112989024B (en) Method, device and equipment for extracting relation of text content and storage medium
Chandra et al. An Enhanced Deep Learning Model for Duplicate Question Detection on Quora Question pairs using Siamese LSTM
CN113627514A (en) Data processing method and device of knowledge graph, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40049213

Country of ref document: HK