CN116225770B - Patch matching method, device, equipment and storage medium - Google Patents

Patch matching method, device, equipment and storage medium Download PDF

Info

Publication number
CN116225770B
CN116225770B CN202310484005.5A CN202310484005A CN116225770B CN 116225770 B CN116225770 B CN 116225770B CN 202310484005 A CN202310484005 A CN 202310484005A CN 116225770 B CN116225770 B CN 116225770B
Authority
CN
China
Prior art keywords
downtime
similarity
patch
candidate
call stack
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310484005.5A
Other languages
Chinese (zh)
Other versions
CN116225770A (en
Inventor
翟明晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202310484005.5A priority Critical patent/CN116225770B/en
Publication of CN116225770A publication Critical patent/CN116225770A/en
Application granted granted Critical
Publication of CN116225770B publication Critical patent/CN116225770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0781Error filtering or prioritizing based on a policy defined by the user or on a policy defined by a hardware/software module, e.g. according to a severity level
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Stored Programmes (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the application provides a patch matching method, a patch matching device, patch matching equipment and a storage medium. In the embodiment of the application, the target downtime log generated after the target machine is downtime can be obtained; extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log; acquiring call stacks and patch information of a plurality of candidate patches; determining a first similarity based on the lycenstant distance between the down call stack and the call stack of each candidate patch of the plurality of candidate patches, and determining a second similarity based on the cosine similarity between the down exception log and the patch information of each candidate patch of the plurality of candidate patches; a target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.

Description

Patch matching method, device, equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a patch matching method, device, apparatus, and storage medium.
Background
With the gradual expansion of the cloud computing scale, the downtime of the Linux kernel is increased. Whereas fault analysis with respect to downtime typically requires significant effort and time from specialized operation and maintenance personnel familiar with kernel development. In order to improve the failure analysis efficiency of operation and maintenance personnel on downtime, patches possibly existing in the downtime problem are often matched in the existing kernel failure analysis by adopting a mode of matching keywords such as a call stack and the like. However, the accuracy of this patch matching method still needs to be improved. Therefore, how to improve the failure analysis efficiency and patch matching accuracy of downtime, there is still a need to provide further solutions.
Disclosure of Invention
Aspects of the application provide a patch matching method, device, equipment and storage medium, which are used for improving the failure analysis efficiency and patch matching accuracy of downtime.
The embodiment of the application provides a patch matching method, which comprises the following steps: obtaining a target downtime log generated after the target machine is downtime; extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log; acquiring call stacks and patch information of a plurality of candidate patches; determining a first similarity based on the lycenstant distance between the down call stack and the call stack of each candidate patch of the plurality of candidate patches, and determining a second similarity based on the cosine similarity between the down exception log and the patch information of each candidate patch of the plurality of candidate patches; a target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.
The embodiment of the application also provides a patch matching device, which comprises: the first acquisition module is used for acquiring a target downtime log generated after the target machine is downtime; the extraction module is used for extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log; the second acquisition module is used for acquiring call stacks and patch information of a plurality of candidate patches; the similarity determining module is used for determining a first similarity based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches and determining a second similarity based on the cosine similarity between the downtime exception log and the patch information of each candidate patch in the plurality of candidate patches; and a patch determination module configured to determine a target patch for repairing the target machine from a plurality of candidate patches based on the first similarity and the second similarity.
The embodiment of the application also provides electronic equipment, which comprises: a memory and a processor; the memory is used for storing a computer program; the processor, coupled to the memory, is configured to execute the computer program for: obtaining a target downtime log generated after the target machine is downtime; extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log; acquiring call stacks and patch information of a plurality of candidate patches; determining a first similarity based on the lycenstant distance between the down call stack and the call stack of each candidate patch of the plurality of candidate patches, and determining a second similarity based on the cosine similarity between the down exception log and the patch information of each candidate patch of the plurality of candidate patches; a target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.
The present application also provides a computer readable storage medium storing a computer program, which when executed by a processor, causes the processor to implement the steps in the patch matching method provided by the embodiment of the present application.
In the embodiment of the application, the target downtime log generated after the target machine is downtime can be obtained, the downtime call stack and the downtime exception log of the target machine are extracted from the target downtime log, the call stacks and the patch information of a plurality of candidate patches are obtained at the same time, the first similarity is determined based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches, the second similarity is determined based on the cosine similarity between the downtime exception log and the patch information of each candidate patch in the plurality of candidate patches, and finally the first similarity and the second similarity are integrated, so that the target patch for repairing the target machine is determined from the plurality of candidate patches.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
Fig. 1 is a schematic process diagram of a patch matching method according to an exemplary embodiment of the present application;
fig. 2 is a system flow diagram of a patch matching method according to an exemplary embodiment of the present application;
fig. 3 is a schematic structural diagram of a patch matching device according to an exemplary embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
First, terms related to one or more embodiments of the present application will be explained.
Downtime: downtime refers to the phenomenon that the operating system cannot recover from a serious system error, or serious problems occur in the hardware level of the system, so that the system does not respond for a long time, and the computer has to be restarted. The downtime causes a service disruption.
Downtime call stack: the function call stack during downtime can be printed by a dmesg command.
Downtime information: and (5) a kernel log generated after downtime occurs.
Deep learning: deep learning is an artificial intelligence method for teaching computers to process data in a manner inspired by the human brain. The deep learning model may identify complex patterns in pictures, text, sound, and other data, extract semantic features of the data, and thereby generate accurate insights and predictions.
TF-IDF: TF-IDF (term frequency-inverse document frequency) is a common weighting technique for information retrieval and data mining. TF is the Term Frequency (Term Frequency) and IDF is the inverse text Frequency index (Inverse Document Frequency).
Edit distance: the edit distance is a quantitative measure of the degree of difference between two strings by looking at how many times at least processing is required to change one string into another. The edit distance may be used in natural language processing.
Levenstein distance: the Levenshtein distance is also called Levenshtein distance, and is one type of editing distance. Refers to the minimum number of editing operations required to switch from one to the other between two strings. The allowed editing operations include replacing one character with another, inserting one character, deleting one character.
Expression mode conversion: for sentences in the language database, selecting keywords from sentences, and recombining the selected keywords by adopting a designated expression mode, wherein the semantics contained in the combined sentences are highly similar to those of the original sentences.
Sentence Transformers: i.e., a sentence vector conversion model, a method based on a pre-trained language model that can convert sentences into 1*N-dimensional embedded vectors, which can learn to adjust sentence embedded vectors to capture potential semantics.
As described in the background art, the conventional matching scheme usually only adopts a keyword matching mode such as matching call stack, and semantics of the downtime log are usually ignored, so that downtime log information is lost, and matching accuracy is low. The general search engine for searching the cause of downtime has a limited input length in the search box although the search surface is wider, and the whole downtime log cannot be used as input, so that the search effect is poor. While some upstream community websites providing patches can search the downtime reasons of similar downtime from collected downtime anomaly reports to search for patches repairing downtime, because of limited input length, the downtime reasons need to be manually summarized as input, and the efficiency is low.
In view of this, in some embodiments of the present application, a target downtime log generated after a target machine is downtime may be obtained, a downtime call stack and a downtime exception log when the target machine is downtime may be extracted from the target downtime log, call stacks and patch information of a plurality of candidate patches may be obtained at the same time, then a first similarity may be determined based on a levenstein distance between the downtime call stack and call stacks of each candidate patch in the plurality of candidate patches, a second similarity may be determined based on a cosine similarity between the downtime exception log and patch information of each candidate patch in the plurality of candidate patches, and finally the first similarity and the second similarity may be integrated, and the target patch for repairing the target machine may be determined from the plurality of candidate patches.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flowchart of a patch matching method according to an exemplary embodiment of the present application. As shown in fig. 1, the method includes:
and step 110, obtaining a target downtime log generated after the target machine is downtime.
And 120, extracting a downtime call stack and a downtime abnormal log when the target machine is downtime from the target downtime log.
In some exemplary embodiments, to improve patch matching efficiency and avoid unnecessary matching computation effort, the target downtime log may be processed to remove redundant information in the target downtime log that is not related to the target machine downtime, thereby extracting downtime-related information. Specifically, extracting a downtime call stack and a downtime exception log when a target machine is downtime from a target downtime log comprises:
extracting a downtime call stack and an initial downtime exception log from the target downtime log by using the regular expression;
data cleaning is carried out on the initial downtime abnormal log so as to filter redundant information irrelevant to the downtime of the target machine from the initial downtime abnormal log;
removing stop words and c language keywords in the filtered initial downtime exception log, and extracting word stems from the initial downtime exception log after the removal operation;
And filling the initial downtime abnormal log after the stem extraction operation according to the output format of the preset downtime abnormal log to obtain the downtime abnormal log.
Wherein, a downtime call stack may include a plurality of kernel functions, such as f1, f2, f3, and f4, the call relationship of the kernel functions may be f4 (f 3 (f 2 (f 1))), each kernel function may be called a stack frame, in the call relationship, f1 may be called a stack top, and f4 may be called a stack bottom.
Wherein the redundant information which is irrelevant to the downtime of the target machine can comprise information of irrelevant downtime reasons such as a time stamp, a punctuation mark and the like. Stop words are words that are often ignored in text processing, common stop words include pronouns, prepositions, conjunctions, articles, etc., as they generally do not make a significant contribution to the meaning of text, whereas C-language keywords are strings of specific meaning specified by the C-language, also commonly referred to as reserved words, e.g., int, char, long, float, unsigned, etc., which are also irrelevant to the cause of downtime. Therefore, the embodiment of the application can remove the stop words and the c language keywords in the initial downtime exception log based on the method. Wherein, the stop words and the c language keywords can be obtained through regular expressions. The stem extraction processing is a process of removing the affix to obtain the root, and key information related to downtime in the initial downtime abnormal log can be effectively extracted through stem extraction. Finally, in order to improve the data processing efficiency, the initial downtime abnormal log after the stem extraction operation can be filled according to the output format of the preset downtime abnormal log, so that the downtime abnormal log is obtained, namely, the data expression mode of the downtime abnormal log is unified.
Step 130, obtaining call stacks and patch information of a plurality of candidate patches.
Wherein multiple candidate patches are provided by an upstream community, which refers to a community maintained by the original sponsor or non-profit organization for some open source projects, others are free to contribute to the original project (upstream), such as submitting code, repairing bugs, writing documents, etc. Typically, the upstream community submits patches that solve various downtime problems, these patches are called candidate patches, and when the target machine is down, the upstream community can find the correct candidate patch for repair.
In some exemplary embodiments, to improve patch matching efficiency while avoiding unnecessary matching computation effort, obtaining call stacks and patch information for multiple candidate patches includes:
extracting call stacks and initial patch information of a plurality of candidate patches by using a regular expression;
data cleaning is carried out on the initial patch information so as to filter redundant information irrelevant to downtime reasons from the initial patch information;
removing stop words and c language keywords in the filtered initial patch information, and extracting word stems from the patch information after the removal operation;
And filling the patch information after the stem extraction operation according to a preset patch information output format to obtain patch information of a plurality of candidate patches.
Step 140, determining a first similarity based on the levenstein distance between the call stack of the downtime and the call stack of each candidate patch of the plurality of candidate patches, and determining a second similarity based on the cosine similarity between the downtime anomaly log and the patch information of each candidate patch of the plurality of candidate patches.
In some exemplary embodiments, to improve matching accuracy, i.e., to improve accuracy of the first similarity, embodiments of the present application improve the TF-IDF algorithm, adding location-based information, i.e., weights for each kernel function, described below. Specifically, determining the first similarity based on the levenstein distance between the down call stack and the call stack of each candidate patch of the plurality of candidate patches includes:
determining call relations between each kernel function in call stacks of the downtime call stack and the candidate patches and between each kernel function;
determining weights of kernel functions in the call stacks of the down call stack and the candidate patches based on call relations between the down call stack and the kernel functions in the call stacks of the candidate patches;
Determining the Lychnst distance between the down call stack and the call stacks of the candidate patches based on the weights of the kernel functions in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of the kernel functions in the down call stack and the call stacks of the candidate patches;
and determining the first similarity based on the Levenstein distance between the down call stack and the call stacks of the plurality of candidate patches and the weight of each kernel function in the down call stack and the call stacks of the plurality of candidate patches.
The weight of each kernel function is calculated by assuming that each call stack contains N stack frames, where N is typically greater than 1, and one stack frame represents a layer of function call. For example, the function f calls the kernel function g, the kernel function g calls the function h, and if the kernel function h is down, the call stack is f (g (h)), where each kernel function of h, g, and f represents a stack frame, where the stack top stack frame is the kernel function h, and the stack top stack frame is the kernel function f. The initial weight of the stack top stack frame is 1, and the weight of each stack frame of the upper layer is added with 1 on the basis of the initial weight. That is, the initial weight of the stack top stack frame function h is 1, the weight of the stack frame kernel function g=initial weight+1=2, and the weight of the stack bottom stack frame kernel function f=the weight of the stack frame kernel function g+1=3, so that the more important position information of the function closer to the stack top is introduced in the levenstein distance.
In some exemplary embodiments, determining the first similarity based on the levenstein distance between the down call stack and the call stack of each of the plurality of candidate patches and the weight of each kernel function in the down call stack and the call stack of each of the plurality of candidate patches comprises:
determining the weight sum of the call stack of the downtime and the call stack of each candidate patch in the plurality of candidate patches based on the weights of the kernel functions in the call stack of each candidate patch in the downtime call stack and each candidate patch in the plurality of candidate patches;
and determining the first similarity based on the Levenstein distance between the call stack of the downtime and the call stack of each candidate patch in the plurality of candidate patches and the ratio between the weight sum of the call stack of the downtime and the call stack of each candidate patch in the plurality of candidate patches.
Based on the introduced position information of the function which is more important closer to the stack top, the embodiment of the application improves the Levenstein distance, and the specific rules are as follows: adding or deleting a kernel function, wherein the added distance is the weight of the kernel function; one kernel function is replaced, and the added distance is the sum of weights of the two kernel functions before and after replacement. The smaller the final levenstein distance is, the higher the similarity between the two call stacks is, and finally the ratio of the calculated levenstein distance to the sum of weights of all stack frames of the two stacks is taken as the first similarity which is the similarity between the final call stacks.
Wherein, the calculation formula of TF-IDF is TF-idf=tf-IDF, where TF is the number of times the word appears in the current document; the N documents in the existing document set are provided with the word, the IDF of the word is 1/N as the inverse document frequency, so that a kernel function word bag needs to be constructed, and the number of the words in the documents is counted. The word frequency is then used as the initial weight for the word.
Assuming that the downtime call stack a is f4 (f 3 (f 2 (f 1))), including kernel functions f1, f2, f3, f4, and the call stack B of the candidate patch is f3 (g 2 (f 1)), including kernel functions f1, g2, f3. According to the above calculation rule, for the downtime call stack a, the weights of the kernel functions f4, f3, f2, f1 are respectively 1×tf-IDF (f 4) =1×tf-IDF (f 4), 2×tf-IDF (f 3) =2×tf-IDF (f 3), 3×tf-IDF (f 2) =3×tf-IDF (f 2), 4×tf-IDF (f 1) =4×tf-IDF (f 1); for stack B of candidate patches, the weights of kernel functions f3, g2, f1 are 1×tf-IDF (f 3) =1×tf-IDF (f 3), 2×tf-IDF (g 2) =2×tf-IDF (g 2), 3×tf-IDF (f 1) =3×tf-IDF (f 1), respectively. The levenstein distance refers to the minimum number of operations required to change two character strings into the same character string through deletion, insertion, and replacement operations of the character. If stack a and stack B are made identical, the levenstein distance (i.e., the minimum number of operations) is 2.
The operation method for changing the call stack B of the candidate patch into the same operation method as the downtime call stack A can comprise the following steps: the method I comprises the steps of replacing g2 of a stack B with f2 and deleting f4 of the stack A; in method two, g2 of stack B is replaced with f2 and f4 is increased. Since the weight of each operation in the existing levenstein distance is 1 (this operation weight may also be referred to as edit distance), both methods are possible. The present solution improves the levenstein distance, i.e. each operational weight is no longer set to 1, but the final weight is calculated from the location weight and the TF-IDF weight. The levenstein distance for both method one and method two is 2 x tf-IDF (g 2) +3 x tf-IDF (f 2) +1 x tf-IDF (f 4) =2 x tf-IDF (g 2) +3 x tf-IDF (f 2) +1 x tf-IDF (f 4). Because the time complexity of searching all editing distances and solving the minimum weight is high, the scheme adopts the method of solving the editing distances to solve the function needing editing, and then the weight is replaced by the weight designed in the application.
In some exemplary embodiments, to improve the similarity calculation efficiency, the downtime anomaly logs may be further processed to convert them into an expression of downtime anomaly information in a preset format. Specifically, determining the second similarity based on cosine similarity between the downtime anomaly log and patch information of each candidate patch of the plurality of candidate patches includes:
Determining the downtime type of the downtime abnormal log, and extracting keyword information from the downtime abnormal log;
processing the downtime type and the keyword information according to a preset information combination mode to obtain downtime abnormal information in a preset format;
converting downtime abnormal information in a preset format and patch information of each candidate patch in the plurality of candidate patches into an embedded vector in a preset dimension through a statement vector conversion model;
and determining a second similarity based on the cosine similarity between the embedded vectors of the preset dimension corresponding to the downtime anomaly information of the preset format and the embedded vectors of the preset dimension corresponding to the patch information of each candidate patch in the plurality of candidate patches.
The downtime types can be classified into a core page-missing request cannot be processed, a hardware error, a null pointer dereferencing, a divide-by-0 error and the like, and each type has common keywords, and the downtime types can be classified according to the common keywords.
The method comprises the steps of converting downtime anomaly information in a preset format and patch information of each candidate patch in a plurality of candidate patches into embedded vectors in a preset dimension through a statement vector conversion model, and particularly converting the downtime anomaly information in the preset format and the patch information of each candidate patch in the plurality of candidate patches into computable embedded vectors in 1 x 1024 dimensions respectively, wherein the vectors comprise extracted log semantic information.
In some exemplary embodiments, to fully utilize the semantic information of the patch information of the candidate patch, the similarity between the patch title and the downtime anomaly log, and the similarity between the patch content and the downtime anomaly log may be calculated, respectively, starting from the patch title and the patch content. Specifically, the patch information includes a patch title and patch content, the second similarity includes a first sub-similarity and a second sub-similarity, and determining the second similarity based on an embedded vector of a preset dimension corresponding to the downtime anomaly information of the preset format and a cosine similarity between embedded vectors of preset dimensions corresponding to patch information of each candidate patch of the plurality of candidate patches includes:
determining a first sub-similarity based on cosine similarity between embedded vectors of preset dimensions corresponding to downtime anomaly information of a preset format and embedded vectors of preset dimensions corresponding to patch titles of a plurality of candidate patches;
and determining a second sub-similarity based on the cosine similarity between the embedded vectors of the preset dimension corresponding to the downtime anomaly information of the preset format and the embedded vectors of the preset dimension corresponding to the patch content of each candidate patch of the plurality of candidate patches.
A target patch for repairing the target machine is determined from the plurality of candidate patches based on the first similarity and the second similarity, step 150.
In some exemplary embodiments, for fusing similarities between feature information of multiple dimensions of call stacks of the down call stack-candidate patch, patch titles of the down exception log-candidate patch, and patch contents of the down exception log-candidate patch, the embodiment of the application can determine the final similarity by normalizing the sum of products between weights of different similarities. Specifically, determining a target patch for repairing the target machine from a plurality of candidate patches based on the first similarity and the second similarity, includes:
normalizing the first similarity, the first sub-similarity and the second sub-similarity to obtain the normalized first similarity, the normalized first sub-similarity and the normalized second sub-similarity;
determining the similarity between the downtime cause of the target machine and the plurality of candidate patches based on the normalized first similarity, the first sub-similarity and the second sub-similarity, and the weights of the first similarity, the first sub-similarity and the second sub-similarity;
A target patch for repairing the target machine is determined from the plurality of candidate patches based on a similarity between a downtime cause of the target machine and the plurality of candidate patches.
And normalizing the first similarity, the first sub-similarity and the second sub-similarity, normalizing the first similarity, the first sub-similarity and the second sub-similarity to a range of 0-1, distributing weights among the first similarity, the first sub-similarity and the second sub-similarity, and taking the weighted sum score as a final similarity score, wherein the weights of the three similarities are adjusted according to the accuracy of an evaluation set. Wherein, the final similarity score=a×first similarity+b×first sub-similarity+c×second sub-similarity. Where a+b+c=1, the specific weights of a, b, c are assigned empirically by the developer. For a downtime log, a similarity score between the downtime log and all upstream patches can be calculated, then all upstream patches are ranked according to the similarity score, and the top x (such as the top 10) upstream patches with the highest score are returned as the most probable correct patches capable of repairing the downtime.
Fig. 2 is a schematic flow chart of a patch matching method applied to an actual scene according to an embodiment of the present application, including: s21, extracting a downtime log feature, wherein the downtime log feature can comprise a downtime abnormal log and a downtime call stack; s22, converting the expression mode of the downtime abnormal log, specifically preprocessing the downtime abnormal log, converting the downtime abnormal log into a preset information combination mode and converting the expression mode of the downtime abnormal log; s23, extracting candidate patch features, wherein the candidate patch features comprise patch titles, patch contents and patch call stacks; s24, performing embedded vector conversion through a statement vector conversion model, specifically, converting the downtime exception log and the patch titles and patch contents of the candidate patches after the expression mode conversion into embedded vectors with preset dimensions through the statement vector conversion model, and calculating first sub-similarity between the downtime exception log and the patch titles and second sub-similarity between the downtime exception log and the patch contents based on the embedded vectors; s25, calculating the similarity between the down call stack and the patch call stack based on the improved TF-IDF; s26, calculating a final similarity score, which specifically may calculate a final similarity score of the downtime log and the plurality of candidate patches based on the first similarity, the first sub-similarity, the second sub-similarity, and the corresponding weights; s27, outputting the ranking of the plurality of candidate patches based on the final similarity score.
In the patch matching method provided by some embodiments of the present application, a target downtime log generated after a target machine is downtime can be obtained, a downtime call stack and a downtime exception log when the target machine is downtime are extracted from the target downtime log, call stacks and patch information of a plurality of candidate patches are obtained at the same time, a first similarity is determined based on the levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches, a second similarity is determined based on the cosine similarity between the downtime exception log and the patch information of each candidate patch in the plurality of candidate patches, and finally the first similarity and the second similarity are integrated, so that the target patch for repairing the target machine is determined from the plurality of candidate patches.
It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 110 to 130 may be device a; for another example, the execution subject of steps 110 to 120 may be device a, and the execution subject of step 130 may be device B; etc.
In addition, in some of the above embodiments and the flows described in the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 110, 120, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types.
Fig. 3 is a schematic structural diagram of a patch matching device according to an exemplary embodiment of the present application. As shown in fig. 3, the apparatus includes: a first acquisition module 310, an extraction module 320, a second acquisition module 330, a similarity determination module 340, and a patch determination module 350, wherein:
a first obtaining module 310, configured to obtain a target downtime log generated after the target machine is downtime;
an extracting module 320, configured to extract, from the target downtime log, a downtime call stack and a downtime exception log when the target machine is downtime;
A second obtaining module 330, configured to obtain call stacks and patch information of a plurality of candidate patches;
a similarity determining module 340, configured to determine a first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, and determine a second similarity based on a cosine similarity between the down exception log and patch information of each candidate patch of the plurality of candidate patches;
a patch determination module 350 is configured to determine, from a plurality of candidate patches, a target patch for repairing the target machine based on the first similarity and the second similarity.
According to the patch matching device provided by the embodiment of the application, the target downtime log generated after the target machine is downtime can be obtained, the downtime call stack and the downtime exception log of the target machine are extracted from the target downtime log, the call stacks and the patch information of a plurality of candidate patches are obtained at the same time, the first similarity is determined based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches, the second similarity is determined based on the cosine similarity between the downtime exception log and the patch information of each candidate patch in the plurality of candidate patches, and finally the first similarity and the second similarity are integrated, so that the target patch for repairing the target machine is determined from the plurality of candidate patches.
Further optionally, when the similarity determining module 340 determines the first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, the similarity determining module is specifically configured to:
determining call relations between the downtime call stack and each kernel function and between the call stacks of the candidate patches;
determining weights of the down call stack and each kernel function in the call stacks of the candidate patches based on the call relationship between the down call stack and each kernel function in the call stacks of the candidate patches;
determining the Levenstat distance between the down call stack and the call stacks of the candidate patches based on the weight of each kernel function in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of each kernel function in the down call stack and the call stacks of the candidate patches;
and determining the first similarity based on the Levenstein distance between the down call stack and the call stacks of the candidate patches and the weight of each kernel function in the down call stack and the call stacks of the candidate patches.
Further optionally, the similarity determining module 340 determines the first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, and a weight of each kernel function in a call stack of each candidate patch of the plurality of candidate patches, where the first similarity is specifically configured to:
determining the weight sum of the call stack of each candidate patch in the plurality of candidate patches and the down call stack based on the weight of each kernel function in the call stack of each candidate patch in the plurality of candidate patches and the down call stack;
and determining the first similarity based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches and the ratio between the weight sum of the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches.
Further optionally, when the similarity determining module 340 determines the second similarity based on the cosine similarity between the downtime anomaly log and patch information of each candidate patch of the plurality of candidate patches, the similarity determining module is specifically configured to:
determining the downtime type of the downtime abnormal log, and extracting keyword information from the downtime abnormal log;
Processing the downtime type and the keyword information according to a preset information combination mode to obtain downtime abnormal information in a preset format;
converting the downtime abnormal information in the preset format and the patch information of each candidate patch in the plurality of candidate patches into an embedded vector in a preset dimension through a statement vector conversion model;
and determining the second similarity based on the embedded vectors of the preset dimensions corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch information of each candidate patch in the plurality of candidate patches.
Further optionally, the patch information includes a patch title and patch content, the second similarity includes a first sub-similarity and a second sub-similarity, and the similarity determining module 340 is specifically configured to, when determining the second similarity, determine the second similarity based on an embedded vector of a preset dimension corresponding to the downtime anomaly information of the preset format and cosine similarity between embedded vectors of preset dimensions corresponding to patch information of each candidate patch of the plurality of candidate patches:
determining the first sub-similarity based on the embedded vectors of the preset dimensions corresponding to the downtime anomaly information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch titles of the candidate patches;
And determining the second sub-similarity based on the embedded vectors of the preset dimension corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimension corresponding to the patch content of each candidate patch of the plurality of candidate patches.
Further optionally, the patch determining module 350 is specifically configured to, when determining, from a plurality of candidate patches, a target patch for repairing the target machine based on the first similarity and the second similarity:
normalizing the first similarity, the first sub-similarity and the second sub-similarity to obtain normalized first similarity, first sub-similarity and second sub-similarity;
determining the similarity between the downtime cause of the target machine and the plurality of candidate patches based on the normalized first similarity, the first sub-similarity and the second sub-similarity, and weights of the first similarity, the first sub-similarity and the second sub-similarity;
and determining a target patch for repairing the target machine from a plurality of candidate patches based on the similarity between the downtime cause of the target machine and the plurality of candidate patches.
Further optionally, the extracting module 320 is specifically configured to, when extracting, from the target downtime log, a downtime call stack and a downtime exception log when the target machine is downtime:
extracting the downtime call stack and the initial downtime abnormal log from the target downtime log by using a regular expression;
data cleaning is carried out on the initial downtime abnormal log so as to filter redundant information irrelevant to the downtime of the target machine from the initial downtime abnormal log;
removing stop words and c language keywords in the filtered initial downtime exception log, and extracting word stems from the initial downtime exception log after the removal operation;
and filling the initial downtime abnormal log after the stem extraction operation according to an output format of a preset downtime abnormal log to obtain the downtime abnormal log.
The patch matching device can implement the method of the method embodiment of fig. 1-2, and the patch matching method of the embodiment shown in fig. 1-2 can be specifically referred to, and will not be described again.
Fig. 4 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 4, the apparatus includes: a memory 41 and a processor 42.
Memory 41 for storing a computer program and may be configured to store various other data to support operations on the computing device. Examples of such data include instructions for any application or method operating on a computing device, contact data, phonebook data, messages, pictures, videos, and the like.
A processor 42 coupled to the memory 41 for executing the computer program in the memory 41 for: obtaining a target downtime log generated after the target machine is downtime; extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log; acquiring call stacks and patch information of a plurality of candidate patches; determining a first similarity based on the lycenstant distance between the down call stack and the call stack of each candidate patch of the plurality of candidate patches, and determining a second similarity based on the cosine similarity between the down exception log and the patch information of each candidate patch of the plurality of candidate patches; a target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.
Further optionally, when the processor 42 determines the first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, the method specifically is used for:
Determining call relations between the downtime call stack and each kernel function and between the call stacks of the candidate patches;
determining weights of the down call stack and each kernel function in the call stacks of the candidate patches based on the call relationship between the down call stack and each kernel function in the call stacks of the candidate patches;
determining the Levenstat distance between the down call stack and the call stacks of the candidate patches based on the weight of each kernel function in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of each kernel function in the down call stack and the call stacks of the candidate patches;
and determining the first similarity based on the Levenstein distance between the down call stack and the call stacks of the candidate patches and the weight of each kernel function in the down call stack and the call stacks of the candidate patches.
Further optionally, the processor 42 is further configured to, when determining the first similarity, specifically configured to:
Determining the weight sum of the call stack of each candidate patch in the plurality of candidate patches and the down call stack based on the weight of each kernel function in the call stack of each candidate patch in the plurality of candidate patches and the down call stack;
and determining the first similarity based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches and the ratio between the weight sum of the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches.
Further optionally, when the processor 42 determines the second similarity based on cosine similarity between the downtime anomaly log and patch information of each candidate patch of the plurality of candidate patches, the method specifically is used for:
determining the downtime type of the downtime abnormal log, and extracting keyword information from the downtime abnormal log;
processing the downtime type and the keyword information according to a preset information combination mode to obtain downtime abnormal information in a preset format;
converting the downtime abnormal information in the preset format and the patch information of each candidate patch in the plurality of candidate patches into an embedded vector in a preset dimension through a statement vector conversion model;
And determining the second similarity based on the embedded vectors of the preset dimensions corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch information of each candidate patch in the plurality of candidate patches.
Further optionally, the patch information includes a patch title and patch content, the second similarity includes a first sub-similarity and a second sub-similarity, and the processor 42 is specifically configured to, when determining the second similarity based on an embedded vector of a preset dimension corresponding to the downtime anomaly information of the preset format and a cosine similarity between embedded vectors of preset dimensions corresponding to patch information of each candidate patch of the plurality of candidate patches:
determining the first sub-similarity based on the embedded vectors of the preset dimensions corresponding to the downtime anomaly information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch titles of the candidate patches;
and determining the second sub-similarity based on the embedded vectors of the preset dimension corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimension corresponding to the patch content of each candidate patch of the plurality of candidate patches.
Further optionally, the processor 42 is configured to, when determining the target patch for repairing the target machine from a plurality of candidate patches based on the first similarity and the second similarity, specifically:
normalizing the first similarity, the first sub-similarity and the second sub-similarity to obtain normalized first similarity, first sub-similarity and second sub-similarity;
determining the similarity between the downtime cause of the target machine and the plurality of candidate patches based on the normalized first similarity, the first sub-similarity and the second sub-similarity, and weights of the first similarity, the first sub-similarity and the second sub-similarity;
and determining a target patch for repairing the target machine from a plurality of candidate patches based on the similarity between the downtime cause of the target machine and the plurality of candidate patches.
Further optionally, when the processor 42 extracts the downtime call stack and the downtime exception log when the target machine is down from the target downtime log, the method is specifically used for:
extracting the downtime call stack and the initial downtime abnormal log from the target downtime log by using a regular expression;
Data cleaning is carried out on the initial downtime abnormal log so as to filter redundant information irrelevant to the downtime of the target machine from the initial downtime abnormal log;
removing stop words and c language keywords in the filtered initial downtime exception log, and extracting word stems from the initial downtime exception log after the removal operation;
and filling the initial downtime abnormal log after the stem extraction operation according to an output format of a preset downtime abnormal log to obtain the downtime abnormal log.
Further, as shown in fig. 4, the electronic device further includes: communication component 43, display 44, power component 45, audio component 46, and other components. Only some of the components are schematically shown in fig. 4, which does not mean that the electronic device only comprises the components shown in fig. 4. In addition, depending on the implementation form of the flow playback device, the components within the dashed box in fig. 4 are optional components, not necessarily optional components. For example, when the electronic device is implemented as a terminal device such as a smart phone, tablet computer, or desktop computer, the components within the dashed box in fig. 4 may be included; when the electronic device is implemented as a server-side device such as a conventional server, cloud server, data center, or server array, the components within the dashed box in fig. 4 may not be included.
According to the electronic equipment provided by the embodiment of the application, the target downtime log generated after the target machine is downtime can be obtained, the downtime call stack and the downtime exception log when the target machine is downtime are extracted from the target downtime log, the call stacks and the patch information of a plurality of candidate patches are obtained at the same time, the first similarity is determined based on the Levenstein distance between the downtime call stack and the call stack of each candidate patch in the plurality of candidate patches, the second similarity is determined based on the cosine similarity between the downtime exception log and the patch information of each candidate patch in the plurality of candidate patches, and finally the first similarity and the second similarity are integrated, so that the target patch for repairing the target machine is determined from the plurality of candidate patches.
Accordingly, embodiments of the present application also provide a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to implement the steps of the respective patch matching method embodiments of the above-described streams.
The communication assembly of 4 above is configured to facilitate wired or wireless communication between the device in which the communication assembly is located and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component may further include a Near Field Communication (NFC) module, radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and the like.
The memory of fig. 4 described above may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The display in fig. 4 described above includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation.
The power supply assembly shown in fig. 4 provides power for various components of the device in which the power supply assembly is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.
The audio component of fig. 4 described above may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (8)

1. A patch matching method, comprising:
obtaining a target downtime log generated after the target machine is downtime;
extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log;
Acquiring call stacks and patch information of a plurality of candidate patches;
determining call relations between the downtime call stack and each kernel function and between the call stacks of the candidate patches; determining weights of the down call stack and each kernel function in the call stacks of the candidate patches based on the call relationship between the down call stack and each kernel function in the call stacks of the candidate patches; determining the Levenstat distance between the down call stack and the call stacks of the candidate patches based on the weight of each kernel function in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of each kernel function in the down call stack and the call stacks of the candidate patches; determining the weight sum of the call stack of each candidate patch in the plurality of candidate patches and the down call stack based on the weight of each kernel function in the call stack of each candidate patch in the plurality of candidate patches and the down call stack; determining a first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, a ratio between the down call stack and a sum of weights of the call stacks of each candidate patch of the plurality of candidate patches, and determining a second similarity based on cosine similarity between the down exception log and patch information of each candidate patch of the plurality of candidate patches;
A target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.
2. The method of claim 1, wherein the determining a second similarity based on cosine similarity between the downtime anomaly log and patch information for each candidate patch of the plurality of candidate patches comprises:
determining the downtime type of the downtime abnormal log, and extracting keyword information from the downtime abnormal log;
processing the downtime type and the keyword information according to a preset information combination mode to obtain downtime abnormal information in a preset format;
converting the downtime abnormal information in the preset format and the patch information of each candidate patch in the plurality of candidate patches into an embedded vector in a preset dimension through a statement vector conversion model;
and determining the second similarity based on the embedded vectors of the preset dimensions corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch information of each candidate patch in the plurality of candidate patches.
3. The method of claim 2, wherein the patch information includes a patch title and patch content, the second similarity includes a first sub-similarity and a second sub-similarity, the determining the second similarity based on cosine similarity between an embedded vector of a preset dimension corresponding to the downtime anomaly information of the preset format and an embedded vector of a preset dimension corresponding to patch information of each candidate patch of the plurality of candidate patches includes:
Determining the first sub-similarity based on the embedded vectors of the preset dimensions corresponding to the downtime anomaly information of the preset format and the cosine similarity between the embedded vectors of the preset dimensions corresponding to the patch titles of the candidate patches;
and determining the second sub-similarity based on the embedded vectors of the preset dimension corresponding to the downtime abnormal information of the preset format and the cosine similarity between the embedded vectors of the preset dimension corresponding to the patch content of each candidate patch of the plurality of candidate patches.
4. The method of claim 3, wherein the determining a target patch for repairing the target machine from a plurality of candidate patches based on the first similarity and the second similarity comprises:
normalizing the first similarity, the first sub-similarity and the second sub-similarity to obtain normalized first similarity, first sub-similarity and second sub-similarity;
determining the similarity between the downtime cause of the target machine and the plurality of candidate patches based on the normalized first similarity, the first sub-similarity and the second sub-similarity, and weights of the first similarity, the first sub-similarity and the second sub-similarity;
And determining a target patch for repairing the target machine from a plurality of candidate patches based on the similarity between the downtime cause of the target machine and the plurality of candidate patches.
5. The method of any one of claims 1-4, wherein the extracting the downtime call stack and the downtime exception log from the target downtime log when the target machine is down comprises:
extracting the downtime call stack and the initial downtime abnormal log from the target downtime log by using a regular expression;
data cleaning is carried out on the initial downtime abnormal log so as to filter redundant information irrelevant to downtime of the target machine from the initial downtime abnormal log;
removing stop words and c language keywords in the filtered initial downtime exception log, and extracting word stems from the initial downtime exception log after the removal operation;
and filling the initial downtime abnormal log after the stem extraction operation according to an output format of a preset downtime abnormal log to obtain the downtime abnormal log.
6. A patch matching device, comprising:
the first acquisition module is used for acquiring a target downtime log generated after the target machine is downtime;
The extraction module is used for extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log;
the second acquisition module is used for acquiring call stacks and patch information of a plurality of candidate patches;
the similarity determining module is used for determining calling relations between the down calling stack, each kernel function in the calling stacks of the candidate patches and each kernel function; determining weights of the down call stack and each kernel function in the call stacks of the candidate patches based on the call relationship between the down call stack and each kernel function in the call stacks of the candidate patches; determining the Levenstat distance between the down call stack and the call stacks of the candidate patches based on the weight of each kernel function in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of each kernel function in the down call stack and the call stacks of the candidate patches; determining the weight sum of the call stack of each candidate patch in the plurality of candidate patches and the down call stack based on the weight of each kernel function in the call stack of each candidate patch in the plurality of candidate patches and the down call stack; determining a first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, a ratio between the down call stack and a sum of weights of the call stacks of each candidate patch of the plurality of candidate patches, and determining a second similarity based on cosine similarity between the down exception log and patch information of each candidate patch of the plurality of candidate patches;
And a patch determination module configured to determine a target patch for repairing the target machine from a plurality of candidate patches based on the first similarity and the second similarity.
7. An electronic device, comprising: a memory and a processor;
the memory is used for storing a computer program;
the processor, coupled to the memory, is configured to execute the computer program for:
obtaining a target downtime log generated after the target machine is downtime;
extracting a downtime call stack and a downtime abnormal log when the target machine is down from the target downtime log;
acquiring call stacks and patch information of a plurality of candidate patches;
determining call relations between the downtime call stack and each kernel function and between the call stacks of the candidate patches; determining weights of the down call stack and each kernel function in the call stacks of the candidate patches based on the call relationship between the down call stack and each kernel function in the call stacks of the candidate patches; determining the Levenstat distance between the down call stack and the call stacks of the candidate patches based on the weight of each kernel function in the down call stack and the call stacks of the candidate patches, the word frequency and the inverse document frequency of each kernel function in the down call stack and the call stacks of the candidate patches; determining the weight sum of the call stack of each candidate patch in the plurality of candidate patches and the down call stack based on the weight of each kernel function in the call stack of each candidate patch in the plurality of candidate patches and the down call stack; determining a first similarity based on a levenstein distance between the down call stack and a call stack of each candidate patch of the plurality of candidate patches, a ratio between the down call stack and a sum of weights of the call stacks of each candidate patch of the plurality of candidate patches, and determining a second similarity based on cosine similarity between the down exception log and patch information of each candidate patch of the plurality of candidate patches;
A target patch for repairing the target machine is determined from a plurality of candidate patches based on the first similarity and the second similarity.
8. A computer readable storage medium storing a computer program, which when executed by a processor causes the processor to implement the steps of the patch matching method of any one of claims 1 to 5.
CN202310484005.5A 2023-04-26 2023-04-26 Patch matching method, device, equipment and storage medium Active CN116225770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310484005.5A CN116225770B (en) 2023-04-26 2023-04-26 Patch matching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310484005.5A CN116225770B (en) 2023-04-26 2023-04-26 Patch matching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116225770A CN116225770A (en) 2023-06-06
CN116225770B true CN116225770B (en) 2023-10-20

Family

ID=86580848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310484005.5A Active CN116225770B (en) 2023-04-26 2023-04-26 Patch matching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116225770B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647418A (en) * 2019-09-12 2020-01-03 努比亚技术有限公司 Exception handling method, server and mobile terminal
CN112395616A (en) * 2019-08-15 2021-02-23 奇安信安全技术(珠海)有限公司 Vulnerability processing method and device and computer equipment
CN113742119A (en) * 2021-07-26 2021-12-03 上海闻泰信息技术有限公司 Call stack backtracking method and device of embedded system and computer equipment
CN114064472A (en) * 2021-11-12 2022-02-18 天津大学 Automatic software defect repairing and accelerating method based on code representation
WO2022111262A1 (en) * 2020-11-25 2022-06-02 北京金山云网络技术有限公司 Hotfix generation method and apparatus, server, and machine readable storage medium
CN115186001A (en) * 2022-07-13 2022-10-14 阿里巴巴(中国)有限公司 Patch processing method and device
CN115269288A (en) * 2022-07-13 2022-11-01 阿里巴巴(中国)有限公司 Fault determination method, device, equipment and storage medium
CN115455961A (en) * 2022-09-21 2022-12-09 中国第一汽车股份有限公司 Text processing method, device, equipment and medium
CN115587029A (en) * 2022-09-28 2023-01-10 中国电信股份有限公司 Patch detection method and device, electronic equipment and computer readable medium
CN115640155A (en) * 2022-09-16 2023-01-24 南京航空航天大学 Program automatic repairing method and system based on statement dependence and patch similarity
CN115713772A (en) * 2022-09-08 2023-02-24 东南大学 Transformer substation panel character recognition method, system, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5152206B2 (en) * 2008-02-21 2013-02-27 富士通株式会社 Patch candidate selection device, patch candidate selection program, and patch candidate selection method
WO2015163931A1 (en) * 2014-04-24 2015-10-29 Hewlett-Packard Development Company, L.P. Dynamically applying a patch to a computer application

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395616A (en) * 2019-08-15 2021-02-23 奇安信安全技术(珠海)有限公司 Vulnerability processing method and device and computer equipment
CN110647418A (en) * 2019-09-12 2020-01-03 努比亚技术有限公司 Exception handling method, server and mobile terminal
WO2022111262A1 (en) * 2020-11-25 2022-06-02 北京金山云网络技术有限公司 Hotfix generation method and apparatus, server, and machine readable storage medium
CN113742119A (en) * 2021-07-26 2021-12-03 上海闻泰信息技术有限公司 Call stack backtracking method and device of embedded system and computer equipment
CN114064472A (en) * 2021-11-12 2022-02-18 天津大学 Automatic software defect repairing and accelerating method based on code representation
CN115186001A (en) * 2022-07-13 2022-10-14 阿里巴巴(中国)有限公司 Patch processing method and device
CN115269288A (en) * 2022-07-13 2022-11-01 阿里巴巴(中国)有限公司 Fault determination method, device, equipment and storage medium
CN115713772A (en) * 2022-09-08 2023-02-24 东南大学 Transformer substation panel character recognition method, system, equipment and storage medium
CN115640155A (en) * 2022-09-16 2023-01-24 南京航空航天大学 Program automatic repairing method and system based on statement dependence and patch similarity
CN115455961A (en) * 2022-09-21 2022-12-09 中国第一汽车股份有限公司 Text processing method, device, equipment and medium
CN115587029A (en) * 2022-09-28 2023-01-10 中国电信股份有限公司 Patch detection method and device, electronic equipment and computer readable medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Linux目标代码内核补丁的机理和应用;胡勇其,匡先锋,侯紫峰;计算机工程与应用(第32期);全文 *
基于自适应热补丁的Android内核漏洞生态修复方案;张煜龙;陈越;包沉浮;夏良钊;郑龙日;卢永强;韦韬;;中国教育网络(第10期);全文 *
胡勇其,匡先锋,侯紫峰.Linux目标代码内核补丁的机理和应用.计算机工程与应用.2006,(第32期),全文. *

Also Published As

Publication number Publication date
CN116225770A (en) 2023-06-06

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
JP7282940B2 (en) System and method for contextual retrieval of electronic records
US20230161971A1 (en) Dynamically Updated Text Classifier
AU2019263758B2 (en) Systems and methods for generating a contextually and conversationally correct response to a query
CN105989040B (en) Intelligent question and answer method, device and system
US11816441B2 (en) Device and method for machine reading comprehension question and answer
US9239875B2 (en) Method for disambiguated features in unstructured text
US11163936B2 (en) Interactive virtual conversation interface systems and methods
US10810215B2 (en) Supporting evidence retrieval for complex answers
US8731930B2 (en) Contextual voice query dilation to improve spoken web searching
CN107368489B (en) Information data processing method and device
US11887011B2 (en) Schema augmentation system for exploratory research
CN111538903B (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
US11379527B2 (en) Sibling search queries
CN109684357B (en) Information processing method and device, storage medium and terminal
US20230090601A1 (en) System and method for polarity analysis
US20230274161A1 (en) Entity linking method, electronic device, and storage medium
CN116108181A (en) Client information processing method and device and electronic equipment
CN116225770B (en) Patch matching method, device, equipment and storage medium
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN110858214B (en) Recommendation model training and further auditing program recommendation method, device and equipment
CN113449078A (en) Similar news identification method, equipment, system and storage medium
CN114764437A (en) User intention identification method and device and electronic equipment
Ziolkowski Vox populism: Analysis of the anti-elite content of presidential candidates’ speeches
CN116306616A (en) Method and device for determining keywords of text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant