CN112039907A - Automatic testing method and system based on Internet of things terminal evaluation platform - Google Patents

Automatic testing method and system based on Internet of things terminal evaluation platform Download PDF

Info

Publication number
CN112039907A
CN112039907A CN202010916739.2A CN202010916739A CN112039907A CN 112039907 A CN112039907 A CN 112039907A CN 202010916739 A CN202010916739 A CN 202010916739A CN 112039907 A CN112039907 A CN 112039907A
Authority
CN
China
Prior art keywords
log
template
analysis
words
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010916739.2A
Other languages
Chinese (zh)
Inventor
张治中
温鹏瑜
邓炳光
禹斯译
沈艳
程方
郑丹玲
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010916739.2A priority Critical patent/CN112039907A/en
Publication of CN112039907A publication Critical patent/CN112039907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Abstract

The invention relates to an automatic testing method and system based on an Internet of things terminal evaluation platform, and belongs to the field of Internet of things safety testing. The method comprises the following steps: s1: extracting a template tree based on the incidence relation: carrying out primary filtering and cleaning on the original log by using a set rule; segmenting the log after the initial cleaning by taking a blank space as a mark, and distinguishing parameter words and template words; extracting a common log template library and continuously updating the template library; s2: and (3) anomaly detection analysis: firstly, extracting a characteristic value of a log according to a correlation analysis method for an extracted template; then, clustering by using a K-Means algorithm; finally, anomaly detection is performed using the modified LSTM model, and statistical analysis is performed on the log events after anomaly analysis. The method is suitable for detecting real-time known attacks and judging unknown attacks, and improves the security test of the Internet of things terminal evaluation platform.

Description

Automatic testing method and system based on Internet of things terminal evaluation platform
Technical Field
The invention belongs to the field of security testing of the Internet of things, and relates to an automatic testing method and system based on a terminal evaluation platform of the Internet of things.
Background
With the formal business of the fifth-generation mobile communication technology (5G) in China, the industry of the Internet of things is about to enter the innovation and development period, and the Internet of things provides an open sharing platform for massive information resources, service resources and application resources. Through the Internet of things platform, all users can interact with the equipment within the authority range, and Internet of things resources are widely utilized. However, while the internet of things promotes the industry development and facilitates the life of the people, the security evaluation aspect of the platform is not mature, and the attack of the internet of things becomes one of the hot security topics.
The terminal of the internet of things is large in scale, the vulnerabilities of the terminal evaluation platform of the internet of things are increased, and an attacker can attack the terminal according to the vulnerabilities. Due to the characteristics of heterogeneous multi-domain sharing, coexistence of massive nodes and the like, the Internet of things is more easily attacked by DoS/DDoS attacks, password brute force cracking, foreground page injection and the like.
The patent application with the publication number of CN111372247A (the name: a terminal security access method and a terminal security access system based on the narrowband Internet of things) requires the acquisition of a platform public key and a private key, a first digital signature, a second digital signature and the like of an electric power Internet of things management platform, an authentication mechanism faces a large amount of butt joint and coordination work, the process is complex, and the expansion of the scale of the Internet of things terminal is greatly hindered. The disclosed system effectively records and transmits device logs and forwards the device logs to a log server or a cloud terminal for recording though the system is a patent with publication number CN106789153A (name: multi-channel self-adaptive log recording and outputting method and system of terminal equipment of the Internet of things system); but does not have the capability of anomaly detection and analysis, and if the anomaly occurs, the anomaly cannot be found and alarmed in time.
The log file is used as an important means for recording the health state of the platform, the analysis of the system log is particularly important for the state of the whole platform, and the direct extraction of network events from massive logs is extremely challenging. The reasons for this are the following four points: (1) massive unstructured logs are obtained; (2) the level used to identify log events cannot accurately indicate the event type and cannot be used directly for anomaly detection; (3) the log content is complex, parameters and template words are contained, and the accurate meaning in the log can be found only by prior knowledge; (4) the log pattern is constantly changing.
In order to solve the above problems, an automatic testing method based on an internet of things terminal evaluation platform is urgently needed to detect the health condition of the platform in time.
Disclosure of Invention
In view of the above, the invention aims to provide an automatic testing method and system based on an internet of things terminal evaluation platform, which design a tree template extraction method based on an incidence relation, rapidly extract a new log, continuously update a template library, provide an abnormality detection analysis method combining K-Means and LSTM, and timely detect the health condition of the platform.
In order to achieve the purpose, the invention provides the following technical scheme:
1. an automatic testing method based on an Internet of things terminal evaluation platform specifically comprises the following steps:
s1: extracting a template tree based on the incidence relation: carrying out primary filtering and cleaning on the original log by using a set rule; segmenting the log after the initial cleaning by taking a blank space as a mark, and distinguishing parameter words and template words; extracting a common log template library and continuously updating the template library;
s2: and (3) anomaly detection analysis: for the preprocessed data, firstly, an event library is constructed based on a large number of normal logs, and the abnormal degree of the logs to be detected and a normal log template library is calculated to detect whether the logs are abnormal logs or not; if the log is an abnormal log, judging whether the log is a known abnormal event and type, and if not, adding the log to be detected as a new type of abnormal attack into an abnormal template library.
Firstly, extracting a characteristic value of a log according to a correlation analysis method for an extracted template; then, clustering by using a K-Means algorithm; finally, the improved LSTM model is used for executing the abnormity detection, and the log events after the abnormity analysis are subjected to statistical analysis and are displayed more intuitively in a chart form.
Further, in step S1, the filtering, splitting, and saving the log specifically includes: although the log formats are various, fixed short texts exist, each word in the log is separated by a space, the log can be segmented by using the space as a mark, and the obtained text words comprise parameter words and template words; and storing the segmented log information in an array, and distinguishing parameter words and template words by using the subscript of the array.
Further, in step S1, distinguishing the parameter words from the template words specifically includes: distinguishing according to the fact that the probability of the template words is larger than that of the parameter words; the template word will typically appear in the same position on a message of equal length. Thus during the word segmentation the position of the word and the length (p, len) of the message are recorded. Using a calculation formula based on conditional probability to obtain the possibility of each word as a template word, wherein the probability is used as the Score of the word;
the formula of the judgment standard is as follows:
Score(word,p,len)=P(word|p,len)
wherein, P represents the probability that the word appears at the P position of the whole word, and the probability is taken as the score; len denotes the message length.
Further, in step S1, constructing a template tree based on the association relationship specifically includes: the method comprises the steps of recording the word frequency of a root node from the root node, then recording the word frequency of a child node below each node, stopping a log template from the root node to the child node if the word frequency quantity of a certain word is the same as the word frequency of a parent node or a grandparent node in the same layer of child nodes of the same father node, and deleting the layer and the child nodes.
Further, in step S1, autonomously updating the template library specifically includes: comparing the newly entered log with the template tree in the original log template library after being processed; because the threshold is used as the standard evaluation when selecting the template word in the previous stage, a maximum approximation value is used in the comparison process in the later stage to represent the approximation degree of the newly entered template and the template library, and is represented as follows:
Logs(N,M)=Nx/Mx
wherein x represents the class x template, NxRepresenting a newly added log template tree, MxRepresenting an original log template library; if the highest value of Logs is greater than or equal to the threshold value, classifying the input Logs into the xth class; otherwise, a new template library is created for x.
Further, in step S2, selecting a time window and creating a feature value specifically includes: the method comprises the steps that a terminal evaluation platform of the Internet of things can generate a large amount of data in a short time, a time window for extracting log blocks is determined, different types of logs generated by events occurring on the platform need to be sorted according to corresponding time stamps in the time window, and the logs with the same time stamps are combined; the characteristics of log entries, periodicity, average occurrence time, frequency and the like which can characterize the events in various aspects are calculated and used as vectors to characterize the events.
Further, in step S2, the template library generation based on clustering specifically includes: using a log matrix with characteristic values extracted and backward quantized as experimental data, and adopting a Principal Component Analysis (PCA) method to realize dimension reduction operation on the experimental data; the dimensionality reduction operation plays an important role in preprocessing, and after relevant characteristic attributes are extracted, the PCA is used for dimensionality reduction to compress the data set, so that the calculation time is reduced.
Further, in step S2, the abnormality detection model based on LSTM: clustering to obtain a template library, and performing anomaly detection by using an improved LSTM model; using Mean Square Error (MSE) as a loss function to describe the difference between the predicted and true values; the improved LSTM model is to add an embedding layer between the input layer and the hidden layer.
2. An automatic test system based on an internet of things terminal evaluation platform comprises: the system comprises a data acquisition module, a log analysis module, an abnormality detection module and an alarm module;
the log analysis module is used for preprocessing the collected log data generated by the Internet of things terminal evaluation platform and comprises four steps of template extraction, feature extraction, PCA dimension reduction and clustering; the original logs are large in quantity and unstructured, the original logs need to be processed, and the logs are converted into log templates, so that the number of the logs is reduced sharply, and basic semantics are guaranteed. A log template and an original log obtained by a log analysis template extraction model are used as mapping tables, and characteristic values capable of representing events are extracted to form characteristic vectors; vectorizing the detection log, and performing cluster analysis;
the abnormal detection module is used for carrying out abnormal detection on the processed log data suitable for safety analysis, firstly, an event library is built based on a large number of normal logs, and the abnormal degree of the log to be detected and a normal log template library is calculated to detect whether the log is an abnormal log; if the log is an abnormal log, firstly judging whether the log is a known abnormal event and type, and if not, adding the log to be detected as a new type of abnormal attack into an abnormal template library;
the alarm module is used for carrying out statistical analysis on the log events after the abnormal analysis and displaying the log events in a chart form more intuitively.
The invention has the beneficial effects that: the invention can quickly extract new logs, continuously update the template library, perform clustering by using K-Means, predict the LSTM and timely detect the health condition of the platform, so that the method has industrial utilization value. The method and the device can detect the known attack in real time, judge the unknown attack and improve the security test of the Internet of things terminal evaluation platform.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is an overall block diagram of an automatic test system of the present invention;
FIG. 2 is a schematic block diagram of constructing a log tree;
FIG. 3 is a functional block diagram of a converted log tree;
FIG. 4 is a diagram of a library of known event templates;
FIG. 5 is a network architecture diagram of the improved LSTM model.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Referring to fig. 1 to 5, fig. 1 is an overall frame diagram of an automatic testing system based on an internet of things terminal evaluation platform provided by the present invention, and as shown in fig. 1, the system sequentially includes, from bottom to top, a data acquisition module, a log analysis module, an anomaly detection module, and an alarm module. The log analysis module is used for preprocessing the collected log data generated by the Internet of things terminal evaluation platform and comprises four steps of template extraction, feature extraction, PCA dimension reduction and clustering. The original logs are large in quantity and unstructured, the original logs need to be processed, and the logs are converted into log templates, so that the number of the logs is reduced sharply, and basic semantics are guaranteed. And (3) making a mapping table by using a log template and an original log obtained by the log analysis template extraction model, and extracting a characteristic value capable of representing an event to form a characteristic vector. And vectorizing the detection log, and performing cluster analysis.
And carrying out abnormity detection on the processed log data suitable for safety analysis. The abnormal detection module is the core of the framework, and for the preprocessed data, the abnormal detection module firstly constructs an event library based on a large number of normal logs and calculates the abnormal degree of the logs to be detected and the normal log template library to detect whether the logs are abnormal logs. If the log is an abnormal log, judging whether the log is a known abnormal event and type, and if not, adding the log to be detected as a new type of abnormal attack into an abnormal template library. The alarm module is mainly used for carrying out statistical analysis on the log events after the abnormal analysis and displaying the log events in a chart form more intuitively.
The automatic test method of the system specifically comprises the following steps:
s1: extracting a template tree based on the incidence relation: carrying out primary filtering and cleaning on the original log by using a set rule; segmenting the log after the initial cleaning by taking a blank space as a mark, and distinguishing parameter words and template words; and extracting a common log template library and continuously updating the template library. Wherein the content of the first and second substances,
1) and filtering, splitting and storing the log: and the logs generated by the terminal evaluation platform of the Internet of things are used for representing events occurring in the system, such as user access, DoS/DDoS attack, login failure and the like. The log structure comprises a timestamp, a Host, a process ID and an event description, the log format depends on the service type or a provider, the formats are various, so the syntax and the semantics are different, and the log format can be updated at any time.
The invention provides an online template extraction method, which can realize dynamic update of a log template library even if the log template changes. Although the log format is various, there is a fixed short text, in which the content in the Message has many correlations to extract log information about ssh, as shown in table 1 below.
TABLE 1 shh Log information
Figure BDA0002665282700000051
Figure BDA0002665282700000061
As can be seen from table 1, all log entries about ssh contain the word "ssh", and the content in the Message is a description of the network event and describes the login success and failure reason. A plurality of sub-types can be extracted from the Message, namely the whole log is hierarchical, and the integration of the sub-types can express the events.
The log text contains template words and parameter words, and the template words form a template. If a template is to be extracted, the changing parameter words may be replaced with a general reference, as shown in Table 2 below. And segmenting the log text, and filtering and deleting the variable parameters.
Table 2 filtering raw logs
Figure BDA0002665282700000062
Each word in the log is separated by a space, the space can be used as a mark to cut words in the log, and the obtained text words comprise parameter words and template words. The difference in the same template is the difference in the parameter words, which exist at the same position in the log messages of the same length. Therefore, the segmented log information is stored in the array, and the parameter words and the template words can be accurately distinguished by using the subscript of the array.
2) And (3) judging template words: the template word is also text and is distinguished from the parameter word by a high probability of occurrence. The template word is scored according to the probability of the template word appearing at a specific position of the log text, the probability value is used as the score value, and the judgment standard formula is as follows:
Score(word,p,len)=P(word|p,len)
where P is the probability that the word appears in the P position of the whole word, and this probability is taken as its score. By definition, the probability of a template word is greater than that of a parameter word.
3) Constructing a template tree based on the incidence relation: as shown in fig. 2, a log template tree is constructed by taking one type of log as an example, a system host name and a log type are taken as parent nodes, then frequently-occurring words are sequentially added as child nodes, and then the process is repeated until all contents are added into the log tree, and the operation shows the log in a clearer form according to the relation of log information.
Since the log template may branch from a certain node during the extraction process, if too many branches are needed to delete the branches as child nodes, the template is terminated to the parent nodes of the nodes. As shown in fig. 2, "web 1 sshd Received disconnected from", two templates are merged into one, and the remaining template word appears only once in the template, so that the matching time is reduced while the template is accurately extracted.
The method comprises the steps of recording the word frequency of a root node from the root node, then recording the word frequency of a child node below each node, stopping a log template in the direction if the word frequency quantity of a certain word is the same as the word frequency of a parent node or a grandparent node in the same layer of child nodes of the same father node, and deleting the layer and the child nodes. The log in fig. 2 can be converted into the log template shown in fig. 3 by using the method, and the template structure of the example of the log is shown in table 3.
TABLE 3 Log template
Figure BDA0002665282700000071
4) And (3) updating the template library automatically: the newly entered log, after being processed, is compared to the template tree in the original template library. Since the template word is selected in the previous stage by using the threshold as the standard evaluation, a maximum approximation value is also required in the later comparison process to represent the approximation degree of the newly entered template and the template library, which is expressed as follows:
Logs(N,M)=Nx/Mx
wherein x represents the class x template, NxRepresenting a newly added log template tree, MxRepresenting the original log template library. If the highest value of Logs is greater than or equal to the threshold, then the incoming log is classified into class x. Otherwise, a new template library is created for x.
S2: and (3) anomaly detection analysis: for the preprocessed data, firstly, an event library is constructed based on a large number of normal logs, and the abnormal degree of the logs to be detected and a normal log template library is calculated to detect whether the logs are abnormal logs or not; if the log is an abnormal log, judging whether the log is a known abnormal event and type, and if not, adding the log to be detected as a new type of abnormal attack into an abnormal template library.
Firstly, extracting a characteristic value of a log according to a correlation analysis method for an extracted template; then, clustering by using a K-Means algorithm; finally, the improved LSTM model is used for executing the abnormity detection, and the log events after the abnormity analysis are subjected to statistical analysis and are displayed more intuitively in a chart form. Wherein the content of the first and second substances,
1) selecting a time window and creating a feature value: the internet of things terminal evaluation platform can generate a large amount of data in a short time, a time window for extracting log blocks is determined, different types of logs generated by events occurring on the platform need to be sorted according to corresponding time stamps in the time window, and the logs with the same time stamps are combined. The characteristics of log entries, periodicity, average occurrence time, frequency and the like which can characterize the events in various aspects are calculated and used as vectors to characterize the events.
And log entry: events occur in the platform, which may generate different numbers of log entries and logs in different locations and formats, and there may be only one log row for an event, but some events may generate log blocks in multiple rows, different locations and different formats.
The periodicity is as follows: some jobs that require polling operations may generate periodic logs, such as operations that require constant polling of a query database, which are generally not related to system failures, but may take into account various granularities over a period of time.
Average occurrence time: the event has its own occurrence time and the log entries generated by the platform, and the average time required for each log to occur is used as a mark point of the event.
Frequency: each log message appears in different periodicity, and is judged by combining a correlation rule method based on statistics, if an error event occurs once, the result of the error can be judged, but if the same error event occurs more than 10 times, whether an abnormal event occurs in the system or not is considered, such as a DoS/DDoS attack, a ssh brute force cracking event and the like.
Message level: the log messages are ranked, representing the severity of the event, from low to high, as debug, info, war, error, false, and more debug and info usually appear in the log messages.
These features, which can characterize the event in multiple ways, are computed as vectors to characterize the event.
2) And (3) generating a template library based on clustering: the anomaly detection method utilizes a log matrix which is subjected to characteristic value extraction and backward quantization as experimental data, and adopts a Principal Component Analysis (PCA) method to realize the dimension reduction operation on the data. The dimensionality reduction operation plays an important role in preprocessing, and after relevant characteristic attributes are extracted, the PCA is used for dimensionality reduction to compress the data set, so that the calculation time is reduced.
The K-Means algorithm is a clustering algorithm in unsupervised learning, and is an algorithm for selecting a sample point closest to a mean value of a clustering center through continuous iteration. Automatic test methods require active learning, and re-clustering when a new attack type is detected. Since K-Means needs to specify the K value in advance, the K value is added with 1 when clustering is performed again, and the requirement of increasing the event types is met. In the learning and training process, the K-Means algorithm flow needs to be changed according to needs, and if a certain attack type does not belong to the existing type in the template library, the newly-appeared attack type is marked so as to increase the attack template library. As shown in fig. 4, since the k value needs to be specified explicitly at the input end of the algorithm, in the training process, if the log data at the input end is found to represent a new attack event, the template library needs to be updated. And at the moment, carrying out iterative operation again, calling the K-Means algorithm again, increasing the K value by 1, and putting the newly-appeared event back to the learning training stage, thereby obtaining the generation process of the template library.
3) LSTM-based anomaly detection model: as shown in FIG. 5, a template library is obtained after clustering, an improved LSTM model is used for executing anomaly detection, and an Embedding layer is added on the basis of a common LSTM network structure. The dimension of the input layer is n, the dimension of the parameter vector is represented, the dimension of the output layer is also n, the number of the hidden layer layers is l, and the hidden layer is used for memorizing and storing the number of nodes of the past state. The number of LSTM units in each layer is alpha, the time step of the LSTM model is h, and the number of used historical log messages is represented.
The difference between the predicted and true values is described using Mean Square Error (MSE) as a loss function. The MSE is generally used to measure the difference between the estimated value of a parameter and the actual value of the parameter, and for two vectors x and y of the same length, the MSE is calculated as follows:
Figure BDA0002665282700000091
where N is the dimension of the vector.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (9)

1. An automatic testing method based on an Internet of things terminal evaluation platform is characterized by specifically comprising the following steps:
s1: extracting a template tree based on the incidence relation: carrying out primary filtering and cleaning on the original log by using a set rule; segmenting the log after the initial cleaning by taking a blank space as a mark, and distinguishing parameter words and template words; extracting a common log template library and continuously updating the template library;
s2: and (3) anomaly detection analysis: firstly, extracting a characteristic value of a log according to a correlation analysis method for an extracted template; then, clustering by using a K-Means algorithm; finally, anomaly detection is performed using the modified LSTM model, and statistical analysis is performed on the log events after anomaly analysis.
2. The automatic testing method according to claim 1, wherein in step S1, the filtering, splitting and saving the log specifically includes: segmenting the log by using a space as a mark, wherein the obtained text words comprise parameter words and template words; and storing the segmented log information in an array, and distinguishing parameter words and template words by using the subscript of the array.
3. The automatic testing method according to claim 1 or 2, wherein in step S1, distinguishing the parameter words from the template words specifically comprises: distinguishing according to the fact that the probability of the template words is larger than that of the parameter words; using a calculation formula based on conditional probability to obtain the possibility of each word as a template word, wherein the probability is used as the Score of the word;
the formula of the judgment standard is as follows:
Score(word,p,len)=P(word|p,len)
where P represents the probability that the word appears in the P position of the whole word and len represents the message length.
4. The automatic testing method according to claim 1, wherein in step S1, constructing a template tree based on association relations specifically includes: the method comprises the steps of recording the word frequency of a root node from the root node, then recording the word frequency of a child node below each node, stopping a log template from the root node to the child node if the word frequency quantity of a certain word is the same as the word frequency of a parent node or a grandparent node in the same layer of child nodes of the same father node, and deleting the layer and the child nodes.
5. The automatic testing method according to claim 1 or 4, wherein in step S1, autonomously updating the template library specifically includes: comparing the newly entered log with the template tree in the original log template library after being processed; the threshold value is used as standard selection when selecting the template words in the early stage, and a maximum approximate value is adopted in the comparison process in the later stage, which is used for representing the approximation degree of the newly entered template and the template library and is represented as follows:
Logs(N,M)=Nx/Mx
wherein x represents the class x template, NxRepresenting a newly added log template tree, MxRepresenting an original log template library; if the highest value of Logs is greater than or equal to the threshold value, classifying the input Logs into the xth class; otherwise, a new template library is created for x.
6. The automatic testing method according to claim 1 or 5, wherein in step S2, selecting a time window and creating a feature value specifically comprises: determining a time window extracted by a log block, sequencing different types of logs generated by events generated by a platform according to corresponding time stamps in the time window, and merging the logs with the same time stamps; and calculating the characteristics for representing the event as a vector to represent the event.
7. The automatic testing method according to claim 1 or 5, wherein in step S2, the clustering-based template library generation specifically includes: and (4) using the log matrix of which the characteristic values are extracted and then quantized as experimental data, and adopting a principal component analysis method to realize dimension reduction operation on the experimental data.
8. The automatic test method according to claim 7, wherein in step S2, based on the LSTM anomaly detection model: clustering to obtain a template library, and performing anomaly detection by using an improved LSTM model; using the mean square error as a loss function to describe the difference between the predicted value and the true value; the improved LSTM model is to add an embedding layer between the input layer and the hidden layer.
9. The utility model provides an automatic test system based on thing networking terminal evaluation platform which characterized in that, this system includes: the system comprises a data acquisition module, a log analysis module, an abnormality detection module and an alarm module;
the log analysis module is used for preprocessing the collected log data generated by the Internet of things terminal evaluation platform and comprises four steps of template extraction, feature extraction, PCA dimension reduction and clustering; a log template and an original log obtained by a log analysis template extraction model are used as mapping tables, and characteristic values capable of representing events are extracted to form characteristic vectors; vectorizing the detection log, and performing cluster analysis;
the abnormal detection module is used for carrying out abnormal detection on the processed log data suitable for safety analysis, firstly, an event library is built based on a large number of normal logs, and the abnormal degree of the log to be detected and a normal log template library is calculated to detect whether the log is an abnormal log; if the log is an abnormal log, firstly judging whether the log is a known abnormal event and type, and if not, adding the log to be detected as a new type of abnormal attack into an abnormal template library;
the alarm module is used for carrying out statistical analysis on the log events after the abnormal analysis and displaying the log events in a chart form more intuitively.
CN202010916739.2A 2020-09-03 2020-09-03 Automatic testing method and system based on Internet of things terminal evaluation platform Pending CN112039907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010916739.2A CN112039907A (en) 2020-09-03 2020-09-03 Automatic testing method and system based on Internet of things terminal evaluation platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010916739.2A CN112039907A (en) 2020-09-03 2020-09-03 Automatic testing method and system based on Internet of things terminal evaluation platform

Publications (1)

Publication Number Publication Date
CN112039907A true CN112039907A (en) 2020-12-04

Family

ID=73591950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010916739.2A Pending CN112039907A (en) 2020-09-03 2020-09-03 Automatic testing method and system based on Internet of things terminal evaluation platform

Country Status (1)

Country Link
CN (1) CN112039907A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579414A (en) * 2020-12-08 2021-03-30 西安邮电大学 Log abnormity detection method and device
CN113760645A (en) * 2021-03-10 2021-12-07 京东科技控股股份有限公司 System operation log monitoring method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101807961B1 (en) * 2016-06-07 2017-12-11 한양대학교 산학협력단 Method and apparatus for processing speech signal based on lstm and dnn
CN109923557A (en) * 2016-11-03 2019-06-21 易享信息技术有限公司 Use continuous regularization training joint multitask neural network model
CN110096411A (en) * 2019-03-22 2019-08-06 西安电子科技大学 Log template rapid extracting method and system based on association analysis and time window
CN111600905A (en) * 2020-06-01 2020-08-28 广州鹄志信息咨询有限公司 Anomaly detection method based on Internet of things

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101807961B1 (en) * 2016-06-07 2017-12-11 한양대학교 산학협력단 Method and apparatus for processing speech signal based on lstm and dnn
CN109923557A (en) * 2016-11-03 2019-06-21 易享信息技术有限公司 Use continuous regularization training joint multitask neural network model
CN110096411A (en) * 2019-03-22 2019-08-06 西安电子科技大学 Log template rapid extracting method and system based on association analysis and time window
CN111600905A (en) * 2020-06-01 2020-08-28 广州鹄志信息咨询有限公司 Anomaly detection method based on Internet of things

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常二慧: ""基于日志分析的物联网平台异常检测方法及系统"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579414A (en) * 2020-12-08 2021-03-30 西安邮电大学 Log abnormity detection method and device
CN113760645A (en) * 2021-03-10 2021-12-07 京东科技控股股份有限公司 System operation log monitoring method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107294993B (en) WEB abnormal traffic monitoring method based on ensemble learning
CN111506478A (en) Method for realizing alarm management control based on artificial intelligence
CN111107072B (en) Authentication graph embedding-based abnormal login behavior detection method and system
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN114465874B (en) Fault prediction method, device, electronic equipment and storage medium
CN110297207A (en) Method for diagnosing faults, system and the electronic device of intelligent electric meter
CN112039907A (en) Automatic testing method and system based on Internet of things terminal evaluation platform
CN113918367A (en) Large-scale system log anomaly detection method based on attention mechanism
CN112769605A (en) Heterogeneous multi-cloud operation and maintenance management method and hybrid cloud platform
Pal et al. DLME: distributed log mining using ensemble learning for fault prediction
CN117220920A (en) Firewall policy management method based on artificial intelligence
CN112506750A (en) Big data processing system for mass log analysis and early warning
CN115277113A (en) Power grid network intrusion event detection and identification method based on ensemble learning
CN116167370A (en) Log space-time characteristic analysis-based distributed system anomaly detection method
CN111475380B (en) Log analysis method and device
KR101621959B1 (en) Apparatus for extracting and analyzing log pattern and method thereof
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster
Lin et al. Dcsa: Using density-based clustering and sequential association analysis to predict alarms in telecommunication networks
CN115080286A (en) Method and device for discovering log exception of network equipment
KR102470364B1 (en) A method for generating security event traning data and an apparatus for generating security event traning data
Liu et al. The runtime system problem identification method based on log analysis
CN117792801B (en) Network security threat identification method and system based on multivariate event analysis
Zhao et al. Multi-stage Location for Root-Cause Metrics in Online Service Systems
CN117540372B (en) Database intrusion detection and response system for intelligent learning
CN117473571B (en) Data information security processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201204