CN116796311A

CN116796311A - Mall system intrusion data analysis method and system based on AI

Info

Publication number: CN116796311A
Application number: CN202310715896.0A
Authority: CN
Inventors: 方爱凤; 邱军; 张加强
Original assignee: Guangzhou Xingdejiu Network Technology Co ltd
Current assignee: Guangzhou Xingdejiu Network Technology Co ltd
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-22

Abstract

The invention relates to the technical field of information security, in particular to an AI-based mall system intrusion data analysis method and system. The method comprises the following steps: acquiring the system log information of the mall, and performing intrusion detection on the system log information of the mall so as to acquire intrusion data of the mall system; performing dimension reduction processing on the mall system intrusion data and constructing an AI intrusion classification model; acquiring historical file data of the mall system, and calculating the historical file data of the mall system through a system file damage coefficient formula so as to acquire the historical damage coefficient data of the system; acquiring current file data of the mall system, and calculating the current file data of the mall system through a system file damage coefficient formula so as to acquire the current damage coefficient data of the system; classifying and calculating current damage coefficient data of the system according to historical damage coefficient data of the system, so as to obtain to-be-suspicious detection data; the invention realizes effective identification of the intrusion data of the mall system based on AI.

Description

Mall system intrusion data analysis method and system based on AI

Technical Field

The invention relates to the technical field of information security, in particular to an AI-based mall system intrusion data analysis method and system.

Background

Mall systems face increasing risks of network attacks and intrusion in networked and intelligent environments. The prevention and management requirements of the invasion risk of the mall system are difficult to meet based on the traditional invasion detection method. Therefore, the search for a faster and more accurate network intrusion data analysis and protection method becomes one of the important schemes for the security of the mall system.

Disclosure of Invention

The application provides an AI-based mall system intrusion data analysis method for solving at least one technical problem.

The application provides an AI-based mall system intrusion data analysis method, which comprises the following steps:

step S1: acquiring the system log information of the mall, and performing intrusion detection on the system log information of the mall so as to acquire intrusion data of the mall system;

step S2: performing dimension reduction processing on the mall system intrusion data and constructing an AI intrusion classification model;

step S3: acquiring historical file data of a mall system, and calculating the historical file data of the mall system through a system file damage coefficient formula to acquire the historical damage coefficient data of the mall system, wherein the system file damage coefficient formula specifically comprises:

Wherein C is a system file damage coefficient, n ₁ The number of samples of the historical file data is h, the number of samples of the historical file data is w _h Weighting the h historical file data, x _h For the number of times the h file fails, y _h The total number of normal operation times of the h file is given, and q is the weight of the damage degree;

step S4: acquiring current file data of the mall system, and calculating the current file data of the mall system through a system file damage coefficient formula so as to acquire the current damage coefficient data of the system;

step S5: classifying and calculating current damage coefficient data of the system according to historical damage coefficient data of the system, so as to obtain to-be-suspicious detection data;

step S6: extracting transmission data quantity characteristics and HTTP protocol information characteristics of the log information of the mall system, so as to obtain transmission data quantity characteristics and HTTP protocol information characteristics, and correcting an AI intrusion classification model by using the transmission data quantity characteristics and the HTTP protocol information characteristics, so as to obtain an AI intrusion detection model of the mall system;

step S7: and carrying out suspicious analysis on the suspicious detection data by using an AI intrusion detection model of the mall system, thereby obtaining system intrusion data and sending the system intrusion data to a mall protection system.

The invention acquires the system log information of the mall through the mall system, and performs intrusion detection on the system log information of the mall, thereby acquiring the intrusion data of the mall system; by intrusion detection, vulnerabilities and potential risks can be found in time, so that the safety of the system is improved. Performing dimension reduction processing on the mall system intrusion data and constructing an AI intrusion classification model; the log information and network traffic data of the mall system are huge, and contain a great amount of redundant information and details of the fine branch minutiae, if the data are directly processed and analyzed, the efficiency and accuracy of intrusion detection can be greatly reduced. Therefore, the dimension reduction processing of the data is beneficial to processing the data quantity and analyzing complexity, and the accuracy of intrusion detection is improved. By constructing the AI classification model, the intrusion data of the mall system can be rapidly processed and analyzed, and the response speed of intrusion detection is improved. Acquiring historical file data of the mall system through the mall system, and calculating the historical file data of the mall system through a system file damage coefficient formula so as to acquire the historical damage coefficient data of the system; a mall system may contain a lot of sensitive information, transaction records, etc., which may cause serious losses to the system and users once the data is lost or corrupted. By calculating the historical damage coefficient, the situations that the file cannot be read normally, automatically generated, tampered and the like can be found in time, and the security of the system is improved by enhancing the data backup strategy, enhancing the data security and the like. Acquiring current file data of the mall system through the mall system, and calculating the current file data of the mall system through a system file damage coefficient formula so as to acquire the current damage coefficient data of the system; the health state of the current system is timely evaluated through calculating the current damage coefficient of the system, and the influence range and the positioning problem of the damaged file are determined, so that the problem is rapidly solved, and the stable operation of the mall system is ensured. Classifying and calculating current damage coefficient data of the system according to historical damage coefficient data of the system, so as to obtain to-be-suspicious detection data; the distribution condition and the importance degree of different damage coefficient data can be more deeply known by carrying out classification calculation on the current damage coefficient data, and targeted optimization and processing are carried out aiming at different classifications so as to improve the operation efficiency of a mall system. Extracting transmission data quantity characteristics and HTTP protocol information characteristics of the log information of the mall system, so as to obtain transmission data quantity characteristics and HTTP protocol information characteristics, and correcting an AI intrusion classification model by using the transmission data quantity characteristics and the HTTP protocol information characteristics, so as to obtain an AI intrusion detection model of the mall system; through carrying out transmission data quantity characteristic and HTTP protocol information characteristic extraction to the mall system log information to obtain transmission data quantity characteristic and HTTP protocol information characteristic, can more comprehensively, accurately analyze the running state of mall system, effectively discover potential security threat. The AI intrusion classification model is further corrected, so that the accuracy of intrusion detection can be improved, the false alarm rate is reduced, and the safety of a mall system is better ensured. And carrying out suspicious analysis on the suspicious detection data by using an AI intrusion detection model of the mall system, thereby obtaining system intrusion data and sending the system intrusion data to a mall protection system. Through AI intrusion detection model and suspicious analysis, the mall system can effectively screen out security threat, reduces manual intervention and analysis's work load, improves safety detection efficiency, reduces the loss that security threat caused to the mall system simultaneously, reinforcing attack intelligence statistical analysis ability and improves the security of mall system.

Optionally, step S1 includes the steps of:

step S11: acquiring system log information of a mall;

step S12: extracting structured data and unstructured data from the system log information of the mall, so as to obtain system structured data and system unstructured data;

step S13: performing structural intrusion detection on the system structural data so as to obtain the system structural intrusion data;

step S14: unstructured intrusion detection is carried out on the unstructured data of the system, so that unstructured intrusion data of the system is obtained;

step S15: and carrying out time sequence combination on the system structured invasion data and the system unstructured invasion data so as to obtain the mall system invasion data.

The invention acquires the log information of the mall system through the mall system. Extracting structured data and unstructured data from the system log information of the mall, so as to obtain system structured data and system unstructured data; structured data and unstructured data are extracted from the log information of the mall system, so that the running condition of the mall system can be better known. Performing structural intrusion detection on the system structural data so as to obtain the system structural intrusion data; the extraction and analysis of the structured intrusion data can provide effective attack information and scale, and collect more attack information, thereby improving the level of security protection and upgrading defense capability. Unstructured intrusion detection is carried out on the unstructured data of the system, so that unstructured intrusion data of the system is obtained; unstructured data is a complex and difficult-to-parse data type, is often used for hidden attack or abnormal operation, and can timely discover and defend various adverse events and abnormal behaviors through detection and identification of the unstructured data of the system, so that the safety and stability of the system are ensured. And carrying out time sequence combination on the system structured invasion data and the system unstructured invasion data so as to obtain the mall system invasion data. The structured and unstructured data can provide different security event details and background information, and can discover more security threats and illegal behaviors by combining two types of data, so that the comprehensiveness and accuracy of intrusion detection are improved.

Optionally, step S13 includes the steps of:

step S131: carrying out statistical analysis on the system structured data so as to obtain high-frequency structured intrusion data and low-frequency structured intrusion data;

step S132: carrying out structural disturbance intrusion calculation on the system structural data so as to obtain potential structural intrusion data;

step S133: and carrying out time sequence combination on the potential structured invasion data, the high-frequency structured invasion data and the low-frequency structured invasion data so as to obtain the system structured invasion data.

The invention carries out statistical analysis on the system structured data so as to obtain high-frequency structured intrusion data and low-frequency structured intrusion data; through statistical analysis, high-frequency structured intrusion data, which can be regarded as a major threat or goal of a mall system, as well as low-frequency structured intrusion data can be acquired. Therefore, the defense strategy and the upgrade defense facilities can be formulated in a targeted manner, and the safety of the system is protected more efficiently. The low frequency structured intrusion data may be a new type of intrusion threat or incident. The detection and understanding of the threats can improve the protection and correspondence capability of enterprises to unknown threats and improve the security level. Carrying out structural disturbance intrusion calculation on the system structural data so as to obtain potential structural intrusion data; in the running process of the system, many anomalies are not attack behaviors, so that the detection is carried out only by means of original structured data under the condition of no attack behaviors, and false alarms are easy to generate. Through the structured disturbance intrusion calculation, the possibility of false alarm generated by intrusion detection can be reduced, and the detection accuracy is improved. The structured disturbance intrusion calculation can simulate the attack mode of an intruder through a reasonable disturbance algorithm, and infer potential intrusion behaviors based on the attack mode, so that the defending capability of the system against malicious attacks is enhanced, and the system safety is ensured. And carrying out time sequence combination on the potential structured invasion data, the high-frequency structured invasion data and the low-frequency structured invasion data so as to obtain the system structured invasion data. By combining the time sequence of the structured intrusion data of the system, the complete tracking and tracing of the security event can be realized, the capability of event handling and processing is enhanced, and the stability and the security of the system are ensured. Meanwhile, more security threats and illegal behaviors can be found, and the comprehensiveness and accuracy of intrusion detection are improved.

Optionally, step S132 includes the steps of:

carrying out disturbance processing on the system structured data through a Laplace mechanism, so as to obtain structured disturbance data;

calculating the structural disturbance data through a structural potential intrusion classification algorithm, so as to obtain potential system structural data;

the function formula of the structured potential intrusion classification algorithm is as follows:

wherein I is a structured potential intrusion coefficient, P is a physical security level, T is time, U is a process use weight, R is a resource utilization rate, and D is a data stream size of external communication.

The functional formula of the structured latent intrusion classification algorithm constructed in the present invention can eliminate the problem by calculating the resource utilization rate by using the harmonic mean processing method for a system using more processes for the possible bad purpose. Fully considers the physical security level P, time T, process use weight U, resource utilization rate R and data flow size D of external communication which influence the structured potential intrusion coefficient I, formsIs a function of (a). T is a time factor, and the longer the time, the lower the intrusion risk index. P is the physical security level, and when the physical security level is high, the invasion risk index is lower. Since the actual physical security level value is between 0 and 1, we take Square root is an inversely proportional accelerator regulator. U is the process usage weight by +.>The term's harmonic mean process, computing only the system representation using more processes, may have a poor goal. R is the resource utilization rate. D denotes the data stream size of the external communication. Increasing this value increases the intrusion risk index and therefore appears as a square root in the formula. The functional formula of the structured potential intrusion classification algorithm improves the accuracy and the accuracy of the structured potential intrusion coefficients, and is beneficial to subsequent classification calculation.

According to the invention, the structured data of the system is subjected to disturbance processing through the Laplace mechanism, so that the structured disturbance data is obtained, the mall system can collect a plurality of sensitive information, such as personal privacy information, transaction records and the like of users, and leakage of the data brings great risks to the users. The Laplace disturbance method can effectively prevent data leakage and reduce the risk of attack or theft of sensitive data. Calculating the structural disturbance data through a structural potential intrusion classification algorithm, so as to obtain potential system structural data; the structural disturbance data is calculated through a potential intrusion classification algorithm, so that the future possible intrusion behavior can be predicted, and the accuracy and timeliness of early warning and intrusion detection are realized.

Optionally, step S14 includes the steps of:

step S141: carrying out statistical analysis on the unstructured data of the system so as to obtain high-frequency unstructured intrusion data and low-frequency unstructured intrusion data;

step S142: unstructured potential intrusion calculation is performed on the unstructured data of the system, so that the potential unstructured intrusion data is obtained;

step S143: and carrying out time sequence combination on the potential unstructured invasion data, the high-frequency unstructured invasion data and the low-frequency unstructured invasion data so as to obtain the system unstructured invasion data.

The invention performs statistical analysis on the unstructured data of the system so as to obtain high-frequency unstructured intrusion data and low-frequency unstructured intrusion data; and carrying out statistical analysis on the unstructured data of the system to obtain high-frequency unstructured intrusion data and low-frequency unstructured intrusion data, so that the comprehensiveness of intrusion detection can be improved. Unstructured potential intrusion calculation is performed on the unstructured data of the system, so that the potential unstructured intrusion data is obtained; through unstructured potential intrusion calculation, future possible intrusion behaviors can be predicted, the enterprises can be helped to find and prevent intrusion events in an early period, and the accuracy and timeliness of intrusion detection are improved. And carrying out time sequence combination on the potential unstructured invasion data, the high-frequency unstructured invasion data and the low-frequency unstructured invasion data so as to obtain the system unstructured invasion data. The time sequence of the potential unstructured intrusion data and the high-frequency and low-frequency unstructured intrusion data are combined, so that the intrusion condition of the system can be reflected more comprehensively, and the coverage range of intrusion detection is improved.

Optionally, step S142 includes the steps of:

step S1421: text data extraction, image data extraction and audio data extraction are carried out on the unstructured data of the system, so that unstructured text data, unstructured audio data and unstructured image data are obtained;

step S1422: extracting voice information from unstructured audio data through a preset voice recognition technology, so as to obtain unstructured voice information;

step S1423: the unstructured speech information and unstructured text data are combined in time sequence, so that unstructured text comprehensive data are obtained;

step S1424: performing unstructured text potential intrusion calculation on the unstructured text comprehensive data so as to obtain the potential unstructured text comprehensive intrusion data;

step S1425: performing unstructured image potential intrusion calculation on unstructured image data so as to obtain potential unstructured image intrusion data;

step S1426: and carrying out time sequence combination on the potential unstructured text comprehensive intrusion data and the potential unstructured image intrusion data so as to obtain the potential unstructured intrusion data.

The invention extracts text data, image data and audio data from unstructured data of the system, can obtain unstructured text data, unstructured audio data and unstructured image data, and has the advantages of convenient data storage and management and improved data analysis efficiency. Extracting voice information from unstructured audio data through a preset voice recognition technology, so as to obtain unstructured voice information; the unstructured audio data is subjected to voice information extraction through the end-to-end voice recognition technology, so that the speed and accuracy of data analysis can be improved, and the voice information can be further analyzed and applied. The unstructured speech information and unstructured text data are combined in time sequence, so that unstructured text comprehensive data are obtained; and various unstructured data are integrated, so that the utilization efficiency of the data can be improved. Performing unstructured text potential intrusion calculation on the unstructured text comprehensive data so as to obtain the potential unstructured text comprehensive intrusion data; and carrying out unstructured text potential intrusion calculation on unstructured text comprehensive data, thereby being beneficial to quickly identifying and positioning potential threat intrusion in the data and better protecting the data security. Performing unstructured image potential intrusion calculation on unstructured image data so as to obtain potential unstructured image intrusion data; and performing unstructured image potential intrusion calculation on unstructured image data, thereby being beneficial to improving the accuracy of safety monitoring, quickly finding abnormal conditions and timely taking corresponding safety measures. And carrying out time sequence combination on the potential unstructured text comprehensive intrusion data and the potential unstructured image intrusion data so as to obtain the potential unstructured intrusion data. By combining the time sequence of different types of potential intrusion data, the related information of intrusion events can be integrated, and the accuracy of intrusion detection is enhanced. The detection sensitivity of the security system to different types of intrusion events can be improved. Whether an intrusion event exists can be accurately judged, and the false alarm rate is reduced.

Optionally, the unstructured text potential intrusion calculation in step S1424 is specifically:

calculating the unstructured text comprehensive data through an unstructured potential text classification algorithm, so as to obtain the potential unstructured text comprehensive intrusion data;

the function formula of the unstructured latent text classification algorithm is as follows:

wherein WI is the potential intrusion coefficient of the unstructured text, n is the total text data amount, i is the text data sequence number, p _i T is the number of keywords contained in the ith unstructured text data _i Q is the radian measure of the time period in which the ith data is located _i And (3) analyzing the unstructured semantic text entropy value for the ith data, wherein x is a normalized coefficient.

The function formula of the unstructured latent text classification algorithm in the invention utilizes the keyword quantity and time point information in the text data, and combines the analyzed unstructured semantic text entropy to perform weighted calculation on the text data, so as to finally obtain a WI value as a classification result. By this formula, classification and prediction of text data generated in the system can be achieved. If the WI value of a certain text data is high, it is indicated that there is a certain association between the text data and the intrusion behavior, and further investigation and analysis are required. Conversely, if the WI value is low, the text data can be considered to be normal without additional processing. The formula can improve the safety and stability of the system, effectively prevent various potential intrusion behaviors and reduce the false alarm rate. The function formula of the unstructured latent text classification algorithm fully considers the total text data quantity n affecting the unstructured text potential invasion coefficient WI, and the number p of keywords contained in the ith unstructured text data _i Radian quantity t of time period in which the i-th data is located _i The value q of the unstructured semantic text entropy analyzed by the ith data _i Normalized by a factor x, forms Is a function of (a). The formula thinking is that WI is obtained by calculating the number of keywords in each text data and time points and weighting and accumulating the keywords. Wherein cos (t) _i ) The text in which keywords appear multiple times over a longer period of time will have a higher weight than text in which keywords appear for a shorter period of time, with the influence of time on the value being controlled. />The overall complexity of the text data and the richness of unstructured semantic information, namely how many different elements such as topics, entities and emotions are contained in the text, can be considered. It should be noted that, since the denominator contains the limit of the sine function, the problem of value divergence may be caused during calculation, and thus special processing is required, where the limit is taken to be 1. Through the formula calculation, unstructured text in the system can be evaluated to judge whether potential intrusion behaviors exist or not.

Optionally, the unstructured image potential intrusion calculation in step S1425 is specifically:

calculating unstructured image data through an unstructured potential image classification algorithm, so that potential unstructured image intrusion data are obtained;

Wherein, the function formula of unstructured latent image classification algorithm is:

wherein PI is the potential invasion coefficient of the unstructured image, v is the pixel sequence number in the image, N is the number of pixels in the image, and w _v Is the weight of pixel v, γ _z Is the environmental impedance, gamma _d For individual dissipation power constants, sigma _d,v Spatially separating gamma for pixel v _d The standard deviation of the dissipated power of the nearest pixel, alpha is the decaySubtracting parameters, e is natural logarithm, beta is damping coefficient, D _v Is the distance of the pixel v to the target area.

The functional formula of the unstructured latent image classification algorithm constructed by the formula is a functional formula of unstructured latent intrusion coefficients generated according to unstructured image data of a system. The method mainly obtains whether the image possibly has an intrusion behavior or not by calculating the distance from each pixel to a target area and other characteristics of the pixels, and further realizes classification and prediction of the image. Gamma in the formula _z Can be used for measuring the size of background noise and gamma _d Representing individual dissipated power constant, sigma _d,v It is determined whether the radiated energies of the pixels converge, α controls the rate of decay of the energy, and β represents the damping coefficient. From these features, the potential intrusion value for each pixel can be calculated and averaged as the potential intrusion coefficient for the entire image. The classification method based on the image data is helpful for improving the safety and stability of the system, and various potential intrusion behaviors can be predicted and prevented to a certain extent. The formula fully considers the number N of pixels in the picture affecting the potential intrusion coefficient PI of the unstructured image and the weight w of the pixel v _v Environmental impedance gamma _z Individual dissipated power constant gamma _d The pixel v is spatially separated from gamma _d Standard deviation sigma of dissipated power of nearest pixel _d,v Attenuation parameter α, damping coefficient β, distance D of pixel v to target region _v Forming Is a function of (a). First, γ in the formula _z The background noise can be used for measuring the background noise, namely the influence of the system environment on the image. As the environmental impedance increases, the more background noise in the image, resulting in a decrease in the potential intrusion coefficient. Second, gamma _d Indicating the amount of energy radiated from the target object for performing the equalization process. Sigma (sigma) _d,v Determining whether the radiant energy of the pixels is convergent, i.e. targetThe difference in radiant energy of pixels surrounding the target object. Alpha is used to control the rate of decay of the energy, i.e. the effect of a pixel further from the target object is smaller. Beta represents a damping coefficient for attenuating the effect of pixels farther from the target object. Finally, the potential intrusion value of each pixel is multiplied by the corresponding weight w by weighted average _v And summing, dividing by the total number of pixels N to obtain a potential intrusion coefficient PI of the whole image. By the function formula, images in the system can be classified and predicted rapidly, whether potential intrusion behaviors exist or not is judged, and the safety and stability of the system are improved.

Optionally, the suspicious analysis in step S7 specifically includes:

acquiring novel system attack means information;

extracting abnormal parameters from the novel system attack means information, thereby obtaining novel means abnormal parameters;

based on the novel means abnormal parameters, classifying and calculating the suspicious detection data through the mall system AI intrusion detection model, thereby obtaining the system intrusion data.

The invention acquires the novel system attack means information through the network security threat report. Extracting abnormal parameters from the novel system attack means information, thereby obtaining novel means abnormal parameters; based on the novel means abnormal parameters, classifying and calculating the suspicious detection data through the mall system AI intrusion detection model, thereby obtaining the system intrusion data. The novel system attack means information is extracted by abnormal parameters, and relevant abnormal parameters such as attack flow, code injection and the like can be extracted by analyzing the attack mode of the novel attack means, the vulnerability characteristics of the target system and the like, so that data support is provided for subsequent identification and prediction of system intrusion. By extracting related abnormal parameters, the risk degree of system intrusion can be more accurately identified and predicted, and the accuracy of intrusion detection is improved. Features of intrusion behaviors can be captured better, and sensitivity of intrusion detection is improved. The intrusion detection model based on the AI technology can realize automatic detection and classified calculation of large-batch data, shorten the processing time of safety detection and improve the efficiency of safety detection.

Optionally, the present specification further provides an AI-based mall system intrusion data analysis system, including:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the AI-based mall system intrusion data analysis method of any one of the above.

The system can realize any one of the AI-based mall system intrusion data analysis methods, is used for combining operation and signal transmission media among various devices to complete the AI-based mall system intrusion data analysis method, and the internal structures of the systems cooperate with each other so as to realize rapid and accurate identification of the mall system intrusion data, thereby effectively intercepting the intrusion data and improving the security of the mall system.

Drawings

Other features, objects and advantages of the application will become more apparent upon reading of the detailed description of a non-limiting implementation, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of the steps of the AI-based mall system intrusion data analysis method of the present invention;

FIG. 2 is a detailed step flow chart of step S1 of the present invention;

FIG. 3 is a detailed flowchart illustrating the step S13 of the present invention;

FIG. 4 is a detailed flowchart of step S14 of the present invention;

fig. 5 is a detailed step flow chart of step S142 in the present invention:

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.

It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

To achieve the above objective, referring to fig. 1 to 5, the present invention provides an AI-based mall system intrusion data analysis method, which includes the following steps:

in this embodiment, the mall system acquires the mall system log information, and performs intrusion detection on the mall system log information, thereby acquiring the mall system intrusion data.

In the embodiment, the main component analysis method is used for carrying out dimension reduction processing on the intrusion data of the mall system, so that the intrusion dimension reduction data of the mall system are obtained, and the intrusion dimension reduction data of the mall system are divided into a dimension reduction training set and a dimension reduction test set according to the proportion of 8:2; and constructing an AI intrusion classification model for the dimension reduction training set and the dimension reduction test set based on a decision tree algorithm.

in this embodiment, the system history file data of the mall is obtained through the mall system, and the system history file data of the mall is calculated through the system file damage coefficient formula, so as to obtain the system history damage coefficient data. The system file damage coefficient formula fully considers the sample number n of the history file data affecting the system file damage coefficient C ₁ Weight w of h historical file data _h The number x of failures of the h-th file _h Total number y of normal operation of h-th file _h Weight q of the degree of damage, formsIs a function of (a). File corruption by computing systemAnd obtaining a specific value of the damage degree of the system file. />May be used to determine the stability and reliability of a device and is generally defined as the ratio of the number of failures that occur during the device's run time to the total time the device is running.An index reflecting the importance of the history file and the stability of the current file can be calculated, and then the indexes corresponding to all files are added to obtain an integral value. This overall value may reflect the stability of the overall system file.

in this embodiment, the current file data of the mall system is obtained through the mall system, and the current file data of the mall system is calculated through a system file damage coefficient formula, so as to obtain the current damage coefficient data of the system.

In the embodiment, threshold calculation is performed on the system history damage coefficient data, so that a system history damage threshold is obtained; and carrying out classification calculation on the current damage coefficient data of the system based on the historical damage threshold of the system by using a logistic regression algorithm, thereby obtaining the data to be suspicious.

in this embodiment, feature extraction is performed on the system log information of the mall, so as to obtain the transmission data volume feature and the HTTP protocol information feature. And taking the transmission data quantity characteristic and the HTTP protocol information characteristic as input variables, and inputting the input variables into an AI intrusion classification model so as to obtain an AI intrusion detection model of the mall system.

Optionally, step S1 includes the steps of:

step S11: acquiring system log information of a mall;

in this embodiment, the mall system log information is obtained through the mall system.

in this embodiment, feature extraction is performed on the system log information of the mall, so as to obtain system structured data and system unstructured data.

In the embodiment, the system structured invasion data and the system unstructured invasion data are combined according to a time sequence, so that the mall system invasion data are obtained.

Optionally, step S13 includes the steps of:

in this embodiment, statistical analysis is performed on the system structured data by using a frequency analysis method, so as to obtain high-frequency structured intrusion data and low-frequency structured intrusion data.

In this embodiment, the potential structured intrusion data, the high-frequency structured intrusion data, and the low-frequency structured intrusion data are combined according to a time sequence, so as to obtain system structured intrusion data.

Optionally, step S132 includes the steps of:

in the embodiment, the system structured data is converted into probability distribution conforming to normal distribution, so that the system structured probability distribution data is obtained, and random disturbance is carried out on the system structured probability distribution data through a Laplacian mechanism, so that structured disturbance data is obtained.

in this embodiment, a structured potential intrusion classification algorithm is formed by combining logistic regression parameters, security level information of system structured data, resource utilization rate, data flow information, and related parameters such as a structuring process. And calculating the structural disturbance data through a structural potential intrusion classification algorithm, so as to obtain the structural data of the potential system.

The functional formula of the structured latent intrusion classification algorithm constructed in the present invention can eliminate the problem by calculating the resource utilization rate by using the harmonic mean processing method for a system using more processes for the possible bad purpose. Fully considers the physical security level P, time T, process use weight U, resource utilization rate R and data flow size D of external communication which influence the structured potential intrusion coefficient I, formsIs a function of (a). T is a time factor, and the longer the time, the lower the intrusion risk index. P is the physical security level, and when the physical security level is high, the invasion risk index is lower. Since the actual physical security level value is between 0 and 1, the square root is taken as an inversely proportional accelerator. U is the process usage weight by +.>The term's harmonic mean process, computing only the system representation using more processes, may have a poor goal. R is the resource utilization rate. D denotes the data stream size of the external communication. Increasing this value increases the intrusion risk index and therefore appears as a square root in the formula. The functional formula of the structured potential intrusion classification algorithm improves the accuracy and the accuracy of the structured potential intrusion coefficients, and is beneficial to subsequent classification calculation.

Optionally, step S14 includes the steps of:

in this embodiment, statistical analysis is performed on the unstructured data of the system by using a frequency analysis method, so as to obtain high-frequency unstructured intrusion data and low-frequency unstructured intrusion data.

In this embodiment, the potential unstructured intrusion data, the high-frequency unstructured intrusion data and the low-frequency unstructured intrusion data are combined according to a time sequence, so as to obtain unstructured intrusion data of the system.

Optionally, step S142 includes the steps of:

in this embodiment, data extraction is performed on unstructured data of the system, so as to obtain unstructured text data, unstructured audio data and unstructured image data.

in this embodiment, the unstructured audio data is extracted by using an end-to-end speech recognition technology, so as to obtain unstructured speech information.

in this embodiment, unstructured speech information and unstructured text data are combined according to a time sequence, so that unstructured text comprehensive data is obtained.

In this embodiment, the latent unstructured text comprehensive intrusion data and the latent unstructured image intrusion data are subjected to structured data conversion, so as to obtain the latent unstructured text comprehensive conversion data and the latent unstructured image conversion data. And merging the potential unstructured text comprehensive conversion data and the potential unstructured image conversion data according to a time sequence, so as to obtain the potential unstructured intrusion data.

in the embodiment, an unstructured potential text classification algorithm is constructed by combining related parameters such as random forest related parameters, keyword information, time information and semantic text information of unstructured text comprehensive data, and the unstructured text comprehensive data is calculated by the unstructured potential text classification algorithm, so that the potential unstructured text comprehensive intrusion data is obtained.

The function formula of the unstructured latent text classification algorithm in the invention utilizes the keyword quantity and time point information in the text data, and combines the analyzed unstructured semantic text entropy to perform weighted calculation on the text data, so as to finally obtain a WI value as a classification result. By this formula, the system can be realized The text data generated in the process is classified and predicted. If the WI value of a certain text data is high, it is indicated that there is a certain association between the text data and the intrusion behavior, and further investigation and analysis are required. Conversely, if the WI value is low, the text data can be considered to be normal without additional processing. The formula can improve the safety and stability of the system, effectively prevent various potential intrusion behaviors and reduce the false alarm rate. The function formula of the unstructured latent text classification algorithm fully considers the total text data quantity n affecting the unstructured text potential invasion coefficient WI, and the number p of keywords contained in the ith unstructured text data _i Radian quantity t of time period in which the i-th data is located _i The value q of the unstructured semantic text entropy analyzed by the ith data _i Normalized by a factor x, forms Is a function of (a). The formula thinking is that WI is obtained by calculating the number of keywords in each text data and time points and weighting and accumulating the keywords. Wherein cos (t) _i ) The text in which keywords appear multiple times over a longer period of time will have a higher weight than text in which keywords appear for a shorter period of time, with the influence of time on the value being controlled. / >The overall complexity of the text data and the richness of unstructured semantic information, namely how many different elements such as topics, entities and emotions are contained in the text, can be considered. It should be noted that, since the denominator contains the limit of the sine function, the problem of value divergence may be caused during calculation, and thus special processing is required, where the limit is taken to be 1. Through the formula calculation, unstructured text in the system can be evaluated to judge whether potential intrusion behaviors exist or not.

in this embodiment, an unstructured latent image classification algorithm is constructed by combining relevant parameters of a decision tree algorithm, pixel information, environmental impedance, dissipation power information, attenuation information and pixel position information of unstructured image data, and the unstructured image data is calculated by the unstructured latent image classification algorithm, so that the latent unstructured image intrusion data is obtained.

Wherein PI is the potential invasion coefficient of the unstructured image, v is the pixel sequence number in the image, N is the number of pixels in the image, and w _v Is the weight of pixel v, γ _z Is the environmental impedance, gamma _d For individual dissipation power constants, sigma _d,v Spatially separating gamma for pixel v _d The standard deviation of the dissipation power of the nearest pixel, alpha is the attenuation parameter, e is the natural logarithm, beta is the damping coefficient, D _v Is the distance of the pixel v to the target area.

The functional formula of the unstructured latent image classification algorithm constructed by the formula is a functional formula of unstructured latent intrusion coefficients generated according to unstructured image data of a system. The method mainly obtains whether the image possibly has an intrusion behavior or not by calculating the distance from each pixel to a target area and other characteristics of the pixels, and further realizes classification and prediction of the image. Gamma in the formula _z Can be used for measuring the size of background noise and gamma _d Representing individual dissipated power constant, sigma _d,v It is determined whether the radiated energies of the pixels converge, α controls the rate of decay of the energy, and β represents the damping coefficient. From these features, the potential intrusion value of each pixel can be calculated, andaveraging is used as a potential intrusion coefficient for the entire image. The classification method based on the image data is helpful for improving the safety and stability of the system, and various potential intrusion behaviors can be predicted and prevented to a certain extent. The formula fully considers the number N of pixels in the picture affecting the potential intrusion coefficient PI of the unstructured image and the weight w of the pixel v _v Environmental impedance gamma _z Individual dissipated power constant gamma _d The pixel v is spatially separated from gamma _d Standard deviation sigma of dissipated power of nearest pixel _d,v Attenuation parameter α, damping coefficient β, distance D of pixel v to target region _v Forming Is a function of (a). First, γ in the formula _z The background noise can be used for measuring the background noise, namely the influence of the system environment on the image. As the environmental impedance increases, the more background noise in the image, resulting in a decrease in the potential intrusion coefficient. Second, gamma _d Indicating the amount of energy radiated from the target object for performing the equalization process. Sigma (sigma) _d,v It is determined whether the radiant energies of the pixels converge, i.e., the difference in radiant energies of the pixels surrounding the target object. Alpha is used to control the rate of decay of the energy, i.e. the effect of a pixel further from the target object is smaller. Beta represents a damping coefficient for attenuating the effect of pixels farther from the target object. Finally, the potential intrusion value of each pixel is multiplied by the corresponding weight w by weighted average _v And summing, dividing by the total number of pixels N to obtain a potential intrusion coefficient PI of the whole image. By the function formula, images in the system can be classified and predicted rapidly, whether potential intrusion behaviors exist or not is judged, and the safety and stability of the system are improved.

Optionally, the suspicious analysis in step S7 specifically includes:

acquiring novel system attack means information;

in this embodiment, the novel system attack means information is obtained through a network security threat report.

in this embodiment, the novel system attack means information is extracted to obtain novel means abnormal parameters, where the novel means abnormal parameters include an abnormal IP address, an abnormal access path, a specific request parameter, and an abnormal data packet.

at least one processor;

a memory communicatively coupled to the at least one processor;

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The utility model provides an AI-based mall system intrusion data analysis method which is characterized by comprising the following steps:

2. The method according to claim 1, wherein step S1 comprises the steps of:

step S11: acquiring system log information of a mall;

3. The method according to claim 2, characterized in that step S13 comprises the steps of:

4. A method according to claim 3, wherein step S132 comprises the steps of:

5. The method according to claim 2, wherein step S14 comprises the steps of:

6. The method according to claim 5, wherein step S142 includes the steps of:

7. The method of claim 6, wherein the unstructured text potential intrusion calculation in step S1424 is specifically:

wherein WI is the unstructured text potential intrusion coefficient, n is the total text data amount, i is the text data sequence number, p _i T is the number of keywords contained in the ith unstructured text data _i Q is the radian measure of the time period in which the ith data is located _i And (3) analyzing the unstructured semantic text entropy value for the ith data, wherein x is a normalized coefficient.

8. The method of claim 6, wherein the unstructured image potential intrusion calculation in step S1425 is specifically:

where PI is the non-structural image potential intrusion coefficient,v is the number of pixels in the image, N is the number of pixels in the picture, w _v Is the weight of pixel v, γ _z Is the environmental impedance, gamma _d For individual dissipation power constants, sigma _d,v Spatially separating gamma for pixel v _d The standard deviation of the dissipation power of the nearest pixel, alpha is the attenuation parameter, e is the natural logarithm, beta is the damping coefficient, D _v Is the distance of the pixel v to the target area.

9. The method according to claim 1, wherein the suspicious analysis in step S7 is specifically:

acquiring novel system attack means information;

10. An AI-based mall system intrusion data analysis system, comprising:

at least one processor;

a memory communicatively coupled to the at least one processor;

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the AI-based mall system intrusion data analysis method of any one of claims 1 to 9.