CN112199671A

CN112199671A - Artificial intelligence-based malicious data analysis method and device and electronic device

Info

Publication number: CN112199671A
Application number: CN202011078844.XA
Authority: CN
Inventors: 唐佳莉; 范渊; 吴卓群
Original assignee: Hangzhou Dbappsecurity Technology Co Ltd
Current assignee: Hangzhou Dbappsecurity Technology Co Ltd
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2021-01-08

Abstract

The application relates to a malicious data analysis method, a malicious data analysis device and an electronic device based on artificial intelligence, wherein the method comprises the following steps: acquiring sample data input by a user and a data analysis requirement, and determining the data type of the sample data; determining a target dynamic and static analysis strategy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data; determining a target machine learning rule according to the data type, the data characteristic information and the data analysis requirement of the sample data, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule; and acquiring to-be-detected data input by a user, and performing malicious analysis processing on the to-be-detected data according to the data classification model to obtain a malicious analysis result corresponding to the to-be-detected data. By the method and the device, the problem that malicious data cannot be effectively analyzed and detected in the related technology is solved.

Description

Artificial intelligence-based malicious data analysis method and device and electronic device

Technical Field

The present application relates to the field of computer technologies, and in particular, to a malicious data analysis method and apparatus based on artificial intelligence, and an electronic apparatus.

Background

With the coming of the 5G era and the development of the internet technology, the application range of the big data processing technology is wider and wider, the malicious data generated therewith is more and more, and if the malicious data are not detected and analyzed in time, huge economic loss can be brought to enterprises.

In the related art, a fixed characteristic mode and a detection algorithm are adopted to detect malicious data, however, the method cannot adapt to the current situation that the quantity and the type of the malicious data are increased sharply, and the analysis capability, the identification and classification efficiency and the accuracy of the malicious data are all defective.

At present, no effective solution is provided for the problem that malicious data cannot be effectively analyzed and detected in the related technology.

Disclosure of Invention

The embodiment of the application provides an artificial intelligence malicious data analysis method, an artificial intelligence malicious data analysis device, an artificial intelligence malicious data analysis system, an electronic device and a storage medium, and at least solves the problem that malicious data cannot be effectively analyzed and detected in the related technology.

In a first aspect, an embodiment of the present application provides a malicious data analysis method based on artificial intelligence, including:

acquiring sample data input by a user and a data analysis requirement, and determining the data type of the sample data;

determining a target dynamic and static analysis strategy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data;

determining a target machine learning rule according to the data type of the sample data, the data characteristic information and the data analysis requirement, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule;

and acquiring to-be-detected data input by the user, and performing malicious analysis processing on the to-be-detected data according to the data classification model to obtain a malicious analysis result corresponding to the to-be-detected data.

In some embodiments, the determining a target dynamic and static analysis policy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis policy to obtain data feature information corresponding to the sample data includes:

according to the data type of the sample data, a target static analysis strategy and a target dynamic analysis strategy are determined from a preset dynamic and static analysis strategy set; the preset dynamic and static analysis strategy set comprises a plurality of static analysis strategies and a plurality of dynamic analysis strategies;

according to the target static analysis strategy and the target dynamic analysis strategy, performing dynamic and static analysis processing on the sample data to obtain data characteristic information corresponding to the sample data; the data characteristic information comprises a plurality of data characteristics and characteristic information corresponding to each data characteristic.

In some of these embodiments, the static analysis policies include antivirus software scanning policies, file format identification policies, string extraction analysis policies, binary structure analysis policies, disassembly policies, decompilation policies, code structure policies and logic analysis policies, shelling identification policies, and code shelling policies;

the dynamic analysis strategy comprises a snapshot comparison strategy, a system dynamic behavior monitoring strategy, a network protocol stack monitoring strategy, a sandbox strategy and a dynamic debugging strategy.

In some embodiments, the determining a target machine learning rule according to the data type of the sample data, the data feature information, and the data analysis requirement, and the constructing a machine learning-based data classification model according to the data feature information and the target machine learning rule includes:

obtaining target characteristic information according to the data type of the sample data and the data characteristic information; the target characteristic information comprises a plurality of target data characteristics and characteristic information corresponding to each target data characteristic;

according to the target characteristic information and the data analysis requirement, a target machine learning rule is determined from a preset target machine learning rule set; the preset target machine learning rule set comprises a plurality of machine learning rules;

and constructing a data classification model based on machine learning according to the target characteristic information and the target machine learning rule.

In some of these embodiments, the target signature information further includes a standard malicious type; the step of constructing a data classification model based on machine learning according to the target feature information and the target machine learning rule comprises the following steps:

the method comprises the steps of taking characteristic information of data as an input parameter and malicious types of the data as an output parameter, and constructing an initial data classification model based on machine learning;

inputting the target characteristic information into the initial data classification model to obtain a predicted malicious type;

comparing the predicted malicious type with the standard malicious type to obtain a comparison result;

and adjusting model parameters of an initial data classification model according to the comparison result and the target machine learning rule so as to train the initial data classification model to obtain a trained data classification model.

In some embodiments, after the building of the machine learning based data classification model according to the target feature information and the target machine learning rule, the method further comprises:

determining a target model optimization strategy from preset model optimization strategies according to the characteristics of the data classification model; the preset model optimization strategy comprises a plurality of model optimization strategies;

and adjusting the model parameters of the data classification model according to the target model optimization strategy so as to optimize the data classification model to obtain the optimized data classification model.

In some embodiments, before the analyzing the sample data according to the target dynamic and static analysis policy, the method further includes:

according to the data type of the sample data, a target data cleaning strategy is determined from a preset data cleaning strategy set, and the sample data is cleaned according to the target data cleaning strategy; the preset data cleaning strategy set comprises a plurality of data cleaning strategies.

In a second aspect, an embodiment of the present application provides an apparatus for analyzing malicious data based on artificial intelligence, including:

the data acquisition module is used for acquiring sample data provided by a user and data analysis requirements and determining the data type of the sample data;

the dynamic and static analysis module is used for determining a target dynamic and static analysis strategy according to the data type of the sample data and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data;

the model building module is used for determining a target machine learning rule according to the data type of the sample data, the data characteristic information and the data analysis requirement, and building a data classification model based on machine learning according to the data characteristic information and the target machine learning rule;

and the malicious analysis module is used for acquiring the data to be detected input by the user and carrying out malicious analysis processing on the data to be detected according to the data classification model to obtain a malicious analysis result corresponding to the data to be detected.

In a third aspect, an embodiment of the present application provides an electronic apparatus, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the artificial intelligence based malicious data analysis method according to the first aspect.

In a fourth aspect, the present application provides a storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the artificial intelligence based malicious data analysis method according to the first aspect.

Compared with the related art, the malicious data analysis method, the malicious data analysis device and the electronic device based on artificial intelligence provided by the embodiment of the application determine the data type of sample data by acquiring the sample data input by a user and the data analysis requirement; determining a target dynamic and static analysis strategy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data; determining a target machine learning rule according to the data type, the data characteristic information and the data analysis requirement of the sample data, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule; the data to be detected input by a user are obtained, malicious analysis processing is carried out on the data to be detected according to the data classification model, a malicious analysis result corresponding to the data to be detected is obtained, and the problem that the malicious data cannot be effectively analyzed and detected in the related technology is solved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flowchart of a malicious data analysis method based on artificial intelligence according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating dynamic and static analysis processing of sample data according to an embodiment of the present disclosure;

FIG. 3 is a flow chart of determining target machine learning rules and constructing a machine learning based data classification model according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating the construction of a data classification model based on machine learning according to target feature information and target machine learning rules in an embodiment of the present application;

FIG. 5 is a flowchart illustrating an optimization process performed on a data classification model according to an embodiment of the present disclosure;

FIG. 6 is a flowchart of a malicious data analysis method based on artificial intelligence according to an embodiment of the present application;

fig. 7 is a block diagram of a hardware structure of a terminal of a malicious data analysis method based on artificial intelligence according to an embodiment of the present application;

FIG. 8 is a block diagram of a malicious data analysis system based on artificial intelligence according to an embodiment of the present disclosure;

fig. 9 is a block diagram of a malicious data analysis apparatus based on artificial intelligence according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The various techniques described herein may be applied, but are not limited to, a variety of data detection and analysis devices, platforms, and systems.

Fig. 1 is a flowchart of a malicious data analysis method based on artificial intelligence according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:

step S110, sample data input by a user and a data analysis requirement are obtained, and the data type of the sample data is determined.

It should be noted that the number of sample data is multiple groups; and each group of sample data carries a malicious type label which is used for identifying whether the sample data is malicious data or not and identifying a malicious type corresponding to the sample data.

And S120, determining a target dynamic and static analysis strategy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data.

Step S130, determining a target machine learning rule according to the data type, the data characteristic information and the data analysis requirement of the sample data, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule.

Step S140, acquiring data to be detected input by a user, and performing malicious analysis processing on the data to be detected according to the data classification model to obtain a malicious analysis result corresponding to the data to be detected.

It should be noted that the data input by the user includes sample data of a known malicious type and a large amount of data to be detected. Firstly, automatically selecting a proper dynamic and static analysis strategy according to the data type of sample data, and performing multi-dimensional analysis on the sample data according to the selected proper dynamic and static analysis strategy so as to extract data characteristic information related to malicious data behaviors from the sample data. Secondly, automatically selecting a proper machine learning rule according to the data type of the sample data, the extracted data characteristic information and the data analysis requirement input by the user, and constructing a data classification model with a detection function according to the data characteristic information and the automatically selected machine learning rule. Finally, according to the constructed data classification model, malicious classification can be performed on a large amount of data to be detected, namely whether the data to be detected is malicious data or not is judged, and a malicious type corresponding to the data to be detected is determined. The malicious analysis result comprises non-malicious data and data corresponding to multiple malicious types.

Through the steps from S110 to S140, a proper dynamic and static analysis strategy is automatically selected according to the data type and the data analysis requirement of the sample data, a proper machine learning rule is automatically selected, multi-dimensional analysis processing is carried out on the sample data according to the automatically selected dynamic and static analysis strategy, data characteristic information related to malicious data behaviors is extracted from the sample data, a data classification model is constructed according to the data characteristic information and the automatically selected proper machine learning rule, malicious analysis can be carried out on the data to be detected input by a user by adopting the formed data classification model, and a malicious analysis result corresponding to the data to be detected is obtained. According to the method and the device, the data classification model with the detection function is automatically formed by aiming at the malicious data of different types, the number and the type of the malicious data are adapted to the current situation of rapid increase, the problems of low analysis capability, recognition classification efficiency and accuracy of the malicious data in the prior art are solved, the analysis and detection capability and precision of the malicious data are improved, and the problem that the malicious data cannot be effectively analyzed and detected in the related technology is solved.

In some embodiments, before step S120, the artificial intelligence based malicious data analysis method further includes: according to the data type of the sample data, a target data cleaning strategy is determined from a preset data cleaning strategy set, and the sample data is cleaned according to the target data cleaning strategy; the preset data cleaning strategy set comprises a plurality of data cleaning strategies.

Data cleansing refers to the last procedure to find and correct recognizable errors in data files, including checking data consistency, processing invalid and missing values, etc.

According to the embodiment, the appropriate data cleaning strategy is automatically selected according to the data type of the sample data, and the sample data is cleaned according to the automatically selected data cleaning strategy, so that recognizable errors in the sample data can be timely found and corrected, the consistency of the sample data is ensured, the accuracy of subsequent dynamic and static analysis can be improved, and the detection precision of malicious data is further improved.

In some of these embodiments, the data cleansing policies include, but are not limited to, data pre-processing policies, default value handling policies, exception data handling policies, and redundancy handling policies.

In some embodiments, fig. 2 is a flowchart of performing dynamic and static analysis processing on sample data in the embodiment of the present application, and as shown in fig. 2, the flowchart includes the following steps:

step S210, according to the data type of sample data, a target static analysis strategy and a target dynamic analysis strategy are determined from a preset dynamic and static analysis strategy set; the preset dynamic and static analysis strategy set comprises a plurality of static analysis strategies and a plurality of dynamic analysis strategies.

Step S220, performing dynamic and static analysis processing on the sample data according to the target static analysis strategy and the target dynamic analysis strategy to obtain data characteristic information corresponding to the sample data; the data characteristic information comprises a plurality of data characteristics and characteristic information corresponding to each data characteristic.

It should be noted that, a dynamic and static analysis strategy is adopted to perform multidimensional analysis on the malicious data behaviors, so as to extract data features related to the malicious data behaviors and malicious type tags corresponding to the data features from sample data.

Through the steps S210 to S220, a target static analysis strategy and a target dynamic analysis strategy are determined from a preset dynamic and static analysis strategy set according to the data type of the sample data; and performing dynamic and static analysis processing on the sample data according to the target static analysis strategy and the target dynamic analysis strategy to obtain data characteristic information corresponding to the sample data. According to the embodiment, different static analysis strategies and dynamic analysis strategies are adopted for sample data of different data types, and multi-dimensional malicious data behavior analysis is performed on the sample data according to the selected static analysis strategies and dynamic analysis strategies, so that more accurate data characteristic information can be obtained, and the detection precision of malicious data is further improved.

In some of these embodiments, the static analysis policies include antivirus software scanning policies, file format identification policies, string extraction analysis policies, binary structure analysis policies, disassembly policies, decompilation policies, code structure policies and logic analysis policies, shelling identification policies, and code shelling policies. The dynamic analysis strategy comprises a snapshot comparison strategy, a system dynamic behavior monitoring strategy, a network protocol stack monitoring strategy, a sandbox strategy and a dynamic debugging strategy.

In some embodiments, fig. 3 is a flowchart of determining a target machine learning rule and constructing a data classification model based on machine learning in the embodiments of the present application, and as shown in fig. 3, the flowchart includes the following steps:

step S310, obtaining target characteristic information according to the data type and the data characteristic information of the sample data; the target characteristic information includes a plurality of target data characteristics and characteristic information corresponding to each target data characteristic.

The data characteristic information comprises a plurality of data characteristics and characteristic information corresponding to each data characteristic. The target characteristic information includes a plurality of target data characteristics and characteristic information corresponding to each target data characteristic.

Specifically, target data features are screened from multiple data features according to the data type of the sample data to obtain multiple target data features and feature information corresponding to each target data feature, namely target feature information, wherein the target data features correspond to the data type of the sample data.

Step S320, determining a target machine learning rule from a preset target machine learning rule set according to the target characteristic information and the data analysis requirement; the preset target machine learning rule set includes a plurality of machine learning rules.

And step S330, constructing a data classification model based on machine learning according to the target characteristic information and the target machine learning rule.

Through the steps from S310 to S330, target characteristic information is obtained according to the data type and the data characteristic information of the sample data; determining a target machine learning rule from a preset target machine learning rule set according to the target characteristic information and the data analysis requirement; and constructing a data classification model based on machine learning according to the target characteristic information and the target machine learning rule. In the embodiment, different target characteristic information is automatically selected according to sample data of different data types; according to the target characteristic information and the data analysis requirement, a proper machine learning rule is automatically selected, so that a data classification model is constructed according to the automatically selected target characteristic information and the automatically selected machine learning rule, the data classification model is suitable for sample data of different data types and the data analysis requirements of different users, large-scale application and popularization are facilitated, and the detection precision of malicious data is further improved.

In some of these embodiments, the preset target machine learning rule set includes a plurality of machine learning rules set according to a machine learning algorithm; the machine learning algorithm includes, but is not limited to, a linear classification algorithm, a support vector machine algorithm, a naive Bayes algorithm, a K-nearest neighbor algorithm, a decision tree algorithm, an integrated model algorithm, a linear regression algorithm, a data clustering algorithm, a data dimension reduction algorithm and a deep learning algorithm.

In some of these embodiments, the target signature information further includes a standard malicious type; fig. 4 is a flowchart of constructing a data classification model based on machine learning according to target feature information and target machine learning rules in the embodiment of the present application, and as shown in fig. 4, the flowchart includes the following steps:

and step S410, constructing an initial data classification model based on machine learning by taking the characteristic information of the data as an input parameter and taking the malicious type of the data as an output parameter.

Step S420, inputting the target characteristic information into the initial data classification model to obtain the predicted malicious type.

And step S430, comparing the predicted malicious type with the standard malicious type to obtain a comparison result.

And step S440, adjusting model parameters of the initial data classification model according to the comparison result and the target machine learning rule so as to train the initial data classification model and obtain the trained data classification model.

Through the steps S410 to S440, an initial data classification model based on machine learning is constructed with the feature information of the data as an input parameter and the malicious type of the data as an output parameter; the target characteristic information is input into the initial data classification model to obtain a predicted malicious type, the predicted malicious type is compared with a standard malicious type, model parameters of the initial data classification model are adjusted according to a comparison result, the initial data classification model is trained, and therefore the data to be detected input by a user is detected according to the trained data classification model, and the detection precision of malicious data can be further improved.

In some embodiments, fig. 5 is a flowchart of an optimization process performed on a data classification model in the embodiments of the present application, and as shown in fig. 5, the process includes the following steps:

step S510, determining a target model optimization strategy from preset model optimization strategies according to the characteristics of the data classification model; the preset model optimization strategy comprises a plurality of model optimization strategies.

And step S520, adjusting model parameters of the data classification model according to the target model optimization strategy so as to optimize the data classification model, and obtaining the optimized data classification model.

Through the steps S510 to S520, according to the characteristics of the data classification model, a suitable target model optimization strategy is automatically selected, and the model parameters of the data classification model are adjusted according to the automatically selected model optimization strategy, so that the data classification model is optimized, the reliability of the data classification model can be improved, and therefore, the data to be detected input by the user is detected according to the optimized data classification model, and the reliability of the malicious analysis result can be further improved.

In some of these embodiments, the model optimization strategies include, but are not limited to, gradient descent optimization strategies, stochastic gradient descent optimization strategies, small batch gradient descent optimization strategies, momentum technology optimization strategies, gradient acceleration optimization strategies, and adaptive time-of-day estimation optimization strategies.

The embodiments of the present application are described and illustrated below by way of specific examples.

Fig. 6 is a flowchart of an artificial intelligence based malicious data analysis method according to an embodiment of the present application, and as shown in fig. 6, the artificial intelligence based malicious data analysis method includes the following steps:

step S610, obtaining sample data input by a user and data analysis requirements, and determining the data type of the sample data.

And S620, determining a target data cleaning strategy according to the data type of the sample data, and cleaning the sample data according to the target data cleaning strategy to obtain the sample data after cleaning.

Step S630, according to the data type of the sample data, determining a target dynamic and static analysis strategy, and according to the target dynamic and static analysis strategy, performing dynamic and static analysis processing on the cleaned sample data to obtain data characteristic information corresponding to the sample data.

And step S640, determining a target machine learning rule according to the data type, the data characteristic information and the data analysis requirement of the sample data, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule.

And S650, determining a target model optimization strategy according to the characteristics of the data classification model, and optimizing the data classification model according to the target model optimization strategy to obtain the optimized data classification model.

And step S660, acquiring the data to be detected input by the user, and performing malicious analysis processing on the data to be detected according to the optimized data classification model to obtain a malicious analysis result corresponding to the data to be detected.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the embodiment can be executed in a terminal, a computer or a similar operation device. Taking the operation on the terminal as an example, fig. 7 is a hardware structure block diagram of the terminal of the malicious data analysis method based on artificial intelligence according to the embodiment of the present application. As shown in fig. 7, the terminal 70 may include one or more (only one shown in fig. 7) processors 702 (the processors 702 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 704 for storing data, and optionally, a transmission device 706 for communication functions and an input-output device 708. It will be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration and is not intended to limit the structure of the terminal. For example, terminal 70 may also include more or fewer components than shown in FIG. 7, or have a different configuration than shown in FIG. 7.

The memory 704 may be used to store computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the artificial intelligence-based malicious data analysis method in the embodiment of the present application, and the processor 702 executes the computer programs stored in the memory 704 to perform various functional applications and data processing, so as to implement the above-mentioned method. The memory 704 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 704 may further include memory located remotely from the processor 702, which may be connected to the terminal 70 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmitting device 706 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the terminal 70. In one example, the transmission device 706 includes a Network adapter (NIC) that can be connected to other Network devices via a base station to communicate with the internet. In one example, the transmitting device 706 can be a Radio Frequency (RF) module configured to communicate with the internet via wireless.

The present embodiment further provides an artificial intelligence-based malicious data analysis apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the apparatus is omitted here. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 8 is a block diagram of a malicious data analysis system based on artificial intelligence according to an embodiment of the present application, and as shown in fig. 8, the malicious data analysis system 100 based on artificial intelligence includes a data cleaning subsystem 10, a dynamic and static analysis subsystem 20, a data feature extraction subsystem 30, a model building and optimizing subsystem 40, and an automated decision making subsystem 50, where:

the automated decision making subsystem 50 is respectively connected to the data cleaning subsystem 10, the dynamic and static analysis subsystem 20, the data feature extraction subsystem 30 and the model building and optimizing subsystem 40, and is configured to control the data cleaning subsystem 10, the dynamic and static analysis subsystem 20, the data feature extraction subsystem 30 and the model building and optimizing subsystem 40 to perform the steps in any one of the above method embodiments.

And the data cleaning subsystem 10 is used for cleaning the sample data input by the user to obtain the cleaned data.

The dynamic and static analysis subsystem 20 is connected to the data cleaning subsystem 10, and is configured to perform dynamic and static analysis processing on the cleaned data to obtain data characteristic information corresponding to the sample data.

The data feature extraction subsystem 30 is connected to the dynamic and static analysis subsystem 20, and is configured to extract target feature information from the data feature information.

The model building and optimizing subsystem 40 is connected to the data feature extraction subsystem 30, and is configured to build a machine learning-based data classification model and perform optimization processing on the data classification model.

It should be noted that the automated decision making subsystem 50 can schedule and make decisions for all subsystems that need automated decision making in the artificial intelligence based malicious data analysis system 100, so that data analysis can be smoothly performed without human intervention.

Fig. 9 is a block diagram illustrating a configuration of an artificial intelligence-based malicious data analysis apparatus according to an embodiment of the present application, and as shown in fig. 9, the artificial intelligence-based malicious data analysis apparatus 900 includes:

the data obtaining module 910 is configured to obtain sample data provided by a user and a data analysis requirement, and determine a data type of the sample data.

And a dynamic and static analysis module 920, configured to determine a target dynamic and static analysis policy according to the data type of the sample data, and perform dynamic and static analysis processing on the sample data according to the target dynamic and static analysis policy to obtain data characteristic information corresponding to the sample data.

And a model building module 930, configured to determine a target machine learning rule according to the data type of the sample data, the data feature information, and the data analysis requirement, and build a data classification model based on machine learning according to the data feature information and the target machine learning rule.

The malicious analysis module 940 is configured to acquire data to be detected input by a user, and perform malicious analysis processing on the data to be detected according to the data classification model to obtain a malicious analysis result corresponding to the data to be detected.

In some embodiments, the dynamic and static analysis module 920 includes a policy determination unit and a dynamic and static analysis unit, wherein:

the strategy determining unit is used for determining a target static analysis strategy and a target dynamic analysis strategy from a preset dynamic and static analysis strategy set according to the data type of the sample data; the preset dynamic and static analysis strategy set comprises a plurality of static analysis strategies and a plurality of dynamic analysis strategies.

The dynamic and static analysis unit is used for performing dynamic and static analysis processing on the sample data according to the target static analysis strategy and the target dynamic analysis strategy to obtain data characteristic information corresponding to the sample data; the data characteristic information comprises a plurality of data characteristics and characteristic information corresponding to each data characteristic.

In some embodiments, the static analysis policy includes an antivirus software scanning policy, a file format identification policy, a string extraction analysis policy, a binary structure analysis policy, a disassembly policy, a decompilation policy, a code structure policy and logic analysis policy, a shell-added identification policy, and a code shelling policy;

In some of these embodiments, the model building module 930 comprises a feature determination unit, a rule determination unit, and a model building unit, wherein:

the characteristic determining unit is used for obtaining target characteristic information according to the data type and the data characteristic information of the sample data; the target characteristic information includes a plurality of target data characteristics and characteristic information corresponding to each target data characteristic.

The rule determining unit is used for determining target machine learning rules from preset target machine learning rules in a centralized manner according to the target characteristic information and the data analysis requirements; the preset target machine learning rule set includes a plurality of machine learning rules.

And the model building unit is used for building a data classification model based on machine learning according to the target characteristic information and the target machine learning rule.

In some of these embodiments, the target signature information further includes a standard malicious type; the model building unit comprises a model building subunit, a feature input subunit, a type comparison subunit and a parameter adjusting subunit, wherein:

and the model construction subunit is used for constructing an initial data classification model based on machine learning by taking the characteristic information of the data as an input parameter and taking the malicious type of the data as an output parameter.

And the characteristic input subunit is used for inputting the target characteristic information into the initial data classification model to obtain the predicted malicious type.

And the type comparison subunit is used for comparing the predicted malicious type with the standard malicious type to obtain a comparison result.

And the parameter adjusting subunit is used for adjusting the model parameters of the initial data classification model according to the comparison result and the target machine learning rule so as to train the initial data classification model and obtain the trained data classification model.

In some embodiments, the artificial intelligence based malicious data analysis apparatus 900 further comprises a model optimization module, the model optimization module comprising a policy determination unit and a model optimization unit, wherein:

the strategy determining unit is used for determining a target model optimization strategy from preset model optimization strategies according to the characteristics of the data classification model; the preset model optimization strategy comprises a plurality of model optimization strategies.

And the model optimization unit is used for adjusting the model parameters of the data classification model according to the target model optimization strategy so as to optimize the data classification model and obtain the optimized data classification model.

In some embodiments, the artificial intelligence based malicious data analysis apparatus 900 further includes a data cleaning module, where the data cleaning module is configured to determine a target data cleaning policy from a preset data cleaning policy set according to a data type of sample data, and perform cleaning processing on the sample data according to the target data cleaning policy; the preset data cleaning strategy set comprises a plurality of data cleaning strategies.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

and S1, acquiring the sample data input by the user and the data analysis requirement, and determining the data type of the sample data.

And S2, determining a target dynamic and static analysis strategy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis strategy to obtain data characteristic information corresponding to the sample data.

And S3, determining a target machine learning rule according to the data type, the data characteristic information and the data analysis requirement of the sample data, and constructing a data classification model based on machine learning according to the data characteristic information and the target machine learning rule.

And S4, acquiring the data to be detected input by the user, and performing malicious analysis processing on the data to be detected according to the data classification model to obtain a malicious analysis result corresponding to the data to be detected.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the malicious data analysis method based on artificial intelligence in the foregoing embodiments, the embodiments of the present application may provide a storage medium to implement. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the artificial intelligence based malicious data analysis methods of the above embodiments.

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A malicious data analysis method based on artificial intelligence is characterized by comprising the following steps:

2. The method of claim 1, wherein the determining a target dynamic and static analysis policy according to the data type of the sample data, and performing dynamic and static analysis processing on the sample data according to the target dynamic and static analysis policy to obtain data characteristic information corresponding to the sample data comprises:

3. The method of claim 2, wherein the static analysis policies include antivirus software scanning policies, file format identification policies, string extraction analysis policies, binary structure analysis policies, disassembly policies, decompilation policies, code structure policies and logic analysis policies, shelling identification policies, and code shelling policies;

4. The method of claim 2, wherein the determining a target machine learning rule according to the data type of the sample data, the data feature information, and the data analysis requirement, and the constructing a machine learning based data classification model according to the data feature information and the target machine learning rule comprises:

5. The method of claim 4, wherein the target signature information further comprises a standard malicious type; the step of constructing a data classification model based on machine learning according to the target feature information and the target machine learning rule comprises the following steps:

6. The method of claim 5, wherein after the building of the machine learning based data classification model from the target feature information and the target machine learning rules, the method further comprises:

7. The method of claim 1, wherein prior to said analyzing said sample data according to said target dynamic-static analysis policy, said method further comprises:

8. An artificial intelligence-based malicious data analysis device, comprising:

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the artificial intelligence based malicious data analysis method of any of claims 1 to 7.

10. A storage medium having a computer program stored thereon, wherein the computer program is configured to execute the artificial intelligence based malicious data analysis method according to any one of claims 1 to 7 when running.