CN117499060A

CN117499060A - Webpage aggressiveness detection method, device, equipment and storage medium

Info

Publication number: CN117499060A
Application number: CN202210876398.XA
Authority: CN
Inventors: 蒋鹏; 李锭; 郭耀; 陈向群
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-07-25
Filing date: 2022-07-25
Publication date: 2024-02-02

Abstract

The application relates to the field of network security and provides a method, a device, equipment and a storage medium for detecting webpage aggressiveness. The method comprises the following steps: sequentially analyzing a plurality of initial call events in a set period, wherein when at least one operation association relation exists between each initial call event which is analyzed at present and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event; and then based on the obtained at least one webpage aggressiveness feature, respectively carrying out webpage aggressiveness detection on the corresponding websites, and generating corresponding target detection results. The detection method provided by the embodiment of the application does not need to modify the kernel of the browser, and reduces the deployment difficulty of the network security module. By analyzing the collected initial calling event, the webpage aggressiveness characteristics related to illegal network station detection are updated, the condition of missing detection and false detection is avoided, and the detection accuracy is effectively improved.

Description

Webpage aggressiveness detection method, device, equipment and storage medium

Technical Field

The application relates to the field of network security and provides a method, a device, equipment and a storage medium for detecting webpage aggressiveness.

Background

With the popularization and development of networks, attack means for networks are becoming more and more, and illegal web page attack is one of the common attack means at present. Therefore, how to detect illegal web pages is becoming one of the problems to be solved in the network security field.

At present, the following two methods are often used to detect whether a web page is an illegal web page:

one is to use a uniform resource locator (Universal Resource Locator, URL) blacklist to detect if an illegitimate web page has been accessed using an object; wherein the URL blacklist is generated based on sandboxes. However, the attacker successfully circumvents the URL blacklist by changing the URL, the internet protocol (Internet Protocol, IP) address, and the like, resulting in failure of the web page aggression detection method based on the URL blacklist.

And secondly, when the URL blacklist is generated, part of illegal web pages are prevented from being judged as illegal web pages by the sandbox by detecting the sandbox environment, so that the URL blacklist lacks the URL of part of illegal web pages, and the condition of missed detection can occur when the URL blacklist is used for carrying out network aggression detection on the network, and the detection accuracy is not high.

The other is to detect illegal hypertext markup language (Hyper Text Markup Language, HTML) or illegal JavaScript code by static or dynamic analysis of the web page code. The detection method can be realized only by modifying the kernel of the browser, but the current browsers on the market are various in variety and version, unified modification standards are difficult to formulate, and a provider of the cloud host is difficult to provide unified security support for the plurality of browsers, so that the deployment difficulty of the network security module is increased. In addition, if a new network security module is not deployed in the browser in time, a potential security threat is also caused.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a storage medium for detecting webpage aggressiveness, which are used for solving the problems of low detection accuracy and high deployment difficulty.

In a first aspect, an embodiment of the present application provides a method for detecting a web page aggressiveness, including:

acquiring a plurality of initial call events in a set period, wherein each initial call event characterizes: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system;

sequentially analyzing the plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed at present and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event; wherein each target call event characterizes: when the application program executes a website access task, invoking an operation executed by a process on a system resource of the operating system;

and respectively carrying out webpage aggressiveness detection on corresponding websites based on the obtained at least one webpage aggressiveness feature, and generating corresponding target detection results.

In a second aspect, an embodiment of the present application further provides a method for detecting a web page aggressiveness, including:

collecting a plurality of initial call logs in a set period, wherein each initial call log characterizes: when an application program executes a class of tasks, the application program interacts with system resources of an operating system;

filtering the obtained initial call logs respectively to obtain corresponding initial call events; wherein each initial call event characterizes: when the application program executes a class of tasks, invoking operations executed by a process on system resources of the operating system;

and sending the obtained multiple initial call events to a server so that the server sequentially analyzes the multiple initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed currently and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the one initial call event.

In a third aspect, an embodiment of the present application further provides a device for detecting web page aggressiveness, including:

the acquisition unit is used for acquiring a plurality of initial calling events in a set period, wherein each initial calling event represents: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system;

The feature extraction unit is used for sequentially analyzing the plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed currently and each target call event, the corresponding webpage aggressiveness feature is updated based on the at least one operation association relation and the initial call event; wherein each target call event characterizes: when the application program executes a website access task, invoking an operation executed by a process on a system resource of the operating system;

and the detection unit is used for respectively carrying out webpage aggressiveness detection on the corresponding websites based on the obtained at least one webpage aggressiveness feature and generating corresponding target detection results.

Optionally, the feature extraction unit is configured to:

for each target call event, the following operations are respectively executed:

acquiring a corresponding parent process name from a process field of one target call event, and acquiring a corresponding child process name from a system resource field of the one target call event;

and matching the parent process name of the initial calling event with the parent process name and the child process name of the target calling event to obtain a corresponding matching result.

Optionally, after parsing the one initial call event, the feature extraction unit is further configured to:

when determining that the operation association relation does not exist between the current resolved one initial call event and each target call event, matching the parent process name of the one initial call event with a preset root process, wherein the root process represents: when an application program of a corresponding client accesses a website for the first time, a process called by the operating system;

if the matching is successful, updating the corresponding webpage aggressiveness characteristic based on the initial calling event; if the matching fails, discarding the initial calling event, and continuing to traverse the next initial calling event.

Optionally, one of the web page aggressiveness features includes a plurality of sub-features, and the detection unit performs the following operations for one of the web page aggressiveness features:

based on a first webpage aggressiveness detection sub-model of the corresponding sub-feature, carrying out primary aggressiveness detection on the corresponding sub-feature to obtain a corresponding initial detection result, wherein each initial detection result represents: the contribution degree of the corresponding sub-feature in network aggressiveness detection;

and carrying out secondary aggression detection on the obtained multiple initial detection results based on a preset second webpage aggression detection sub-model to obtain corresponding target detection results.

In a fourth aspect, an embodiment of the present application further provides a device for detecting web page aggressiveness, including:

the log acquisition unit is used for acquiring a plurality of initial call logs in a set period, wherein each initial call log represents: when an application program executes a class of tasks, the application program interacts with system resources of an operating system;

the log processing unit is used for respectively filtering the obtained initial call logs to obtain corresponding initial call events; wherein each initial call event characterizes: when the application program executes a class of tasks, invoking operations executed by a process on system resources of the operating system;

and the sending unit is used for sending the obtained plurality of initial call events to the server so that the server sequentially analyzes the plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed currently and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the one initial call event.

Optionally, the log collection unit is configured to:

And periodically detecting system resources of the operating system through probes deployed inside the operating system, and acquiring corresponding initial call logs when interaction behaviors between the application program and the system resources of the operating system are detected.

Optionally, the log processing unit is configured to:

the following operations are respectively executed on the initial call logs:

removing redundant attribute fields in an initial call log, and encoding other attribute fields of the initial call log to obtain a corresponding target call log;

and extracting a target attribute field set from a target call log, and packaging the obtained target attribute field set into an initial call event.

Optionally, a set of target attribute fields is extracted by the log processing unit in the following manner:

extracting a field from the target call log to obtain a process field, a system resource field and an operation type field;

and determining the process field, the system resource field and the operation type field as the target attribute field set.

In a fifth aspect, an embodiment of the present application further provides a computer device, including a processor and a memory, where the memory stores program code, and when the program code is executed by the processor, causes the processor to execute the steps of any one of the above-mentioned methods for detecting web page aggressiveness.

In a sixth aspect, embodiments of the present application further provide a computer readable storage medium including program code for causing a computer device to perform the steps of any one of the above-described methods for detecting web page aggressiveness, when the program product is run on the computer device.

In a seventh aspect, embodiments of the present application further provide a computer program product including computer instructions for executing the steps of any one of the above methods for detecting web page aggressiveness.

The beneficial effects of the application are as follows:

the embodiment of the application provides a method, a device, equipment and a storage medium for detecting web page aggressiveness, wherein the method comprises the following steps: acquiring a plurality of initial call events in a set period, wherein each initial call event characterizes: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system; sequentially analyzing a plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed at present and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event; wherein each target call event characterizes: when an application program executes a website access task, invoking an operation executed by a process on a system resource of an operating system; and then based on the obtained at least one webpage aggressiveness feature, respectively carrying out webpage aggressiveness detection on the corresponding websites, and generating corresponding target detection results.

Compared with the traditional webpage aggressiveness detection method, the embodiment of the application provides a real-time, lightweight and cross-browser webpage aggressiveness detection method, the kernel of the browser does not need to be modified, a provider of a cloud host is facilitated to provide a unified webpage security solution for browsers of different operating systems, and the deployment difficulty of a network security module is reduced.

Meanwhile, the traceability analysis technology is innovatively utilized, the collected initial calling event is analyzed, the webpage aggressiveness characteristics related to detection of illegal websites are updated, and the corresponding detection model is designed aiming at webpage access tasks of different application programs under different operating systems by utilizing the idea of integrated learning, so that the conditions of missing detection and false detection are avoided, the detection accuracy is effectively improved, the network access safety of a cloud host is ensured, and the loss caused by access to illegal websites is reduced.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is an optional schematic diagram of an application scenario in an embodiment of the present application;

FIG. 2 is a general architecture diagram of a web page offensiveness system provided by an embodiment of the present application;

FIG. 3A is a flowchart illustrating a probe collecting initial call log according to an embodiment of the present application;

FIG. 3B is a schematic diagram of the deployment of probes in an operating system according to an embodiment of the present application;

FIG. 3C is a logic diagram of a probe collecting initial call logs according to an embodiment of the present application;

fig. 4A is a schematic flow chart of network aggression detection performed by the server according to the embodiment of the present application;

fig. 4B is a logic schematic diagram of network aggression detection performed by a server according to an embodiment of the present application;

fig. 4C is a system traceability diagram generated by accessing an illegal website according to an embodiment of the present application;

FIG. 4D is a system traceability diagram generated by accessing a normal website according to an embodiment of the present application;

fig. 4E is a model architecture diagram of a web page aggressiveness detection model provided in an embodiment of the present application;

Fig. 5 is a schematic diagram of a complete flow of network aggression detection according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a device for detecting web page aggressiveness according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a device for detecting web page aggressiveness according to an embodiment of the present application;

fig. 8 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computing device in an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, which can be made by a person of ordinary skill in the art without any inventive effort, based on the embodiments described in the present application are intended to be within the scope of the technical solutions of the present application.

Some of the terms in the embodiments of the present application are explained below to facilitate understanding by those skilled in the art.

1. Cloud technology (Cloud technology):

the application relates to the field of cloud technology, and the cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data.

Specifically, the Cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like applying a Cloud computing (Cloud computing) business model, and a resource pool formed by the Cloud technology is used as required, so that the Cloud computing system is more flexible and convenient.

The cloud computing technology becomes an important support of the cloud technology, and mainly solves the problem that a large amount of computing and storage resources are needed in background services of a cloud technology network system, including but not limited to video websites, picture websites and more portal websites. With the high development and application of the internet industry, each article may have an own identification mark in the future, and needs to be transmitted to a background system for logic processing, and data of different levels can be processed separately, and various industry data need a powerful system rear shield as a technical support, which can only be realized through cloud computing.

2. Cloud computing:

narrow cloud computing refers to the delivery and usage model of an information technology (Information Technology, IT) infrastructure, which is to obtain the required resources in an on-demand, easily scalable manner over a network; the generalized cloud computing refers to a service delivery and use mode, and a required service is obtained in an on-demand and easily-expandable manner through a network, and the service can be related to IT, software and the Internet, or can be other services.

Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load balancing), and the like. With the development of the internet, real-time data flow and diversification of connected devices, and the pushing of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing has been rapidly developed. Different from the previous parallel distributed computing, the birth of cloud computing is to push the revolutionary transformation of the whole Internet mode and the enterprise management mode in concept.

3. Artificial intelligence cloud services:

artificial intelligence cloud services, also commonly referred to as AI as a service (Artificial Intelligence as a Service, AIaaS), are currently a mainstream service method of an artificial intelligence platform.

Specifically, the AIaaS platform splits several common AI services and provides independent or packaged services at the cloud. This service mode is similar to an AI theme mall: all developers can access and use one or more artificial intelligence services provided by the platform through an API interface, and partial deep developers can also use an AI framework and AI infrastructure provided by the platform to deploy and operate and maintain self-proprietary cloud artificial intelligence services.

4. Cloud instance:

the cloud instance is equivalent to a virtual machine on the server, and comprises virtual CPU (vCPU), memory, operating system, network, disk and other basic computing components, so that the configuration of the instance can be conveniently customized and changed by using the object.

5. System trace (System Provenance): the system trace records interaction information between the application program and the operating system.

6. The trace Data (ProvenanceData) contains the complete history of the Data and process operations since the operating system was booted. The method mainly comprises file operation, network operation, process operation and the like.

7. Trace-source Graph (ProvenanceGraph): the trace-source data is often represented in the form of a graph model. If an entity in the system is considered a node and the causal relationship between nodes is considered an edge, the trace-source data can be described as a Directed Acyclic Graph (DAG), which is referred to as a trace-source graph.

The following briefly describes the design concept of the embodiment of the present application:

In view of this, the embodiments of the present application provide a method, apparatus, device, and storage medium for detecting web page aggressiveness. The method comprises the following steps: acquiring a plurality of initial call events in a set period, wherein each initial call event characterizes: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system; sequentially analyzing a plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed at present and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event; wherein each target call event characterizes: when an application program executes a website access task, invoking an operation executed by a process on a system resource of an operating system; and then based on the obtained at least one webpage aggressiveness feature, respectively carrying out webpage aggressiveness detection on the corresponding websites, and generating corresponding target detection results.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and are not intended to limit the present application, and embodiments and features of embodiments of the present application may be combined with each other without conflict.

The embodiments of the present application may be applied to various scenarios including, but not limited to, cloud technology, artificial intelligence, intelligent transportation, assisted driving, and the like. Fig. 1 shows one of the application scenarios, which includes two physical terminal devices 110 and one server 130. The physical terminal device 110 and the server 130 may communicate using a wired network or a wireless network, which is not limited herein.

A lightweight probe is deployed in the operating system of the physical terminal device 110, and the background probe can collect log information in the process of using the application program by prompting the application program through a prompting interface, prompting short messages, authorization codes and the like, and under the condition that the application program agrees, the probe collects initial call logs in a set period, and each initial call log is generated based on interaction behavior between the application program 120 and the operating system when executing a class of tasks. In addition, the probe is also responsible for filtering all the obtained initial call logs to obtain corresponding initial call events, sending the initial call events to the server side for unified data analysis so as to judge which websites belong to illegal websites with aggressive behaviors in a plurality of websites accessed by the application program, and sending alarm information to a used object so as to prompt the opposite party to stop accessing the related illegal websites, thereby protecting personal information security, data security and network security.

In addition, the collected log information is only used for detecting the webpage aggressiveness, and after the detection work is finished, the collected data is deleted, and the related data is not saved under the condition that the user is unknowed.

The server performs noise reduction processing on the received multiple initial call events, eliminates the initial call events irrelevant to executing the website access task, and updates the corresponding webpage aggressiveness feature based on the target call events relevant to executing the website access task and the operation association relation between the events. And finally, the server respectively detects the webpage aggression of the corresponding websites based on the obtained at least one webpage aggression feature and generates a corresponding target detection result.

In the embodiment of the present application, the physical terminal device 110 is a computer device used by a user. The computer device may be a personal computer, a mobile phone, a tablet computer, a notebook, an electronic book reader, an intelligent home, or the like with a certain computing capability.

The server 130 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (Content Delivery Network, CDN), and basic cloud computing services such as big data and an artificial intelligence platform.

Referring to the overall architecture diagram of the system shown in fig. 2, the web page aggressiveness detection system mainly includes: and a probe responsible for collecting call logs and a server responsible for data analysis.

And deploying a lightweight probe in the operating system of each cloud instance, continuously detecting the kernel space of the operating system, and collecting each initial call log (also called as 'kernel log') in the kernel space. For example, when the browser executes a website access task, a system resource of the operating system is called, a kernel log is generated based on the interaction between the browser and the system resource, and the kernel log is collected by the probe in the kernel space.

Each probe respectively filters each collected initial call log to obtain corresponding initial call events, and sends each initial call event to a server side for data analysis.

And sequentially analyzing a plurality of initial call events through a graph model deployed on a server, wherein when at least one operation association relation exists between each initial call event which is analyzed currently and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event.

And finally, inputting the obtained webpage aggressiveness characteristics into a webpage aggressiveness detection model constructed based on a real-time learning algorithm, respectively carrying out webpage aggressiveness detection on corresponding websites, judging which websites belong to illegal websites with aggressive behaviors in a plurality of websites accessed by an application program, and sending alarm information to a user to prompt the opposite party to stop accessing the related illegal websites so as to protect personal information security, data security and network security.

Next, referring to the flowchart shown in fig. 3A, the operation of the probe of one of the cloud instances is further described.

S301: the probe collects a plurality of initial call logs within a set period, wherein each initial call log characterizes: an application program performs a class of tasks in an interactive manner with system resources of an operating system.

At present, the mainstream operating system provides a preset kernel mechanism to generate kernel logs, but the mode can generate the problems of large log information quantity, large log quantity and the like, and cannot be applied to the application. To solve the above problem, as shown in fig. 3B, one lightweight probe is deployed by configuring a customized kernel log acquisition framework for each cloud instance. In this way, under the condition that the kernel of the browser is not modified, the system resources of the operating system can be periodically detected from the outside of the application program through the probe deployed inside the operating system, and corresponding initial call logs are collected when the interaction behavior between one application program and the system resources of the operating system is detected. When the physical terminal equipment is in a Linux running environment, a Sysdig log acquisition frame is adopted; and when the physical terminal equipment is in the Windows running environment, adopting an ETW acquisition framework.

When the interval time set by the detection period is shorter (for example, every 1 second, whether the interaction between the application program and the system resource occurs or not is detected in the second time), the detection mode approaches to real-time detection, the quantity of logs processed in batch is reduced, and the processing pressure of the system is effectively reduced.

The purpose of the application is to detect whether a website has aggressive behavior, so that an initial call log generated by an application program due to execution of a website access task should be acquired as much as possible in a log acquisition stage. In practical applications, besides the browser, the application object may also access the website through other application programs (for example, in social software, click on a link sent by the chat object, access the corresponding website, etc.), so that the probe detects interaction behavior between other application programs in the physical terminal device and the operating system, and collects the corresponding initial call log.

The implementation mode of the probe mainly refers to the current tracing acquisition mechanism, the process of acquiring each initial call log is shown in fig. 3C, the probe acquires (collector) system call records of each application program in a kernel space (kernel space), temporarily caches each system call record in a ring-shaped buffer area (log), and then uses each system call record as a corresponding initial call event through a shared memory mechanism (handler) to transfer to a user space (user space).

S302: filtering the obtained initial call logs respectively to obtain corresponding initial call events; wherein each initial call event characterizes: an application program, when executing a class of tasks, invokes operations that a process performs on system resources of an operating system.

The kernel log module of the current operating system has a great influence on the running efficiency of the system, and the main reasons are that the existing kernel log contains a large amount of redundant information irrelevant to the detection of the webpage aggressiveness, the attribute field coding of the log is not compact enough, and a large amount of storage space is occupied. Therefore, the embodiment of the application enables the probe to have basic filtering capability by modifying the kernel log module, and eliminates redundant logs. Meanwhile, the embodiment of the application also applies a system kernel log information coding algorithm which is compact in design, so that the probe codes keep specific attribute fields, and a target call log with small overall data volume and high effective information occupation ratio is obtained, so that the operation cost of the probe is remarkably reduced.

Wherein, for each initial call log, the following operations are executed respectively: removing redundant attribute fields in an initial call log, and encoding other attribute fields of the initial call log to obtain a corresponding target call log; and extracting a target attribute field set from the target call log, and packaging the obtained target attribute field set into an initial call event.

The target call log mainly comprises the operation of the application program on system resources such as files, networks, processes and the like. And extracting a field from a target call log to obtain a process field, a system resource field and an operation type field, and determining the process field, the system resource field and the operation type field as a target attribute field group.

Thus, the initial call event generated based on the target property field set is saved in the form of triples < src, dst, operation >. Wherein src represents a process associated with an application; dst represents system resources such as files, networks, processes, registries and the like operated by the src process; operation represents operation types such as read file, write file, read network port, write network port, process spawn (fork), and inter-process communication (Interprocess communication, IPC), etc.

For example, the probe collects an initial call log as shown in the first row of table 1, filters the initial call log to obtain a target call log as shown in the second row of table 1, and finally forms an initial call event in the form of a triplet as shown in the third row of table 1, where the meaning of the event is: 108830 process with process name sshd, performs a read operation on itself.

TABLE 1

S303: and sending the obtained plurality of initial call events to a server so that the server sequentially analyzes the plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed currently and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event.

And sending the obtained multiple initial call events to the server through a communication channel established between the physical terminal equipment and the server, so that the server performs data analysis on each initial call event, judges which websites belong to illegal websites with aggressive behaviors in a plurality of websites accessed by the application program, and sends alarm information to a user object to prompt the opposite party to stop accessing related illegal websites.

After knowing the log collection work done by the physical terminal device side, please refer to the flow diagram shown in fig. 4A and the logic diagram shown in fig. 4B, and continue to introduce the process of detecting network aggressiveness by the server.

S401: the server obtains a plurality of initial call events in a set period, wherein each initial call event characterizes: an application program, when executing a class of tasks, invokes operations that a process performs on system resources of an operating system.

As can be seen from the foregoing description, each probe collects initial call logs between all application programs and system resources in a set period, so that, based on initial call events generated by each initial call log, each initial call event corresponds to an operation performed by a calling process on a system resource of an operating system when an application program of one physical terminal device performs a class of tasks.

S402: the server sequentially analyzes a plurality of initial call events, wherein when at least one operation association relation exists between each initial call event which is analyzed at present and each target call event, the corresponding webpage aggressiveness characteristic is updated based on the at least one operation association relation and the initial call event; wherein each target call event characterizes: when an application program executes a website access task, the application program invokes an operation executed by a process on a system resource of an operating system.

As shown in fig. 4B, the implementation of server data analysis mainly consists of two parts, one part is to use a graph model to perform feature extraction (feature extraction, corresponding to step 402), and the other part is to use a web page aggressiveness detection model to perform feature aggregation, and respectively perform web page aggressiveness detection on corresponding websites (corresponding to step 403).

Fig. 4C and fig. 4D respectively show an illegal website including a worm virus, and a system traceability graph generated when a normal website is accessed, wherein nodes of the system traceability graph are called processes or used system resources, edges between the nodes represent operation association relations between the nodes, for example, a main Chrome process accesses a cabxxxx.tmp file, and then the main Chrome process is taken as a node a, and the cabxxxx.tmp file is taken as a node b, so that an edge is formed.

It can be seen that there are many obvious differences between the two graphs, for example, an illegal website containing a worm virus may access a host file and continue to search for the next target infection object in the same lan through the file, however, a normal website uses the sensorapi. Dll to obtain geographic information of the user, so as to provide personalized services for the user.

By analyzing a large number of system traceability graphs, the following four types of traceability features (also called as "sub-features") related to the detection of web page aggressiveness are extracted, and features containing the four types of sub-features are called as "web page aggressiveness features".

The first category is file characteristics, and specifically includes: the file read histogram feature, the file write histogram feature, and the dynamic library use the histogram feature.

Taking the file read histogram feature as an example, which is essentially a key-value map, a key is all file extensions associated with an application process during access to a web site, and a value is the number of times a file of a different extension is read. Assuming that a website reads a.dat, b.dat and c.png and writes data in d.dat, e.tmp and f.tmp, for this website access the following two histogram features can be obtained: file read histogram {. Dat:2,. Png:1}, and file write histogram {. Dat:1,. Tmp:2}.

The second type is an operation type histogram feature, which is also in essence a key-value map, a key being an operation type, a value being the number of times an operation of a different operation type is performed.

The third class is a network feature, which is a two-dimensional feature vector c= (S, I), I representing the number of IP addresses connected in total in one website access task, S representing the number of different subnets in the IP addresses. It is found that the illegal network station is more prone to be connected with the IP address of the same subnet, so that network characteristics are extracted, and the illegal network station is detected by the webpage aggressiveness detection model.

The fourth class is a feature directly related to the system traceability map, and specifically includes: the width of the system tracing graph, the number of nodes in the system tracing graph and the number of edges in the system tracing graph.

However, in order to process the initial call event sent by each probe in real time, the embodiment of the application does not explicitly construct a system traceability graph as shown in fig. 4C and fig. 4D, but updates the first three types of sub-features, two data structures of Node Depth Map (NDM) and width cache (W) through at least one operation association relationship existing between one initial call event which is analyzed currently and each target call event and the initial call event which is analyzed currently, and then updates the fourth type of sub-features based on the two updated data structures, thereby achieving the purpose of updating the aggressive feature of the web page. Wherein ndm= < N, depth >, N is a node in the system tracing graph, and depth is the depth of the node in the current tracing graph; w= < depth, num >, W [ k ] = m means that there are m total nodes of depth k.

For convenience of description, taking an initial call event of current parsing as an example, the following parsing operation is performed on the initial call event:

acquiring a corresponding parent process name from a process field of the initial call event; and then, the parent process name of the initial call event is respectively matched with the comprehensive process name contained in each target call event, and at least one operation association relation exists between the initial call event and each target call event successfully matched.

Because the probe acquires the corresponding initial call log every time the probe detects the interaction between an application program and a system resource, all the initial call events obtained after the log is processed are still arranged according to the log generation time. If the currently resolved one initial call event is an event related to a website access task, then the parent process name of the initial call event is already present in the previously resolved at least one target call event, or the parent process of the initial call event is the root process invoked when accessing the new website.

Thus, an initial call event that is currently resolved will hit any one of the following results:

(1) Determining that an operation association relation exists with at least one target calling event, and updating corresponding webpage aggressiveness characteristics based on the obtained at least one operation association relation and event content of the operation association relation;

(2) Determining that the parent process of the initial calling event is a root process called when a new website is accessed, and updating corresponding webpage aggressiveness characteristics based on event content of the parent process;

(3) Not in any of the above cases, the initial call event needs to be discarded and the next initial call event is continued to be traversed.

Specifically, for each target call event, the following operations are performed separately:

acquiring a corresponding parent process name from a process field of a target call event, and acquiring a corresponding child process name from a system resource field of the target call event; and matching the parent process name of the current resolved initial calling event with the parent process name and the child process name of the target calling event to obtain a corresponding matching result.

After an initial call event is analyzed, determining that an initial call event which is analyzed currently is matched with a preset root process when no operation association relation exists between the initial call event and each target call event, and characterizing the root process by matching the parent process name of the initial call event with the preset root process: when an application program of a corresponding client accesses a website for the first time, a process called by an operating system;

if the matching is successful, updating the corresponding webpage aggressiveness characteristic based on the initial calling event; if the matching fails, discarding the initial call event, and continuing to traverse the next initial call event.

For example, the parent process name of the initial call event that is currently resolved is Chrome1 (main), and the parent process names of the multiple target call events are Chrome1 (main), so that an operation association relationship exists between the initial call event and the multiple target call events, the initial call event should be saved, and the corresponding webpage aggressiveness feature is updated based on the obtained multiple operation association relationships and the initial call event.

As another example, the parent process name of the currently resolved initial call event is Software reporter, but this parent process is not in the attribute field of the resolved multiple target call events, and because this parent process is not the root process ChromN (main), this initial call event should be discarded and the next initial call event should be traversed continuously.

S403: the server respectively detects the webpage aggressiveness of the corresponding websites based on the obtained at least one webpage aggressiveness characteristic and generates a corresponding target detection result.

At least one webpage aggressiveness feature updated in real time can be input into a webpage aggressiveness detection model deployed on a server, and real-time webpage aggressiveness detection is carried out on a corresponding website. And the finally updated at least one webpage aggressiveness characteristic can be input into a webpage aggressiveness detection model deployed on a server, and the webpage aggressiveness detection is carried out on the corresponding websites accessed in history.

As shown in fig. 4E, the web page aggressiveness detection model is divided into two layers. The first layer is a presentation layer (representations) which deploys a plurality of first web page aggression detection sub-models, one for each of the web page aggression features. The second layer is a voting layer (voting), which deploys a second web page aggression detection sub-model, the input of which is a plurality of initial classification results representing the output of the layer, and the output of which is a target classification result, wherein the target classification result is used for representing the probability that the website belongs to a normal website and the probability that the website belongs to an illegal website.

In essence, the first webpage aggressiveness detection sub-model and the second webpage aggressiveness detection sub-model are both constructed based on a classification idea, but the server selects the matched first webpage aggressiveness detection sub-model and second webpage aggressiveness detection sub-model according to webpage access tasks of different application programs under different operating systems.

The specific detection process is as follows: based on a first webpage aggressiveness detection sub-model of the corresponding sub-feature, carrying out primary aggressiveness detection on the corresponding sub-feature to obtain a corresponding initial detection result, wherein each initial detection result represents: the contribution degree of the corresponding sub-feature in network aggressiveness detection;

and performing secondary aggression detection on the obtained multiple initial detection results based on a preset second webpage aggression detection sub-model to obtain corresponding target detection results.

For example, a plurality of sub-features in one web page aggression feature of the website a are respectively input into the corresponding first web page aggression detection sub-model, the obtained initial detection results are (0.1,0.3,0.2,0.1,0.3), then (0.1,0.3,0.2,0.1,0.3) are input into the second web page aggression detection sub-model, the obtained target detection results are (0.2, 0.8), the result shows that the probability of judging the website a as a normal website is 20% and the probability of judging the website a as an illegal website is 80%, so the website a belongs to the illegal website, and warning information is sent to a user as soon as possible to prompt the opposite side not to access the website any more.

In addition, the embodiment of the application applies the traceability analysis technology and the machine learning idea to the field of network security, expands the boundary and application scene of the related artificial intelligence (Artificial Intelligence, AI) technology, and is beneficial to the landing and development of the AI technology.

For ease of understanding, taking a specific embodiment as an example, the complete process of detecting web page aggressiveness is as follows:

s501: collecting initial call logs within three hours through probes deployed in an operating system;

s502: filtering the obtained initial call logs through probes deployed in an operating system to obtain initial call events in a triplet form;

s503: the probe sends each initial calling event to a server;

s504: reading an initial calling event through a graph model deployed on a server;

s505: the graph model judges whether at least one operation association relation exists between the initial call event analyzed currently and each target call event, if yes, step 508 is executed; otherwise, go to step 506;

s506: the graph model judges whether the parent process name of the initial call event analyzed currently is matched with a preset root process, if yes, step 509 is executed; otherwise, go to step 507;

s507: discarding the initial call event of the current parsing and returning to step 504;

s508: the graph model updates corresponding webpage aggressiveness characteristics based on at least one operation association relation and the initial call event;

S509: updating the corresponding webpage aggressiveness characteristic by the graph model based on the initial calling event;

s510: the graph model determines whether all the initial call events are resolved, if so, step 511 is performed; otherwise, return to step 504;

s511: inputting the obtained at least one webpage aggressiveness characteristic into a webpage aggressiveness detection model deployed on a server, respectively carrying out webpage aggressiveness detection on corresponding websites, and generating corresponding target detection results.

Based on the same inventive concept as the method embodiment, the embodiment of the application also provides a device for detecting web page aggressiveness. As shown in fig. 6, the apparatus 600 for detecting web page aggressiveness may include:

an obtaining unit 601, configured to obtain a plurality of initial call events in a set period, where each initial call event characterizes: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system;

the feature extraction unit 602 is configured to parse the plurality of initial call events in sequence, where, when at least one operation association relationship exists between each of the current parsed initial call events and each of the target call events, update the corresponding webpage aggressiveness feature based on the at least one operation association relationship and the one initial call event; wherein each target call event characterizes: when an application program executes a website access task, invoking an operation executed by a process on a system resource of an operating system;

The detecting unit 603 is configured to perform web page aggression detection on the corresponding websites based on the obtained at least one web page aggression feature, and generate corresponding target detection results.

Optionally, an initial call event that is currently resolved is resolved by the feature extraction unit 602 in the following manner:

acquiring a corresponding parent process name from a process field of an initial call event;

and respectively matching the parent process name of the initial calling event with the comprehensive process name contained in each target calling event, and determining that at least one operation association relationship exists between the initial calling event and each target calling event successfully matched.

Optionally, the feature extraction unit 602 is configured to:

for each target call event, the following operations are respectively executed:

acquiring a corresponding parent process name from a process field of a target call event, and acquiring a corresponding child process name from a system resource field of the target call event;

Optionally, after parsing an initial call event, the feature extraction unit 602 is further configured to:

when determining that an initial calling event analyzed currently does not have an operation association relation with each target calling event, matching the parent process name of the initial calling event with a preset root process, and characterizing the root process: when an application program of a corresponding client accesses a website for the first time, a process called by an operating system;

if the matching is successful, updating the corresponding webpage aggressiveness characteristic based on an initial calling event; if the matching fails, discarding one initial calling event, and continuing to traverse the next initial calling event.

Optionally, one web page aggressiveness feature contains a plurality of sub-features, and the detection unit 603 performs the following operations for one web page aggressiveness feature:

Based on the same inventive concept as the method embodiment, the embodiment of the application also provides a device for detecting web page aggressiveness. As shown in fig. 7, the apparatus 700 for detecting web page aggressiveness may include:

the log collection unit 701 is configured to collect a plurality of initial call logs in a set period, where each initial call log represents: when an application program executes a class of tasks, the application program interacts with system resources of an operating system;

the log processing unit 702 is configured to perform filtering processing on each obtained initial call log, so as to obtain a corresponding initial call event; wherein each initial call event characterizes: when an application program executes a class of tasks, invoking operations executed by a process on system resources of an operating system;

and a sending unit 703, configured to send the obtained plurality of initial call events to the server, so that the server sequentially parses the plurality of initial call events, where, when at least one operation association relationship exists between each of the current parsed initial call events and each of the target call events, the corresponding web page aggressiveness feature is updated based on the at least one operation association relationship and the one initial call event.

Optionally, the log collection unit 701 is configured to:

and periodically detecting system resources of the operating system through probes deployed inside the operating system, and collecting corresponding initial call logs when interaction behaviors between an application program and the system resources of the operating system are detected.

Optionally, the log processing unit 702 is configured to:

the following operations are respectively executed for each initial call log:

Optionally, one target attribute field set is extracted by the log processing unit 702 in the following manner:

extracting a field from a target call log to obtain a process field, a system resource field and an operation type field;

and determining the process field, the system resource field and the operation type field as a target attribute field set.

For convenience of description, the above parts are described as being functionally divided into modules (or units) respectively. Of course, the functions of each module (or unit) may be implemented in the same piece or pieces of software or hardware when implementing the present application.

Having described the method and apparatus for accessing a service platform according to an exemplary embodiment of the present application, next, a computer device according to another exemplary embodiment of the present application is described.

Those skilled in the art will appreciate that the various aspects of the present application may be implemented as a system, method, or program product. Accordingly, aspects of the present application may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Based on the same inventive concept as the above-described method embodiment, a computer device is further provided in the embodiment of the present application, and referring to fig. 8, the computer device 800 may include at least a processor 801 and a memory 802. The memory 802 stores program code that, when executed by the processor 801, causes the processor 801 to perform the steps of any one of the above-described methods for detecting web page aggressiveness.

In some possible implementations, a computing device according to the present application may include at least one processor, and at least one memory. The memory stores program code that, when executed by the processor, causes the processor to perform the steps in the method for detecting web page aggressiveness according to various exemplary embodiments of the present application described in the present specification. For example, the processor may perform the steps as shown in fig. 4A.

A computing device 900 according to such an embodiment of the present application is described below with reference to fig. 9. The computing device 900 of fig. 9 is only one example and should not be taken as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 9, computing device 900 is in the form of a general purpose computing device. Components of computing device 900 may include, but are not limited to: the at least one processing unit 901, the at least one memory unit 902, a bus 903 connecting the different system components, including the memory unit 902 and the processing unit 901.

Bus 903 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

The storage unit 902 may include a readable medium in the form of volatile memory, such as Random Access Memory (RAM) 9021 and/or cache storage unit 9022, and may further include Read Only Memory (ROM) 9023.

The storage unit 902 may also include a program/utility 9025 having a set (at least one) of program modules 9024, such program modules 9024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The computing device 900 may also communicate with one or more external devices 904 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the computing device 900, and/or any devices (e.g., routers, modems, etc.) that enable the computing device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 905. Moreover, computing device 900 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, for example, the Internet, through network adapter 906. As shown, the network adapter 906 communicates with other modules for the computing device 900 over the bus 903. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with computing device 900, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

Various aspects of the method for detecting web page aggressiveness provided herein may also be implemented in the form of a program product comprising program code for causing a computer device to carry out the steps of the method for detecting web page aggressiveness according to the various exemplary embodiments of the present application described hereinabove, when the program product is run on a computer device, e.g. the computer device may carry out the steps as shown in fig. 4A, based on the same inventive concept as the method embodiments described hereinabove.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims

1. A method for detecting web page aggressiveness, comprising:

2. The method of claim 1, wherein an initial call event for the current resolution is resolved by:

acquiring a corresponding parent process name from a process field of the initial calling event;

3. The method as set forth in claim 2, wherein said matching the parent process name of said one initial call event with the integrated process name contained in each target call event, respectively, comprises:

for each target call event, the following operations are respectively executed:

4. The method of claim 2, further comprising, after parsing the one initial call event:

5. The method according to any one of claims 1 to 4, wherein the web page aggressiveness detection is performed on the corresponding web sites based on the obtained at least one web page aggressiveness feature, and the corresponding target detection result is generated, wherein one web page aggressiveness feature comprises a plurality of sub-features, and the following operations are performed for one web page aggressiveness feature:

6. A method for detecting web page aggressiveness, comprising:

7. The method of claim 6, wherein collecting a plurality of initial call logs over a set period comprises:

8. The method of claim 6, wherein the filtering the obtained initial call logs to obtain the corresponding initial call events includes:

The following operations are respectively executed on the initial call logs:

9. The method of claim 8, wherein a set of target attribute fields is extracted by:

10. A device for detecting web page aggressiveness, comprising:

11. The apparatus of claim 10, wherein the currently resolved one initial call event is resolved by the feature extraction unit by:

12. A method for detecting web page aggressiveness, comprising:

13. A computer device comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-5 or 6-9.

14. A computer readable storage medium, characterized in that it comprises a program code for causing a computer device to perform the steps of the method according to any one of claims 1-5 or 6-9, when said program code is run on said computer device.

15. A computer program product comprising computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 5 or 6 to 9.