CN115603999A

CN115603999A - Container safety protection method, device, equipment and storage medium

Info

Publication number: CN115603999A
Application number: CN202211250285.5A
Authority: CN
Inventors: 徐文想; 姚倩; 胡建强
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-01-13

Abstract

The embodiment of the application discloses a container safety protection method, a device, equipment and a storage medium. The method comprises the following steps: acquiring running data of a program in a container during running, inputting the running data into a reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request; and executing the response action, acquiring an operation log of the container, calculating an incentive value corresponding to the response action according to the operation log, and returning the incentive value to the reinforcement learning model. In the embodiment of the application, on one hand, the corresponding response action is output while the user access request is determined through the reinforcement learning model, so that the safety protection range of the container is expanded, on the other hand, the reward value corresponding to the response action is calculated and returned to the reinforcement learning model, so that the accuracy of the reinforcement learning model in identifying the attack request is improved.

Description

Container safety protection method, device, equipment and storage medium

Technical Field

The present application relates to the field of network security, and in particular, to a container security protection method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

At present, cloud computing has become a preferred computing and storage mode for more and more enterprises and users, and a container is one of the most important virtualization technologies in cloud computing, and the security of the container directly determines the trust degree of the user on the cloud computing, so that the container is attracted by attention.

Therefore, how to improve the comprehensiveness and effectiveness of the safety protection of the container is a technical problem which needs to be continuously researched by those skilled in the art.

Disclosure of Invention

To solve the foregoing technical problem, embodiments of the present application provide a container security protection method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

According to an aspect of the embodiment of the application, a container safety protection method is provided, which comprises the following steps: acquiring running data of a program in a container during running, wherein the running data comprises user request data and user behavior track data; inputting the operating data into a reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request; and executing the response action, acquiring an operation log of the container, calculating an incentive value corresponding to the response action according to the operation log, and returning the incentive value to the reinforcement learning model so as to update the reinforcement learning model based on the incentive value.

According to an aspect of an embodiment of the present application, the analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain the user access request includes: analyzing the user request data through the reinforcement learning model to obtain first access request data; finding out second access request data from the user behavior track data; determining the user access request based on the first access request data and the second user access request data.

According to an aspect of an embodiment of the present application, before the second access request data of the user is found from the user behavior trace data, the method further includes: determining whether the user access request can be determined based on the first access request data; the finding out second access request data from the user behavior trace data comprises: and if the user access request cannot be determined according to the first access request data, finding out adjacent behavior track data corresponding to the user request data from the user behavior track data, and taking the adjacent behavior track data as the second access request data.

According to an aspect of the embodiment of the present application, the analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request includes: analyzing the user access request through a reinforcement learning model to obtain a plurality of user characteristic data; judging the plurality of user characteristic data to obtain judgment results corresponding to the plurality of user characteristic data; and determining a response action corresponding to the user access request based on the judgment result corresponding to each of the plurality of user characteristic data, and outputting the response action.

According to an aspect of the embodiment of the present application, determining a response action corresponding to the user access request based on a determination result corresponding to each of the plurality of user characteristic data includes: calculating a correlation value between each judgment result in the judgment results corresponding to the plurality of user characteristic data and the user access request; carrying out weighted summation calculation on the correlation numerical values corresponding to the plurality of judgment results to obtain a decision numerical value of the user access request; and determining a response action corresponding to the user access request based on the decision numerical value.

According to an aspect of an embodiment of the present application, the calculating, according to the execution log, a reward value corresponding to the response action and returning the reward value to the reinforcement learning model includes: obtaining access data contained in the running log; judging whether the access data has an unauthorized risk or not to obtain a judgment result; if the judgment result represents that the access data has no unauthorized risk, returning an incentive value larger than zero to the reinforcement learning model; and if the judgment result represents that the access data has the unauthorized risk, returning an incentive value smaller than zero to the reinforcement learning model.

According to an aspect of an embodiment of the present application, the determining whether there is an unauthorized risk in the access data includes: analyzing the access data to obtain resource data used by the user and contained in the access data; and judging whether the access data has an unauthorized risk or not based on the resource data used by the user.

According to an aspect of an embodiment of the present application, there is provided a container safety shield apparatus, comprising: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring running data of a program in a container during running, and the running data comprises user request data and user behavior track data; the reinforcement learning module is used for inputting the operation data into a reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request; and the execution module is used for executing the response action, acquiring the running log of the container, calculating an incentive value corresponding to the response action according to the running log, and returning the incentive value to the reinforcement learning model so as to update the reinforcement learning model based on the incentive value.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the container security method as described above.

According to an aspect of the embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions, which, when executed by a processor of a computer, cause the computer to execute the container security protection method as described above.

According to an aspect of the embodiments of the present application, there is also provided a computer program product, including a computer program, which when executed by a processor, implements the steps in the container security protection method as described above.

In the technical scheme provided by the embodiment of the application, on one hand, a user access request is determined by a reinforcement learning model according to user request data and user behavior track data during program operation in a container, and meanwhile, a response action corresponding to the user access request is output, compared with the prior art that a decision rule base is used for judging the user access request, the application expands the range of container safety protection; on the other hand, the accuracy of the reinforcement learning model for identifying the attack request can be improved by acquiring the operation log of the container, calculating the reward value corresponding to the response action according to the operation log and returning the reward value to the reinforcement learning model.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 is a schematic illustration of an environment for performing container security in accordance with an exemplary embodiment of the present application;

FIG. 2 is a flow chart illustrating a method of securing a container according to an exemplary embodiment of the present application;

FIG. 3 is a flow chart illustrating a method of securing a container according to another exemplary embodiment of the present application;

FIG. 4 is a flow chart illustrating a method of securing a container according to another exemplary embodiment of the present application;

FIG. 5 is a flow chart illustrating a method of securing a container according to another exemplary embodiment of the present application;

FIG. 6 is a flow chart illustrating a method of securing a container according to another exemplary embodiment of the present application;

FIG. 7 is a schematic flow diagram of container security in an exemplary application scenario;

FIG. 8 is a block diagram of a container safety shield apparatus shown in an exemplary embodiment of the present application;

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the application, as detailed in the appended claims.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Reference to "a plurality" in this application means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.

First, the reinforcement learning model is a machine learning model built by reinforcement learning, wherein reinforcement learning is a behavior of learning an Agent in a "trial and error" manner, and a reward guidance behavior obtained by interacting with an environment aims to maximize a reward for the Agent, and is different from supervised learning in connection insights learning, and is mainly expressed on a reinforcement signal, and the reinforcement signal provided by the environment in reinforcement learning is an evaluation (usually, a scalar signal) of the quality of a generated action rather than guiding a reinforcement learning system RLS (learning system) how to generate a correct action. Since the information provided by the external environment is very small, the RLS must learn from its own experience. In this way, the RLS gains knowledge in the context of actions and assessments, improving the action plan to suit the context.

The reinforcement learning mainly comprises an Agent (Agent), an Environment (Environment), a State (State), an Action (Action) and a Reward (Reward). After the agent performs an action, the environment will transition to a new state for which the environment will give a bonus signal (positive or negative). The agent then performs a new action according to a policy based on the new state and the reward of environmental feedback. The above process is a way for the agent to interact with the environment through state, action, rewards. The purpose of reinforcement learning is to find an optimal strategy and achieve the maximum value.

The container safety protection is one of the important means for guaranteeing the micro-service and the application. In the related art, the concept of container safety protection is realized by judging according to a decision rule base, however, the safety types covered by the decision rule base are limited to be not comprehensive and cannot be advanced with time, so that the judgment on attack requests is not comprehensive, and the container safety protection is threatened.

In order to solve the above problems, embodiments of the present application respectively provide a container security protection method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product, and the embodiments will be described in detail below.

Referring to fig. 1, fig. 1 is a schematic diagram of an implementation environment according to an embodiment of the present application. The implementation environment includes a terminal 110 and a server 120, wherein the terminal 110 and the server 120 communicate by wired or wireless means.

The terminal 110 includes a Docker Engine and a Host os (Host operating system) added probe, and when the container is started by the container Engine, the probe monitors, in real time, operation data related to the container, such as a user URL (uniform resource locator), an input parameter, an API call, a behavior trace, and a network traffic, in the Host operating system container. And inputs container operational data into the reinforcement learning model through the server 120.

The server 120 receives container operation data acquired by a probe on the terminal 110, inputs the container operation data into the reinforcement learning model, identifies a user access request contained in the container operation data through the reinforcement learning model, outputs a corresponding response action aiming at the user access request, executes the response action and acts on the terminal 110, acquires an operation log of the container monitored by the probe, calculates a reward value corresponding to the response action according to the operation log, and returns the corresponding reward value to the reinforcement learning model so as to update the reinforcement learning model based on the reward value.

It should be noted that the terminal 110 may be any electronic device supporting a codeless visualization configuration function, such as a smart phone, a tablet computer, a notebook computer, or a wearable device, but is not limited thereto, and for example, the terminal 110 may also be a device applied to a special field, such as a vehicle-mounted terminal, an aircraft, or the like. The terminal 110 may communicate with the server 120 through a wireless network such as 3G (third generation mobile information technology), 4G (fourth generation mobile information technology), 5G (fifth generation mobile information technology), etc., or communicate with the server 120 through a wired network, which is not limited herein.

The server 120 may be, for example, an independent physical server, a server cluster or a distributed system configured by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, a cloud storage, a web service, cloud communication, a middleware service, a domain name service, a security service, a CDN (Content Delivery Network), and a big data and artificial intelligence platform, which is not limited herein.

It should be understood that Cloud Technology refers to a hosting technique for unifying a series of resources, such as hardware, software, network, etc., together in a wide area network or a local area network to achieve calculation, storage, processing, and sharing of data. The cloud technology is also a general name of a network technology, an information technology, an integration technology, a management platform technology, an application technology and the like based on cloud computing business model application, can form a resource pool, is used as required, and is flexible and convenient.

FIG. 2 is a flow chart illustrating a method of securing a container according to an exemplary embodiment of the present application. The method is suitable for use in the implementation environment shown in fig. 1 and is specifically performed by the server 120 in the implementation environment of fig. 1. The method may also be applied to other implementation environments and specifically executed by devices in other implementation environments, which is not limited in this embodiment.

The method proposed by the embodiment of the present application will be described in detail below with a server as an exemplary execution subject. As shown in fig. 3, in an exemplary embodiment, the method includes steps S210 to S230, which are described in detail as follows:

step S210, obtaining operation data of the program in the container during operation, where the operation data includes user request data and user behavior trace data.

It should be noted that the probe is a periodical diagnosis performed on the container by the kubernets (an orchestration management tool of a portable container generated for container service) by the kubernets (node agents), and when the container diagnosis needs to be performed, the kubernets call the container to implement a Handler (asynchronous message).

When a Program on a host operating system of a terminal is started, a container where the Program is located is also started, and running data generated when the Program in the container runs is monitored and collected in real time through a probe, wherein the running data comprises user URL (uniform resource locator), input parameters, request parameters, API (Application Program Interface) calls, user request data and user track data.

For example, when a program in a container on the terminal host system runs, the container is started and a probe is added, and running data generated by the running of the program in the container is obtained through the probe, for example, an instruction of a user initiating an access request at an application client, the probe obtains a URL corresponding to the access instruction, a request parameter, a user IP, an interface call, and a behavior track of the user in the container.

Step S220, inputting the operation data into the reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request.

Inputting all running data related to a container, which is generated when a program in the container runs and collected by a probe in the container, into a pre-trained reinforcement learning model, analyzing user request data and a user behavior track in the running data related to the container through the reinforcement learning model to obtain a corresponding user access request, judging whether the user access request belongs to an attack request, and outputting a corresponding response action according to a judgment result, namely, outputting a signal representing the attack request when the reinforcement learning model judges that the user access request belongs to the attack request so that the probe monitors and intercepts the access request; when the reinforcement learning model judges that the user access request belongs to a normal request, a signal representing the normal request is output, so that the probe can pass the access request and respond.

Illustratively, running data related to the container, such as a CPU, a memory, a network, a process, an API, and a behavior trace, used when an in-container program collected by an in-container probe runs, is input into a pre-trained reinforcement learning model, so as to analyze the running data related to the container through the reinforcement learning model, where the analysis refers to analyzing path information in the running data related to the container, and may include analyzing an effective data source, an effective data packet destination address, a forwarding interface of an effective data packet, and other data related to effective data in the path information. The effective data comprises user request data and user behavior track data, namely, the reinforcement learning model analyzes the user request data and the user track data in the effective data packet to obtain a corresponding user access request, and inputs flow data corresponding to the user access request into an environment variable in the reinforcement learning model, and an agent in the reinforcement learning model analyzes and judges the access request flow data input in the environment and outputs a corresponding response action.

Step S230, executing the response action, obtaining the operation log of the container, calculating an incentive value corresponding to the response action according to the operation log, and returning the incentive value to the reinforcement learning model to update the reinforcement learning model based on the incentive value.

The server executes the response action output by the reinforcement learning model, and acts the response action on the access request corresponding to the client, for example, if the access request submitted by the user on the application program client in the current container is judged to be an attack request through the judgment of the reinforcement learning model, the access request of the user is intercepted, and the corresponding instruction is forbidden to be executed. During the container operation, the operation log of the container records the response action of each execution of the reinforcement learning model output in the log.

The method comprises the steps of obtaining an operation log of a container, obtaining a flow event recorded in the operation log of the container, analyzing the flow event to obtain user information contained in the flow event and resource information used by a user for completing the flow event, judging whether the flow event has risks or not according to the user information and the resource information used by the user for completing the flow event to obtain a judgment result, determining an incentive value according to the judgment result, returning the incentive value to a reinforcement learning model, and enabling the reinforcement learning model to update and adjust according to the incentive value.

In this embodiment, on one hand, a user access request is determined by inputting container operation data generated during the operation of an application program in a container into the reinforcement learning model, and a response action corresponding to the user access request is output, compared with the prior art in which a decision rule base is used for judging the user access request, the application expands the range of container safety protection, and on the other hand, the accuracy of the reinforcement learning model in identifying the attack request can be improved by obtaining a container operation log, calculating an incentive value corresponding to the response action of the reinforcement learning model according to the operation log, and returning the incentive value to the reinforcement learning model.

The method comprises the steps of obtaining operation data of a container generated when a program in the container runs, wherein the operation data comprises user request data and user behavior track data, inputting the operation data into a reinforcement learning model, analyzing the user request data and the user behavior data through the reinforcement learning model to obtain a corresponding user access request, outputting a response action corresponding to the user access request, comprehensively judging whether the user access request has a safety risk or not through the reinforcement learning model, expanding the range of container safety protection through combining the user request data and the user behavior data, then executing a response action output by the reinforcement learning model, obtaining an operation log of the container, calculating a reward value corresponding to the response action according to the operation log, returning the reward value to the reinforcement learning model, guiding the reinforcement learning model to select an optimal strategy to achieve maximum benefit, and improving the performance of the reinforcement learning model for identifying attack types.

Further, referring to fig. 3, based on the above embodiment, in an exemplary embodiment provided in the present application, the implementation process of analyzing the user request data and the user behavior trace data by using the reinforcement learning model to obtain the user access request may include steps S310 to S330, which are described in detail as follows:

step S310, analyzing the user request data through the reinforcement learning model to obtain first access request data.

When running data of an application program in a container during running is input into a reinforcement learning model, the reinforcement learning model analyzes user request data contained in the running data, wherein the analysis comprises extracting corresponding request parameters from a URL (uniform resource locator) submitted by a user, and obtaining first access request data according to a main domain name in the URL or an address corresponding to the user, and the first access request data comprise a plurality of various types of parameters for forming a user access request.

Step S320, finding out the second access request data from the user behavior trace data.

Specifically, the user behavior trace data refers to container operation data occupied when an application program in a container operates, wherein the container operation data can be grouped according to user dimensions to obtain container operation data occupied by a single user, the container operation data occupied by the single user is arranged according to a time sequence to obtain container operation data occupied by the single user according to the time sequence, namely behavior trace data of the user in the container, and second access request data corresponding to the user is extracted from the behavior trace data of the user in the container, wherein the second access request data is adjacent operation data of the user request data.

Step S330, determining a user access request based on the first access request data and the second user access request data.

Analyzing the user request data through a reinforcement learning model to obtain first access request data corresponding to the user, wherein the first access request data can be request parameters, main domain names and the like in a user URL; and obtaining second access request data corresponding to the user at context adjacent resources in the operating data occupied by the user in the container according to the user request data, wherein the second access request data comprises an interface called by a preamble of a user URL or an interface called by data after the user URL. And synthesizing first access request data obtained by the user request data and second access request data extracted from the user behavior data according to the user request data to obtain a user access request, wherein the user access request comprises an internet protocol address, a header, a cookie, a state code, flow information and parameters.

In the embodiment, the first access request data obtained according to the user request data and the second access request data found from the user behavior trajectory data are combined to determine the user access data, so that the knowledge of the user access data is improved, the identification range of the reinforcement learning model on the attack request is expanded through mining the user behavior trajectory data, and the safety protection range of the container is improved.

Based on the foregoing embodiment, in one of the exemplary embodiments provided in this application, before the second access request data of the user is found from the user behavior trace data, the method further includes the following steps:

determining whether a user access request can be determined based on the first access request data;

specifically, the reinforcement learning model analyzes the user request data in the container operation data, and judges whether the user request parameters have fuzzy request parameters or incomplete request parameters. If the request parameter is incomplete, the final execution data cannot be accurately obtained, and the specific execution details of the access request cannot be determined.

Based on the above embodiment, the implementation process of finding out the second access request data from the user behavior trace data may include step S321 and step S322, which are described in detail as follows:

step S321, if it is determined that the user access request cannot be determined according to the first access request data, finding out adjacent behavior trajectory data corresponding to the user request data from the user behavior trajectory data;

in step S322, the adjacent behavior trace data is used as the second access request data.

Analyzing user request data in the container operation data through a reinforcement learning model to obtain first access request data corresponding to a user, wherein the first access request data can be request parameters, main domain names, headers and state codes contained in URLs in the user request data. When specific execution data cannot be determined according to request data such as request parameters, main domain names, headers, state codes and the like contained in user request data, searching adjacent behavior trace data corresponding to the user request data from the user behavior trace data, wherein the adjacent behavior trace data can be processed into an interface without parameters by processing a URL in an http request in the user request data, and performing directed connection of a referrer of each log in the user request data to the URL, and searching adjacent behavior trace data with the directed connection of the URL from the user behavior trace data, namely searching a preamble interface with the directed connection of the URL to a first interface and a subsequent interface with the directed connection of the URL to a last interface from the user behavior trace data. And using the preceding interface and the following interface as second user access data.

Illustratively, analyzing user request data contained in running data of a container through a reinforcement learning model to obtain first access request data of a user generated based on the user request data, and judging whether request parameters in the first access request data of the user are clear, complete and effective, if the request parameters in the first access request data are not clear, complete or invalid, processing a URL in an http request in the user request data into an interface without parameters, processing the URL in each http request in user behavior trajectory data into a directed sequence of the interface without parameters, and extracting an adjacent interface of an interface corresponding to the user request data from the directed sequence in the user behavior trajectory data, that is, an adjacent interface of a head interface and a tail interface of the directed interface sequence of the user request data. And determining a user access request by taking the adjacent interface as second access request data corresponding to the user and combining the second access request data corresponding to the user with the first access request data corresponding to the user.

In this embodiment, when it is determined that the user access request data cannot be determined according to the first access request data of the user, the adjacent behavior trajectory data corresponding to the user request data is found from the user behavior trajectory data and used as the second access request data, so that not only is the specific execution items of the user access request determined, but also the accuracy of the reinforcement learning model for identifying the attack request is improved.

Based on the above embodiments, please refer to fig. 4, in one of the exemplary embodiments provided in the present application, the implementation process of analyzing the user request data and the user behavior trajectory data by using the reinforcement learning model to obtain the user access request and outputting the response action corresponding to the user access request may include steps S410 to S430, and specifically described as follows:

step S410, analyzing the user access request through a reinforcement learning model to obtain a plurality of user characteristic data;

step S420, judging the plurality of user characteristic data to obtain judgment results corresponding to the plurality of user characteristic data;

step S430, determining a response action corresponding to the user access request based on the determination result corresponding to each of the plurality of user characteristic data, and outputting the response action.

In step S410, a user access request is determined according to user request data and user behavior trace data in the operation data of the container, and the user access request is parsed to obtain a plurality of user feature data included in the user access data, where the user feature data includes data related to user features, such as domain name information, user ID, source IP, cookie, header, request parameter, status code, and requested event.

In step S420, a plurality of user feature data included in the user access request obtained through the reinforcement learning model analysis are detected, and whether each of the plurality of user feature data is dangerous attack feature data is detected.

Specifically, in an actual scene, the body intelligent agent through reinforcement learning in the reinforcement learning model perceives the current environment state, and acts according to a strategy corresponding to the current environment state, and the action acts on the environment again to cause the environment state to change, and simultaneously, the environment sends a feedback signal to the intelligent agent.

That is, in order to further improve the security of the container, before the reinforcement learning model determines whether the user access request is an attack request, the reinforcement learning model needs to interact with a large number of user access requests to evaluate the policy and update the model. In the container security protection, the intelligent agent is a server, the environment is a container, the environment state generated when the application program in the container runs is obtained through the probe in the container, the intelligent agent can execute releasing or intercepting operation on a user access request generated on the application program in the container, the environment state can be converted into a new state through the releasing or intercepting operation, and the intelligent agent can receive rewards of different degrees according to the correlation between the new state and the target state.

Exemplarily, since the reinforcement learning model can combine the perception capability and the decision capability of reinforcement learning, it can obtain an observation of a high dimension (including user feature data of multiple dimensions) through interaction between an agent (agent) and an environment at each moment, and perceive the observation by using a machine learning method to obtain a specific state feature representation of the observation, and then can evaluate a value function (state value function) of each state and a value function (action value function) corresponding to a state-action based on an expected return, and promote a decision policy based on the two value functions, where the decision policy is used to map a current state to a corresponding decision action; the environment will react to the decision action and get the next observation. Therefore, the server (agent) can be controlled through the reinforcement learning model, so that the server interacts with the user request data to obtain a plurality of interaction actions, and further, when the container operation data is input into the reinforcement learning model and the user access request is determined through the reinforcement learning model, the agent interacts with the user access request to obtain a corresponding response action.

Specifically, in this embodiment, after the server receives a user access request, the server analyzes user request data through the reinforcement learning model to obtain user feature data of multiple dimensions, inputs the user feature data of multiple dimensions into the state cost function through the reinforcement learning model to obtain a value corresponding to the user feature data of each dimension, obtains an action value corresponding to the user request according to the value function (action value function) of the state-action pair, and outputs a response action for the action value corresponding to the user request based on a decision policy.

Therefore, in this embodiment, the reinforcement learning model is used to analyze the multidimensional feature data in the access request to determine the action value corresponding to the access request, so that the reinforcement learning model outputs the response action corresponding to the access request, thereby performing multidimensional detection on the container security problem, and avoiding the problem of missing of container security protection.

Based on the foregoing embodiment, please refer to fig. 5, in one exemplary embodiment provided by the present application, the implementation process of determining the response action corresponding to the user access request based on the determination result corresponding to each of the plurality of user characteristic data may further include steps S510 to S530, which are described in detail as follows:

step S510, calculating a correlation value between each judgment result and the user access request in the judgment results corresponding to the plurality of user characteristic data;

step S520, carrying out weighted summation calculation on the correlation numerical values corresponding to the plurality of judgment results to obtain a decision numerical value of the user access request;

step S530, determining a response action corresponding to the user access request based on the decision-making value.

Specifically, as described above, the value function (state value function) of each state and the value function (action value function) corresponding to a state-action are evaluated based on the expected return, and the decision policy is improved based on the two value functions, that is, the agent in the reinforcement learning model calculates the correlation value between each user feature data and the user access request according to the weight of each user feature data in the user access request in the value state, in some realizable schemes, the agent may calculate the revenue predicted value corresponding to each of the plurality of user feature data in the current state by using the neural network, perform weighted summation calculation on the revenue predicted values corresponding to each of the plurality of user feature data, and obtain the decision value corresponding to the state (user access request), and the agent in the reinforcement learning model outputs the corresponding response action to the decision value based on the decision policy.

In the embodiment, the current user access request is judged through the multi-dimensional user characteristics, so that omission of the safety problem of the container caused by limitation of a decision rule base in the prior art is avoided, and the safety protection capability of the container is effectively improved.

Based on the above embodiment, please refer to fig. 6, in an exemplary embodiment provided in the present application, the implementation process of calculating the reward value corresponding to the response action according to the running log and returning the reward value to the reinforcement learning model may further include steps S610 to S630', which are described in detail as follows:

step S610, obtaining access data contained in the running log;

step S620, judging whether the access data has an unauthorized risk or not to obtain a judgment result;

step S630, if the judgment result represents that the access data has no unauthorized risk, returning an award value larger than zero to the reinforcement learning model;

and step 630', if the judgment result represents that the access data has the unauthorized risk, returning an award value less than zero to the reinforcement learning model.

Illustratively, all operation data of the container and access data in the operation data are recorded in an operation log of the container, whether attack request data or unauthorized behavior data exist in the access data is further judged, and a corresponding judgment result is recorded.

And if the access data in the container operation log does not have the access behavior of the unauthorized or attack request by judgment, returning a feedback signal related to decision correctness to the reinforcement learning model, specifically, if the access data does not have the unauthorized or attack behavior by judgment and the reinforcement learning model does pass the behavior to complete the access behavior, returning an award value larger than zero to the reinforcement learning model according to the response action of the access behavior, wherein if a plurality of unauthorized or attack access behaviors exist in the access data, the award value returned to the reinforcement learning model is larger. Of course, in some realizable schemes, when a container running log records each access behavior, the reinforced learning model may determine whether the decision on the access behavior is correct, and then return a corresponding reward value to the reinforced learning model based on the determination result.

And if the access data is judged to have the access behavior of the override or attack request, returning a feedback signal related to a decision error to the reinforcement learning model, specifically, if the access data is judged to have the override or attack behavior, and the reinforcement learning model passes the behavior to complete the access behavior, returning a reward value smaller than zero to the reinforcement learning model according to the response action of the access behavior, wherein if a plurality of override or attack access behaviors exist in the access data, the reward value returned to the reinforcement learning model is smaller. Of course, in some realizable schemes, each time an access behavior is recorded in the container operation log, the reinforced learning model may determine whether the decision on the access behavior is correct, and then return a corresponding reward value to the reinforced learning model based on the determination result.

In this embodiment, whether the judgment on the normal request or the attack request of the user in the container running log is correct is judged, the reward value larger than zero is returned according to the correct judgment, and the reward value smaller than zero is returned according to the wrong judgment, and the reward values with different numerical values are used for guiding the reinforcement learning model to generate the optimal strategy, so that the judgment capability of the reinforcement learning model on the attack request is improved, and the safety protection capability of the container is further improved.

Further, based on the above embodiments, in one embodiment provided in the present application, the above determining whether there is an unauthorized risk in the access data further specifically includes step S621 and step S622, which are described in detail as follows:

step S621, analyzing the access data to obtain resource data used by the user and contained in the access data;

step S622, determining whether there is an unauthorized risk of accessing the data based on the resource data used by the user.

Specifically, one or more fields contained in a request header of a network request contained in resource information used by a user in access data are analyzed to obtain request content contained in the request content, whether the request content is matched with one or more detection rules in an attack detection rule set or not is judged according to the attack detection rule set, and if the request content is matched with one or more detection rules in the attack detection rule set, it can be determined that the network request in the access data has an attack request.

In addition, the relationship between the user and the interface called more in the resource information used by the user in the access data can be obtained, the interface called by the access behavior triggered by the user and the user is obtained, whether one or more interfaces which do not belong to the authority interface set corresponding to the user exist in the resource information used by the user is judged according to the authority interface set corresponding to the user, and if yes, the user unauthorized request can be determined to exist in the user resource information in the access data.

In this embodiment, whether a network attack behavior or a user override behavior exists is determined by a network access request in a resource used by a user in access data in a container operation log and interface information used by the user, so as to determine a response action of the reinforcement learning model, and return a corresponding reward value according to a determination result, so as to guide an optimal strategy in the reinforcement learning model to realize maximum benefit, and improve the accuracy of the reinforcement learning model in identifying attack types.

Referring to fig. 7, fig. 7 is an overall service flow diagram of container security protection shown in an exemplary application scenario of the present application.

As shown in fig. 7, container operation data generated when a program in a container is applied is obtained, where the container operation data includes user request data and user behavior trajectory data, the container operation data is directly input into a reinforcement learning model, the user request data in the container operation data is analyzed through the reinforcement learning model to obtain first access request data corresponding to a user, whether the user access request can be determined according to the first access request data corresponding to the user is judged, if not, adjacent behavior trajectory data corresponding to the user request data is found from the user behavior trajectory data in the container operation data, the adjacent behavior trajectory data is used as second access request data corresponding to the user, and then the user access request is determined based on the first access request data and the second access request data corresponding to the user. And then, according to a decision strategy in the reinforcement learning model, a corresponding response action is made for the user access request, the server executes the response action, acts in the container, collects a container operation log, calculates an incentive value corresponding to the response action in the container operation log, and feeds the incentive value back to the reinforcement learning model, so that the reinforcement learning model generates an optimal strategy according to guidance of the incentive value.

In this embodiment, the safety protection range of the container is expanded by mining the adjacent behavior track data corresponding to the user request data in the user behavior track data, and the reward value calculated in the container operation log according to the response action is used as the feedback signal of the reinforcement learning model, so that the accuracy of the reinforcement learning model in identifying the attack request is improved, and the safety protection capability of the container is further improved.

Fig. 8 is a block diagram of a container safety shield apparatus 800, according to an exemplary embodiment of the present application. As shown in fig. 8, the apparatus includes:

the obtaining module 810 is configured to obtain running data of the program in the container during running, where the running data includes user request data and user behavior trajectory data; the reinforcement learning module 820 is configured to input the operation data into the reinforcement learning model, analyze the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and output a response action corresponding to the user access request; the executing module 830 is configured to execute the response action, obtain an operation log of the container, calculate an incentive value corresponding to the response action according to the operation log, and return the incentive value to the reinforcement learning model, so as to update the reinforcement learning model based on the incentive value.

According to an aspect of the embodiment of the present application, the reinforcement learning module 820 further includes: the first analysis unit is used for analyzing the user request data through the reinforcement learning model to obtain first access request data; the searching unit is used for searching the second access request data from the user behavior track data; a first determining unit for determining a user access request based on the first access request data and the second user access request data.

According to an aspect of the embodiment of the present application, the reinforcement learning module 820 further includes: and the first judging module is used for judging whether the user access request can be determined based on the first access request data.

According to an aspect of the embodiment of the application, the search unit is specifically configured to, if it is determined that the user access request cannot be determined according to the first access request data, search, from the user behavior trajectory data, adjacent behavior trajectory data corresponding to the user request data, and use the adjacent behavior trajectory data as the second access request data.

According to an aspect of the embodiment of the present application, the reinforcement learning module 820 further includes: the second analysis unit is used for analyzing the user access request through the reinforcement learning model to obtain a plurality of user characteristic data; a second judging unit, configured to judge the multiple user characteristic data to obtain a judgment result corresponding to each of the multiple user characteristic data; and the second determining unit is used for determining a response action corresponding to the user access request based on the judgment result corresponding to each of the plurality of user characteristic data and outputting the response action.

According to an aspect of an embodiment of the present application, the second determining unit specifically includes: the calculating subunit is used for calculating a correlation value between each judgment result and the user access request in the judgment results corresponding to the plurality of user characteristic data; the weighted summation subunit is used for carrying out weighted summation calculation on the correlation numerical values corresponding to the multiple judgment results to obtain a decision numerical value of the user access request; and the determining subunit is used for determining a response action corresponding to the user access request based on the decision-making numerical value.

According to an aspect of an embodiment of the present application, the executing module 830 includes: an acquisition unit configured to acquire access data included in the operation log; the third judging unit is used for judging whether the access data has the unauthorized risk or not so as to obtain a judging result; the first returning unit is used for returning an award value larger than zero to the reinforcement learning model if the judgment result represents that the access data has no unauthorized risk; and the second returning unit is used for returning an award value smaller than zero to the reinforcement learning model if the judgment result represents that the access data has the unauthorized risk.

According to an aspect of an embodiment of the present application, the third determining unit is specifically configured to analyze the access data to obtain resource data used by the user and included in the access data; and judging whether the access data has the unauthorized risk or not based on the resource data used by the user.

It should be noted that the container safety protection device provided in the foregoing embodiment and the container safety protection method provided in the foregoing embodiment belong to the same concept, and specific ways of performing operations by the respective modules and units have been described in detail in the method embodiment, and are not described herein again. In practical applications, the container safety protection device provided in the above embodiments may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to implement all or part of the above described functions, which is not limited herein.

An embodiment of the present application further provides an electronic device, including: one or more processors; the storage device is configured to store one or more programs, and when the one or more programs are executed by the one or more processors, the electronic device is enabled to implement the container security protection method provided in the foregoing embodiments.

FIG. 9 illustrates a schematic structural diagram of a computer system suitable for use to implement the electronic device of the embodiments of the subject application. It should be noted that the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes, such as executing the method in the above-described embodiment, according to a program stored in a Read-Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 908 including a hard disk and the like; and a communication section 909 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 901.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer-readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Another aspect of the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a container security protection method as before. The computer-readable storage medium may be included in the electronic device described in the above embodiment, or may exist separately without being incorporated in the electronic device.

Another aspect of the application also provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the container security protection method provided in the above embodiments.

The above description is only a preferred exemplary embodiment of the present application, and is not intended to limit the embodiments of the present application, and those skilled in the art can easily make various changes and modifications according to the main concept and spirit of the present application, so that the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of safeguarding a container, comprising:

acquiring running data of a program in a container during running, wherein the running data comprises user request data and user behavior track data;

inputting the operating data into a reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request;

and executing the response action, acquiring an operation log of the container, calculating an incentive value corresponding to the response action according to the operation log, and returning the incentive value to the reinforcement learning model so as to update the reinforcement learning model based on the incentive value.

2. The method of claim 1, wherein the parsing the user request data and user behavior trace data through the reinforcement learning model to obtain a user access request comprises:

analyzing the user request data through the reinforcement learning model to obtain first access request data;

finding out second access request data from the user behavior track data;

determining the user access request based on the first access request data and the second user access request data.

3. The method of claim 2, wherein prior to finding the second access request data for the user from the user behavior trace data, the method further comprises:

determining whether the user access request can be determined based on the first access request data;

the finding out second access request data from the user behavior trace data comprises:

and if the user access request cannot be determined according to the first access request data, finding out adjacent behavior track data corresponding to the user request data from the user behavior track data, and taking the adjacent behavior track data as the second access request data.

4. The method of claim 1, wherein the analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request, includes:

analyzing the user access request through a reinforcement learning model to obtain a plurality of user characteristic data;

judging the plurality of user characteristic data to obtain judgment results corresponding to the plurality of user characteristic data;

and determining a response action corresponding to the user access request based on the judgment result corresponding to each of the plurality of user characteristic data, and outputting the response action.

5. The method of claim 4, wherein the determining a response action corresponding to the user access request based on the determination result corresponding to each of the plurality of user characteristic data comprises:

calculating a correlation value between each judgment result in the judgment results corresponding to the plurality of user characteristic data and the user access request;

carrying out weighted summation calculation on the correlation numerical values corresponding to the plurality of judgment results to obtain a decision numerical value of the user access request;

and determining a response action corresponding to the user access request based on the decision numerical value.

6. The method of claim 1, wherein calculating a reward value corresponding to the response action from the log of runs and returning the reward value to the reinforcement learning model comprises:

obtaining access data contained in the running log;

judging whether the access data has an unauthorized risk or not to obtain a judgment result;

if the judgment result represents that the access data does not have the unauthorized risk, returning an award value larger than zero to the reinforcement learning model;

and if the judgment result represents that the access data has the unauthorized risk, returning an incentive value smaller than zero to the reinforcement learning model.

7. The method of claim 6, wherein said determining whether there is an unauthorized risk of said accessing data comprises:

analyzing the access data to obtain resource data used by the user and contained in the access data;

and judging whether the access data has the unauthorized risk or not based on the resource data used by the user.

8. A container safety shield apparatus, comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring running data of a program in a container during running, and the running data comprises user request data and user behavior track data;

the reinforcement learning module is used for inputting the operation data into a reinforcement learning model, analyzing the user request data and the user behavior trajectory data through the reinforcement learning model to obtain a user access request, and outputting a response action corresponding to the user access request;

and the execution module is used for executing the response action, acquiring the running log of the container, calculating an incentive value corresponding to the response action according to the running log, and returning the incentive value to the reinforcement learning model so as to update the reinforcement learning model based on the incentive value.

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs that, when executed by the one or more processors, cause the electronic device to implement the container security method of any of claims 1-7.

10. A computer-readable storage medium having computer-readable instructions stored thereon,

the computer readable instructions, when executed by a processor of a computer, cause the computer to perform the method of securing a container according to any of claims 1 to 7.