CN112948132B - Vectorization method of cloud service event and service level contract data - Google Patents

Vectorization method of cloud service event and service level contract data Download PDF

Info

Publication number
CN112948132B
CN112948132B CN202110372833.0A CN202110372833A CN112948132B CN 112948132 B CN112948132 B CN 112948132B CN 202110372833 A CN202110372833 A CN 202110372833A CN 112948132 B CN112948132 B CN 112948132B
Authority
CN
China
Prior art keywords
violation
state
cloud service
instance
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110372833.0A
Other languages
Chinese (zh)
Other versions
CN112948132A (en
Inventor
李肖坚
张翠萍
杨昊澎
黄程灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202110372833.0A priority Critical patent/CN112948132B/en
Publication of CN112948132A publication Critical patent/CN112948132A/en
Application granted granted Critical
Publication of CN112948132B publication Critical patent/CN112948132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a vectorization method of cloud service events and service level contract data thereof, which comprises the steps of formalizing events and constructing state elements from fragment data and contract data of discrete cloud service events, formalizing contracts and extracting violation elements and indexes thereof, mapping a contact tuple of event conditions and service level contracts, quantizing the state elements and the indexes thereof, generating a condition-index vector sample sequence of the cloud service events and the service level contracts, and taking the condition-index vector sample sequence as vectorization trace data of cloud server log suspected violation. The condition-index vector sample obtained by the method can be used for neural network deep judgment aiming at event violations such as entity states or entity contact and the like, and can also be used for intrusion detection, event investigation, tracking and tracing and the like based on deep learning.

Description

Vectorization method of cloud service event and service level contract data
Technical Field
The invention relates to the technical field of network security of cloud servers, and aims to provide a vectorization method for cloud service events and service level contract data thereof.
Background
A Cloud Server (Cloud Server) is a physical or virtual infrastructure that executes application programs and information processing storage. The physical server is divided into a plurality of virtual servers through virtualization software, and an Infrastructure as a Service (IaaS) architecture is applied to process the workload and store information, so that a user can remotely access the functions of the virtual servers through an online interface. An ali cloud server (ECS) is a simple, efficient and elastically scalable IaaS-level cloud computing Service with processing capability, and is shown in fig. 1 as a cloud server structure diagram. A cloud service is a behavior in which a Process (Process) of a cloud server replies to information as requested by a specific protocol. And the cloud service log is a trace record of service behavior.
Network security is the ability to protect the hardware, software and their data of a network system from attacks, intrusions, interferences, damages or unauthorized accesses and other unexpected emergencies, to keep them in a stable and reliable operating state, and to ensure the confidentiality, integrity, authenticity, availability and resistance to repudiation of the network data. In the ECS architecture, the security group is essentially a virtual firewall that defines access traffic (i.e., which applications can be accessed) by some rules, and has state detection and packet filtering capabilities for partitioning security domains in the cloud. Security group rules may allow or disallow access to and from both extranets and intranets of ECS instances. That is, by configuring security group rules, ingress and egress traffic to and from ECS instances within a security group can be controlled. An ECS instance must belong to at least one security group. When creating an instance, a security group needs to be selected for network access control.
Big data computing service (MaxCompute) is a cloud computing service developed autonomously by alrbaba for processing structured and semi-structured big data. The method adopts an abstract operation processing framework to unify the computing tasks of different scenes on the same platform, shares security, storage, data management and resource scheduling, and provides a unified programming interface and interface for the data processing tasks from different user requirements. The system supports SQL processing compatible with standard syntax, an extended MapReduce programming framework and the like.
A Service Level Agreement (SLA) is a contract that is formally negotiated between a Service provider and a client to ensure a desired Level of Service. It specifies service availability level indicators and indemnity schemes for cloud services provided by the Ali cloud to customers. And the violation refers to the behavior that the cloud service provider or the customer does not reach the contract. A cloud service provided by a cloud service provider may be blamed if its computing performance (e.g., availability, security, etc.) does not meet service contract requirements.
The cloud service events are generally stored in the form of cloud service logs, and the log data volume is large, discrete and discontinuous, and further comprises non-numerical character strings. The cloud service level index is mostly described by natural language or characters, and has semantic interval with the cloud service log. Neither service log data nor grade index data can directly participate in calculation, and the method is more unfavorable for neural network depth judgment of event violation such as entity state or entity contact.
The existing data vectorization method has the following four defects in processing cloud service events and level contract data thereof:
the first is that the vectorization method relying on the context or word stock is constrained by the structural data and word stock, can vectorize only limited character strings, cannot process a large number of and any character strings, and is not suitable for processing unstructured data. In addition, the method is not strong in adaptability and consumes time, and the model needs to be trained again when different data are input;
the second is that the traditional vectorization method for processing time series data is not suitable for processing non-time series data;
thirdly, the existing index quantification method only considers indexes such as time or resource use conditions singly, and is difficult to synthesize and quantify multi-party indexes such as time, quantity, operation and resource use conditions.
Fourth is that no specific vectorization to service events and their contract data.
Therefore, the existing data vectorization method cannot process discrete non-time sequence event fragment data, cannot process multi-index contract data at the same time, lacks a path for converting log data into violation semantics, and needs to construct an index for converting the log data into the violation semantics.
Disclosure of Invention
The method aims to solve the technical problem that the existing method cannot process discrete non-time sequence event fragment data and multi-index contract data at the same time. The invention provides a vectorization method of cloud service events and level contract data thereof. Firstly, formalizing a cloud service event and constructing a state element of the event; secondly, formalizing a service level contract and extracting violation elements and indexes thereof; thirdly, mapping the relation between the event condition and the service level to obtain a contact tuple of 'state element-violation element'; fourthly, constructing a 'status-index' contact tuple according to the 'status element-violation element' contact tuple and the index; fifthly, quantizing the 'status-index' contact tuple according to a quantization rule; and finally, generating a 'condition-index' vector sample of the cloud service event.
The vectorization method of the cloud service event and the level contract data thereof has the advantages that:
the method provides a way for converting the semantics of the event and the contract data thereof into the semantics of violation by formalizing the cloud service event and the contract thereof and respectively constructing the state element and the violation element.
Secondly, the method is not restricted by whether the log data is complete or not, and only needs the discontinuous fragment data to generate the data.
Compared with a vectorization method for describing data by only one dimension of a high-dimensional vector, the method of the invention generates less redundant dimensions and brings less pressure to calculation and storage.
The method can simultaneously integrate multi-party indexes such as time, quantity, operation, resource use condition and the like to measure the events.
The method generates violation judgment which can be used as the state change event of the entity or the contact event between the entities.
The method can be expanded to be used for intrusion detection, event investigation, tracing and tracing.
Drawings
Fig. 1 is a diagram of a cloud server ECS structure.
FIG. 2 is a map of cloud service events and their tier contracts of the present invention.
Fig. 3 is a flowchart of a vectorization method of cloud service events and their level contract data according to the present invention.
Fig. 4 is a flow chart of quantization of state elements in the present invention.
FIG. 5 is an accuracy, precision, and recall of applying KNN to determine long-tailed violations.
Fig. 6 is a false positive rate for applying KNN to determine a long-tailed violation.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The objects processed by the invention are cloud Service logs and cloud Service Level contract data (SLA) thereof. Each cloud service log is sourced from an MR jobmodule, an SQL jobmodule and the like in the big data computing service storage/computing layer. The big data computing service is a proprietary cloud enterprise edition of Ali cloud, published on page 11 of big data computing service product introduction, 11/18/2020. Product version: v3.12.0 is added. The version of big data computing service MaxCompute service level contract has effective date of 2018, month 02 and day 01.
In the invention, SLA restricts the responsibility of the cloud service, and the user submits the demand to the cloud service by taking JOB JOB as a unit. Therefore, the unit that the SLA can be mapped out most completely is a job. If the request of JOB happens to be assumed by a TASK, the responsibility can be mapped to the TASK.
Formalizing cloud service events
The cloud service log records the execution status of the job. A log records the operating conditions of an instance. A job is composed of one or more tasks, and a task is composed of one or more instances.
In the present invention, JOBs are denoted as JOB; TASK, noted TASK.
The JOB has a plurality of tasks, the task set is expressed as MTA in a set form, and
Figure GDA0003741497200000021
Figure GDA0003741497200000022
indicating the first task belonging to the JOB.
Figure GDA0003741497200000023
Indicating that the second task belongs to a JOB.
Figure GDA0003741497200000024
Indicating the ith task belonging to JOB.
Figure GDA0003741497200000025
Indicating the last task belonging to the JOB.
To is coming toTo facilitate the explanation of the invention, the lower subscript i denotes the identification number of the task, i.e.
Figure GDA0003741497200000026
Also referred to as any one task. The lower subscript m represents the total number of tasks.
In the invention, an example is denoted as INST; any one of the tasks
Figure GDA0003741497200000031
There are multiple instances, and the set of instances is expressed in the form of a set as
Figure GDA0003741497200000032
And is
Figure GDA0003741497200000033
Figure GDA0003741497200000034
Indicating belonging to any one of the tasks
Figure GDA0003741497200000035
The first example of (1).
Figure GDA0003741497200000036
Indicating belonging to any one of the tasks
Figure GDA0003741497200000037
The second example of (1).
Figure GDA0003741497200000038
Indicating belonging to any one of the tasks
Figure GDA0003741497200000039
The j-th instance of (a).
Figure GDA00037414972000000310
Indicating belonging to any one of the tasks
Figure GDA00037414972000000311
The last example of (1).
For the sake of convenience in explaining the present invention, the subscript j denotes an identification number of an example, i.e.
Figure GDA00037414972000000312
Also referred to as any one instance under the task. The lower subscript n represents the total number of instances.
Content of field contained in cloud service event
In the present invention, any one of the examples will be given
Figure GDA00037414972000000313
The record is taken as a cloud service event. Then, any one of the examples
Figure GDA00037414972000000314
The contained cloud service event field content is recorded as
Figure GDA00037414972000000315
And is provided with
Figure GDA00037414972000000316
The start _ time represents the start time of the instance. end _ time represents the end time of the instance. The machine _ id represents a cloud server identification. task _ name represents the task name. job _ name represents a job name. inst _ name represents an instance name. seq _ no represents the number of instance retries. Total _ seq _ no represents the total number of instance retries. status represents the state of the instance. CPU _ avg represents the average CPU utilization of an instance. CPU _ max represents the maximum CPU utilization of the instance. mem _ avg represents the average memory usage of the instance. mem _ max represents the maximum amount of memory usage for the instance.
In the invention, the cloud service event field content
Figure GDA00037414972000000317
Is used for constructing a cloud service event state element set
Figure GDA00037414972000000318
And cloud service event violation element set
Figure GDA0003741497200000041
Of (2) is used.
In the invention, if the cloud service event state element is collected
Figure GDA0003741497200000042
A certain state element in (1) is a violation element set related to a violation cloud service event
Figure GDA0003741497200000043
A factor of a violation element in (2), a cloud service event
Figure GDA0003741497200000044
The state element of (1) is the violation element.
State element of cloud service event
In the present invention, cloud service events
Figure GDA0003741497200000045
Each field in (a) is a constituent element of a sentence. The sentence structure component method is applied, and the subject part and the predicate part of the sentence are divided by double vertical lines. One time cloud service event
Figure GDA0003741497200000046
An expression of "main predicate expression" is denoted as SYS _ EVENT, and SYS _ EVENT [ period ]]The (specific) example | [ retry ] is]In the present state<Load(s)>。
Referring to fig. 2, in the mapping diagram of the cloud service EVENT and its level contract, in the present invention, a cloud service EVENT state element set is formed by seven state elements, which is denoted as EVENT _ STATUS, and
Figure GDA0003741497200000047
in the present invention, the seven state elements are:
the duration state element TIME describes a duration state of the cloud service event, and is denoted as TIME ═ start _ TIME, end _ TIME }.
LOCATION state element LOCATION, which describes the LOCATION state of the cloud service event, is denoted as LOCATION ═ machine _ id, job _ name, task _ name }.
A NUMBER state element NUMBER, which describes the NUMBER state of the cloud service event, is denoted as NUMBER ═ inst _ name.
The RETRY state element RETRY describes a RETRY state of the cloud service event, which is denoted as RETRY ═ { seq _ no, total _ seq _ no }.
The OPERATION state element OPERATION, which describes the OPERATION state of the cloud service event, is denoted as OPERATION ═ status }.
And a CPU load state element CPU describing a CPU load state of the cloud service event, which is denoted as CPU _ avg, CPU _ max.
Memory load state element MEM, which describes the memory load state of a cloud service event, is denoted MEM ═ MEM _ avg, MEM _ max }.
Violation element of cloud service event
In the invention, the set of VIOLATION elements of the cloud service event is marked as VIOLATION, and
Figure GDA0003741497200000051
vf _ longTail represents the instance level duration-violation element.
vf location represents the location element at the instance level-violation element.
vf _ number represents the number element of the job level-violation element.
vf retry represents retry-violation elements at the instance level.
vf operation represents an instance level operation element-violation element.
vf _ CPU represents the instance level CPU load element-violation element.
vf _ mem represents the instance level memory load element-violation element.
In the present invention, violationsMeta refers to cloud service event-situation conventions
Figure GDA0003741497200000052
Refers to elements that violate the specification. Constructing the violation elements extracted from the SLAS to obtain a cloud service event violation element set
Figure GDA0003741497200000053
Formalized service level contracts
Referring to fig. 2, a mapping chart of a cloud service event and a level contract thereof constructed by the present invention is shown, if the cloud service event
Figure GDA0003741497200000054
Is embodied in the sentence SYS _ EVENT ═ time period]The (specific) example | [ retry ] is]In the form of state<Load(s)>The predicate and the predicate complement components. Multiple events
Figure GDA0003741497200000055
The number of bursts violation is reflected in the sentence SYS _ EVENT ═ time period]The (specific) example | [ retry ] is]In the form of state<Load(s)>For example, NUMBER state element NUMBER.
In the invention, a cloud service event-situation stipulation SLAS is set according to a big data computing service MaxCommute service level contract. SLAS includes 7 conventions, expressed in sets as:
Figure GDA0003741497200000061
sla4inst _ time represents the duration element specification at the instance level. sla4inst _ location represents the instance level specification of the location element. sla4job _ number represents the number element specification of the job level. sla4inst _ retry represents retry meta-reduction at the instance level. sla4inst _ operation represents an example level operational specification. sla4inst _ CPU represents the example-level CPU load specification. sla4inst _ mem represents the instance level memory load specification.
Instance level long meta-protocol
The specification of the time element at the instance level is recorded as sla4inst _ time. The sla4inst _ TIME refers to the specification of the TIME status element TIME. The Chinese expression is as follows: and if the running time period of a certain instance in the task is more than or equal to 3 times the average running time period of all instances, the instance has a long tail.
The example-level duration state element specification of the chinese expression is formalized as formula (1):
Figure GDA0003741497200000062
Figure GDA0003741497200000063
v denotes a condition of predicate decision.
When the predicate decision result of v is
Figure GDA0003741497200000064
Then equation (1) is violated and it is marked as violation-instance level duration element convention, i.e. the state element is violation element vf longTail.
When the predicate decision result of v is not
Figure GDA0003741497200000065
The instance-level length element specification is satisfied.
Predicate IsLongTail (-) represents an instance
Figure GDA0003741497200000066
Is greater than or equal to the long tail indicator.
longtail _ metric represents an example
Figure GDA0003741497200000067
Index of long tail.
n represents a task
Figure GDA0003741497200000068
Examples of (A) to (B)
Figure GDA0003741497200000069
The total number of (a).
Figure GDA00037414972000000610
Showing examples
Figure GDA00037414972000000611
The end time of (c).
Figure GDA00037414972000000612
Showing examples
Figure GDA00037414972000000613
The start time of (c).
Figure GDA00037414972000000614
Showing examples
Figure GDA00037414972000000615
The operating period of (c).
Instance-level duration violation elements and their indices are extracted from sla4task _ time.
At sla4task _ time, the element of the suspected violation convention is the runtime of the instance, i.e., reflected on the duration state element. Therefore, the duration element-violation element vf _ longTail is an example running period, which is denoted as vf _ longTail ═ end _ time, start _ time >.
Example Long tailed violation index longTail _ metric is 3 times the average run time period of all examples, expressed as
Figure GDA0003741497200000071
Instance level specification of location elements
The position meta-convention at instance level, is denoted as sla4inst _ location. The sla4inst _ LOCATION refers to the specification of the LOCATION state element LOCATION. The Chinese expression is as follows: if one instance on one cloud server is in a failure state, the instance disables the machine. If the instance enables the disabled machine, then the instance is deemed to have a location violation.
The example-level location element of the Chinese expression is formalized as formula (3):
Figure GDA0003741497200000072
Figure GDA0003741497200000073
v denotes a condition of predicate decision.
When the predicate decision result of v is
Figure GDA0003741497200000074
Then equation (3) is violated and it is marked as violation-instance level position element convention, i.e. the state element is violation element vf location.
When the predicate decision result of v is not
Figure GDA0003741497200000075
The instance-level position meta-convention is satisfied.
Predicate IsUsued (-) represents an instance
Figure GDA0003741497200000076
The status of (1) is Uue (whether cloud server status is available).
Predicate IsFailed (-) represents an instance
Figure GDA0003741497200000077
The state of (1) is Failed.
The forbidden _ machine represents the identity of the cloud server that is disabled.
Figure GDA0003741497200000078
Representing an instance of operation
Figure GDA0003741497200000079
The identification of the cloud server.
Figure GDA00037414972000000710
Representing an instance of operation
Figure GDA00037414972000000711
Of another cloud server.
Figure GDA00037414972000000712
Showing examples
Figure GDA00037414972000000713
The state of (1).
The location violation at instance level and its index are extracted from sla4inst _ location.
In sla4inst _ location, the element relating to the violation is the identification of the cloud server that carried the instance running, i.e., reflected on the location state element. Job names and task names can uniquely identify instances. Therefore, the location element-violation element vf _ location is expressed as vf _ location ═ machine _ id, job _ name, task _ name }.
And the index location _ metric of the location violation is the device id of the disabled machine, which is denoted as location _ metric ═ machine _ id.
Job-level number element specification
And the number element specification of the job level is recorded as sla4 jobnumber. The sla4 jobnumber refers to the NUMBER state element NUMBER reduction. The Chinese expression is as follows: if the number of Reduce instances of a job exceeds 2000, or the number of Map instances of a job exceeds 8000, then the number of instances of the job is overrun.
The Chinese expression job level quantity element is reduced and expressed into a formula (5):
Figure GDA0003741497200000081
Figure GDA0003741497200000082
Figure GDA0003741497200000083
rNumber_metric=2000 (8)
mNumber_metric=8000 (9)
v denotes a condition of predicate decision.
When the predicate decision result of v is
Figure GDA0003741497200000084
Then equation (5) is violated and it is noted as violation-job level number element convention, i.e. the state element is violation element vf number.
When the predicate decision result of v is not
Figure GDA0003741497200000085
The job level number meta-convention is satisfied.
The predicate IsOverRedNumber (·) represents that the number of Reduce instances of a job exceeds its quantity index.
The predicate IsOverMapNumber (·) indicates that the number of Map instances for a job exceeds its quantity index.
The predicate IsReduceTask (·) indicates that the argument is a Reduce task.
The predicate IsMapTask (·) indicates that the argument is a Map task.
rInstNumberOfJob indicates the number of Reduce instances in a job.
mInstNumberOfJob indicates the number of Map instances in a job.
rNumber _ metric represents the number of Reduce instances index in one job.
mNumber _ metric represents an index of the number of Map instances in one job.
Figure GDA0003741497200000086
Indicating the task name to which the instance belongs.
The number of job-level violation elements and their indices are extracted from sla4 jobnumber.
In sla4job _ number, the element suspected of a violation is the number of instances, i.e., reflected on the number state element. Therefore, the location-violation file vf _ number is expressed as vf _ number ═ { inst _ name }.
The number violation index is that the number of Reduce instances of the job exceeds 2000, or the number of Map instances of the job exceeds 8000, so the number violation index number _ metric of the job level is expressed as number _ metric ═ 2000,8000 }.
Retry meta-specification at instance level
The retry meta-convention at the instance level is denoted as sla4inst _ retry. The sla4inst _ RETRY refers to the specification of the RETRY state element RETRY. The Chinese expression is as follows: if the retry number exceeds 3 after the Map instance or Reduce instance fails, the retry of the instance is over-limit.
Formalizing an example level retry meta-reduction for a Chinese expression as equation (10)
Figure GDA0003741497200000091
retry_metric=3 (11)
v denotes a condition of predicate decision.
Figure GDA0003741497200000092
When the predicate decision result of v is
Figure GDA0003741497200000093
Then equation (10) is violated and the rule is written as retry element at the violation-instance level, i.e. the state element is a violation element vf retry.
Figure GDA0003741497200000094
When the predicate judgment result of v is not
Figure GDA0003741497200000095
The meta-convention is retried in order to satisfy the instance level.
The predicate IsOverRetry (·) indicates that the number of retries or the total number of retries for an instance exceeds the retry violation index.
Predicate IsFailed (-) represents an instance
Figure GDA0003741497200000096
The state of (1) is Failed.
The predicate IsReduceTask (·) indicates that the argument is a Reduce task.
The predicate IsMapTask (·) indicates that the argument is a Map task.
Figure GDA0003741497200000097
Showing examples
Figure GDA0003741497200000098
And the total number of retries.
retry _ metric represents a retry violation indicator.
The retry violation and its index at the instance level are extracted from sla4inst _ retry.
At sla4inst _ retry, the element of the suspected violation is the number of retries or total number of retries of the instance, i.e., reflected on the retry status element. Therefore, the retry element — violation element vf _ retry is expressed as vf _ retry ═ { seq _ no, total _ seq _ no }. Its retry violation indicator retry _ metric is 3.
Instance level specification of operation elements
The specification of the operation element at the instance level is denoted as sla4inst _ operation. The sla4inst _ OPERATION refers to the specification of the OPERATION state element OPERATION. The Chinese expression is as follows: if the instance is in a Failed state due to some operation of the instance itself (e.g., the instance exceeds the number of restarts, etc.), the instance operation is not violated; if an instance is Interrupted by the system because the CPU or memory load exceeds its limit, and the instance is in an Interrupted state, then the instance operates in violation.
The example-level operation element specification for the chinese expression is formalized as equation (12):
Figure GDA0003741497200000101
retry_metric=3 (13)
v denotes a condition of predicate decision.
When the predicate decision result of v is
Figure GDA0003741497200000102
Then equation (12) is violated and the violation is recorded as
Figure GDA0003741497200000103
Rule-instance level operation element conventions, i.e. the state element is a violation element vf operation.
When the predicate decision result of v is not
Figure GDA0003741497200000111
The specification of the instance level operation element is satisfied.
Figure GDA0003741497200000112
Predicate IsFailed (-) represents an instance
Figure GDA0003741497200000113
The state of (1) is Failed.
The predicate IsOverRetry (·) indicates that the number of retries or the total number of retries for an instance exceeds the retry violation index.
The predicate IsReduceTask (·) indicates that the argument is a Reduce task.
The predicate IsMapTask (·) indicates that the argument is a Map task.
The predicate IsInterrupted (-) represents an instance
Figure GDA0003741497200000114
The state is Interrupted.
Predicate IsOverPlanCPU (-) represents an instance
Figure GDA0003741497200000115
Exceeds the projected CPU load limit.
Predicate IsOverPlanEM (-) represents an example
Figure GDA0003741497200000116
Exceeds the planned memory load limit.
Figure GDA0003741497200000117
Showing examples
Figure GDA0003741497200000118
The CPU load of (1).
Figure GDA0003741497200000119
Showing examples
Figure GDA00037414972000001110
The memory load of (2).
Figure GDA00037414972000001111
Showing examples
Figure GDA00037414972000001112
The number of retries and the total number of retries.
The instance level operation violation and its index are extracted from sla4inst _ operation.
In sla4inst _ operation, the element suspected of a violation is the state of the instance, i.e., reflected on the operand. Therefore, the operand-violation argument vf _ operation is expressed as vf _ operation ═ status }.
Among them, Failed and Interrupted of an instance are the manifestation of an operation violation. The operation violation index operation _ metric of the example level is thus denoted as operation _ metric { 'Failed', 'Interrupted'.
Instance level CPU load meta-specification
The example level CPU load meta specification, noted sla4inst _ CPU. The sla4inst _ CPU refers to the specification of the CPU load state element CPU. The Chinese expression is as follows: if the instance's CPU load exceeds its limit (e.g., the projected CPU load), then the instance CPU load is overrun.
The example-level CPU load element specification for the chinese expression is formalized as equation (14):
Figure GDA00037414972000001113
v denotes a condition of predicate determination.
When the predicate decision result of v is
Figure GDA00037414972000001114
Then equation (14) is violated and it is marked as violation-instance level CPU load element specification, i.e. the state element is violation element vf _ CPU.
When the predicate decision result of v is not
Figure GDA00037414972000001115
The specification of the instance level CPU load element is satisfied.
Predicate IsOverPlanCPU (-) represents an instance
Figure GDA0003741497200000121
Exceeds the projected CPU load limit.
Instance-level CPU load violation elements and their indices are extracted from sla4inst _ CPU.
At sla4inst _ CPU, the element suspected of a violation is the CPU load of the instance, i.e., reflected on the CPU load state element. Therefore, the CPU load element — violation element vf _ CPU is denoted as vf _ CPU ═ CPU _ avg, CPU _ max.
The indicator of CPU load violation is the CPU load limit plan _ CPU allocated to the instance by the system, so the example-level CPU load violation indicator CPU _ metric ═ plan _ CPU }.
Instance level memory load meta-specification
The example level memory load meta-convention is denoted as sla4inst _ mem. The sla4inst _ MEM refers to the specification of the memory load state element MEM. The Chinese expression is as follows: if the instance's memory load exceeds its limit (e.g., projected memory load), then the instance's memory load is overrun.
The example-level memory load element of the Chinese expression is reduced and expressed as a formula (15):
Figure GDA0003741497200000122
v denotes a condition of predicate decision.
When the predicate judgment result of the v is
Figure GDA0003741497200000123
Then equation (15) is violated and the violation-instance level specification of the memory load element is recorded, i.e. the state element is the violation element vf mem.
When the predicate judgment result of v is not
Figure GDA0003741497200000124
Then the instance memory load meta-convention is satisfied.
Predicate IsOverPlanEM (-) represents an example
Figure GDA0003741497200000125
Exceeds the planned memory load limit.
The instance level memory load violation and its index are extracted from sla4inst _ mem.
At sla4inst _ mem, the element suspected of a violation is the memory load of the instance, i.e., reflected on the memory load state element. Thus, the memory load element-violation element vf _ mem ═ { mem _ avg, mem _ max }.
The index of the memory load violation is the memory load limit plan _ mem allocated to the instance by the system, so the example-level memory load violation index mem _ metric is { plan _ mem }.
In the invention, the cloud service log is derived from an MR jobmodule and an SQL jobmodule in a storage/computation layer of the AliCloud big data computing service (MaxCommute). The method records the operation execution condition of the cloud server in the Ali cloud storage/computing cluster. The service level contract comes from the version with the effective date of 2018, month 02 and day 01.
Referring to fig. 3, the vectorization method of cloud service events and level contract data thereof according to the present invention includes the following steps:
the method comprises the steps of firstly, formalizing a cloud service event;
step 101, collecting logs of a cloud server;
collecting log records of the cloud server execution JOBs, wherein the JOBs in the log records are marked as JOBs, a plurality of tasks exist in the JOBs, and a set form is adopted to express a task set as
Figure GDA0003741497200000126
Figure GDA0003741497200000127
There are multiple instances, and the set of instances is expressed in a set form as
Figure GDA0003741497200000128
The cloud service log records operated by the invention are derived from the Alibaba cluster trace v2018 data set.
102, setting field contents of cloud service events;
to take any instance under a task
Figure GDA0003741497200000129
Recording as a cloud service event, then recording the content of the cloud service event field as
Figure GDA00037414972000001210
The above-mentioned
Figure GDA0003741497200000131
The described
Figure GDA0003741497200000132
The subscript i in (1) represents the identification number of the task, and the subscript j represents the identification number of the instance.
Step two, constructing a state element of the cloud service event;
step 201, expressing major and minor sentences of cloud service events;
in the present invention, cloud service events
Figure GDA0003741497200000133
Each field in (a) is a constituent element of a sentence. The sentence structure component method is applied, and the subject part and the predicate part of the sentence are divided by double vertical lines. One time cloud service event
Figure GDA0003741497200000134
An expression of "main predicate expression" is denoted as SYS _ EVENT, and SYS _ EVENT [ period ]]The (specific) example | [ retry ] is]In the present state<Load(s)>。
202, based on the state of the cloud service event represented by the state element;
in the invention, a major predicate SYS _ EVENT ═ time interval is adopted]The (specific) example | [ retry ] is]In the form of state<Load(s)>For example set
Figure GDA0003741497200000135
The field semantics of each instance are subjected to sentence structure component division, a state element set of the cloud service EVENT is constructed and recorded as EVENT _ STATUS, and the EVENT _ STATUS comprises the following contents:
Figure GDA0003741497200000136
in the present invention, a TIME element of a cloud service event is used to describe a TIME state of the cloud service event, where the TIME is { start _ TIME, end _ TIME }.
In the present invention, LOCATION state element LOCATION of a cloud service event is used to describe the LOCATION state of the cloud service event, where the LOCATION is { machine _ id, job _ name, task _ name }.
In the present invention, the NUMBER status element NUMBER of the cloud service event is used to describe the NUMBER status of the cloud service event, and the NUMBER is { inst _ name }.
In the present invention, a RETRY state element RETRY of a cloud service event is used to describe a RETRY state of the cloud service event, where the RETRY state element RETRY is { seq _ no, total _ seq _ no }.
In the invention, the OPERATION state element OPERATION of the cloud service event is used for describing the OPERATION state of the cloud service event, and the OPERATION state element OPERATION is { status }.
In the present invention, the CPU load state element CPU of the cloud service event is used for describing the CPU load state of the cloud service event, where the CPU is { CPU _ avg, CPU _ max }.
In the present invention, a memory load state element MEM of a cloud service event is used for describing the memory load state of the cloud service event, where MEM is { MEM _ avg, MEM _ max }.
Event field content for any one cloud service
Figure GDA0003741497200000137
Structured cloud service event state element collection
Figure GDA0003741497200000141
Comprises the following steps:
Figure GDA0003741497200000142
step three, formalizing a service level contract of the cloud service event;
in the invention, a big data computing service MaxCommute service level contract is used in combination with cloud service event field content
Figure GDA0003741497200000143
The construction results in a cloud service event-situation conventions, SLAS, which includes 7 conventions.
The above-mentioned
Figure GDA0003741497200000144
Instance level long meta-protocol
Figure GDA0003741497200000145
Instance level specification of location elements
Figure GDA0003741497200000146
Job-level number element specification
Figure GDA0003741497200000147
Retry meta-specification at instance level
Figure GDA0003741497200000148
Instance level operational conventions
Figure GDA0003741497200000151
Instance level CPU load specification
Figure GDA0003741497200000152
Instance level memory load specification
Figure GDA0003741497200000153
Step four, extracting violation elements;
in the invention, the elements in the rule according to the third step are used as violation elements.
In the invention, the illegal element refers to a cloud service event-situation specification
Figure GDA0003741497200000154
Refers to elements that violate the specification. Constructing the violation elements extracted from the SLAS to obtain a cloud service event violation element set
Figure GDA0003741497200000155
In the present invention, the duration element specification of violation instance level sla4inst _ time is called duration element-violation element vf _ longTail: the vf _ longTail is < end _ time, start _ time >.
In the present invention, the violation of the instance-level position meta convention sla4inst _ location is called position meta-violation meta vf _ location: the vf _ location ═ machine _ id, job _ name, task _ name }.
In the present invention, the number element specification sla4 jobnumber violating a job level is called number element-violation element vf number: the vf _ number ═ { inst _ name }.
In the present invention, the retry element specification sla4inst _ retry violating the instance level is called retry element-violation element vf _ retry: and vf _ retry is { seq _ no, total _ seq _ no }.
In the present invention, the operation element specification sla4inst _ operation at the violation instance level is called the operation element-violation element vf _ operation:
the vf _ operation ═ status }.
In the present invention, a violation of the example-level CPU load element specification sla4inst _ CPU is referred to as CPU load element-violation element vf _ CPU:
and vf _ cpu ═ { cpu _ avg, cpu _ max }.
In the present invention, a violation of the example-level memory load element convention sla4inst _ mem is called a memory load element-violation element vf _ mem:
the vf _ mem ═ { mem _ avg, mem _ max }.
In the present invention, violation means that the cloud service event-situation specification is not reached
Figure GDA0003741497200000161
The behavior of (c). Event violation refers to a cloud service event
Figure GDA0003741497200000162
Violation of the convention
Figure GDA0003741497200000163
Then the
Figure GDA0003741497200000164
Violation.
In the present invention, violation element refers to violation of a convention
Figure GDA0003741497200000165
The factor (1). Violation meta-revealing cloud service events
Figure GDA0003741497200000166
The nature of the violation, and thus the ability to generate the required vector samples for accurately determining the violation, requires the discovery of the cloud service event
Figure GDA0003741497200000167
Factor of violation (i.e., violation element), which becomes a cloud service event
Figure GDA0003741497200000168
The violation element. In order to consider the factors that the cloud service event is possibly suspected of being illegal from multiple aspects, the invention constructs the state element set of the cloud service event
Figure GDA0003741497200000171
Step five, extracting indexes;
in the present invention, event-situation conventions are served from the cloud
Figure GDA0003741497200000172
And cloud service event violation element set
Figure GDA0003741497200000173
And extracting the violation limit value as a violation index to obtain a specification-index set METRIC.
The above-mentioned
Figure GDA0003741497200000174
In the present invention, from the instance-level duration element specification sla4inst _ time and "duration element-violation element" vf _ longTail, a duration violation index longTail _ metric is extracted:
Figure GDA0003741497200000175
in the present invention, from the instance-level location element specification sla4inst _ location and location element-violation element vf _ location, the location violation indicator location _ metric is extracted: the location _ metric ═ { machine _ id }.
In the present invention, from the number element specification sla4 jobnumber _ number and number element-violation element vf _ number of the job level, the number violation index number _ metric is extracted: the number _ metric is {2000,8000 }.
In the present invention, a retry violation index retry _ metric is extracted from the retry meta-reduction sla4inst _ retry and retry meta-violation meta-vf _ retry at instance level: the retry _ metric is {3 }.
In the invention, from the example-level operation element specification sla4inst _ operation and the operation element-violation element vf _ operation, the operation violation index operation _ metric is extracted: the operation _ metric { 'Failed', 'Interrupted' }.
In the present invention, a CPU load violation index CPU _ metric is extracted from a CPU load element specification sla4inst _ CPU and a CPU load element-violation element vf _ CPU at an instance level: the cpu _ metric is { plan _ cpu }.
In the present invention, a memory violation index mem _ metric is extracted from an instance-level memory load element specification sla4inst _ mem and a memory load element-violation element vf _ mem: the mem _ metric is { plan _ mem }.
Mapping and constructing a condition element-violation element-relation group;
in the invention, according to the collection of the state elements of the cloud service event
Figure GDA0003741497200000181
And cloud service event violation element set thereof
Figure GDA0003741497200000182
And mapping the state element-violation element contact element group to obtain a condition element-violation element-contact element group set which is marked as PSV.
The above-mentioned
Figure GDA0003741497200000183
PSV _ TIME represents the "duration state element-duration violation element" contact tuple.
PSV _ LOCATION represents a "LOCATION status element-LOCATION violation element" contact tuple.
PSV _ NUMBER represents the "NUMBER state element-NUMBER violation element" contact tuple.
PSV _ RETRY represents a "RETRY status element-RETRY violation element" contact tuple.
PSV _ OPERATION represents an "OPERATION state element-OPERATION violation element" contact tuple.
PSV _ CPU represents the "CPU load state element-CPU load violation element" contact tuple.
PSV _ MEM represents a "memory load status element-memory load violation element" association tuple.
In the invention, according to the TIME length state element TIME of an event and the violation element vf _ longTail thereof, a TIME length state element-TIME length violation element contact tuple PSV _ TIME is mapped:
PSV_TIME=(end_time,start_time)。
in the invention, according to the LOCATION state element LOCATION of an event and the violation element vf _ LOCATION thereof, a LOCATION state element-LOCATION violation element contact tuple PSV _ LOCATION is mapped:
the PSV _ LOCATION ═ is (machine _ id, job _ name, task _ name).
In the invention, according to the NUMBER state element NUMBER and the violation element vf _ NUMBER thereof, the NUMBER state element-NUMBER violation element association tuple PSV _ NUMBER is mapped:
the PSV _ NUMBER ═ is (inst _ name).
In the invention, a RETRY state element-RETRY violation element contact tuple PSV _ RETRY is mapped according to the RETRY state element RETRY of an event and the violation element vf _ RETRY thereof:
the PSV _ RETRY is equal to (seq _ no, total _ seq _ no).
In the invention, according to the OPERATION state element OPERATION and the violation element vf _ OPERATION of the event, the OPERATION state element-OPERATION violation element contact tuple PSV _ OPERATION is mapped:
the PSV _ OPERATION ═ status.
In the invention, according to the CPU load state element CPU of an event and the violation element vf _ CPU thereof, a contact tuple PSV _ CPU of 'CPU load state element-CPU load violation element' is mapped:
the PSV _ CPU ═ CPU _ avg (CPU _ max).
In the invention, according to the memory load state element MEM of an event and the violation element vf _ MEM thereof, a "memory load state element-memory load violation element" contact tuple PSV _ MEM is mapped:
the PSV _ MEM ═ (MEM _ avg, MEM _ max).
In the invention, if the state element of the cloud service event is a factor related to violation of the convention, the state element of the event is a violation element.
Constructing a state element-index element-linkage group;
step 701, according to the mapped status element-violation element-association element set
Figure GDA0003741497200000191
And the extracted specification-index set
Figure GDA0003741497200000192
And constructing a state element-index element-tuple set of the cloud service event, and recording the state element-index element-tuple set as PSM.
The above-mentioned
Figure GDA0003741497200000193
In the present inventionTime length state element-time length violation element index tuple
Figure GDA0003741497200000201
In the present invention, a location state element-location violation element index tuple
Figure GDA0003741497200000202
In the present invention, a number state element-number violation element index tuple
Figure GDA0003741497200000203
In the present invention, retry state element-retry violation element index tuple
Figure GDA0003741497200000204
In the present invention, an operation state element-operation violation element index tuple
Figure GDA0003741497200000205
In the invention, CPU load state element-CPU load violation element index tuple
Figure GDA0003741497200000206
In the invention, the memory load state element-memory load violation element index tuple
Figure GDA0003741497200000207
Step 702, according to the state element-index element-tuple set
Figure GDA0003741497200000208
Making a Cartesian product of the condition event and the index to construct a condition-index contact tuple of the cloud service event, and recording the tuple as RSM;
the RSM ═ EVENT _ STATUS (METRIC)
EVENT _ STATUS represents an instance condition EVENT.
METRIC represents an event violation marker.
In the invention, the constructed cloud service event condition-index contact tuple
Figure GDA0003741497200000211
According to the relation among the state elements of the event, the violation elements of the specification and the indexes of the violation elements, the cloud service event is known to be violated if the state elements of the cloud service event do not accord with or exceed the indexes.
Step eight, generating a status-index vectorization sample of the cloud service event;
the vectorization method is a word2vec method similar to natural language, and cloud service events and service level contract data of the cloud service events are quantized into vectors.
Referring to the state element quantization flow diagram shown in FIG. 4, an example set is read in
Figure GDA0003741497200000212
Traversing state elements of each instance condition event
Figure GDA0003741497200000213
If any one of the examples
Figure GDA0003741497200000214
State element of
Figure GDA0003741497200000215
If not, extracting the numerical value in the position state element value and the numerical value; mapping the operand values to different integer values ("Terminated" state mapping to value 0, "Ready" state mapping to value 1, "Running" state mapping to value 2, "Terminated" state mapping to value 3, "Interrupted" state mapping to value 4, and "Failed" state mapping to value 5); if the values of the duration state element, the retry state element, the CPU load state element and the memory load state element are numerical values, the numerical values are saved; if CPU load state element, memoryIf there is a null value for the value of the load state element, it is filled with the value 0.
If any one of the examples
Figure GDA0003741497200000216
State element of
Figure GDA0003741497200000217
If it is empty, it means that the process has been traversed and the quantization is completed
Figure GDA0003741497200000221
All state elements of
Figure GDA0003741497200000222
And finally, saving the quantization result of the state element into a file.
The violation indicators relate to the same quantification of the instance's state as the event state elements, with the exception of the operational violation indicators. That is, the operation violation indicator maps to a value of 0, except that the fail state and the interrupt state of the instance are quantized the same as the event state element (i.e., "interleaved" state maps to a value of 4 and "Failed" state maps to a value of 5).
And finally generating a 'condition-index' vector sample of the cloud service event.
In the invention, one operation from the Alibaba cluster trace v2018 data set is selected to generate a 'status-index' vector sample of the cloud service event.
For example, a cloud service instance status event derived from the Alibaba cluster trace v2018 dataset
Figure GDA0003741497200000223
The above-mentioned
Figure GDA0003741497200000224
For example, using the method of the present invention to vectorize events
Figure GDA0003741497200000225
The resulting vector sample sample;
The above-mentioned
Figure GDA0003741497200000226
Line 1 in the left end parenthesis represents the duration state element of the event.
Line 2 in the left end brackets represents the position status element of the event.
Line 3 in the left end brackets represents the number state element for the event.
Line 4in the left end brackets represents the retry state element for the event.
Line 5 in the left end brackets represents the operating state element for the event.
Line 6 in the left end brackets represents the CPU load state element for the event.
Line 7 in the left end parenthesis represents the memory load state element for the event.
Line 1 in the right parenthesis represents the long-tailed violation indicator for the event.
Line 2 in the right parenthesis indicates the location violation indicator for the event.
Line 3 in the right parenthesis represents the number of events violation indicator.
Line 4in the right parenthesis represents the retry violation indicator for the event.
Line 5 in the right parenthesis represents the operation violation indicator for the event.
Line 6 in the right parenthesis represents the CPU load violation indicator for the event.
Line 7 in the right parenthesis represents the memory load violation indicator for the event.
Step nine, verifying;
the vectorization method of the cloud service event and the service level contract is installed in a K neighbor KNN model to form an improved KNN model. From the data set of the Alibaba cluster trace v2018, cloud service events are arbitrarily selected as a training set and a testing set of a model to judge whether the cloud service events violate rules, wherein a label of '1' indicates violation, and a label of '0' indicates no violation, as shown in Table 1.
TABLE 1 improved KNN model input vector samples generated by the present method
Figure GDA0003741497200000231
Referring to fig. 5 and 6, experimental results show that the method of the present invention can accurately determine the violation of the cloud service event by applying the improved KNN model: the misjudgment rate is kept below 0.06%, and the accuracy and the recall rate are kept at 99% or above. In addition, the method can obtain the technical effects of low misjudgment and high precision by using only a few '1' sample numbers. The method can provide basis for the configuration of the security group rule of the ECS structure in the aspects of violation, abnormal detection or tracing and the like, thereby achieving the effect of network access control and improving the state detection and data packet filtering capability of the virtual firewall.

Claims (3)

1. A vectorization method of cloud service events and service level contract data is characterized by comprising the following steps:
firstly, a cloud service event is formalized;
step 101, collecting logs of a cloud server;
collecting log records of the cloud server execution JOBs, wherein the JOBs in the log records are marked as JOBs, and a plurality of tasks exist in the JOBs; any one instance under one task is taken
Figure FDA0003741497190000011
Recording as a primary cloud service event;
102, setting field contents of cloud service events;
will be described in
Figure FDA0003741497190000012
Marking the content of the field of the cloud service event as
Figure FDA0003741497190000013
The described
Figure FDA0003741497190000014
The above-mentioned
Figure FDA0003741497190000015
The lower subscript i in (1) represents the identification number of the task, and the lower subscript j represents the identification number of the instance;
start _ time represents the start time of the instance;
end time represents the end time of the instance;
the machine _ id represents a cloud server identifier;
task _ name represents the task name;
job _ name represents a job name;
inst _ name represents an instance name;
seq _ no represents the number of instance retries;
total _ seq _ no represents the total number of instance retries;
status represents the status of the instance;
CPU _ avg represents the average CPU utilization of the instance;
CPU _ max represents the maximum CPU utilization of the instance;
mem _ avg represents the average memory usage of the instance;
mem _ max represents the maximum memory usage of the instance;
step two, constructing a state element of the cloud service event;
step 201, expressing major and minor sentences of cloud service events;
cloud service events
Figure FDA0003741497190000016
Each field in (1) is a constituent element of a sentence; dividing a subject part and a predicate part of the sentence by using a double vertical line by applying a sentence structure component method; one time cloud service event
Figure FDA0003741497190000021
An expression of the main and predicate expressions is denoted as SYS _ EVENT, and SYS _ EVENT [ period ]](specific) examples| | [ retry]In the present state<Load(s)>;
202, based on the state of the cloud service event represented by the state element;
using a major sentence pattern SYS _ EVENT ═ time period]The (specific) example | [ retry ] is]In the form of state<Load(s)>For example set
Figure FDA0003741497190000022
The field semantics of each instance are subjected to sentence structure component division, a state element set of the cloud service EVENT is constructed and recorded as EVENT _ STATUS, and the EVENT _ STATUS comprises the following contents:
Figure FDA0003741497190000023
the duration state element TIME of the cloud service event is used for describing the duration state of the cloud service event, and the TIME is { start _ TIME, end _ TIME };
a LOCATION state element LOCATION of the cloud service event is used for describing the LOCATION state of the cloud service event, where the LOCATION is { machine _ id, job _ name, task _ name };
the NUMBER state element NUMBER of the cloud service event is used for describing the NUMBER state of the cloud service event, and the NUMBER is { inst _ name };
a RETRY state element RETRY of the cloud service event is used for describing a RETRY state of the cloud service event, where the RETRY state is { seq _ no, total _ seq _ no };
an OPERATION state element OPERATION of the cloud service event is used for describing the OPERATION state of the cloud service event, wherein the OPERATION is { status };
the CPU load state element CPU of the cloud service event is used for describing the CPU load state of the cloud service event, and the CPU is { CPU _ avg, CPU _ max };
the memory load state element MEM of the cloud service event is used for describing the memory load state of the cloud service event, and the MEM is { MEM _ avg, MEM _ max };
event field content for any one cloud service
Figure FDA0003741497190000024
Constructed set of cloud service event state elements
Figure FDA0003741497190000025
Comprises the following steps:
Figure FDA0003741497190000031
step three, formalizing a service level contract of the cloud service event;
based on big data computing service MaxCommute service level contract and combined with cloud service event field content
Figure FDA0003741497190000032
Constructing to obtain a cloud service event-situation protocol SLAS;
the above-mentioned
Figure FDA0003741497190000033
Instance level long meta-protocol
Figure FDA0003741497190000034
Instance level specification of location elements
Figure FDA0003741497190000035
Job-level number element specification
Figure FDA0003741497190000036
Retry meta-specification at instance level
Figure FDA0003741497190000041
Example level operational specification
Figure FDA0003741497190000042
Instance level CPU load specification
Figure FDA0003741497190000043
Instance level memory load specification
Figure FDA0003741497190000044
v represents a condition of predicate decision;
predicate IsLongTail (-) represents an instance
Figure FDA0003741497190000045
The operation time period of (a) is greater than or equal to the long tail index;
longtail _ metric represents an example
Figure FDA0003741497190000046
Index of long tail;
predicate IsUsued (-) represents an instance
Figure FDA0003741497190000047
State of (1) is Uue;
the forbidden _ machine represents the identity of the cloud server that is disabled;
the predicate IsOverRedNumber (·) indicates that the number of Reduce instances of the operation exceeds the quantity index;
rInstNumberOfJob represents the number of Reduce instances in one job;
rNumber _ metric represents the index of the number of Reduce instances in one job;
the predicate IsOverMapNumber (·) indicates that the number of Map instances of the job exceeds the number index;
mInstNumberOfJob represents the number of Map instances in a job;
mNumber _ metric represents the index of the number of Map instances in one job;
the predicate IsOverRetry (·) indicates that the number of retries or the total number of retries of the instance exceeds a retry violation index;
retry _ metric represents a retry violation indicator;
predicate IsFailed (-) represents an instance
Figure FDA0003741497190000051
The state of (2) is Failed;
the predicate IsReduceTask (·) indicates that the argument is a Reduce task;
the predicate IsMapTask () represents that the argument is a Map task;
the predicate IsInterrupted (-) represents an instance
Figure FDA0003741497190000052
The state is interrupt d
Predicate IsOverPlanCPU (-) represents an instance
Figure FDA0003741497190000053
The CPU load of (1) exceeds the projected CPU load limit;
predicate IsOverPlanEM (-) represents an example
Figure FDA0003741497190000054
The memory load of (a) exceeds the planned memory load limit;
step four, extracting violation elements;
taking the elements in the rule according with the rule formulated in the step three as violation elements;
the violation element refers to the cloud service event-situation specification
Figure FDA0003741497190000055
To elements that violate the specification; constructing the violation elements extracted from the SLAS to obtain a cloud service event violation element set
Figure FDA0003741497190000056
Violating the instance-level duration meta-convention sla4inst _ time, called duration meta-violation meta vf _ longTail:
the vf _ longTail is < end _ time, start _ time >;
violation of the instance-level location element convention sla4inst _ location, called location element-violation element vf _ location:
the vf _ location ═ machine _ id, jobname, task _ name };
the number-of-job-level violation meta-convention sla4 jobnumber, referred to as the number-of-violation meta-vf _ number:
the vf _ number is { inst _ name };
violation of the instance-level retry meta-convention sla4inst _ retry, called retry meta-violation meta-vf _ retry:
the vf _ retry is { seq _ no, total _ seq _ no };
violation of the example-level operation meta-specification sla4inst _ operation, called operation meta-violation meta-vf _ operation:
the vf _ operation ═ status };
violating the example-level CPU load element specification sla4inst _ CPU, called CPU load element-violating element vf _ CPU:
the vf _ cpu ═ { cpu _ avg, cpu _ max };
violating the example-level memory load element convention sla4inst _ mem, called memory load element-violation element vf _ mem:
the vf _ mem ═ { mem _ avg, mem _ max };
violation means that the cloud service event-situation convention is not reached
Figure FDA0003741497190000061
The behavior of (c); event violation refers to a cloud service event
Figure FDA0003741497190000062
Violation of the convention
Figure FDA0003741497190000063
Then the
Figure FDA0003741497190000064
Violation of rules;
violation element refers to violation of a convention
Figure FDA0003741497190000065
The factor (2); violation meta-exposure cloud service event
Figure FDA0003741497190000066
The nature of the violation, and thus the ability to generate the required vector samples for accurately determining the violation, requires the discovery of the cloud service event
Figure FDA0003741497190000067
Factor of violation (i.e., violation element), which becomes a cloud service event
Figure FDA0003741497190000068
The violation element of (1); in order to consider the factors that the cloud service event is possibly suspected to be illegal from multiple aspects, therefore, a cloud service event state element set is constructed
Figure FDA0003741497190000071
Step five, extracting indexes;
event-situation specification from cloud service
Figure FDA0003741497190000072
And cloud service event violation element set
Figure FDA0003741497190000073
Extracting violation limit values as violation indexes to obtain a specification-index set METRIC;
the above-mentioned
Figure FDA0003741497190000074
location _ metric represents an indicator of a location violation;
number _ metric represents a number violation indicator for a job level;
operation _ metric represents an instance-level operation violation indicator;
CPU _ metric represents an instance-level CPU load violation indicator;
mem _ metric represents an instance-level memory load violation indicator;
mapping and constructing a condition element-violation element-relation group;
event state element collection according to cloud service
Figure FDA0003741497190000081
And cloud service event violation element set thereof
Figure FDA0003741497190000082
Mapping a state element-violation element contact tuple to obtain a condition element-violation element-contact element group set which is marked as PSV;
the described
Figure FDA0003741497190000083
PSV _ TIME represents a contact tuple of 'duration state element-duration violation element';
PSV _ LOCATION represents a "LOCATION status element-LOCATION violation element" contact tuple;
PSV _ NUMBER represents the "NUMBER state element-NUMBER violation element" association tuple;
PSV _ RETRY represents the "RETRY status element-RETRY violation element" contact tuple;
PSV _ OPERATION represents an "OPERATION state element-OPERATION violation element" contact tuple;
the PSV _ CPU represents a connection tuple of 'CPU load state element-CPU load violation element';
PSV _ MEM represents a "memory load status element-memory load violation element" contact tuple;
mapping a TIME-length state element-TIME violation element contact element PSV _ TIME according to the TIME-length state element TIME of the event and the violation element vf _ longTail thereof:
PSV_TIME=(end_time,start_time);
according to the LOCATION state element LOCATION of the event and the violation element vf LOCATION, mapping the LOCATION state element-LOCATION violation element contact tuple PSV _ LOCATION:
PSV_LOCATION=(machine_id,job_name,task_name);
mapping a quantity state element-quantity violation element association tuple PSV _ NUMBER according to the quantity state element NUMBER and the violation element vf _ NUMBER thereof:
PSV_NUMBER=(inst_name);
mapping a RETRY state element-RETRY violation element contact tuple PSV _ RETRY according to the RETRY state element RETRY of the event and the violation element vf _ RETRY thereof:
PSV_RETRY=(seq_no,total_seq_no);
mapping an OPERATION state element-OPERATION violation element contact tuple PSV _ OPERATION according to the OPERATION state element OPERATION of the event and the violation element vf _ OPERATION thereof:
PSV_OPERATION=(status);
mapping a contact tuple PSV _ CPU of 'CPU load state element-CPU load violation element' according to the CPU load state element CPU of the event and the violation element vf _ CPU thereof:
PSV_CPU=(cpu_avg,cpu_max);
according to the memory load state element MEM of the event and the violation element vf _ MEM thereof, mapping a "memory load state element-memory load violation element" contact tuple PSV _ MEM:
PSV_MEM=(mem_avg,mem_max);
constructing a state element-index element-connection group;
step 701, according to the mapped status element-violation element-association element set
Figure FDA0003741497190000091
And the extracted specification-index set
Figure FDA0003741497190000101
Constructing a state element-index element-tuple set of the cloud service event, and recording the state element-index element-tuple set as PSM;
the above-mentioned
Figure FDA0003741497190000102
PSM _ TIME represents a duration state element-duration violation element index tuple;
PSM _ LOCATION represents a LOCATION state element-LOCATION violation element index tuple;
PSM _ NUMBER represents a NUMBER status element-NUMBER violation element indicator tuple;
PSM _ RETRY represents RETRY status element-RETRY violation element index tuple;
PSM _ OPERATION represents an OPERATION state element-OPERATION violation element index tuple;
PSM _ CPU represents CPU load state element-CPU load violation element index tuple;
PSM _ MEM represents a memory load state element-memory load violation element index tuple;
duration state element-duration violation element index tuple
Figure FDA0003741497190000103
Location state element-location violation element index tuple
Figure FDA0003741497190000104
Number state element-number violation element index tuple
Figure FDA0003741497190000111
Retry state meta-retry violation meta-pointer tuple
Figure FDA0003741497190000112
Operation state element-operation violation element index tuple
Figure FDA0003741497190000113
CPU load state element-CPU load violation element index tuple
Figure FDA0003741497190000114
Memory load state element-memory load violation element index tuple
Figure FDA0003741497190000115
Step 702, according to the state element-index element-tuple set
Figure FDA0003741497190000116
Performing Cartesian product on the condition event and the indexes to construct a condition-index contact tuple of the cloud service event, and recording the tuple as RSM;
the RSM ═ EVENT _ STATUS (METRIC)
EVENT _ STATUS represents an instance condition EVENT;
METRIC represents an event violation indicator;
structured cloud service event status-index contact tuple
Figure FDA0003741497190000121
n represents the total number of instances;
plan _ CPU represents an instance-level CPU load violation indicator;
the plan _ mem represents an example-level memory load violation indicator;
step eight, generating a status-index vectorization sample of the cloud service event;
the vectorization method is a word2vec method similar to natural language, and is used for quantizing cloud service events and service level contract data thereof into vectors;
reading in a set of instances
Figure FDA0003741497190000122
Traversing state elements of each instance condition event
Figure FDA0003741497190000123
If any one of the examples
Figure FDA0003741497190000124
State element of (2)
Figure FDA0003741497190000125
If not, extracting the numerical value in the position state element value and the numerical value;
the Terminated state is mapped to a value of 0;
the Ready state is mapped to a numerical value of 1;
the Running state is mapped to a numerical value of 2;
the Terminating state is mapped to a value of 3;
mapping an interleaved state into a value of 4;
the Failed state is mapped to a value of 5;
if the values of the duration state element, the retry state element, the CPU load state element and the memory load state element are numerical values, the numerical values are saved; if the values of the CPU load state element and the memory load state element have null values, filling the null values into a value 0;
if any one of the examples
Figure FDA0003741497190000131
State element of
Figure FDA0003741497190000132
If it is empty, it indicates that the traversal has been completed and the quantization is completed
Figure FDA0003741497190000133
All state elements of
Figure FDA0003741497190000134
And finally, saving the quantization result of the state element into a file.
2. The method for vectorizing cloud service event and service level contract data according to claim 1, wherein: each rule in the cloud service event-situation specification SLAS is defined;
the example-level duration state element specification is formalized as equation (1):
Figure FDA0003741497190000135
Figure FDA0003741497190000136
v represents a condition of predicate decision;
when the predicate decision result of v is
Figure FDA0003741497190000137
Violating the formula (1) and recording as violation-instance-level duration element convention, that is, the state element is violation element vf _ longTail;
when the predicate judgment result of v is not
Figure FDA0003741497190000138
Then the instance-level length element specification is satisfied;
the example-level position element specification is formalized as equation (3):
Figure FDA0003741497190000141
Figure FDA0003741497190000142
v represents a condition of predicate decision;
when the predicate judgment result of the v is
Figure FDA0003741497190000143
Then the formula (3) is violated and the rule of the position element at the violation-instance level is marked, i.e. the state element is a violation element vf location;
when the predicate decision result of v is not
Figure FDA0003741497190000144
Then the instance level position element specification is satisfied;
the job level number element specification is formalized as equation (5):
Figure FDA0003741497190000145
Figure FDA0003741497190000146
Figure FDA0003741497190000147
rNumber_metric=2000 (8)
mNumber_metric=8000 (9)
v represents a condition of predicate decision;
when the predicate decision result of v is
Figure FDA0003741497190000148
Then the formula (5) is violated and the number element specification of the violation-job level is recorded, that is, the state element is a violation element vf _ number;
when the predicate decision result of v is not
Figure FDA0003741497190000149
The job level number element specification is satisfied;
example level retry meta-reduction formalization as equation (10)
Figure FDA0003741497190000151
retry_metric=3 (11)
v represents a condition of predicate decision;
when the predicate decision result of v is
Figure FDA0003741497190000152
Then equation (10) is violated and the rule is recorded as retry element rule at violation-instance level, i.e. the state element is violation element vf retry;
when the predicate decision result of v is not
Figure FDA0003741497190000153
Retry meta-conventions for satisfaction of instance level;
the example level operation element specification is formalized as equation (12):
Figure FDA0003741497190000161
retry_metric=3 (13)
v represents a condition of predicate decision;
when the predicate decision result of v is
Figure FDA0003741497190000162
Then the formula (12) is violated and the violation-instance level operation element specification is recorded, i.e. the state element is violation element vf operation;
when the predicate decision result of v is not
Figure FDA0003741497190000171
Then the instance level operation element specification is satisfied;
the example-level CPU load element specification for the chinese expression is formalized as equation (14):
Figure FDA0003741497190000172
v represents a condition of predicate decision;
when the predicate decision result of v is
Figure FDA0003741497190000173
The formula (14) is violated and the violation-instance-level CPU load element specification is recorded, that is, the state element is the violation element vf _ CPU;
when the predicate decision result of v is not
Figure FDA0003741497190000174
Then the example level CPU load element specification is satisfied;
the example-level memory load element of the Chinese expression is reduced and expressed as a formula (15):
Figure FDA0003741497190000175
v represents a condition of predicate determination;
when the predicate decision result of v is
Figure FDA0003741497190000176
Violating the formula (15), and recording as violation-instance-level memory load element convention, that is, the state element is a violation element vf _ mem;
when the predicate decision result of v is not
Figure FDA0003741497190000177
Then the instance memory load meta-convention is satisfied.
3. The method of claim 1, wherein the vectorization of cloud service events and service level contract data comprises: the log of the cloud server adopts an Alibaba cluster trace v2018 data set.
CN202110372833.0A 2021-04-07 2021-04-07 Vectorization method of cloud service event and service level contract data Active CN112948132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110372833.0A CN112948132B (en) 2021-04-07 2021-04-07 Vectorization method of cloud service event and service level contract data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110372833.0A CN112948132B (en) 2021-04-07 2021-04-07 Vectorization method of cloud service event and service level contract data

Publications (2)

Publication Number Publication Date
CN112948132A CN112948132A (en) 2021-06-11
CN112948132B true CN112948132B (en) 2022-09-06

Family

ID=76230852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110372833.0A Active CN112948132B (en) 2021-04-07 2021-04-07 Vectorization method of cloud service event and service level contract data

Country Status (1)

Country Link
CN (1) CN112948132B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9727324B2 (en) * 2015-05-22 2017-08-08 VMware. Inc. Application management in enterprise environments using cloud-based application recipes
CN106100902B (en) * 2016-08-04 2020-04-03 腾讯科技(深圳)有限公司 Cloud index monitoring method and device
US10805114B2 (en) * 2017-10-02 2020-10-13 Vmware, Inc. Processing data messages of a virtual network that are sent to and received from external service machines
US10684909B1 (en) * 2018-08-21 2020-06-16 United States Of America As Represented By Secretary Of The Navy Anomaly detection for preserving the availability of virtualized cloud services
CN109861844B (en) * 2018-12-07 2021-09-03 中国人民大学 Cloud service problem fine-grained intelligent tracing method based on logs
CN109886847B (en) * 2019-01-30 2024-01-12 深圳国瑞发展教育有限公司 Innovative entrepreneur education resource sharing collaborative educating system based on cloud service
CN111182582B (en) * 2019-12-30 2023-04-07 东南大学 Multitask distributed unloading method facing mobile edge calculation
CN111698278B (en) * 2020-04-10 2021-06-25 湖南大学 Multi-cloud data storage method based on block chain
CN112527759B (en) * 2021-02-09 2021-06-11 腾讯科技(深圳)有限公司 Log execution method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN112948132A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
US9635101B2 (en) Proposed storage system solution selection for service level objective management
US10205627B2 (en) Method and system for clustering event messages
US10031935B1 (en) Customer-requested partitioning of journal-based storage systems
US9406029B2 (en) Modeler for predicting storage metrics
US8200628B2 (en) Multi-tenancy data storage and access method and apparatus
US20180165173A1 (en) Method and system for identifying event-message transactions
US20150370799A1 (en) Method and system for clustering and prioritizing event messages
US9122739B1 (en) Evaluating proposed storage solutions
EP4328816A1 (en) Machine learning service
US20190155953A1 (en) Efficient log-file-based query processing
US20130332490A1 (en) Method, Controller, Program and Data Storage System for Performing Reconciliation Processing
US20190163550A1 (en) Automated methods and systems to classify and troubleshoot problems in information technology systems and services
US11880272B2 (en) Automated methods and systems that facilitate root-cause analysis of distributed-application operational problems and failures by generating noise-subtracted call-trace-classification rules
US11811839B2 (en) Managed distribution of data stream contents
US9430330B1 (en) System and method for managing environment metadata during data backups to a storage system
CN110458678B (en) Financial data verification method and system based on hadoop verification
US10198346B1 (en) Test framework for applications using journal-based databases
US10235407B1 (en) Distributed storage system journal forking
CN110502472A (en) A kind of the cloud storage optimization method and its system of large amount of small documents
CN112948132B (en) Vectorization method of cloud service event and service level contract data
US11210352B2 (en) Automatic check of search configuration changes
CN108363761A (en) Hadoop awr automatic loads analyze information bank, analysis method and storage medium
Ribeiro et al. A data integration architecture for smart cities
US8635707B1 (en) Managing object access
WO2021057824A1 (en) Method and apparatus for querying data, computing device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant