CN111553401B

CN111553401B - QoS prediction method applied to cloud service recommendation and based on graph model

Info

Publication number: CN111553401B
Application number: CN202010322193.8A
Authority: CN
Inventors: 丁丁; 畅振华; 李浥东; 夏有昊
Original assignee: Beijing Jiaotong University
Current assignee: Beijing Jiaotong University
Priority date: 2020-04-22
Filing date: 2020-04-22
Publication date: 2024-02-13
Anticipated expiration: 2040-04-22
Also published as: CN111553401A

Abstract

The invention provides a QoS prediction method based on a graph model applied to cloud service recommendation. The method comprises the following steps: constructing a full graph model containing multi-source information, wherein the full graph model comprises nodes representing users and services, and edges using the similarity between the users, the similarity between the services and the similarity between the users and the services as weights; dividing the full graph model into a plurality of sub graph models; a probability matrix decomposition algorithm for respectively optimizing the full graph model and the sub graph model is adopted to obtain the QoS of global and local prediction; and carrying out self-adaptive fusion processing on the global and local predicted QoS to obtain the final predicted QoS. The method fully considers the influence of the multi-source information on QoS, and adaptively fuses the local and global characteristics to improve the prediction accuracy of QoS. The method and the device can accurately predict the missing QoS value, fill the sparse matrix, improve the density of the matrix and solve the problem of QoS sparseness in the cloud service recommendation field to a certain extent.

Description

QoS prediction method applied to cloud service recommendation and based on graph model

Technical Field

The invention relates to the technical field of computer application, in particular to a QoS prediction method based on a graph model applied to cloud service recommendation.

Background

SOA (Service-Oriented Architecture, service oriented architecture) is widely used in distributed computing environments, especially in the field of cloud computing, and as a core part of SOA, services (services) are a popular way to provide configurable functions. In this case, many redundant services are present that are functionally similar but differ in QoS (Quality of Service ). In the face of these massive services, users experience significant problems in how to select the appropriate cloud service. The cloud service recommendation system aims to solve the problem that users cannot quickly select proper services due to the fact that the number of cloud services is large and information is overloaded. In the case of the same function, the key to how to provide proper service to the user is QoS. QoS is a set of attributes used to describe cloud service non-functionality, including response time, throughput, reputation, failure rate of service invocation, stability, etc. QoS values may vary widely among different Web services due to variations in network environment conditions and different network environments in which users using the services are located. Because of the large number of services and the high cost of invocation, qoS obtained by individual users is extremely sparse, and cloud service recommendation directly based on QoS is not feasible. Therefore, a core of a non-functional based cloud service recommendation system can be regarded as a prediction of QoS, and recommending appropriate services to users according to the predicted missing QoS.

Currently, many scholars are devoted to studying QoS predictions. Initially, the learner predicts QoS using a static approach. Static methods use arithmetic averages to make predictions and calculations include average QoS values from global users and services, respectively. The methods are simple and feasible, and the situation perception factors of users and services are not needed to be considered, but the static methods cannot reflect the dynamic characteristics of QoS, and the obtained results are often inaccurate.

Inspired by a traditional recommendation system, many scholars apply collaborative filtering algorithms (Collaborative Filtering, CF) in the recommendation system on top of cloud service recommendations. The CF algorithm uses the history of the user calling the services, discovers the similarity of experiences between users or the similarity between the services through the records, and predicts QoS by using the similarity. CF algorithms fall into two categories, memory-based and model-based. The learner proposes to use collaborative filtering based on the user and collaborative filtering based on the service to conduct QoS prediction, and the memory-based prediction method is applied to the field of web services service recommendation; however, these are only studied on QoS and do not consider the context information of the user and the service, so the predicted result has a certain limitation; and then, the learner starts to consider using the geographic information to cluster the service and the user, and the QoS prediction is performed by using a mixed matrix fusion mode after clustering, so that the accuracy of the QoS prediction is improved to a certain extent. Although the above study has made some improvements to the QoS prediction model, and has added much context information; but does not consider how to integrate such multi-source information effectively, while maintaining the original contact of the user with the service. At the same time, when local information and global information are processed, a linear parameter adjustment mode is often adopted for fusion, and the method is often more energy-consuming in determining proper parameters and is not robust. As such, they are not fully considered in solving the cold start problem, and often choose to ignore these problems.

An example of invoking five services for four users is shown in fig. 1: previous studies have all extracted unilateral features to cluster users and services, thus dividing four users into two categories, five services into two stacks, and users and services within a group represent neighbors that are closely related to them. However, this tends to ignore call records generated by the user invoking the service, which is the most important contact between the user and the service (as shown by the dashed line in FIG. 1).

Therefore, how to consider the tight connection between the user and the service in the process of predicting the QoS, and to deeply mine the connection between the multi-source information, fully utilizing the connection to improve the accuracy of QoS prediction is a problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a QoS prediction method based on a graph model, which is applied to the field of cloud service recommendation, so as to overcome the problems in the prior art.

In order to achieve the above purpose, the present invention adopts the following technical scheme.

A QoS prediction method based on a graph model applied in cloud service recommendation comprises the following steps:

constructing a full graph model containing multi-source information, wherein the full graph model comprises nodes representing users and services, and edges using the similarity between the users, the similarity between the services and the similarity between the users and the services as weights;

dividing the full graph model into a plurality of sub graph models;

a probability matrix decomposition algorithm for respectively optimizing the full graph model and the sub graph model is adopted to obtain the QoS of global and local prediction;

and carrying out self-adaptive fusion processing on the global and local predicted QoS to obtain the final predicted QoS.

Preferably, the constructing a full graph model containing multi-source information, the full graph model including nodes representing users and services, and edges using similarities between users, similarities between services, and similarities between users and services as weights, includes:

calculating the similarity W between the users in geographic positions according to the longitude and latitude information of the users _lo (u _i ，u _j )；

Calculating similarity W between users in terms of network position by using autonomous system information of users _As (u _i ，u _j )；

Comprehensively considering longitude and latitude information and autonomous system information of user to obtain user u _i And user u _j Final similarity betweenAs shown in formula (1):

wherein eta ₁ Is a given threshold when the final similarity value is greater than η ₁ Will only be at user u _i And user u _j The construction weight is as followsOther cases user u _i And user u _j There is no edge connection between them; lambda (lambda) ₁ And lambda is ₂ Respectively representing the weights of the geographic position information and the autonomous system information of the user in the final similarity;

calculating semantic similarity W between services according to WSDL information of services _ws (s _i ，s _j )；

Calculating similarity W of services and services in terms of network location using autonomous system information in which the services are located _AS (s _i ，s _j )；

Obtaining service s by comprehensively considering WSDL information and autonomous system information of service _i Sum service s _j Final similarity betweenAs shown in formula (2):

wherein eta ₂ Is a given threshold when the final similarity value is greater than η ₂ Will only be at service s _i Sum service s _j The construction weight is as followsEdge of other case services s _i Sum service s _j There is no edge connection between them; gamma ray ₁ And gamma is equal to ₂ The weights of the WSDL information and the autonomous system information of the service in the final similarity are respectively represented.

The similarity between the user and the service is obtained by the following transformation using the QoS prediction matrix:

E _ui，sj ＝rt _i，j

rtmax _j ＝max(rt _ij |i＝1,2，...，m)

rtmin _j ＝min(rt _ij |i＝1,2，...，m)

rtmax＝(rtmax ₁ ，rtmax ₂ ，...，rtmax _n )

rtmin＝(rtmin ₁ ，rtmin ₂ ，...，rtmin _n )

first normalizing the service call and then converting the normalized RT to a similarity using equation (3), wherein rtmax _j With rtmin _j Representing services s respectively _j Is the maximum and minimum of (1), while rtmax and rtmin represent the maximum of all services, respectivelyValue and minimum value, I _m×1 A one-dimensional column vector representing m users;

the similarity between users, the similarity between services and the similarity between users and services are respectively used as weights of three sides, the users and the services are used as nodes, and a full graph model containing the users, the services and the three sides is constructed: g= { V, E }, where v= { U, S }, u= { U } ₁ ，u ₂ ，...，u _m And s= { S ₁ ，s ₂ ，...，s _n -representing a group consisting of m users and n services, respectively; the edge set is expressed as: e= { E _uu ，E _ss ，E _us E, where E _uu ，E _ss ，E _us The sides using the similarity between users, between services, and between users and services as weights are shown, respectively.

calculating the similarity W between the users in geographic positions according to the longitude and latitude information of the users _lo (u _i ，u _j )：

dis(u _i ，u _j ) Representing user u _i And user u _j The Euclidean distance between the two is converted into the similarity W of the user in the geographic position by using a formula (5) _lo (u _i ，u _j )，x _i And y is _i Respectively represent user u _i Longitude and latitude information of the geographic location, x _j And y is _j Respectively represent user u _j Longitude and latitude information of the geographic location.

Calculating user u by using autonomous system information of user through formula (6) _i And user u _j Similarity W between network locations _AS (u _i ，u _j )：

Performing de-textualization processing on the WSDL information, removing structured format information, extracting feature words in the WSDL information, and converting special keywords in the WSDL information into semantic similarity between services by using tf-idf algorithm in the natural language processing field through the following two formulas:

in the formula (7), M is the total number of web pages searched from Google using the feature words x and y, and logf (x) and logf (y) are the number of clicks searched using the feature words x and y, respectively; f (x, y) represents the number of web pages using both the feature words x and y, in equation (8)And->Representing services s respectively _i ，s _j Feature word vector of>Representing the cardinality of the vector. W (W) _ws (s _i ，s _j ) Representing services s _i Sum service s _j Semantic similarity in terms of WSDL information.

Calculating a service s by using autonomous system information of the service _i Sum service s _j Similarity in network location W _AS (s _i ，s _j )：

Preferably, the splitting the full graph model into a plurality of sub-graph models includes:

converting the full graph model G into a similarity matrix containing the similarity of the user, the service and the similarity;

constructing an adjacency matrix according to the similarity matrix, and constructing a degree matrix at the same time;

constructing a Laplace matrix according to the adjacency matrix and the degree matrix;

normalizing the Laplace matrix;

calculating the minimum K eigenvalues and eigenvectors of the Laplace matrix;

combining K eigenvectors as an eigenvector matrix, and normalizing the eigenvector matrix according to rows at the same time;

clustering the normalized feature matrix to obtain a set of K segmented subgraphs Wherein K represents the number of subgraphs, sub ₁ Representing the 1 st subgraph, sub _K Representing the kth subgraph.

Preferably, the probability matrix decomposition algorithm for optimizing the full graph model and the sub graph model respectively obtains the globally and locally predicted QoS, which includes:

the probability matrix decomposition algorithm with the optimization is set as follows:

r represents the final predicted QoS value,r represents _ij Obeying the mean value to be 0, the variance to be sigma _R Is a normal distribution of (c).

μ _B Representing the average deviation of all QoS, BU representing the average deviation of QoS generated by the user invoking the service, BS representing the average deviation of QoS generated by the service invoked by the user, the definition of the three average deviations is as follows:

where |R| represents the number of all QoS's, R _(i) Representing user u _i Collection of invoked services, R _(j) Representing invoked service s _j Is a set of users;

converting QoS in the full graph model into a matrix form, decomposing the matrix according to the optimized probability matrix decomposition algorithm to respectively obtain a user hidden factor vector and a service hidden factor vector, multiplying the user hidden factor vector and the service hidden factor vector to obtain predicted QoS, and combining the predicted QoS with mu _B Adding BU and BS to obtain global predicted QoS;

converting QoS in the sub-graph model into a matrix form, decomposing the matrix according to the optimized probability matrix decomposition algorithm to respectively obtain a user hidden factor vector and a service hidden factor vector, multiplying the user hidden factor vector and the service hidden factor vector to obtain predicted QoS, and combining the predicted QoS with mu _B The BU and BS add to get the locally predicted QoS.

Preferably, the performing adaptive fusion processing on the globally and locally predicted QoS to obtain a final predicted QoS includes:

based on the QoS of the local and global predictions, a mixed gaussian model is constructed, subject to the gaussian model, using the following equations (12) and (13):

where p (x|t) =n (x|u) _t ，∑ _t ) A function of a gaussian model representing the t-th gaussian distribution, the parameters satisfying:and-> And->Respectively representing the proportion of the local predicted QoS and the global predicted QoS in the final predicted QoS; />And->Respectively representing the mean and variance in the Gaussian model in the local and global matrix decomposition;

and obtaining final predicted QoS (quality of service) by adaptively fusing local and global predicted QoS based on the Gaussian mixture model:

wherein the method comprises the steps ofAnd->Represents the locally predicted QoS and the globally predicted QoS, respectively, ε ₁ And epsilon ₂ Represents the local and global predicted QoS specific gravity of the final predicted QoS, respectively,/for the final predicted QoS>Sub _BU ，Sub _BS Respectively representing the average value deviation of QoS and user u in the local matrix _i Average deviation of invoked local services and service s _j Average deviation of all local user calls; />G _BU ，G _BS Respectively representing the average value deviation of QoS and user u in the global matrix _i Average deviation of all invoked services and service s _j Average deviation of all local user calls.

The technical scheme provided by the embodiment of the invention can be seen that the invention provides a full-view concept to integrate various multi-source information. The full graph model not only reflects the relation between users, but also extracts the implicit relation between services, and simultaneously, the QoS matrix information is fully utilized to enhance the relation between the users and the services. By cutting through the whole graph, noise information which is not closely related is weakened, and the problem of cold start can be relieved. The influence of multi-source information on QoS is fully considered, and the local and global characteristics are adaptively fused, so that the prediction accuracy of QoS is improved. The embodiment of the invention can accurately predict the missing QoS value, further fill the sparse matrix, improve the density of the matrix and solve the QoS sparse problem in the cloud service recommendation field to a certain extent.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a prior art four user invocation of five services;

fig. 2 is a schematic diagram of implementation principle of a QoS prediction method based on a graph model in cloud service recommendation according to an embodiment of the present invention;

fig. 3 is a process flow diagram of a QoS prediction method based on a graph model in cloud service recommendation according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a graph obtained by segmenting a full graph by using a spectral clustering idea according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of QoS prediction on sub-graph and full graph using an optimized probability matrix decomposition algorithm according to an embodiment of the present invention;

fig. 6 is a schematic diagram showing QoS prediction results of a GMF model and other methods according to an embodiment of the present invention;

fig. 7 is a schematic diagram showing comparison between a GMF model and QoS prediction results of two modules of the GMF model according to an embodiment of the present invention;

fig. 8 is a schematic diagram illustrating an analysis of the influence of the number K of sub-graphs and the number D of PMF matrix decomposition hidden factors on the whole graph and the sub-graphs on the result when the sparsity is 5% according to the embodiment of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

For the purpose of facilitating an understanding of the embodiments of the invention, reference will now be made to the drawings of several specific embodiments illustrated in the drawings and in no way should be taken to limit the embodiments of the invention.

Aiming at the existing QoS prediction method, the embodiment of the invention researches the QoS prediction problem based on matrix decomposition, and provides a novel QoS prediction model GMF (Graph based Matrix Factorization) based on self-adaptive matrix decomposition of a graph model, which effectively integrates multi-source information of users and services, and simultaneously adopts a graph cutting algorithm to reduce the cold start problem. Based on the model, multisource information can be effectively integrated, a matrix decomposition algorithm is improved, and a better prediction result is obtained through a self-adaptive mixed matrix decomposition method on the graph model.

Fig. 2 is a schematic diagram of an implementation principle of a QoS prediction method based on a graph model applied in cloud service recommendation according to an embodiment of the present invention, where a process flow of the method is shown in fig. 3, and the method includes the following processing steps:

step S1: constructing a full graph model containing multi-source information;

based on the original QoS matrix, multi-source information which has obvious influence on QoS prediction is fully considered, wherein the multi-source information comprises structured WSDL (Web Services Description Language, web service description language) text information, geographical location information of a user and a service, network location information and the like. The multi-source information with significant impact on these QoS predictions is quantified and translated into similarities between users, between services, and between users and services. And using the similarities as weights of edges, and using the users and the services as nodes to construct a full graph model.

Calculating the similarity between the users in geographic positions by using longitude and latitude information provided by the users:

wherein x and y respectively represent longitude and latitude information of the geographic position of the user; calculating a distance between users using European formula by formula (1), and then converting the distance information into similarity using formula (2), dis (u) _i ，u _j ) Representing user u _i And user u _j Euclidean distance between W _lo (u _i ，u _j ) Representing user u _i And user u _j Similarity in geographic location. In equation (2), all values satisfy 0 < W _lo (u _i ，u _j ) The larger the number is less than or equal to 1, the higher the similarity isThe higher.

Calculating user u by using autonomous system information of user through formula (3) _i And user u _j Similarity between network locations:

comprehensively considering longitude and latitude information and autonomous system information of user to obtain user u _i And user u _j Final similarity betweenAs shown in formula (4):

wherein eta ₁ Is a given threshold when the final similarity value is greater than η ₁ Will only be at user u _i And user u _j The construction weight is as followsIn the other cases, there is no edge connection between user u_i and user u_j, lambda ₁ And lambda is ₂ And respectively representing the weights of the geographic position information and the autonomous system information of the user in the final similarity.

Semantic similarity between services is calculated using WSDL information:

firstly, performing de-texting processing on WSDL information, removing structured format information, and then extracting feature words in the WSDL information, wherein the feature words comprise information such as port information (portType), service name (name), used data types (types), communication protocol (binding), message (message) and the like. And then using a term frequency-inverse document frequency (tf-idf) algorithm in the natural language processing field to convert the special keywords in the WSDL information into semantic similarity information of the service through the following two formulas.

In the formula (5), M is the total number of web pages searched from Google using the feature words x and y, and logf (x) and logf (y) are the number of clicks searched using the feature words x and y, respectively; f (x, y) represents the number of web pages using both the feature words x and y. In the formula (6)And->Representing services s respectively _i ，s _j Feature word vector of>Representing the cardinality of the vector. W (W) _ws (s _i ，s _j ) Representing services s _i Sum service s _j Semantic similarity in terms of WSDL information.

Calculating a service s by using autonomous system information of the service _i Sum service s _j Similarity between the network locations:

obtaining service s by comprehensively considering WSDL information and autonomous system information of service _i Sum service s _j Final similarity betweenAs shown in formula (8):

E _ui，sj ＝rt _i，j

rtmax _j ＝max(rt _ij |i＝1,2，...，m)

rtmin _j ＝min(rt _ij |i＝1,2，...，m)

rtmax＝(rtmax ₁ ，rtmax ₂ ，...，rtmax _n )

rtmin＝(rtmin ₁ ，rtmin ₂ ，...，rtmin _n )

the service call is first normalized and then the normalized RT is converted to similarity using equation (9). Wherein rtmax _j With rtmin _j Representing services s respectively _j Is the maximum and minimum of all services, respectively, rtmax and rtmin, I _m×1 Representing a one-dimensional column vector of m users.

Step 2: dividing the full graph model into a plurality of sub graph models;

the embodiments of the present invention treat the original data as noisy data because of the presence of edges in the data where the user is not very closely tied to the service, and these edges do not have a significant or even negative effect on the recommended QoS. In order to weaken the influence of noise data on QoS prediction and strengthen strong connection between users and services, the embodiment of the invention provides a graph cutting method, which cuts a complete full graph into a plurality of subgraphs and weakens the side information of the noise. Thereby splitting a large graph G into aWherein K represents the number of subgraphs, and the specific graph cutting mode is as follows:

input: full graph G;

and (3) outputting:

1) Converting G into a similarity matrix containing the similarity of the user and the service;

2) Constructing an adjacency matrix according to the similarity matrix, and constructing a degree matrix at the same time;

3) Constructing a Laplace matrix according to the adjacency matrix and the degree matrix;

4) Normalizing the Laplace matrix;

5) Calculating the minimum K eigenvalues and eigenvectors of the Laplace matrix;

6) Combining K eigenvectors as an eigenvector matrix, and normalizing the eigenvector matrix according to rows at the same time;

7) Clustering the normalized feature matrix to obtain K Sub-image Sub-units ₁ ，Sub ₂ ，…，Sub _K 。

The size of K determines the number of users and services within the subgraph. The larger the K, the fewer services and users in the subgraph, the smaller the K, and the more services and users in the subgraph. And when the K value is 8 through experiments, the prediction accuracy is highest. By cutting the graph, the users and the services are divided into the same sub-graph, and the users or the services of the same sub-graph are directly recommended to the users or the users without any calling record, so that the calculation steps of the similarity among the users and the similarity among the services are reduced, and the cold start problem is relieved to a certain extent.

Step 3: and respectively optimizing the full graph model and the sub graph model by a probability matrix decomposition algorithm (Probabilistic Matrix Factorization, PMF) to obtain the QoS of global and local prediction.

The probability matrix decomposition algorithm model is shown in formula (10):

wherein R represents the final predicted QoS value, U, S represents the hidden factor vector and sigma of the user and service respectively _R Representing the variance. The optimized probability matrix decomposition algorithm is as follows:

the optimal probability matrix is decomposed as follows:

the main improvement of equation (11) is shown in μ compared to equation (10) _B BU and BS represent average deviations of all QoS, respectively, qoS average deviations generated by a user invoking service and QoS average deviations generated by a service invoked by a user. The definition of the three bias terms is as follows:

wherein:

where |R| represents the number of all QoS's, R _(i) Representing user u _i Collection of invoked services, R _(j) Representing invoked service s _j Is a set of users.

The present invention uses an improved probability matrix approach that fully considers the influence of bias terms on matrix decomposition, where μ _B Representing the mean of QoS, BUi represents user u _i Invoking the generated average of all services, BS _j Representing services s _j The user-generated mean is invoked by all. And the improved PMF algorithm is applied to the whole graph to obtain the globally predicted QoS, and the improved PMF algorithm is applied to the sub-graph model to obtain the locally predicted QoS.

Input: g (full view) or(subgraph)

And (3) outputting: qoS (quality of service)

1) Calculating QoS average value deviation, qoS average value deviation of users and QoS average value deviation of services;

2) Converting QoS in the whole graph or subgraph into a matrix form;

3) Decomposing the matrix according to a PMF algorithm to respectively obtain a user hidden factor vector and a service hidden factor vector;

4) And (2) multiplying the user hidden factor vector by the service hidden factor vector to obtain the predicted QoS, and adding the predicted QoS with the three average value deviations in the step (1) to obtain the local and global QoS finally predicted by the improved PMF algorithm.

Step 4: performing self-adaptive fusion processing on the global and local predicted QoS to obtain final predicted QoS;

based on the QoS of local and global predictions, a mixed gaussian model (Gaussian Mixture Model, GMM) is constructed using the following equations (13) and (14) under the condition of obeying the gaussian model:

where p (x|t) =n (x|u) _t ，∑ _t ) A function of a gaussian model representing the t-th gaussian distribution, the parameters satisfying:and-> And->Respectively representing the proportion of the local predicted QoS and the global predicted QoS in the final predicted QoS; />And->Mean and variance in the gaussian model in the local and global matrix decomposition, respectively.

In order to fully mine potential relation between local information and global information, the embodiment of the invention uses a method based on a Gaussian mixture model to adaptively fuse the QoS of local prediction and global prediction so as to obtain the final predicted QoS.

Wherein the method comprises the steps ofAnd->Represents the locally predicted QoS and the globally predicted QoS, respectively, ε ₁ And epsilon ₂ Represents the local and global predicted QoS specific gravity of the local and global predicted QoS in the final prediction respectively,/for each QoS>Sub _BU ，Sub _BS Mean value deviation of QoS in local matrix, user u _i Average deviation of invoked local services and service s _j Average deviation of all local user calls; />G _BU ，G _BS Mean value deviation of QoS, user u, respectively represented in global matrix _i Average deviation of all invoked services and service s _j Average deviation of all local user calls.

Example two

To verify the effectiveness of the algorithm, this embodiment employs a WSD read dataset that collects actual QoS measurements obtained from 339 users on 5,825 Web services, specifically including the values shown in table 1 below:

table 1 dataset description

1. Processing data

1.1, calculating the similarity between users:

1.1.1, collecting longitude and latitude information of a user;

1.1.2, converting longitude and latitude information of users into similarity information among users;

1.1.3 collecting autonomous system information of the user and calculating the similarity thereof;

1.1.4 integrating longitude and latitude information and autonomous system information to obtain similarity information between end users.

1.2 computing similarity between services

1.2.1 collecting WSDL document information;

1.2.2, performing unstructured processing on WSDL information;

1.2.3 calculating WSDL similarity;

1.2.4 collecting autonomous system information of the service and calculating similarity;

1.2.5 integrating WSDL information and autonomous system information to obtain the final similarity between services;

1.3, calculating the similarity between the user and the service;

1.3.1 normalizing QoS;

1.3.2 converting normalized QoS information into a similarity between a user and a service;

2. constructing a full map

And (3) constructing a full graph model by using the side information between users, between services and between users and services calculated in the step (1) and the node information of the two types.

3. Splitting the full graph to obtain sub-graph

As shown in fig. 4, the whole graph is segmented by using the spectral clustering idea, and a subgraph is obtained;

4. respectively performing probability matrix decomposition on the whole graph and the subgraph

4.1, improving a probability matrix decomposition algorithm;

4.2 as shown in fig. 5, performing corresponding QoS prediction on the subgraph and the full graph by using a probability matrix decomposition algorithm optimized by 4.1;

5. self-adaptive fusion of QoS obtained by subgraph and full graph;

5.1, constructing a Gaussian mixture model according to QoS results predicted by the subgraph and the full graph;

5.2, determining the weight proportion of the global and the local in the final predicted QoS according to the Gaussian mixture model;

6. QoS prediction is carried out;

QoS prediction is completed by the above steps 1-6.

Evaluation index: the evaluation indexes selected by the embodiment of the invention are MAE (Mean Square Error, average error) and RMSE (RootMean SquareError, mean square error) as the evaluation indexes of the final prediction result.

MAE is defined as follows:

RMSE is defined as follows:

wherein gamma is _i，j And gamma (gamma) _i，j The true value and the predicted value are represented, respectively, and N represents the total number of predicted QoS. The lower the values of both indices, the more accurate the predicted result.

To simulate a user's invocation of a service in the real world, the entire data set is divided into two parts. One part is used to train the model and the other part is used to test the prediction results. Specifically, we randomly deleted 95%, 90%, 85% and 80% of the data in the call matrix as the test set, and the remaining 5%, 10%, 15% and 20% of the data as the training set. The experimental results are shown in fig. 6 and 7. The results of the parameter comparison for the different conditions are shown in fig. 8.

As can be seen from fig. 6, as the training set density increases from 5% to 20%, the MAE and RMSE of all methods become smaller and smaller, as a denser matrix will provide more information for missing QoS predictions. However, the GMF always gets the smallest MAE and RMSE regardless of the change in training set density, which means that the GMF can obtain higher prediction accuracy than all comparison methods (including memory-based and model-based methods). Furthermore, compared to the PMF model (which is the basis of our approach, better than most of the methods in table 2), GMF average MAE and RMSE were increased by 10.13% and 9.70%, respectively. Even compared to HMF methods that were first proposed to combine local matrix decomposition and global matrix decomposition and have been widely used in many studies, GMF still increased 3.842% and 6.617% on average MAE and RMSE, respectively.

As can be seen from fig. 7, the performance of GMF is superior to GMF-GMM and GMF-Graph, which confirms: (a) The multisource information is integrated into one Graph model, so that the QoS prediction accuracy is remarkably improved; (b) The fusion of the local and global matrix decomposition of the GMM model also helps to improve the accuracy of QoS prediction. However, GMF still produces optimal performance for different training set densities. This can be attributed to graph-based integration and GMM-based fusion.

In summary, the embodiment of the invention uses the graph structure to integrate the multi-source information based on the existing QoS prediction method of matrix decomposition, and adaptively fuses the local and global features to improve the prediction accuracy of the service. Compared with the prior art, the invention fully considers the multi-source information: the concept of a full graph is presented to integrate various multi-source information. The full graph model not only reflects the relationship between users, but also extracts the implicit relationship between services, which reflects the original relationship between users and services.

The embodiment of the invention solves the problem of data sparsity: with the rapid development of cloud computing, the number of cloud services increases rapidly, so that QoS (quality of service) matrixes for calling services by users are sparse; by the method, the missing QoS value can be accurately predicted, the sparse matrix is filled, the density of the matrix is improved, and the problem of QoS sparseness in the cloud service recommendation field is solved to a certain extent.

The embodiment of the invention relieves the cold start problem: for users without any call records and services not invoked by any users, it is difficult to recommend appropriate services to such new users when first joining the cloud service platform, and it is also difficult to recommend new services to users. By integrating multi-source information, such as geographical location information, network location and the like of a user, WSDL description information of a service and the like, the user and the service can be helped to find their neighbors; and by means of graph cutting, users with potential close contact and services are connected in the subgraph, the users and the services in the subgraph exist, the services in the subgraph are directly recommended to the users, and the cold start problem is relieved to a certain extent.

Those of ordinary skill in the art will appreciate that: the drawing is a schematic diagram of one embodiment and the modules or flows in the drawing are not necessarily required to practice the invention.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present invention.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus or system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part. The apparatus and system embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A graph model-based QoS prediction method applied in cloud service recommendation, comprising:

dividing the full graph model into a plurality of sub graph models;

performing self-adaptive fusion processing on the global and local predicted QoS to obtain final predicted QoS;

the construction of the full graph model containing the multi-source information, the full graph model comprising nodes representing users and services, and edges using the similarity between users, the similarity between services and the similarity between users and services as weights, comprises:

λ ₁ W _lo (u _i ，u _j )+λ ₂ W _AS (u _i ，u _j )＞η ₁ (1)

Obtaining service s by comprehensively considering WSDL information and autonomous system information of service _i Sum service s _i Final similarity betweenAs shown in formula (2):

γ ₁ W _ws (s _i ，s _j )+γ ₂ W _AS (s _i ，s _j )＞η ₂ (2)

wherein eta ₂ Is a given threshold when the final similarity value is greater than η ₂ Will only be at service s _i Sum service s _j The construction weight is as followsEdge of other case services s _i Sum service s _j There is no edge connection between them; gamma ray ₁ And gamma is equal to ₂ The weight of WSDL information and autonomous system information of the service in the final similarity is represented respectively;

user u is obtained by down-conversion using QoS matrix _i And services s _j Similarity between E _ui，sj ：

E _ui，sj ＝rt _i，j

rtmax _j ＝max(rt _ij |i＝1,2，...，m)

rtmin _j ＝min(rt _ij |i＝1,2，...，m)

rtmax＝(rtmax ₁ ，rtmax ₂ ，...，rtmax _n )

rtmin＝(rtmin ₁ ，rtmin ₂ ，...，rtmin _n )

First normalizing the service call and then converting the normalized response time RT to similarity using equation (3), wherein rtmax _j With rtmin _j Representing services s respectively _j Maximum and minimum response time values, rtmax and rtmin respectively represent maximum and minimum response time values of all services, I _m×1 A one-dimensional column vector representing m users;

the similarity between users, the similarity between services and the similarity between users and services are respectively used as weights of three sides, the users and the services are used as nodes, and a full graph model containing the users, the services and the three sides is constructed: g= { V, E }, where v= { U, S }, u= { U } ₁ ，u ₂ ，...，u _m And s= { S ₁ ，s ₂ ，...，s _n -each representing a group comprising m users and n services; the edge set is expressed as: e= { E _uu ，E _ss ，E _us E, where E _uu ，E _ss ，E _us The sides using the similarity between users, between services, and between users and services as weights are shown, respectively.

2. The method of claim 1, wherein constructing a full graph model containing multi-source information, the full graph model representing nodes of users and services, and edges using similarities between users, similarities between services, and similarities between users and services as weights, comprises:

dis(u _i ，u _j ) Representing user u _i And user u _j Euclidean distance between the two, and using formula (5) to convert the Euclidean distance into similarity W of the user in geographic position _lo (u _i ，u _j )，x _i And y is _i Respectively represent user u _i Longitude and latitude information of the geographic location, x _j And y is _j Respectively represent user u _j Longitude and latitude information of the geographic location;

in the formula (7), M is the total number of web pages searched from Google using the feature words x and y, and logf (x) and logf (y) are the number of clicks searched using the feature words x and y, respectively, f (x, y) represents the number of web pages using both the feature words x and y, in the formula (8)And->Representing services s respectively _i ，s _j Feature word vector of>Representing the cardinality of the vector, W _ws (s _i ，s _j ) Representing services s _i Sum service s _j Semantic similarity in terms of WSDL information;

3. The method of claim 1, wherein said segmenting the full graph model into a plurality of sub-graph models comprises:

normalizing the Laplace matrix;

calculating the minimum K eigenvalues and eigenvectors of the Laplace matrix;

4. A method according to any one of claims 1 to 3, wherein the probability matrix decomposition algorithm for optimizing the full graph model and the sub-graph model respectively obtains globally and locally predicted QoS, comprising:

r represents the final predicted QoS value,r represents _ij Obeying the mean value to be 0, the variance to be sigma _R Is a normal distribution of (2);

μ _B mean deviation of all QoS is represented, and definition of three mean deviations is as follows:

5. The method of claim 4 wherein said adaptively fusing the globally and locally predicted QoS to obtain a final predicted QoS comprises: