CN110659363A - Web service mixed evolution clustering method based on membrane computing - Google Patents

Web service mixed evolution clustering method based on membrane computing Download PDF

Info

Publication number
CN110659363A
CN110659363A CN201910692218.0A CN201910692218A CN110659363A CN 110659363 A CN110659363 A CN 110659363A CN 201910692218 A CN201910692218 A CN 201910692218A CN 110659363 A CN110659363 A CN 110659363A
Authority
CN
China
Prior art keywords
service
word
data
domain
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910692218.0A
Other languages
Chinese (zh)
Other versions
CN110659363B (en
Inventor
陆佳炜
赵伟
周焕
马超治
王小定
徐俊
肖刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201910692218.0A priority Critical patent/CN110659363B/en
Publication of CN110659363A publication Critical patent/CN110659363A/en
Application granted granted Critical
Publication of CN110659363B publication Critical patent/CN110659363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0246Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols
    • H04L41/0273Exchanging or transporting network management information using the Internet; Embedding network management web servers in network elements; Web-services-based protocols using web services for network management, e.g. simple object access protocol [SOAP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/30Definitions, standards or architectural aspects of layered protocol stacks
    • H04L69/32Architecture of open systems interconnection [OSI] 7-layer type protocol stacks, e.g. the interfaces between the data link level and the physical level
    • H04L69/322Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions
    • H04L69/329Intralayer communication protocols among peer entities or protocol data unit [PDU] definitions in the application layer [OSI layer 7]

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A Web service mixed evolution clustering method based on membrane computing comprises the following steps: the first step is as follows: formalized definition; the second step is that: calculating the service similarity; the third step: clustering services; fourthly, the data cell updates the global optimal object according to the operation rule; and fifthly, stopping and outputting, wherein each tissue cell in the system is used as an independent execution unit to perform evolutionary operation in a parallel structure, so that the system is distributed in parallel. The invention can better obtain the characteristics of the service field, can calculate the similarity more accurately and obtain a better clustering result.

Description

Web service mixed evolution clustering method based on membrane computing
Technical Field
The invention relates to a clustering problem of web services, in particular to a mashup service clustering and SOAP service clustering method in REST data services.
Technical Field
With the development of Web 2.0 technology, the number of services and their types on the internet are increasing, which provides the possibility of developing applications of the internet of things in an easier and faster manner, so that how to accurately and effectively discover required atomic services or service combinations becomes a problem. Service clustering techniques can effectively facilitate service discovery, and in recent years, many different types of service clustering methods have been proposed to cluster Mashup services, Web APIs, and Web services.
Currently, most existing methods perform a service clustering operation for SOAP services by calculating functional similarity between Web services using service function descriptions (WSDL documents), and Liu et al extracts four characteristics of Web services, content, context, hostname, and Web service name, from WSDL description text of the Web services for Web service clustering. Elgazzar et al analyzed WSDL documents and clustered them according to functional similarity, Yu and Rege also proposed a clustering method for improving service discovery using a service community learning algorithm, and ontology is also commonly used for semantic similarity calculation and matching between Web services to promote service clustering and discovery. For example, Pop et al have devised a metric to evaluate the degree of match between ontological concepts describing two semantic Web services and cluster them using an ant-based approach to achieve efficient service discovery. Nayak et al propose Web service discovery with additional semantics and clustering based on a clustering hierarchy clustering algorithm.
The existing Mashup service clustering method generally performs clustering by analyzing description texts of services, but does not comprehensively consider the mutual influence of word frequency and correlation in the description text information, the description text information of the services is limited, and other information of some services, such as service APIs (application programming interfaces), service labels and the like, has important embodiment in the function description of the services.
Gao et al propose a new graph theory-based service and Mashup recommendation method, which extracts a theme from the functional description of the service and models the relationship among a user, Mashup, the service and the theme into a quadrilateral graph to improve recommendation performance. Cao et al develops a two-layer topic model by inferring the relationship between Mashup services from Web API calls and labels, and merges the topic distribution of Mashup services at the network layer into the topic probability distribution of the original Mashup services at the content layer.
In addition, Pan W et al propose a novel Mashup service clustering method based on structural similarity and genetic algorithm, describe the relationship between Mashups and Web APIs through a dual-mode graph, apply a SimRank algorithm to quantify the structural similarity between each pair of Mashup services, and finally effectively cluster the Mashup services.
Disclosure of Invention
In order to solve the problem of SOAP service clustering in a web environment, the invention calculates term characteristic values by extracting hidden term information from WSDL documents and dividing the service information into two types, namely service self information and service context information, generates a special Bigraph hierarchical model according to the calculated characteristic values, and calculates the similarity of SOAP services through the Bigraph hierarchical model. For Mashup service, the invention adopts a service feature selection method based on domain perception to obtain a processed service description text, designs a multi-data source LDA topic model based on combining the processed description text, a service API and a label, wherein LDA (latent Dirichlet allocation) is a document topic generation model, also called a three-layer Bayesian probability model, comprising a word, a topic and a document three-layer structure, deduces the topic probability distribution of the service through the model, and calculates the similarity of the service. Meanwhile, a data set is preprocessed by combining a density-based k-means algorithm, an organization P system is utilized, and a Web service mixed evolution clustering method based on membrane calculation is provided by combining an Agnes algorithm based on hierarchical division, a Genetic Algorithm (GA) and a weighted Fuzzy Clustering (FCM) algorithm.
In order to solve the technical problems, the invention provides the following technical scheme:
a Web service mixed evolution clustering method based on membrane computing comprises the following steps:
the first step is as follows: formalized definition, the process is as follows:
1.1 mashup service definition
1.1.1 service document vector model: the preprocessed service document vector model is a four-tuple, RSM ═ RD, RT, RA, T >, where:
RD is a domain feature vector, representing service domain information, defining a service with m domains, and then RD ═ RD1,RD2,…,RDm};
RT is a service description text feature vector, and assuming that there are n service description texts in each domain, the description texts of m domains are represented as RT ═ RT11,RT12,…,RT1n,…,RTmn};
RA is a service API feature vector;
t is a service label feature vector;
each service description text RTijThe characteristic word in (1) is expressed as FWijkWhere i represents a domain variable, j represents a description text variable, and k represents a feature word variable, the service description text RTijMay also be denoted as RTij={FWij1,FWij2,…,FWijsI is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to n, and s is the number of the characteristic words;
1.1.2 service document Cross-Domain concentration: the cross-domain concentration is denoted as DdepIt represents the inclusion service domain RDiCharacteristic word FW in (1)ijkService description document RTijAnd the proportion of the feature words in all domains of the service is calculated according to the following formula:
Figure BDA0002148215010000021
wherein df (FW)ijk,RDi) Representative service domain RDiIn (1), containing a feature word FWijkDescription of (1)This RTijA number of
Figure BDA0002148215010000022
Representing the inclusion of feature words FW in all domainsijkDescription text RT ofijThe higher the cross-domain concentration ratio is, the higher the concentration ratio of the service document in the domain is, so that the method has stronger field representation;
1.1.3 feature word frequency cross-domain concentration: the cross-domain concentration is denoted as DfreIt stands for the feature word FWijkIn the service domain RDiThe different frequency ratios occurring in all service domains are neutralized, and the calculation is as follows:
Figure BDA0002148215010000031
wherein tf (FW)ijk,RDi) Representing the service domain RDiMiddle and characteristic word FWijkOf a quantity of
Figure BDA0002148215010000032
The number of the feature words appearing in all the service domains is represented, and similarly, the higher feature word frequency cross-domain concentration degree means that the feature words are concentrated to a higher degree in the service domains;
1.1.4 Domain representation of feature words: represents a characteristic word FWijkRepresenting a service domain RDiThe degree of the word frequency is comprehensively calculated according to the cross-domain concentration degree of the service document and the cross-domain concentration degree of the feature word frequency, and the calculation is as follows according to a formula
Dfinal(FWijk,RDi)=a*Ddep(FWijk,RDi)+β*Dfre(FWijk,RDi)
Alpha and beta are weight coefficients, and alpha + beta is 1, the domain representation degrees of all the characteristic words in different service domains are obtained through the formula, and the higher the domain representation degree of the characteristic word is, the more the characteristic word can represent the service domain information; a series of typical characteristic words appear in a service domain, the domain representation degree of the words is high but the clustering effect of the service is general, a threshold value of the domain representation degree of the characteristic words is set, the characteristic words exceeding the threshold value are filtered, and the representation effect of the characteristic words on the service domain is improved;
1.1.5 field efficient feature word set: selecting proper feature word sets for representing all feature word sets in a service domain, sorting the feature words in descending order according to domain representation degree of the feature words, and selecting the feature words with the top percentage P in the service domain as the domain efficient feature word sets required by the invention, as shown in the following
HQ(RDi)={FWij1,FWij2,…,FWijp,}
Wherein P is L P/100, if a description text RT is in the process of simplifying the characteristic wordsijCharacteristic word FWijkNot belonging to HQ (RD)i) It is filtered and the service description document RT is updatedij';
1.2SOAP services terminology definition:
let TL be { T ═ T1,T2,…TnIs a set of terms in the service corpus, n is the number of terms, a ═ a1,a2,…amIs an atomic vocabulary constituting the term TL, i.e. the vocabulary has not been subdivided, m corresponding to the number of all atomic vocabularies, defining the frequency of the term
Figure BDA0002148215010000033
Namely the term TiThe frequency of occurrence is the same as the sum of the frequency of occurrence of all terms in the corpus TL, and the corresponding frequency of atomic vocabulary
Figure BDA0002148215010000034
And calculating the sum of the occurrence times of all the vocabularies, wherein the calculation formula is as follows:
Figure BDA0002148215010000035
Figure BDA0002148215010000041
NumTLis TL number of all terms, NumAAll are the sum of the occurrence times of the atomic vocabularies;
1.3, organization P System (P System) definition:
one degree of 3, i.e., 3, formalized by data cell organization P system is defined as the following octave:
ω=(OB1,OB2,OB3,OR1,OR2,OR3,OR′,OEo)
wherein:
OB1、OB2and OB3Is a set of objects of each tissue cell, namely a data cell set;
OR1、OR2and OR3Representing the clustering rules based on Agnes and k-means algorithm, weighted FCM algorithm and GA algorithm respectively for the evolution rules of each tissue cell;
OR' represents the transport rule of each tissue cell in the whole P system, and the sharing and exchange of objects can be carried out between cells through the transport rule;
OEo is the output area of the system, representing the environment;
the second step is that: service similarity calculation
Judging whether the service is the SOAP service, if the service is the SOAP service, jumping to the step 2.2, and if the service is the mashup service, performing the step 2.1;
2.1 mashup service similarity calculation, the process is as follows:
2.1.1 service Pre-processing
The method comprises the following steps of preprocessing crawled service information, namely a service domain, a description text, an API (application program interface) and a label, extracting accurate and effective characteristic words in the service information, constructing a service description document with more accurate description, and improving service clustering precision, wherein the preprocessing steps are as follows:
2.1.1.1 constructing an initial feature vector, and segmenting a statement and extracting effective words by using a natural language processing package NLTK;
2.1.1.2 remove invalid words such as symbols (+, -, _ etc.) and prepositions (a, the, of, and etc.) that are not useful in characterizing a service, preserving names, verbs and adjectives that may characterize the service's characteristics;
2.1.1.3 merging and processing word stems, wherein some words with the same word stem often have similar meanings, for example, the characteristics of use, used and using expression are the same, and the root of a word with the same meaning is deleted and reserved;
2.1.2 service feature reduction Process
Different services have unique field characteristics, the importance of characteristic words in the same domain is related to the frequency and the domain correlation, when the service characteristic value weight is calculated, only one factor is considered in the traditional TF-IDF calculation method or mutual information method, and the description text of the service is subjected to characteristic simplification processing by comprehensively considering the word frequency and the correlation factor, and the steps are as follows:
2.1.2.1 traverse each service domain RDiEach of the description texts RTijEach feature word FWijkCalculating the feature word FW according to the formula in 1.1.4ijkRepresenting a service domain RDiDegree D offinal(FWijk,RDi) If D isfinal(FWijk,RDi) If R is less than R, deleting the characteristic value, wherein R is a threshold value of the domain representation degree of the characteristic value;
2.1.2.2 all service domains after completing 2.1.2.1 steps, according to Dfinal(FWijk,RDi) Value of (D), to RDiCharacteristic word FWijkSorting in descending order, and selecting the characteristic words of the top percentage P as the service domain RD according to the step 1.5iDomain efficient feature word set HQ (RD)i);
2.1.2.3 repeat step 2.1.2.2 until a domain efficient feature word set HQ (RD) is generated for all service domainsi) Each service domain RDiAccording to HQ (RD)i) Deleting all absent HQ (RD)i) The feature words of (1);
2.1.3 topic clustering model construction:
after the characteristics of the RSM are simplified, obtaining a new service document vector model RSM ═ RD ', RT ', RA ', T >, the method comprises the following steps of simplifying a service description text feature vector RT', a Web API feature vector RA and a label feature vector T, wherein an MD-LDA model is based on hidden Dirichlet distribution (LDA), is a topic model (topicmodel), can give the topic of each document in a document set according to the form of probability distribution, and fuses various data source features of the service, in the MD-LDA model, the relevant word selection method in the service API and the label is consistent with that in the service document description document RT, and each service API or service label has unique contribution to the theme distribution of the document;
therefore, there is a topic distribution, the alpha hyper-parameter of dirichlet corresponds to RA and T in RSM', the beta hyper-parameter of dirichlet corresponds to word distribution in each topic, then a topic is extracted from the topic distribution according to the selected RA or T, and a specific word is generated through the selected topic, thereby generating an MD-LDA model fusing the service description text, the service API and the service label, the generation process is as follows:
2.1.3.1 for RA or T in RSM', the variable dat is defineddWherein dat isd1,2,3, N being the total number of RA and T in RSM', one polynomial θ is selecteddatObeying alpha hyper-parameter distribution of Dirichlet;
2.1.3.2 for topic K in RSM', K is 1,2
Figure BDA0002148215010000051
Obeying beta hyper-parameter distribution of Dirichlet;
2.1.3.3 setting variable D as the document tag in the reduced service document RSM', D being 1,2dRepresents RA or T in each RSM', for each word w in ddiI 1,2,3, M is the total number of words in d;
extract a Web API or tag denoted as xdiWhere obedience is uniformly distributed is denoted as Uniform (dat)d);
Extracting a subject as zdiIts obedient polynomial distribution is noted
Figure BDA0002148215010000052
Distributing;
extracting a word and recording the word as wdiSubject to
Figure BDA0002148215010000053
Distributing;
each topic in the corresponding MD-LDA probability model
Figure BDA0002148215010000054
The above word distribution is related, the extraction of words is independent of the dirichlet parameter β, x denotes the tag set dat from the APIdSelecting RA or label T related to given single word, each RA or T is distributed with theta on a subject, theta is selected from Dirichlet parameter alpha, and the subject z is formed by combining the subject distribution of RA and T and the word distribution of the subjectdi
And extracting the word w from the selected topicdi
As can be seen from the above description, the posterior distribution of the model topic depends on RT ', RA and T in RSM', and the parameters of the MD-LDA model are set as follows:
θdat|α~Dirichlet(α)
Figure BDA0002148215010000061
xdi|datd~Uniform(datd)
Figure BDA0002148215010000062
Figure BDA0002148215010000063
2.1.3.4 parameter reasoning is carried out on the MD-LDA model by a Gibbs sampling method, the sampling method provides a simple and effective method for potential variable estimation, which belongs to a Markov chain Monte Carlo algorithm for obtaining a random sample sequence by multivariate probability distribution, and each step of the Gibbs sampling method follows the following formula distribution;
P(zdi=j,xdi=k|wi=m,z_di,x_di,datd)
wherein z is_diIndicating a processed word wdiThen assign, x, to each word topic_diIndicating a processed word wdiThen assign each word API or label, nzwRepresenting the total number of words w, m, assigned to the topic zxzRepresenting the total number of words in the Web API and tags assigned to topic z,
Figure BDA0002148215010000065
as a subject zdiThe alpha parameter of (a) is,
Figure BDA0002148215010000066
as a word wdiBeta parameter of (a), number of subjects V, alphav,βvFor alpha parameter and beta parameter of the v-th topic, the word distribution of the topic in the sampling process
Figure BDA0002148215010000067
The theme distribution theta of the API and the labels is required to be calculated through the following formula;
Figure BDA0002148215010000068
Figure BDA0002148215010000069
zdiand xdiBy determining z_diAnd x_diTo sample and decide, for eachRSM', the present invention summarizes all θxCalculating the topic distribution of the document d, wherein x belongs to datdTo obtain the final topic probability distribution of all RSMs';
2.1.4 similarity calculation
In fact, the theme distribution of the mashup service document is mapped to the text vector space, so that two service documents RSM can be calculated through corresponding theme probability distribution1' and RSM2' similarity, the topic in this model is the mixed distribution of word vectors, so the relative entropy (KL) distance can be used as the similarity measure, and the following can be calculated:
Figure BDA0002148215010000071
t stands for all common topics in two service documents, pjAnd q isjRespectively representing the distribution of topics in two documents, when pj=qjAt that time, KL distance calculation result DKL(RSM′1,RSM′2) Is 0, since the KL distance is not of a symmetrical nature, i.e. DKL(RSM′1,RSM′2)≠DKL(RSM′2,RSM′1) So a symmetric version thereof is usually used, the calculation formula is as follows:
DKL(RSM′1,RSM′2)=λDKL(RSM′1,λ*RSM′1+(1-λ)RSM′2)+(1-λ)DKL(RSM′2,λ*RSM′1+(1-λ)RSM′2)
therefore, if λ is equal to 0.5, the above formula is converted into a JS distance, which is also called JS divergence (Jensen-Shannon divergence) and is a variation of the KL distance, and the similarity of the text is calculated by using the JS distance as a standard and is used as the similarity of the service, and the final calculation formula is as follows:
2.2 SOAP similarity calculation, the process is as follows:
2.2.1 calculation of self-eigenvalues
Finding a term T in a service corpusiThe information quantity I (P) can be calculated by an information theory methodi) On this basis, the term T can be usediCharacteristic value of (Spe) (T)i) Assigned as follows
Spe(Ti)=I(Pi)
By computing a joint probability distribution P { P }i,qjIs calculated for the term feature value, where piE is P and qj∈Q,piIs to select a word from the term set TL and qjIs to take a word from the atomic vocabulary A, where { p }1,p2,...pnAnd q1,q2,...,qmAre respectively represented by random variables P, Q, PiAnd q isjThe mutual information calculation of (a) can be calculated by the following formula:
the term pi has a characteristic value denoted as I (p)iQ), representing the relationship between pi terms and the vocabulary library Q, the formula for calculating pi feature values in combination with the frequency of terms and vocabularies in the corpus is as follows:
Spe(Ti)≈I(pi,Q)
according to the Bayes' theorem,
Figure BDA0002148215010000081
the final self-information characteristic value SelfSpe (Ti) of the SOAP service is calculated as follows
Figure BDA0002148215010000082
Analyzing conventional WSDL documents generally includes 1 to 2 words, so
Figure BDA0002148215010000083
The vocabulary in the representative terms is approximately set to be 1 for calculation, theta represents a weighted value, the weighted value is set based on an information theory method, and the value range is 0 to 1;
2.2.2 contextual information eigenvalue calculation
According to the information theory approach, the context information characteristic of a service is based on the entropy of the modified term word probability distribution, for which its entropy value is calculated by the following formula:
Figure BDA0002148215010000084
wherein NT represents the term TiModified quantity of (2), (mod)m,Ti) Represents modmModifying the term TiThe entropy value of (d) is determined by all (mod)m,Ti) For the calculation of average information amount, in a specific field, the modifiers of the terms are distributed more closely, so that the entropy value of the terms in a specific field is lower, and the term T can be calculated through the entropy valueiContext information characteristic value ContextSpe (T)i) The following were used:
Figure BDA0002148215010000085
wherein j is more than or equal to 1 and less than or equal to K, K is the sum of the number of modifiers with the same definition,
Figure BDA0002148215010000086
represents each modifier;
2.2.3 hybrid eigenvalue calculation
The self characteristic value and the context information characteristic calculated by the steps 2.2.1 and 2.2.2 can cover the characteristic of the descriptive word and the information which cannot be described by the word, and finally the mixed characteristic value is obtained by the following formula:
the value of the mixing coefficient alpha is between 0 and 1, the mixing coefficient alpha is set to be 0.65 according to experiments, and the values of the self characteristic value, the context characteristic value and the mixing characteristic value of the service are all between 0 and 1 through normalization processing;
2.2.4 Domain weight calculation, the process is as follows:
2.2.4.1 Domain weight calculation
In the process of Bigraph structure generation, a weight based on a domain characteristic value is required, the weight is embodied by terms of the same level, the larger the definition structure similarity is, the larger the weight of terms of the same level is, and the specific calculation method is as follows:
wherein,is a new term TnSet of sibling terms of (1), hybrid spe (T)s) And hybrid Spe (T)n) Respectively representing the characteristic value of each sibling term and the new term, and directly defining the weight value of 0.5G when the newly added term has no sibling termiFor current Bigraph structures, the bipraph (Bigraph) is a bigram, B ═<BP,BL>The method is proposed by Milner of the lottery winning device, BP and BL are respectively a position graph (place graph) and a connection graph (link graph), BP is a triple BP ═<V,E,P>The node set V, the edge set E and the interface P of the graph form a nested node, the nested node is in a parent-child relationship in the position graph, the branch relationship represents the embedding between the nodes, the BL is also a triple formed by the node set V, the edge set E and the interface P of the graph like the BP, and the BL is used for representing the connection relationship between the nodes;
2.2.4.2 term weight value calculation
The similarity of terms is calculated by comparing the word similarity of two terms, as follows:
Figure BDA0002148215010000094
wherein,
Figure BDA0002148215010000095
and
Figure BDA0002148215010000096
respectively represent in the term TiAnd TnThe number of constituent words in (a),
Figure BDA0002148215010000097
representing the number of the same words in the two terms, defining that a new term comprises more related sub-structure similar terms and the weight is higher, and obtaining the term weight value according to the similarity of the terms, wherein the calculation formula is as follows:
Figure BDA0002148215010000098
where NP is the total set of superior, sibling, and subordinate terms of the term, TiRepresents one of these terms;
2.2.5 Bigraph hierarchy of generative terms
Constructing a Bigraph hierarchical structure of different terms, similar to a Bigraph position graph, wherein each node of the Bigraph represents a term object, the value of the node represents the characteristic value of the term object, and the Bigraph hierarchical structure is constructed from top to bottom by the following steps:
2.2.5.1, calculating the mixed characteristic value of the terms in the WSDL document and extracted from Google according to the formula in 2.2.3, putting the mixed characteristic value into an array A, and selecting the previous 3 term objects as three nodes of a Bigraph to form an initial Bigraph structure T according to ascending order;
2.2.5.2 for the remaining term T in array AnAdded to the existing Bigraph hierarchy if TxSatisfy (hybrid Spec (T)n)-0.3<HybridSpe(Tx)<HybridSpe(Tn) +0.3, then TxMarked as target node, TxDetermining T for the existing Bigraph hierarchy terminology through the target nodesnDetermining the position of the target substructure, thereby determining a candidate Bigraph structure;
2.2.5.3 by considering new terms anddomain weight W of candidate Bigraph structuresDS(Gi) And term weight WTS(Gi) Calculating to obtain final node weight through the following formula, thereby finding out the optimal Bigraph structure;
Wf(Gi)=ωWDS(Gi)+(1-ω)WTS(Gi)
where ω is a coefficient, ranging from 0 to 1, by iteratively running 2.2.5.2-2.2.5.3 until all terms are added to the Bigraph hierarchy;
2.2.6 constructing a similarity matrix:
the similarity is calculated using the following formula:
Figure BDA0002148215010000101
where D represents the maximum number of layers of the Bigraph hierarchy constructed by the term, dis (T)1,T2) Stands for two terms T1,T2Calculating the similarity of each characteristic of the SOAP service according to the shortest distance in the Bigraph hierarchical structure, namely the similarity of the SOAP service on a certain characteristic, taking the sum of the characteristic similarities as the similarity of the service, and constructing the similarity relation between the services into a similarity matrix;
the third step: service clustering
The selection of the cluster center point needs to calculate the value of the integral cluster variance for the points in the data set, but a plurality of non-alternative points exist in the data set, and data noise points and isolated points of edges exist, and the points not only influence the selection of the cluster center, but also can additionally increase the calculation cost, and simultaneously need to manually pre-specify the number of data clusters; in view of the above disadvantages, a density-based K-means algorithm is proposed to be improved, by calculating the density number of each point, extracting a data point with a high density number as a cluster center, and by the improved K-means algorithm, pre-processing clustering is performed on an initial data set S to be clustered, wherein S is composed of M data points with a dimension d, and the point density calculation of the density-based K-means algorithm is as follows:
wherein sensitivity (S)i) Is represented as SiThe total number of points in the range of R, and the distance sim (S)i,Sj) Adopt as service SiAnd SjThe similarity of (2);
for this purpose, the clustering process based on the density K-means algorithm is as follows:
3.1 preprocessing the data by adopting a density-based K-means algorithm and calculating different data SiThe distance between the two clusters is divided into different clusters according to the radius range R, and the Density, namely Density (S), is selected to be the highesti) The highest K SiAs a cluster center, clustering the data by similarity, the process is as follows:
3.1.1 calculation of Each data SiDistance of each data cluster center within the tissue object Q, validation SiSorting the data sets based on density according to the number of points in each data cluster;
3.1.2 select the first K S points with the highest density, i.e. the largest number of R range pointskAs a new data cluster center Ck
3.1.3 obtaining each S according to the distance between the divided different clustersiAnd CkSimilarity sim (S) ofi,Ck) According to the average similarity Avesim, if sim (C)k,Si)>Avesim, then SiPartitioning into data clusters CkFinally obtaining N data clusters;
Figure BDA0002148215010000111
3.2 tissue cell O1 evolutionary rule
O1An Agnes algorithm is adopted as an evolution rule to guide and complete the evolution of objects in cells, N initial clusters obtained through a density k-means algorithm are combined through the Agnes algorithm according to a set inter-cluster similarity threshold Cs, and the process is as follows:
3.2.1 clusters C from any two datai,CjAverage similarity dis (C) of inner datai,Cj) Constructing a similarity matrix D
Figure BDA0002148215010000112
Wherein SXAs a data cluster CiData point of (1), SYAs a data cluster CjThe data points in (1), U and V are Ci,CjThe number of data points in;
3.2.2 choosing dis (C)i,Cj) Largest data cluster Ci,CjAccording to the threshold value of similarity between clusters Cs, if dis (C)i,Cj) Cs then cluster the data CiAnd CjMerging;
3.2.3 repeating step 3.2.2 until all data clusters meet the similarity threshold requirement;
3.3 histiocyte O2Rules of evolution
O2The FCM algorithm based on sample weighting is adopted as an evolution rule to guide and complete the evolution of objects in cells, the difference of samples is not considered in the target function and cluster center calculation of the traditional FCM algorithm, all samples are treated in a same view, but the defect that the influence of isolated points or noise data in a data set is easily expanded exists, so that the contribution of some important samples to clustering is reduced, and the clustering precision is reduced; in order to reduce the influence of sample difference on the clustering effect, the invention provides a sample weighting-based FCM clustering algorithm, which improves the clustering effect by reasonably weighting a target function and a clustering center function;
for data set S ═ S1,s2,…,sn},
3.3.1 calculating FCM membership according to the following formula:
Figure BDA0002148215010000121
wherein u isijRepresents a membership value of the ith data belonging to the jth cluster,i.e. dividing the ith data into the data cluster j, | s with the maximum membershipi-tj| is data siTo the cluster center tjN is the number of data, it can be found that the sum of all the data membership degrees is 1, namely, the sum satisfies
Figure BDA0002148215010000122
j=1,2,…,n;
3.3.2 computing weight and entropy information
The entropy of thermodynamics represents the chaos degree of information, the invention effectively analyzes the membership degree of data based on entropy definition, and carries out sample weighting on an FCM target function, firstly, an entropy variable E is definediRepresenting degree of membership uijAnd by calculating the weight wiMeasurement data siFor the degree of influence of this secondary clustering, they are calculated as follows:
Figure BDA0002148215010000123
Figure BDA0002148215010000124
3.3.3 according to Ei,wiCalculating a new objective function
Weight coefficient wiSatisfy the requirement of
Figure BDA0002148215010000125
The target function F (S, t) of the newly defined FCM is formulated as follows
Figure BDA0002148215010000126
m is a weighting index and is an integer greater than or equal to 1, and in order to solve the extreme value of the objective function under the constraint condition, a Lagrange multiplier method is used for constructing a new objective function as follows:
Figure BDA0002148215010000127
the optimization conditions for extremizing the objective function are as follows:
Figure BDA0002148215010000128
Figure BDA0002148215010000129
Figure BDA0002148215010000131
calculating a new cluster center tjComprises the following steps:
Figure BDA0002148215010000132
updating membership uijDividing the ith data into data clusters Cj with the maximum membership degree
3.3.4 if | F (S, t)i-1-F(S,t)iIf | is greater than the set threshold value, repeating the step 3.3.3, otherwise, ending the algorithm and outputting a result F (S, t)iRepresenting the FCM objective function value obtained by the ith iteration;
3.4 tissue cell O3 evolutionary rule
O3The method adopts three genetic operations of selection, crossing and variation of a Genetic Algorithm (GA) as an evolution rule to guide and complete the evolution of each object in cells, and the evolution steps are as follows:
3.4.1 O3combining m objects in the self cells and objects transferred by other two histiocytes into a new object evolutionary pool P;
3.4.2 O3selecting, crossing and mutating the new object evolution pool P, wherein the selecting operation is carried out by adopting an optimal storage strategy, the crossing and mutating operations adopt integer form crossing and single point mutation,the specific method comprises the following steps:
3.4.2.1 calculates an evaluation value p of each object kkN is the number of data clusters, tiIs the center of the ith data cluster, pmThe smaller the classification method is, the more appropriate the classification method is, the easier the object is to be inherited to the next generation;
Figure BDA0002148215010000134
3.4.2.2 define the fitness function fitness of each object kk
fitnessk=α(1-α)index-1
Wherein alpha is the value range of the set parameter from 0 to 1, and index is the iteration number;
3.4.2.3 selecting operation according to the ratio of object fitness
Figure BDA0002148215010000135
Wherein u is the total number of objects in the object pool, and for each object, a random number p is generated cyclically and randomly, if p is<CifkThen the object is inherited to the next generation;
3.4.2.4, determining the crossing position in the crossing operation by the crossing probability Pc, randomly selecting two objects from the evolution pool to carry out the crossing operation, traversing each component of the object, if the random number p generated by the cycle is less than Pc, exchanging the components of the two objects after the position at the position, and ending the traversal;
3.4.2.5 define the mutation probability PmSetting a random probability P for each object, and if the probability P is less than the mutation probability PmIf z is according to the mutation probability PmThe determined variance point (i.e., a component) of the object has a variance value of zθThe mutated object is represented as:
Figure BDA0002148215010000141
wherein, delta belongs to [0,1], is a random number generated randomly, and the sign of + and-appears according to probability;
3.4.3 repeating steps 3.4.1-3.4.2, O to keep the scale of the objects in the evolutionary pool stable3Screening the evolved objects, eliminating the objects according to the fitness of the objects, and reserving m objects with the highest fitness to form an object evolution pool P';
the fourth step: the data cell updates the global optimal object according to the operation rule
The invention defines a transport rule in a designed tissue P system to guide the information exchange between tissue cells, wherein the rule is as follows:
(x,T1,T2,...Tm,/T′1,T′2,...T′m,y),x≠y,x,y=1,2,3.
this transport rule indicates that histiocyte x and histiocyte y can carry out object transport in both directions, where T1,T2,...TmFor the m objects of tissue cell x, like T1’,T2’,...Tm' m objects of histiocyte y; the following effects can be achieved by the transfer rule:
4.1) m subjects T in tissue cell x1,T2,...TmIs transported into the tissue cell y;
4.2) m subjects T in histiocyte y1’,T2’,...Tm' is transported into tissue cells x;
(x,Txbest/Tbest,OEo),x≠y,x,y=1,2,3.
this transport rule represents the transport of histiocyte x and the systemic environment, where TxbestFor the current calculation of the locally optimal object in the tissue cell x, TbestFor the global optimal object in the current environment, the optimal object in the tissue cell x is transported to the environment by the transport rule, and the environment is updated at the same timeA global optimal object;
the fifth step: shutdown and output
The method comprises the steps of defining a series of calculation steps as a calculation, starting from the histiocytes containing an initial data cell object set, in each calculation, meaning that one or more evolution rules are acted on the current data cell object set, automatically stopping the system when a stopping constraint condition of the system is reached, and presenting the calculation result in the external environment of the system.
In order to reduce the complexity of the system, a simple shutdown condition based on maximum execution calculation is adopted, specifically, the system is stopped when the organization P executes to the set maximum calculation number, and a global optimal object set in the current environment is output
The invention has the beneficial effects that: the method obtains the description text of the processed mashup service by adopting a domain-awareness-based service feature selection method, forms a service description document by using high-efficiency information required by a service feature simplification algorithm extraction method, can effectively remove useless information in the description document, and can better obtain the feature of the service field by considering the importance, the frequency and the domain correlation of feature words in the same domain when simplifying the feature words compared with the traditional TF-IDF calculation method or mutual information method.
For SOAP services, a special Bigraph hierarchical model is generated by extracting hidden term information from a WSDL document, and service information is divided into two types, namely service self information and service context information, by means of a composition word based on the special Bigraph hierarchical model, so that a new term feature value calculation method is introduced. Most terms are composite terms with a set of modifiers, self-information is important to represent a set of internal features in a domain corpus. The context information helps to make up for the deficiency of the information of the service itself. The final feature value is calculated by a combination of the self information and the context information. The similarity can be calculated more accurately.
Meanwhile, by using the k-means based on density and the P-based organization system, the advantages of three clustering algorithms can be effectively combined by taking a hierarchical Agnes algorithm, a Genetic Algorithm (GA) algorithm and a weighted Fuzzy Clustering (FCM) algorithm as evolution rules, so that a better clustering result is obtained.
Detailed Description
The present invention is further explained below.
A Web service mixed evolution clustering method based on membrane computing comprises the following steps:
the first step is as follows: formalized definition, the process is as follows:
1.1 mashup service definition
1.1.1 service document vector model: the preprocessed service document vector model is a four-tuple, RSM ═ RD, RT, RA, T >, where:
RD is a domain feature vector, representing service domain information, defining a service with m domains, and then RD ═ RD1,RD2,…,RDm};
RT is a service description text feature vector, and assuming that there are n service description texts in each domain, the description texts of m domains are represented as RT ═ RT11,RT12,…,RT1n,…,RTmn};
RA is a service API feature vector;
t is a service label feature vector;
each service description text RTijThe characteristic word in (1) is expressed as FWijkWhere i represents a domain variable, j represents a description text variable, and k represents a feature word variable, the service description text RTijMay also be denoted as RTij={FWij1,FWij2,…,FWijsI is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to n, and s is the number of the characteristic words;
1.1.2 service document Cross-Domain concentration: the cross-domain concentration is denoted as DdepIt represents the inclusion service domain RDiCharacteristic word FW in (1)ijkService description document RTijAnd the proportion of the feature words in all domains of the service is calculated according to the following formula:
Figure BDA0002148215010000161
wherein df (FW)ijk,RDi) Representative service domain RDiIn (1), containing a feature word FWijkDescription text RT ofijA number of
Figure BDA0002148215010000162
Representing the inclusion of feature words FW in all domainsijkDescription text RT ofijThe higher the cross-domain concentration ratio is, the higher the concentration ratio of the service document in the domain is, so that the method has stronger field representation;
1.1.3 feature word frequency cross-domain concentration: the cross-domain concentration is denoted as DfreIt stands for the feature word FWijkIn the service domain RDiThe different frequency ratios occurring in all service domains are neutralized, and the calculation is as follows:
Figure BDA0002148215010000163
wherein tf (FW)ijk,RDi) Representing the service domain RDiMiddle and characteristic word FWijkOf a quantity of
Figure BDA0002148215010000164
The number of the feature words appearing in all the service domains is represented, and similarly, the higher feature word frequency cross-domain concentration degree means that the feature words are concentrated to a higher degree in the service domains;
1.1.4 Domain representation of feature words: represents a characteristic word FWijkRepresenting a service domain RDiThe degree of the word frequency is comprehensively calculated according to the cross-domain concentration degree of the service document and the cross-domain concentration degree of the feature word frequency, and the calculation is as follows according to a formula
Dfinal(FWijk,RDi)=α*Ddep(FWijk,RDi)+β*Dfre(FWijk,RDi)
α and β are weighting coefficients, and α + β is 1, the domain representation degrees of all feature words in different service domains can be obtained through the above formula, the higher the domain representation degree of a feature word is, the more the feature word can represent the service domain information, it needs to be noted that a series of typical feature words appear in one service domain, the domain representation degrees of these words are very high but the clustering effect of the service is general, a threshold value of the domain representation degree of a feature word is set, the feature words exceeding the threshold value are filtered, and the representation effect of the feature words on the service domain is improved;
1.1.5 field efficient feature word set: selecting proper feature word sets for representing all feature word sets in a service domain, sorting the feature words in descending order according to domain representation degree of the feature words, and selecting the feature words with the top percentage P in the service domain as the domain efficient feature word sets required by the invention, as shown in the following
HQ(RDi)={FWij1,FWij2,...,FWijp,}
Wherein P is L P/100, if a description text RT is in the process of simplifying the characteristic wordsijCharacteristic word FWijkNot belonging to HQ (RD)i) It is filtered and the service description document RT is updatedij′;
1.2SOAP service term definition:
let TL be { T ═ T1,T2,...TnIs a set of terms in the service corpus, n is the number of terms, a ═ a1,a2,...amIs an atomic vocabulary constituting the term TL, i.e. the vocabulary has not been subdivided, m corresponding to the number of all atomic vocabularies, defining the frequency of the term
Figure BDA0002148215010000165
Namely the term TiThe frequency of occurrence is the same as the sum of the frequency of occurrence of all terms in the corpus TL, and the corresponding frequency of atomic vocabulary
Figure BDA0002148215010000171
And calculating the sum of the occurrence times of all the vocabularies, wherein the calculation formula is as follows:
Figure BDA0002148215010000172
Figure BDA0002148215010000173
NumTLnumber of all terms for TL, NumAAll are the sum of the occurrence times of the atomic vocabularies;
1.3, organization P System (P System) definition:
one degree of 3, i.e., 3, formalized by data cell organization P system is defined as the following octave:
ω=(OB1,OB2,OB3,OR1,OR2,OR3,OR′,OEo)
wherein:
OB1、OB2and OB3Is a set of objects of each tissue cell, namely a data cell set;
OR1、OR2and OR3Representing the clustering rules based on Agnes and k-means algorithm, weighted FCM algorithm and GA algorithm respectively for the evolution rules of each tissue cell;
OR' represents the transport rule of each tissue cell in the whole P system, and the sharing and exchange of objects can be carried out between cells through the transport rule;
OEo is the output area of the system, representing the environment;
the second step is that: service similarity calculation
Judging whether the service is the SOAP service, if the service is the SOAP service, jumping to the step 2.2, and if the service is the mashup service, performing the step 2.1;
2.1 mashup service similarity calculation
2.1.1 service Pre-processing
The method comprises the following steps of preprocessing crawled service information, namely a service domain, a description text, an API (application program interface) and a label, extracting accurate and effective characteristic words in the service information, constructing a service description document with more accurate description, and improving service clustering precision, wherein the preprocessing steps are as follows:
2.1.1.1 constructing an initial feature vector, and segmenting a statement and extracting effective words by using a natural language processing package NLTK;
2.1.1.2 remove invalid words such as symbols (+, -, etc.) and prepositions (a, the, of, and etc.), which are useless in characterizing the service, keeping names, verbs and adjectives that can characterize the service's characteristics;
2.1.1.3 merging and processing word stems, wherein some words with the same word stem often have similar meanings, for example, the characteristics of use, used and using expression are the same, and the root of a word with the same meaning is deleted and reserved;
2.1.2 service feature reduction Process
Different services have unique field characteristics, the importance of characteristic words in the same domain is related to the frequency and the domain correlation, when the service characteristic value weight is calculated, only one factor is considered in the traditional TF-IDF calculation method or mutual information method, and the description text of the service is subjected to characteristic simplification processing by comprehensively considering the word frequency and the correlation factor, and the steps are as follows:
2.1.2.1 traverse each service domain RDiEach of the description texts RTijEach feature word FWijkCalculating the feature word FW according to the formula in 1.1.4ijkRepresenting a service domain RDiDegree D offinal(FWijk,RDi) If D isfinal(FWijk,RDi) If R is less than R, deleting the characteristic value, wherein R is a threshold value of the domain representation degree of the characteristic value;
2.1.2.2 all service domains after completing 2.1.2.1 steps, according to Dfinal(FWijk,RDi) Value of (D), to RDiCharacteristic word FWijkSorting in descending order, and selecting the characteristic words of the top percentage P as the service domain RD according to the step 1.5iDomain efficient feature word set HQ (RD)i);
2.1.2.3 repeat step 2.1.2.2 until a domain efficient feature word set HQ (RD) is generated for all service domainsi) Each service domain RDiAccording to HQ (RD)i) Deleting all absent HQ (RD)i) Is characterized byA word;
2.1.3 topic clustering model construction:
after the characteristic simplification of the RSM is completed, a new service document vector model RSM ' < RD ', RT ', RA ', T ' >, is obtained, the invention constructs an extended LDA topic model based on a plurality of data sources, which is marked as MD-LDA, the method comprises the following steps of simplifying a service description text feature vector RT', a Web API feature vector RA and a label feature vector T, wherein an MD-LDA model is based on hidden Dirichlet distribution (LDA), is a topic model (topic model), can give the topic of each document in a document set according to the form of probability distribution, and fuses various data source features of the service, in the MD-LDA model, the relevant word selection method in the service API and the label is consistent with that in the service document description document RT, and each service API or service label has unique contribution to the theme distribution of the document;
therefore, there is a topic distribution, the alpha hyper-parameter of dirichlet corresponds to RA and T in RSM', the beta hyper-parameter of dirichlet corresponds to word distribution in each topic, then a topic is extracted from the topic distribution according to the selected RA or T, and a specific word is generated through the selected topic, thereby generating an MD-LDA model fusing the service description text, the service API and the service label, the generation process is as follows:
2.1.3.1 for RA or T in RSM', the variable dat is defineddWherein dat isd1,2,3, N being the total number of RA and T in RSM', one polynomial θ is selecteddatObeying alpha hyper-parameter distribution of Dirichlet;
2.1.3.2 for topic K in RSM', K is 1,2
Figure BDA0002148215010000181
Obeying beta hyper-parameter distribution of Dirichlet;
2.1.3.3 setting variable D as the document tag in the reduced service document RSM', D being 1,2dRepresents RA or T in each RSM', for each word w in ddiI 1,2,3, M is the total number of words in d;
extract a Web API or tag denoted as xdiWhere obedience is uniformly distributed is denoted as Uniform (dat)d);
Extracting a subject as zdiIts obedient polynomial distribution is notedDistributing;
extracting a word and recording the word as wdiSubject toDistributing;
each topic in the corresponding MD-LDA probability model
Figure BDA0002148215010000192
The above word distribution is related, the extraction of words is independent of the dirichlet parameter β, x denotes the tag set dat from the APIdSelecting RA or label T related to given single word, each RA or T is distributed with theta on a subject, theta is selected from Dirichlet parameter alpha, and the subject z is formed by combining the subject distribution of RA and T and the word distribution of the subjectdi
And extracting the word w from the selected topicdi
As can be seen from the above description, the posterior distribution of the model topic depends on RT ', RA and T in RSM', and the parameters of the MD-LDA model are set as follows:
θdat|α~Dirichlet(α)
Figure BDA0002148215010000193
xdi|datd~Uniform(datd)
Figure BDA0002148215010000194
Figure BDA0002148215010000195
2.1.3.4 parameter reasoning for MD-LDA model by Gibbs sampling method, which provides a simple and effective method for potential variable estimation, belonging to a Markov chain Monte Carlo algorithm for obtaining random sample sequence by multivariate probability distribution, wherein each step of Gibbs sampling method follows the following formula distribution:
P(zdi=j,xdi=k|wi=m,z_di,x_di,datd)
Figure BDA0002148215010000196
wherein z is_diIndicating a processed word wdiThen assign, x, to each word topic_diIndicating a processed word wdiThen assign each word API or label, nzwRepresenting the total number of words w, m, assigned to the topic zxzRepresenting the total number of words in the Web API and tags assigned to topic z,
Figure BDA0002148215010000197
as a subject zdiThe alpha parameter of (a) is,
Figure BDA0002148215010000198
as a word wdiBeta parameter of (a), number of subjects V, alphav,βvFor alpha parameter and beta parameter of the u-th topic, the word distribution of the topic during sampling
Figure BDA0002148215010000199
The theme distribution theta of the API and the labels is required to be calculated through the following formula;
Figure BDA00021482150100001910
Figure BDA00021482150100001911
zdiand xdiBy determining z_diAnd x_diTo sample decisions, for each RSM ', the invention summarizes all θ' sxCalculate the topic distribution of document d, where χ ∈ dardTo obtain the final topic probability distribution of all RSMs';
2.1.4 similarity calculation
In fact, the theme distribution of the mashup service document is mapped to the text vector space, so that two service documents RSM can be calculated through corresponding theme probability distribution1' and RSM2The similarity of' the topic in the model is the mixed distribution of word vectors, so the relative entropy (KL) distance can be used as the similarity measurement standard, and the specific calculation is as follows;
t stands for all common topics in two service documents, pjAnd q isjRespectively representing the distribution of topics in two documents, when pj=qjAt that time, KL distance calculation result DKL(RSM′1,RSM′2) Is 0, since the KL distance is not of a symmetrical nature, i.e. DKL(RSM′1,RSM′2)≠DKL(RSM′2,RSM′1) So a symmetric version thereof is usually used, the calculation formula is as follows:
DKL(RSM′1,RSM′2)=λDKL(RSM′1,λ*RSM′1+(1-λ)RSM′2)+(1-λ)DKL(RSM′2,λ*RSM′1+(1-λ)RSM′2)
therefore, if λ is equal to 0.5, the above formula is converted into a JS distance, which is also called JS divergence (Jensen-Shannon divergence) and is a variation of the KL distance, and the similarity of the text is calculated by using the JS distance as a standard and is used as the similarity of the service, and the final calculation formula is as follows:
Figure BDA0002148215010000202
2.2 SOAP similarity calculation
2.2.1 calculation of self-eigenvalues
Finding a term T in a service corpusiThe information quantity I (P) can be calculated by an information theory methodi) On this basis, the term T can be usediCharacteristic value of (Spe) (T)i) Assigned as follows
Spe(Ti)=I(Pi)
By computing a joint probability distribution P { P }i,qjIs calculated for the term feature value, where piE is P and qj∈Q,piIs to select a word from the term set TL and qjIs to take a word from the atomic vocabulary A, where { p }1,p2,...pnAnd q1,q2,...,qmAre respectively represented by random variables P, Q, PiAnd q isjThe mutual information calculation of (a) is calculated by the following formula:
Figure BDA0002148215010000211
the term pi has a characteristic value denoted as I (p)iQ), representing the relationship between pi terms and the vocabulary library Q, the formula for calculating pi feature values in combination with the frequency of terms and vocabularies in the corpus is as follows:
Spe(Ti)≈I(Pi,Q)
according to the Bayes' theorem,
Figure BDA0002148215010000212
the final self-information characteristic value SelfSpe (Ti) of the SOAP service is calculated as follows
Figure BDA0002148215010000213
Analyzing conventional WSDL documents generally includes 1 to 2 words, so
Figure BDA0002148215010000214
The vocabulary in the representative terms is approximately set to be 1 for calculation, theta represents a weighted value, the weighted value is set based on an information theory method, and the value range is 0 to 1;
2.2.2 contextual information eigenvalue calculation
According to the information theory approach, the context information characteristic of a service is based on the entropy of the modified term word probability distribution, for which its entropy value is calculated by the following formula:
Figure BDA0002148215010000215
wherein NT represents the term TiModified quantity of (2), (mod)m,Ti) Represents modmModifying the term TiThe entropy value of (d) is determined by all (mod)m,Ti) For the calculation of average information amount, in a specific field, the modifiers of the terms are distributed more closely, so that the entropy value of the terms in a specific field is lower, and the term T can be calculated through the entropy valueiContext information characteristic value ContextSpe (T)i) The following were used:
Figure BDA0002148215010000216
wherein j is more than or equal to 1 and less than or equal to K, K is the sum of the number of modifiers with the same definition,
Figure BDA0002148215010000217
represents each modifier;
2.2.3 hybrid eigenvalue calculation
The self characteristic value and the context information characteristic calculated by the steps 2.2.1 and 2.2.2 can cover the characteristic of the descriptive word and the information which cannot be described by the word, and finally the mixed characteristic value is obtained by the following formula:
Figure BDA0002148215010000221
the value of the mixing coefficient alpha is between 0 and 1, the mixing coefficient alpha is set to be 0.65 according to experiments, and the values of the self characteristic value, the context characteristic value and the mixing characteristic value of the service are all between 0 and 1 through normalization processing;
2.2.4 Domain weight calculation:
2.2.4.1 Domain weight calculation
In the process of Bigraph structure generation, a weight based on a domain characteristic value is required, the weight is embodied by terms of the same level, the larger the definition structure similarity is, the larger the weight of terms of the same level is, and the calculation method is as follows:
Figure BDA0002148215010000222
wherein,
Figure BDA0002148215010000223
is a new term TnSet of sibling terms of (1), hybrid spe (T)s) And hybrid Spe (T)n) Respectively representing the characteristic value of each sibling term and the new term, and directly defining the weight value as 0.5, G, if the newly added term has no sibling termiFor current Bigraph structures, the bipraph (Bigraph) is a bigram, B ═<BP,BL>The method is proposed by Milner of the lottery winning device, BP and BL are respectively a position graph (place graph) and a connection graph (link graph), BP is a triple BP ═<V,E,P>The node set V, the edge set E and the interface P of the graph form a nested node, the nested node is in a parent-child relationship in the position graph, the branch relationship represents the embedding between the nodes, the BL is also a triple formed by the node set V, the edge set E and the interface P of the graph like the BP, and the BL is used for representing the connection relationship between the nodes;
2.2.4.2 term weight value calculation
The similarity of terms is calculated by comparing the word similarity of two terms, as follows:
Figure BDA0002148215010000224
wherein,and
Figure BDA0002148215010000226
respectively represent in the term TiAnd TnThe number of constituent words in (a),representing the number of the same words in the two terms, defining that a new term comprises more related sub-structure similar terms and the weight is higher, and obtaining the term weight value according to the similarity of the terms, wherein the calculation formula is as follows:
Figure BDA0002148215010000228
where NP is the total set of superior, sibling, and subordinate terms of the term, TiRepresents one of these terms;
2.2.5 Bigraph hierarchy of generative terms
A Bigraph hierarchy construction algorithm of terms is provided, a Bigraph hierarchy structure of different terms is constructed, a position graph similar to the Bigraph is constructed, each node of the Bigraph represents a term object, the value of the node represents the characteristic value of the term object, and the Bigraph hierarchy structure is constructed from top to bottom, and the method comprises the following steps:
2.2.5.1, calculating the mixed characteristic value of the terms in the WSDL document and extracted from Google according to the formula in 2.2.3, putting the mixed characteristic value into an array A, and selecting the first 3 term objects as three nodes of a Bigraph to form an initial Bigraph structure T according to ascending order;
2.2.5.2 for the remaining term T in array AnAdded to the existing Bigraph hierarchy if TxSatisfy (hybrid Spec (T)n)-0.3<HybridSpe(Tx)<HybridSpe(Tn) +0.3, then TxMarked as target node, TxFor the existing Bigraph hierarchy terminology, T is determined by these target nodesnDetermining the position of the target substructure, thereby determining a candidate Bigraph structure;
2.2.5.3 by comprehensively considering the new term and the domain weight W of the candidate Bigraph structureDS(Gi) And term weight WTS(Gi) Calculating to obtain final node weight through the following formula, thereby finding out the optimal Bigraph structure;
Wf(Gi)=ωWDS(Gi)+(1-ω)WTS(Gi)
where ω is a coefficient, ranging from 0 to 1, by iteratively running 2.2.5.2-2.2.5.3 until all terms are added to the Bigraph hierarchy;
2.2.6 constructing a similarity matrix:
the similarity is calculated using the following formula:
Figure BDA0002148215010000231
where D represents the maximum number of layers of the Bigraph hierarchy constructed by the term, dis (T)1,T2) Stands for two terms T1,T2Calculating the similarity of each characteristic of the SOAP service according to the shortest distance in the Bigraph hierarchical structure, namely the similarity of the SOAP service on a certain characteristic, taking the sum of the characteristic similarities as the similarity of the service, and constructing the similarity relation between the services into a similarity matrix;
the third step: service clustering
The selection of the clustering center point needs to calculate the value of the integral cluster variance for the points in the data set, but a plurality of non-alternative points exist in the data set, data noise points and isolated points of edges exist, the points not only can influence the selection of the cluster center, but also can additionally increase the calculation cost, and meanwhile, the number of data clusters needs to be artificially pre-specified, and in view of the defects, a density-based K-means algorithm is provided for improvement, the density number of each point is calculated, data points with high density are extracted as the cluster center, the initial data set S to be clustered is subjected to pre-processing clustering through the improved K-means algorithm, the S is composed of M data points with dimensionality d, and the point density calculation of the density-based K-means algorithm is as follows:
Figure BDA0002148215010000232
wherein sensitivity (S)i) Is represented as SiThe total number of points in the range of R, and the distance sim (S)i,Sj) Adopt as service SiAnd SjThe similarity of (2);
for this purpose, the clustering process based on the density K-means algorithm is as follows:
3.1 preprocessing the data by adopting a density-based K-means algorithm and calculating different data SiThe distance between the two clusters is divided into different clusters according to the radius range R, and the Density, namely Density (S), is selected to be the highesti) The highest K SiAs a cluster center, clustering the data by similarity, the process is as follows:
3.1.1 calculation of Each data SiDistance of each data cluster center within the tissue object Q, validation SiSorting the data sets based on density according to the number of points in each data cluster;
3.1.2 select the first K S points with the highest density, i.e. the largest number of R range pointskAs a new data cluster center Ck
3.1.3 obtaining each S according to the distance between the divided different clustersiAnd CkSimilarity sim (S) ofi,Ck) According to the average similarity Avesim, if sim (C)k,Si) If Avesim is greater, then S isiPartitioning into data clusters CkFinally obtaining N data clusters;
Figure BDA0002148215010000241
3.2 tissue cell O1 evolutionary rule
O1An Agnes algorithm is adopted as an evolution rule to guide and complete the evolution of objects in cells, N initial clusters obtained by a density k-means algorithm are combined through the Agnes algorithm according to a set inter-cluster similarity threshold Cs, and the process is as follows:
3.2.1 clusters C from any two datai,CjAverage similarity dis (C) of inner datai,Cj) Constructing a similarity matrix D
Figure BDA0002148215010000242
Wherein SXAs a data cluster CiData point of (1), SYAs a data cluster CjThe data points in (1), U and V are Ci,CjThe number of data points in;
3.2.2 choosing dis (C)i,Cj) Largest data cluster Ci,CjAccording to the threshold value of similarity between clusters Cs, if dis (C)i,Cj) Cs then cluster the data CiAnd CjMerging;
3.2.3 repeating step 3.2.2 until all data clusters meet the similarity threshold requirement;
3.3 histiocyte O2Rules of evolution
O2The FCM algorithm based on sample weighting is adopted as an evolution rule to guide and complete the evolution of objects in cells, the difference of samples is not considered in the target function and cluster center calculation of the traditional FCM algorithm, all samples are treated in a same view, but the defect that the influence of isolated points or noise data in a data set is easily expanded exists, so that the contribution of some important samples to clustering is reduced, and the clustering precision is reduced; in order to reduce the influence of sample difference on the clustering effect, an FCM clustering algorithm based on sample weighting is provided, and the clustering effect is improved by reasonably weighting a target function and a clustering center function;
for data set S ═ S1,S2,...,sn},
3.3.1 calculating FCM membership according to the following formula:
Figure BDA0002148215010000251
wherein u isijRepresenting the membership value of the ith data belonging to the jth cluster, i.e. the ith data is divided into the data cluster j, | s with the maximum membershipi-tj| is data siTo the cluster center tjN is the number of data, it can be found that the sum of all the data membership degrees is 1, namely, the sum satisfies
3.3.2 computing weight and entropy information
The entropy of thermodynamics represents the chaos degree of information, the invention effectively analyzes the membership degree of data based on entropy definition, and carries out sample weighting on an FCM target function, firstly, an entropy variable E is definediRepresenting degree of membership uijAnd by calculating the weight wiMeasurement data siFor the degree of influence of this secondary clustering, they are calculated as follows:
Figure BDA0002148215010000254
3.3.3 according to Ei,wiCalculating a new objective function
Weight coefficient wiSatisfy the requirement of
Figure BDA0002148215010000255
The target function F (S, t) of the newly defined FCM is formulated as follows
Figure BDA0002148215010000256
m is a weighting index and is an integer greater than or equal to 1, and in order to solve the extreme value of the objective function under the constraint condition, a Lagrange multiplier method is used for constructing a new objective function as follows:
Figure BDA0002148215010000257
the optimization conditions for extremizing the objective function are as follows:
Figure BDA0002148215010000258
Figure BDA0002148215010000261
Figure BDA0002148215010000262
calculating a new cluster center tjComprises the following steps:
updating membership uijDividing the ith data into data clusters Cj with the maximum membership degree
3.3.4 if | F (S, t)i-1-F(S,t)iIf | is greater than the set threshold value, repeating the step 3.3.3, otherwise, ending the algorithm and outputting a result F (S, t)iRepresenting the FCM objective function value obtained by the ith iteration;
3.4 tissue cell O3 evolutionary rule
O3The method adopts three genetic operations of selection, crossing and variation of a Genetic Algorithm (GA) as an evolution rule to guide and complete the evolution of each object in cells, and the evolution steps are as follows:
3.4.1 O3m objects in self cell and the cellThe objects transferred by the two histiocytes are combined into a new object evolution pool P;
3.4.2 O3and performing selection, crossing and mutation operations on the new object evolution pool P, wherein the selection operation is performed by adopting an optimal storage strategy, and the crossing and mutation operations adopt integer form crossing and single point mutation, and the method comprises the following steps:
3.4.2.1 calculates an evaluation value p of each object kkN is the number of data clusters, tiIs the center of the ith data cluster, pmThe smaller the classification method is, the more appropriate the classification method is, the easier the object is to be inherited to the next generation;
Figure BDA0002148215010000265
3.4.2.2 define the fitness function fitness of each object kk
fitnessk=α(1-α)index-1
Wherein alpha is the value range of the set parameter from 0 to 1, and index is the iteration number;
3.4.2.3 selecting operation according to the ratio of object fitness
Figure BDA0002148215010000266
Wherein u is the total number of objects in the object pool, and for each object, a random number p is generated cyclically and randomly, if p is<CifkThen the object is inherited to the next generation;
3.4.2.4 the crossing position in the crossing operation is determined by the crossing probability Pc, two objects are randomly selected from the evolution pool to carry out the crossing operation, each component of the object is traversed, if the random number p is generated in a circulating way, if p is less than Pc, the components of the two objects behind the position are exchanged at the position, and the traversal is finished;
3.4.2.5 define the mutation probability PmSetting a random probability P for each object, and if the probability P is less than the mutation probability PmIf z is according to the mutation probability PmThe determined change point of the object (i.e. a certain component),the value after mutation is zθThe mutated object is represented as:
Figure BDA0002148215010000271
wherein, delta belongs to [0,1], is a random number generated randomly, and the sign of + and-appears according to probability;
3.4.3 repeating steps 3.4.1-3.4.2, O to keep the scale of the objects in the evolutionary pool stable3Screening the evolved objects, eliminating the objects according to the fitness of the objects, and reserving m objects with the highest fitness to form an object evolution pool P';
the fourth step: the data cell updates the global optimal object according to the operation rule
Transport channels exist among cell membranes of tissue cells in the system, different objects are shared and exchanged among different tissue cells, transport rules defined by the system are required to support, and transport rules are defined in a designed tissue P system to guide the information exchange among the tissue cells, wherein the rules are as follows:
(x,T1,T2,…Tm,/T′1,T′2,…T′m,y),x≠y,x,y=1,2,3
this transport rule indicates that histiocyte x and histiocyte y can carry out object transport in both directions, where T1,T2,…TmFor the m objects of tissue cell x, like T1’,T2’,…Tm' is the m objects of the tissue cell y, by which the following effects can be achieved:
4.1) m subjects T in tissue cell x1,T2,…TmIs transported into the tissue cell y;
4.2) m subjects T in histiocyte y1’,T2’,…Tm' is transported into tissue cells x;
(x,Txbest/Tbest,OEo),x≠y,x,y=1,2,3
the transport rule represents histiocytes x andthe system environment is in transit, wherein TxbestFor the current calculation of the locally optimal object in the tissue cell x, TbestFor the global optimal object in the current environment, the optimal object in the tissue cell x is transferred into the environment through the transfer rule, and the global optimal object of the environment is updated at the same time;
the fifth step: shutdown and output
In the system, a series of calculation steps are defined as a calculation, starting from the histiocytes containing an initial data cell object set, in each calculation, one or more evolution rules are acted on the current data cell object set, when the shutdown constraint condition of the system is reached, the system is automatically shut down, and the calculation result is presented in the external environment of the system.
In order to reduce the complexity of the system, a simple shutdown condition based on maximum execution of the calculation is adopted, specifically, the system is stopped when the organization P performs to the set maximum calculation number, and the global optimal object set in the current environment is output.

Claims (10)

1. A Web service mixed evolution clustering method based on membrane computing is characterized by comprising the following steps:
the first step is as follows: formalized definition;
the second step is that: calculating the service similarity;
the third step: service clustering
The selection of the cluster center point needs to calculate the value of the integral cluster variance for the points in the data set, but a plurality of non-alternative points exist in the data set, and data noise points and isolated points of edges exist, and the points not only influence the selection of the cluster center, but also can additionally increase the calculation cost, and simultaneously need to manually pre-specify the number of data clusters; in view of the above disadvantages, a density-based K-means algorithm is proposed to be improved, by calculating the density number of each point, extracting a data point with a high density number as a cluster center, and by the improved K-means algorithm, pre-processing clustering is performed on an initial data set S to be clustered, wherein S is composed of M data points with a dimension d, and the point density calculation of the density-based K-means algorithm is as follows:
Figure FDA0002148213000000011
wherein sensitivity (S)i) Is represented as SiThe total number of points in the range of R, and the distance sim (S)i,Sj) Adopt as service SiAnd SjThe similarity of (2);
the fourth step: the data cell updates the global optimal object according to the operation rule
Transport channels exist among cell membranes of tissue cells in the system, different objects are shared and exchanged among different tissue cells, transport rules defined by the system are required to support, and transport rules are defined in a designed tissue P system to guide the information exchange among the tissue cells, wherein the rules are as follows:
(x,T1,T2,…Tm,/T′1,T′2,…T′m,y),x≠y,x,y=1,2,3.
this transport rule represents a bi-directional object transport of histiocyte x and histiocyte y, where T1,T2,…TmFor the m objects of tissue cell x, like T1’,T2’,…Tm' m objects of histiocyte y; the following effects are achieved by the transfer rule:
4.1) m subjects T in tissue cell x1,T2,…TmIs transported into the tissue cell y;
4.2) m subjects T in histiocyte y1’,T2’,…Tm' is transported into tissue cells x;
(x,Txbest/Tbest,OEo),x≠y,x,y=1,2,3.
this transport rule represents the transport of histiocyte x and the systemic environment, where TxbestFor the current calculation of locally optimal pairs in tissue cells xElephant, TbestFor the global optimal object in the current environment, the optimal object in the tissue cell x is transferred into the environment through the transfer rule, and the global optimal object of the environment is updated at the same time;
the fifth step: shutdown and output
Each histiocyte in the system is used as an independent execution unit to perform evolutionary operation in a parallel structure, so the system is distributed in parallel, in the system, a series of calculation steps are defined as a calculation, starting from the histiocyte containing an initial data cell object set, in each calculation, one or more evolutionary rules are acted on the current data cell object set, when a shutdown constraint condition of the system is reached, the system is automatically shut down, and the calculation result is presented in the external environment of the system.
2. The membrane computing-based Web services mixed-evolution clustering method according to claim 1, wherein in the first step, formally defined procedure is as follows:
1.1 mashup service definition;
1.2SOAP services terminology definition:
let TL be { T ═ T1,T2,…TnIs a set of terms in the service corpus, n is the number of terms, a ═ a1,a2,…amIs an atomic vocabulary constituting the term TL, i.e. the vocabulary has not been subdivided, m corresponding to the number of all atomic vocabularies, defining the frequency of the term
Figure FDA0002148213000000021
Namely the term TiThe frequency of occurrence is the same as the sum of the frequency of occurrence of all terms in the corpus TL, and the corresponding frequency of atomic vocabulary
Figure FDA0002148213000000022
And calculating the sum of the occurrence times of all the vocabularies, wherein the calculation formula is as follows:
Figure FDA0002148213000000024
NumTLnumber of all terms for TL, NumAAll are the sum of the occurrence times of the atomic vocabularies;
1.3 organization P System (P System) definition:
one degree of 3, i.e., 3, formalized by data cell organization P system is defined as the following octave:
ω=(OB1,OB2,OB3,OR1,OR2,OR3,OR′,OEo)
wherein:
OB1、OB2and OB3Is a set of objects of each tissue cell, namely a data cell set;
OR1、OR2and OR3Representing the clustering rules based on Agnes and k-means algorithm, weighted FCM algorithm and GA algorithm respectively for the evolution rules of each tissue cell;
OR' represents the transport rule of each tissue cell in the whole P system, and the sharing and exchange of objects can be carried out between cells through the transport rule;
and OEo is the output area of the system, representing the environment.
3. The membrane computing-based Web services mixed evolution clustering method of claim 2, wherein the process of 1.1 is as follows:
1.1.1 service document vector model: the preprocessed service document vector model is a four-tuple, RSM ═ RD, RT, RA, T >, where:
RD is domain feature vector, representing service domain information, defining a service having m domains, then
RD={RD1,RD2,…,RDm};
RT is the service description text feature vector,assuming that there are n service description texts in each domain, the description texts of m domains are denoted as RT ═ RT11,RT12,…,RT1n,…,RTmn};
RA is a service API feature vector;
t is a service label feature vector;
each service description text RTijThe characteristic word in (1) is expressed as FWijkWhere i represents a domain variable, j represents a description text variable, and k represents a feature word variable, the service description text RTijMay also be denoted as RTij={FWij1,FWij2,…,FWijsI is more than or equal to 1 and less than or equal to m, j is more than or equal to 1 and less than or equal to n, and s is the number of the characteristic words;
1.1.2 service document Cross-Domain concentration: the cross-domain concentration is denoted as DdepIt represents the inclusion service domain RDiCharacteristic word FW in (1)ijkService description document RTijAnd the proportion of the feature words in all domains of the service is calculated according to the following formula:
Figure FDA0002148213000000031
wherein df (FW)ijk,RDi) Representative service domain RDiIn (1), containing a feature word FWijkDescription text RT ofijA number of
Figure FDA0002148213000000032
Representing the inclusion of feature words FW in all domainsijkDescription text RT ofijThe higher the cross-domain concentration ratio is, the higher the concentration ratio of the service document in the domain is, so that the method has stronger field representation;
1.1.3 feature word frequency cross-domain concentration: the cross-domain concentration is denoted as DfreIt stands for the feature word FWijkIn the service domain RDiThe different frequency ratios occurring in all service domains are neutralized, and the calculation is as follows:
Figure FDA0002148213000000033
wherein tf (FW)ijk,RDi) Representing the service domain RDiMiddle and characteristic word FWijkOf a quantity of
Figure FDA0002148213000000034
The number of the feature words appearing in all the service domains is represented, and similarly, the higher feature word frequency cross-domain concentration degree means that the feature words are concentrated to a higher degree in the service domains;
1.1.4 Domain representation of feature words: represents a characteristic word FWijkRepresenting a service domain RDiThe degree of the word frequency is comprehensively calculated according to the cross-domain concentration degree of the service document and the cross-domain concentration degree of the feature word frequency, and the calculation is as follows according to a formula
Dfinal(FWijk,RDi)=α*Ddep(FWijk,RDi)+β*Dfre(FWijk,RDi)
Alpha and beta are weight coefficients, and alpha + beta is 1, the domain representation degrees of all the feature words in different service domains are obtained through the formula, the higher the domain representation degree of the feature words is, the more the feature words can represent the service domain information, it needs to be noted that a series of typical feature words appear in one service domain, the domain representation degrees of the words are high but the clustering effect of the service is general, a threshold value of the domain representation degree of the feature words is set, the feature words exceeding the threshold value are filtered, and the representation effect of the feature words on the service domain is improved;
1.1.5 field efficient feature word set: representing all feature word sets in a service domain, selecting a proper feature word set, sorting in descending order according to the domain representation degree of the feature words, and selecting the feature words of the top P percent in the service domain as a required domain efficient feature word set as shown in the following
HQ(RDi)={FWij1,FWij2,…,FWijp,}
Wherein P is L P/100, if a description text RT is in the process of simplifying the characteristic wordsijCharacteristic word FWijkNot belonging to HQ (RD)i) Then it is passed throughFiltering and updating service description document RTij'。
4. The membrane computing-based Web service mixed evolution clustering method according to any one of claims 1 to 3, wherein in the second step, whether the Web service is a SOAP service is judged, if the SOAP service is the SOAP service, the step 2.2 is skipped, and if the SOAP service is a mashup service, the step 2.1 is performed;
2.1 mashup service similarity calculation
2.1.1 service Pre-processing
By preprocessing the crawled service information, namely the domain, the description text, the API and the label of the service, accurate and effective characteristic words in the service information are extracted, a more accurate service description document is constructed, and the service clustering precision is improved;
2.1.2 service feature reduction Process
Different services have unique field characteristics, the importance of characteristic words in the same field is related to frequency and field correlation, when calculating the service characteristic value weight, the traditional TF-IDF calculation method or mutual information method only considers one factor, and the description text of the service is subjected to characteristic simplification processing by comprehensively considering the word frequency and the correlation factor
2.1.3 topic clustering model construction:
after the characteristics of the RSM are simplified, a new service document vector model RSM '< RD', RT ', RA', T '>, is obtained, an extended LDA topic model based on a plurality of data sources is constructed and marked as MD-LDA, wherein the extended LDA topic model comprises a simplified service description text characteristic vector RT', a Web API characteristic vector RA and a label characteristic vector T, the MD-LDA model is based on hidden Dirichlet distribution, the hidden Dirichlet distribution LDA is a topic model, topics of each document in a document set are given out according to a probability distribution mode, various data source characteristics of the service are fused, in the MD-LDA model, word selection methods related to the service API and the labels are consistent with the service document description document RT, and each service API or service label has unique contribution to the topic distribution of the document;
therefore, a topic distribution is provided, the alpha hyper-parameter of the Dirichlet corresponds to RA and T in RSM', the beta hyper-parameter of the Dirichlet corresponds to word distribution in each topic, then a topic is extracted from the topic distribution according to the RA or T selected in the distribution, and a specific word is generated through the selected topic, so that an MD-LDA model fusing a service description text, a service API and a service label is generated;
2.1.4 similarity calculation
In fact, the theme distribution of the mashup service document is mapped to the text vector space, so that two service documents RSM are calculated through corresponding theme probability distribution1' and RSM2The similarity of the' is that the topic in the model is the mixed distribution of word vectors, so the relative entropy KL distance is used as the similarity measurement standard, and the calculation is shown as follows;
Figure FDA0002148213000000051
t stands for all common topics in two service documents, pjAnd q isjRespectively representing the distribution of topics in two documents, when pj=qjAt that time, KL distance calculation result DKL(RSM′1,RSM′2) Is 0, since the KL distance is not of a symmetrical nature, i.e. DKL(RSM′1,RSM′2)≠DKL(RSM′2,RSM′1) So a symmetric version thereof is usually used, the calculation formula is as follows:
DKL(RSM′1,RSM′2)=λDKL(RSM′1,λ*RSM′1+(1-λ)RSM′2)
+(1-λ)DKL(RSM′2,λ*RSM′1+(1-λ)RSM′2)
therefore, if λ is equal to 0.5, the above formula is converted into a JS distance, which is also called JS divergence (Jensen-Shannon divergence) and is a variation of the KL distance, and the similarity of the text is calculated by using the JS distance as a standard and is used as the similarity of the service, and the final calculation formula is as follows:
Figure FDA0002148213000000052
2.2 SOAP similarity calculation
2.2.1 calculation of self-eigenvalues
Finding a term T in a service corpusiThe information quantity I (P) is calculated by an information theory methodi) On this basis, the term T will be usediCharacteristic value of (Spe) (T)i) Assigned as follows
Spe(Ti)=I(Pi)
By computing a joint probability distribution P { P }i,qjIs calculated for the term feature value, where piE is P and qj∈Q,piIs to select a word from the term set TL and qjIs to take a word from the atomic vocabulary A, where { p }1,p2,…pnAnd q1,q2,…,qmAre respectively represented by random variables P, Q, PiAnd q isjThe mutual information calculation of (a) is calculated by the following formula;
Figure FDA0002148213000000053
the term pi has a characteristic value denoted as I (p)iQ), representing the relationship between pi terms and the vocabulary library Q, the formula for calculating pi feature values in combination with the frequency of terms and vocabularies in the corpus is as follows:
Spe(Ti)≈I(pi,Q)
according to the Bayes' theorem,
Figure FDA0002148213000000061
the final self-information characteristic value SelfSpe (Ti) of the SOAP service is calculated as follows
Figure FDA0002148213000000062
Analyzing conventional WSDL documents generally includes 1 to 2 words, so
Figure FDA0002148213000000063
The vocabulary in the representative terms is approximately set to be 1 for calculation, theta represents a weighted value, the weighted value is set based on an information theory method, and the value range is 0 to 1;
2.2.2 contextual information eigenvalue calculation
According to the information theory approach, the context information characteristic of a service is based on the entropy of the modified term word probability distribution, for which its entropy value is calculated by the following formula:
Figure FDA0002148213000000064
wherein NT represents the term TiModified quantity of (2), (mod)m,Ti) Represents modmModifying the term TiThe entropy value of (d) is determined by all (mod)m,Ti) For the calculation of average information amount, in a specific field, the modifiers of the terms are distributed more closely, so that the entropy value of the terms in a specific field is lower, and the term T is calculated through the entropy valueiContext information characteristic value ContextSpe (T)i) The following were used:
Figure FDA0002148213000000065
wherein j is more than or equal to 1 and less than or equal to K, K is the sum of the number of modifiers with the same definition,represents each modifier;
2.2.3 hybrid eigenvalue calculation
Covering the characteristics of the descriptive words and the information which cannot be described by the words by the self characteristic values and the context information characteristics calculated in the steps 2.2.1 and 2.2.2, and finally obtaining the mixed characteristic values by the following formula as follows:
Figure FDA0002148213000000067
the value of the mixing coefficient alpha is between 0 and 1, the mixing coefficient alpha is set to be 0.65 according to experiments, and the values of the self characteristic value, the context characteristic value and the mixing characteristic value of the service are all between 0 and 1 through normalization processing;
2.2.4 calculating the domain weight;
2.2.5 Bigraph hierarchy of generative terms
Providing a term Bigraph hierarchical structure algorithm, constructing Bigraph hierarchical structures of different terms, wherein each node of the Bigraph represents a term object, the value of the node represents the characteristic value of the term object, and the Bigraph hierarchical structure is constructed from top to bottom;
2.2.6 constructing a similarity matrix:
the similarity is calculated using the following formula:
Figure FDA0002148213000000071
where D represents the maximum number of layers of the Bigraph hierarchy constructed by the term, dis (T)1,T2) Stands for two terms T1,T2And calculating the similarity of each characteristic of the SOAP service according to the shortest distance in the Bigraph hierarchical structure, namely the similarity of the SOAP service on a certain characteristic, taking the sum of the characteristic similarities as the similarity of the service, and constructing the similarity relation between the services into a similarity matrix.
5. The membrane computing-based Web services mixed evolution clustering method of claim 4, wherein in 2.1.1, the preprocessing steps are as follows:
2.1.1.1 constructing an initial feature vector, and segmenting a statement and extracting effective words by using a natural language processing package NLTK;
2.1.1.2 remove invalid words such as symbols (+, -, _ etc.) and prepositions (a, the, of, and etc.) that are not useful in characterizing a service, preserving names, verbs and adjectives that characterize the service's characteristics;
2.1.1.3 merging and processing word stems, wherein some words with the same word stem often have similar meanings, for example, the characteristics of use, used and using expressions are the same, and the root of a word with the same meaning is deleted to be reserved.
6. The membrane computing-based Web services mixed evolution clustering method of claim 4, wherein the step of 2.1.2 is as follows:
2.1.2.1 traverse each service domain RDiEach of the description texts RTijEach feature word FWijkCalculating the feature word FW according to the formula in 1.1.4ijkRepresenting a service domain RDiDegree D offinal(FWijk,RDi) If D isfinal(FWijk,RDi)<R, deleting the characteristic value, wherein R is a threshold value of the domain representation degree of the characteristic value;
2.1.2.2 all service domains after completing 2.1.2.1 steps, according to Dfinal(FWijk,RDi) Value of (D), to RDiCharacteristic word FWijkSorting in descending order, and selecting the characteristic words of the top percentage P as the service domain RD according to the step 1.5iDomain efficient feature word set HQ (RD)i);
2.1.2.3 repeat step 2.1.2.2 until a domain efficient feature word set HQ (RD) is generated for all service domainsi) Each service domain RDiAccording to HQ (RD)i) Deleting all absent HQ (RD)i) The feature words of (1).
7. The membrane computing-based Web services mixed evolution clustering method of claim 4, wherein in the 2.1.3, the generation process is as follows:
2.1.3.1 for RA or T in RSM', the variable dat is defineddWherein dat isd1,2,3, …, N being the total number of RA and T in RSM', a polynomial θ is selecteddatSubject to the alpha hyper-parametric distribution of dirichlet, 2.1.3.2 for topic K in RSM ', satisfying K1, 2, …, K being the number of topics in RSM', one polynomial is selected
Figure FDA0002148213000000081
Obeying beta hyper-parameter distribution of Dirichlet;
2.1.3.3 setting variable D as document tag in the simplified service document RSM ', D is 1,2, …, D is the total number of RSM', defining variable datdRepresents RA or T in each RSM', for each word w in ddiI ═ 1,2,3, …, M being the total number of words in d;
extract a Web API or tag denoted as xdiWhere obedience is uniformly distributed is denoted as Uniform (dat)d);
Extracting a subject as zdiIts obedient polynomial distribution is noted
Figure FDA0002148213000000082
Distributing;
extracting a word and recording the word as wdiSubject to
Figure FDA0002148213000000083
Distributing;
each topic in the corresponding MD-LDA probability model
Figure FDA0002148213000000084
The above word distribution is related, the extraction of words is independent of the dirichlet parameter β, x denotes the tag set dat from the APIdSelecting RA or label T related to given single word, each RA or T is distributed with theta on a subject, theta is selected from Dirichlet parameter alpha, and the subject z is formed by combining the subject distribution of RA and T and the word distribution of the subjectdiAnd extracting the word w from the selected subjectdi
As can be seen from the above description, the posterior distribution of the model topic depends on RT ', RA and T in RSM', and the parameters of the MD-LDA model are set as follows:
θdat|α~Dirichlet(α)
Figure FDA0002148213000000085
xdi|datd~Uniform(datd)
Figure FDA0002148213000000086
2.1.3.4 parameter reasoning for MD-LDA model by Gibbs sampling method, which provides a simple and effective method for potential variable estimation, belonging to a Markov chain Monte Carlo algorithm for obtaining random sample sequence by multivariate probability distribution, wherein each step of Gibbs sampling method follows the following formula distribution:
Figure FDA0002148213000000088
wherein z is_diIndicating a processed word wdiThen assign, x, to each word topic_diIndicating a processed word wdiThen assign each word API or label, nzwRepresenting the total number of words w, m, assigned to the topic zxzRepresenting the total number of words in the Web API and tags assigned to topic z,
Figure FDA0002148213000000091
as a subject zdiThe alpha parameter of (a) is,
Figure FDA0002148213000000092
as a word wdiBeta parameter of (a), number of subjects V, alphavvFor alpha parameter and beta parameter of the v-th topic, the word distribution of the topic in the sampling process
Figure FDA0002148213000000093
And subject distribution theta of API, label needs to pass throughThe following formula is obtained;
Figure FDA0002148213000000094
Figure FDA0002148213000000095
zdiand xdiBy determining z_diAnd x_diTo sample decisions, for each RSM ', sum all θ' sxCalculating the topic distribution of the document d, wherein x belongs to datdThus resulting in a final topic probability distribution for all RSMs'.
8. The membrane computing-based Web services mixed evolution clustering method of claim 4, wherein the process of 2.2.4 is as follows:
2.2.4.1 Domain weight calculation
In the process of Bigraph structure generation, a weight based on a domain characteristic value is required, the weight is embodied by terms of the same level, the larger the definition structure similarity is, the larger the weight of terms of the same level is, and the calculation method is as follows:
Figure FDA0002148213000000096
wherein,
Figure FDA0002148213000000097
is a new term TnSet of sibling terms of (1), hybrid spe (T)s) And hybrid Spe (T)n) Respectively representing the characteristic value of each sibling term and the new term, and directly defining the weight value as 0.5, G, if the newly added term has no sibling termiFor current Bigraph structures, the even graph Bigraph is a bituple B ═<BP,BL>Proposed by Milner, Bonus awards, BP, BL are respectively a location graph (place graph) and a link graph (link graph)<V,E,P>The node set V, the edge set E and the interface P of the graphThe nested nodes are in a parent-child relationship in the position diagram, the embedding between the nodes is represented by a branch relationship, BL is a triple composed of a node set V, an edge set E and an interface P of the diagram like BP, and BL is used for representing the connection relationship between the nodes;
2.2.4.2 term weight value calculation
The similarity of terms is calculated by comparing the word similarity of two terms, as follows:
Figure FDA0002148213000000098
wherein,
Figure FDA0002148213000000099
andrespectively represent in the term TiAnd TnThe number of constituent words in (a),
Figure FDA00021482130000000911
representing the number of the same words in the two terms, defining that the more related sub-structure similar terms a new term contains, the higher the weight, and obtaining the term weight value according to the similarity of the terms, wherein the calculation formula is as follows:
Figure FDA0002148213000000101
where NP is the total set of superior, sibling, and subordinate terms of the term, TiRepresents one of these terms.
9. The membrane computing-based Web services mixed evolution clustering method of claim 4, wherein the step of 2.2.5 is as follows:
2.2.5.1, calculating the mixed characteristic value of the terms in the WSDL document and extracted from Google according to the formula in 2.2.3, putting the mixed characteristic value into an array A, and selecting the previous 3 term objects as three nodes of a Bigraph to form an initial Bigraph structure T according to ascending order;
2.2.5.2 for the remaining term T in array AnAdded to the existing Bigraph hierarchy if TxSatisfy (hybrid Spec (T)n)-0.3<HybridSpe(Tx)<HybridSpe(Tn) +0.3, then TxMarked as target node, TxDetermining T for the existing Bigraph hierarchy terminology through the target nodesnDetermining the position of the target substructure, thereby determining a candidate Bigraph structure;
2.2.5.3 by comprehensively considering the new term and the domain weight W of the candidate Bigraph structureDS(Gi) And term weight WTS(Gi) Calculating to obtain final node weight through the following formula, thereby finding out the optimal Bigraph structure;
Wf(Gi)=ωWDS(Gi)+(1-ω)WTS(Gi)
where ω is a coefficient, ranging from 0 to 1, by iteratively running 2.2.5.2-2.2.5.3 until all terms are added to the Bigraph hierarchy.
10. The membrane computing-based Web service mixed evolution clustering method according to one of the claims 1 to 3, wherein in the third step, the clustering process based on the density K-means algorithm is as follows:
3.1 preprocessing the data by adopting a density-based K-means algorithm and calculating different data SiThe distance between the two clusters is divided into different clusters according to the radius range R, and the Density, namely Density (S), is selected to be the highesti) The highest K SiAs a cluster center, clustering the data through similarity;
3.1.1 calculation of Each data SiDistance of each data cluster center within the tissue object Q, validation SiSorting the data sets based on density according to the number of points in each data cluster;
3.1.2 select the first K S points with the highest density, i.e. the largest number of R range pointskAs newData cluster center Ck
3.1.3 obtaining each S according to the distance between the divided different clustersiAnd CkSimilarity sim (S) ofi,Ck) According to the average similarity Avesim, if sim (C)k,Si)>Avesim, then SiPartitioning into data clusters CkFinally obtaining N data clusters;
Figure FDA0002148213000000102
3.2 tissue cell O1 evolutionary rule
O1An Agnes algorithm is adopted as an evolution rule to guide and complete the evolution of objects in cells, N initial clusters obtained by a density k-means algorithm are combined through the Agnes algorithm according to a set inter-cluster similarity threshold Cs, and the process is as follows:
3.2.1 clusters C from any two datai,CjAverage similarity dis (C) of inner datai,Cj) Constructing a similarity matrix D
Figure FDA0002148213000000111
Wherein SXAs a data cluster CiData point of (1), SYAs a data cluster CjThe data points in (1), U and V are Ci,CjThe number of data points in;
3.2.2 choosing dis (C)i,Cj) Largest data cluster Ci,CjAccording to the threshold value of similarity between clusters Cs, if dis (C)i,Cj)>Cs then clusters the data CiAnd CjMerging;
3.2.3 repeating step 3.2.2 until all data clusters meet the similarity threshold requirement;
3.3 histiocyte O2Rules of evolution
O2An FCM algorithm based on sample weighting is adopted as an evolution rule to guide and complete the evolution of objects in cells, and a target function and a cluster center of a traditional FCM algorithmThe calculation does not consider the difference of the samples, and all the samples are treated in a same-view way, but the defect that the influence of isolated points or noise data in a data set is easily expanded exists, so that the contribution of some important samples to clustering is reduced, and the clustering precision is reduced; in order to reduce the influence of sample difference on the clustering effect, an FCM clustering algorithm based on sample weighting is provided, and the clustering effect is improved by reasonably weighting a target function and a clustering center function;
for data set S ═ S1,s2,…,sn},
3.3.1 calculating FCM membership according to the following formula:
Figure FDA0002148213000000112
wherein u isijRepresenting the membership value of the ith data belonging to the jth cluster, i.e. the ith data is divided into the data cluster j, | s with the maximum membershipi-tj| is data siTo the cluster center tjN is the number of data, it can be found that the sum of all the data membership degrees is 1, namely, the sum satisfies
Figure FDA0002148213000000113
3.3.2 computing weight and entropy information
The entropy of thermodynamics represents the chaos degree of information, the data membership is effectively analyzed based on entropy definition, the FCM target function is subjected to sample weighting, and an entropy variable E is defined firstlyiRepresenting degree of membership uijAnd by calculating the weight wiMeasurement data siThe degree of influence on the secondary clusters is shown in the following calculation formula:
Figure FDA0002148213000000114
Figure FDA0002148213000000121
3.3.3 according to Ei,wiCalculating a new objective function
Weight coefficient wiSatisfy the requirement of
Figure FDA0002148213000000122
The target function F (S, t) of the newly defined FCM is formulated as follows
m is a weighting index which is an integer greater than or equal to 1, and in order to solve the extreme value of the objective function under the constraint condition, a Lagrange multiplier method is used for constructing a new objective function as follows:
Figure FDA0002148213000000124
the optimization conditions for extremizing the objective function are as follows:
Figure FDA0002148213000000125
Figure FDA0002148213000000127
calculating a new cluster center tjComprises the following steps:
Figure FDA0002148213000000128
updating membership uijDividing the ith data into data clusters Cj with the maximum membership degree
Figure FDA0002148213000000129
3.3.4 if | F (S, t)i-1-F(S,t)iIf | is greater than the set threshold value, repeating the step 3.3.3, otherwise, ending the algorithm and outputting a result F (S, t)iRepresenting the FCM objective function value obtained by the ith iteration;
3.4 tissue cell O3 evolutionary rule
O3The method adopts three genetic operations of selection, crossing and variation of a Genetic Algorithm (GA) as an evolution rule to guide and complete the evolution of each object in cells, and the evolution steps are as follows:
3.4.1 O3combining m objects in the self cells and objects transferred by other two histiocytes into a new object evolutionary pool P;
3.4.2 O3and performing selection, crossing and mutation operations on the new object evolution pool P, wherein the selection operation is performed by adopting an optimal storage strategy, and the crossing and mutation operations adopt integer form crossing and single point mutation, and the method comprises the following steps:
3.4.2.1 calculates an evaluation value p of each object kkN is the number of data clusters, tiIs the center of the ith data cluster, pmThe smaller the classification method is, the more appropriate the classification method is, the easier the object is to be inherited to the next generation;
Figure FDA0002148213000000131
3.4.2.2 define the fitness function fitness of each object kk
fitnessk=α(1-α)index-1
Wherein alpha is the value range of the set parameter from 0 to 1, and index is the iteration number;
3.4.2.3 selecting operation according to the ratio of object fitness
Figure FDA0002148213000000132
Where u is the total number of objects in the pool of objects, for eachCyclically and randomly generating a random number p if p<CifkThen the object is inherited to the next generation;
3.4.2.4 the crossing position in the crossing operation is determined by the crossing probability Pc, two objects are randomly selected from the evolution pool to carry out the crossing operation, each component of the object is traversed, if the random number p is generated in a circulating way, if p is less than Pc, the components of the two objects behind the position are exchanged at the position, and the traversal is finished;
3.4.2.5 define the mutation probability PmSetting a random probability P for each object, and if the probability P is less than the mutation probability PmIf z is according to the mutation probability PmThe determined variation point of the object, i.e. a certain component, the value after variation is zθThe mutated object is represented as:
Figure FDA0002148213000000133
wherein, delta belongs to [0,1], is a random number generated randomly, and the sign of + and-appears according to probability;
3.4.3 repeating steps 3.4.1-3.4.2, O to keep the scale of the objects in the evolutionary pool stable3And screening the evolved objects, eliminating the objects according to the fitness of the objects, and reserving m objects with the highest fitness to form an object evolution pool P' again.
CN201910692218.0A 2019-07-30 2019-07-30 Web service mixed evolution clustering method based on membrane computing Active CN110659363B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910692218.0A CN110659363B (en) 2019-07-30 2019-07-30 Web service mixed evolution clustering method based on membrane computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910692218.0A CN110659363B (en) 2019-07-30 2019-07-30 Web service mixed evolution clustering method based on membrane computing

Publications (2)

Publication Number Publication Date
CN110659363A true CN110659363A (en) 2020-01-07
CN110659363B CN110659363B (en) 2021-11-23

Family

ID=69036387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910692218.0A Active CN110659363B (en) 2019-07-30 2019-07-30 Web service mixed evolution clustering method based on membrane computing

Country Status (1)

Country Link
CN (1) CN110659363B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475608A (en) * 2020-02-28 2020-07-31 浙江工业大学 Mashup service characteristic representation method based on functional semantic correlation calculation
CN111475609A (en) * 2020-02-28 2020-07-31 浙江工业大学 Improved K-means service clustering method around topic modeling
CN114722897A (en) * 2022-03-01 2022-07-08 西北工业大学 Method for improving battlefield comprehensive situation information processing efficiency
WO2022156328A1 (en) * 2021-01-19 2022-07-28 青岛科技大学 Restful-type web service clustering method fusing service cooperation relationships
TWI845028B (en) * 2022-11-21 2024-06-11 英業達股份有限公司 Service plan automatic generation system and operation method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150327135A1 (en) * 2014-04-24 2015-11-12 Futurewei Technologies, Inc. Apparatus and method for dynamic hybrid routing in sdn networks to avoid congestion and balance loads under changing traffic load
CN106250933A (en) * 2016-08-12 2016-12-21 西华大学 Method, system and the FPGA processor of data clusters based on FPGA
CN107357793A (en) * 2016-05-10 2017-11-17 腾讯科技(深圳)有限公司 Information recommendation method and device
CN108108688A (en) * 2017-12-18 2018-06-01 青岛联合创智科技有限公司 A kind of limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling
CN108470034A (en) * 2018-02-01 2018-08-31 百度在线网络技术(北京)有限公司 A kind of smart machine service providing method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150327135A1 (en) * 2014-04-24 2015-11-12 Futurewei Technologies, Inc. Apparatus and method for dynamic hybrid routing in sdn networks to avoid congestion and balance loads under changing traffic load
CN107357793A (en) * 2016-05-10 2017-11-17 腾讯科技(深圳)有限公司 Information recommendation method and device
CN106250933A (en) * 2016-08-12 2016-12-21 西华大学 Method, system and the FPGA processor of data clusters based on FPGA
CN108108688A (en) * 2017-12-18 2018-06-01 青岛联合创智科技有限公司 A kind of limbs conflict behavior detection method based on the extraction of low-dimensional space-time characteristic with theme modeling
CN108470034A (en) * 2018-02-01 2018-08-31 百度在线网络技术(北京)有限公司 A kind of smart machine service providing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭伟 等: "服务环境下多粒度制造资源自适应组织与发现", 《计算机集成制造系统》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111475608A (en) * 2020-02-28 2020-07-31 浙江工业大学 Mashup service characteristic representation method based on functional semantic correlation calculation
CN111475609A (en) * 2020-02-28 2020-07-31 浙江工业大学 Improved K-means service clustering method around topic modeling
CN111475609B (en) * 2020-02-28 2022-04-05 浙江工业大学 Improved K-means service clustering method around topic modeling
CN111475608B (en) * 2020-02-28 2022-06-17 浙江工业大学 Mashup service characteristic representation method based on functional semantic correlation calculation
WO2022156328A1 (en) * 2021-01-19 2022-07-28 青岛科技大学 Restful-type web service clustering method fusing service cooperation relationships
CN114722897A (en) * 2022-03-01 2022-07-08 西北工业大学 Method for improving battlefield comprehensive situation information processing efficiency
TWI845028B (en) * 2022-11-21 2024-06-11 英業達股份有限公司 Service plan automatic generation system and operation method thereof

Also Published As

Publication number Publication date
CN110659363B (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN110659363B (en) Web service mixed evolution clustering method based on membrane computing
CN110647626B (en) REST data service clustering method based on Internet service domain
CN110533072B (en) SOAP service similarity calculation and clustering method based on Bigraph structure in Web environment
CN107688870B (en) Text stream input-based hierarchical factor visualization analysis method and device for deep neural network
CN110046228B (en) Short text topic identification method and system
CN110264372B (en) Topic community discovery method based on node representation
CN108427756B (en) Personalized query word completion recommendation method and device based on same-class user model
Basha et al. A roadmap towards implementing parallel aspect level sentiment analysis
Özbakır et al. Exploring comprehensible classification rules from trained neural networks integrated with a time-varying binary particle swarm optimizer
CN116521882A (en) Domain length text classification method and system based on knowledge graph
Palanivinayagam et al. An optimized iterative clustering framework for recognizing speech
CN113468291A (en) Patent network representation learning-based automatic patent classification method
Yuan et al. Dde-gan: Integrating a data-driven design evaluator into generative adversarial networks for desirable and diverse concept generation
Chen et al. DPM-IEDA: dual probabilistic model assisted interactive estimation of distribution algorithm for personalized search
Viadinugroho et al. A weighted metric scalarization approach for multiobjective BOHB hyperparameter optimization in LSTM model for sentiment analysis
Mnih et al. Learning label trees for probabilistic modelling of implicit feedback
CN117194771B (en) Dynamic knowledge graph service recommendation method for graph model characterization learning
CN116304063A (en) Simple emotion knowledge enhancement prompt tuning aspect-level emotion classification method
Chen et al. Gaussian mixture embedding of multiple node roles in networks
CN111538898B (en) Web service package recommendation method and system based on combined feature extraction
Lin et al. Copula guided parallel gibbs sampling for nonparametric and coherent topic discovery
Papenkov et al. Multi-Industry Simplex: A Probabilistic Extension of GICS
CN117033775B (en) Knowledge graph-based industrial software component recommendation method and system
Schmaltz et al. Approximate Conditional Coverage via Neural Model Approximations
Granata Tentative reflections on construction of assessment models for Buildings’ sustainability certification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant