CN110533072A - Based on the SOAP service similarity calculation and clustering method of Bigraph structure under Web environment - Google Patents
Based on the SOAP service similarity calculation and clustering method of Bigraph structure under Web environment Download PDFInfo
- Publication number
- CN110533072A CN110533072A CN201910692227.XA CN201910692227A CN110533072A CN 110533072 A CN110533072 A CN 110533072A CN 201910692227 A CN201910692227 A CN 201910692227A CN 110533072 A CN110533072 A CN 110533072A
- Authority
- CN
- China
- Prior art keywords
- data
- term
- bigraph
- cluster
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
- G06N3/126—Evolutionary algorithms, e.g. genetic algorithms or genetic programming
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Genetics & Genomics (AREA)
- Physiology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Based on the SOAP service similarity calculation and clustering method of Bigraph structure under Web environment.A kind of SOAP service similarity calculation and clustering method based on Bigraph structure, comprising the following steps: first step formal definitions;Second step characteristic value calculates;Third step field weight calculation;Step 4: generating the Bigraph hierarchical structure of term;5th step constructs similarity matrix;6th step service cluster;7th step, data cell are according to operating rule with new global optimum's object;8th step, each histocyte are run as individual execution unit with parallel structure evolution; defining a series of calculating step is a calculating; since the histocyte comprising primary data cell object collection; in each is calculated; it can mean that one or more evolutionary rule is applied on current data cell object collection; when reaching the shutdown constraint condition of system, system autostop, calculated result is presented in the external environment of system.The present invention can more accurately calculate similarity, obtain better cluster result.
Description
Technical field
The present invention relates to web services similarity clustering problems, especially SOAP service similarity clustering problem
Background technique
With the development of 2.0 technology of Web, quantity of service and its type on internet are continuously increased, this is more to hold
Easily, faster mode develops Internet of Things application and provides possibility, so that how accurately and effectively to find required atomic service
Or Services Composition becomes a problem.Service clustering technique can be effectively facilitated service discovery, in recent years, it has been suggested that
Many different types of service clustering methods cluster Mashup service, Web API and Web service.
Existing method mainly utilizes the information such as Mashup in service describing to describe, and API description, WSDL document etc. will take
The similitude described of being engaged in carries out service cluster as the functional similarity of service.Other method is used by further exploring
Family marks the information in label, to improve the performance of service cluster.Obviously, service describing and service labels are all text informations.
In general, these methods speculate service similarity by semantic similarity, the cluster operation to service is instructed.In fact, they mention
The measuring similarity standard of the similarity for being mostly used in quantification service description and label out is all based on the semanteme in text
Information.In addition, Pan W et al. propose it is a kind of based on the novel Mashup of structural similarity and genetic algorithm service cluster side
Method describes Mashups by bimodulus figure, the relationship between Web APIs, quantifies each pair of Mashup using SimRank algorithm and takes
Mashup is finally serviced effectively clustering by the structural similarity between business.Lu Jiawei et al. passes through the clothes that will be isolated
It is a kind of global social interaction server network that business, which is coupled, calculates the social similarity between service, proposes one kind towards global social interaction server
The service clustering method of net, the description, service field, QoS information for comprehensively considering service carry out the calculating of phase knowledge and magnanimity, to improve
Service the precision of cluster.
Currently, most of existing methods calculate SOAP service by using service function description (WSDL document)
Functional similarity between Web service executes service cluster operation, and Liu et al. people describes to mention in text from the WSDL of Web service
Four characteristics of Web service: content, context, host name and Web Service name are taken, to carry out Web service cluster.
Elgazzar et al. analyzes WSDL document, and is clustered them according to functional similarity, and Yu and Rege also proposed a kind of benefit
The clustering method that service discovery is improved with service community learning algorithm, in addition, ontology is also commonly used between Web service
Semantic Similarity Measurement and matching, to promote the cluster and discovery of service.Such as Pop et al. devises a module
Assessment description two Semantic Web Services Ontological concept between matching degree, and using ant-based method to they into
Row cluster, to realize efficient service discovery.Nayak et al. is based on cluster hierarchical clustering algorithm, proposes with additional semantic
Web service with cluster is found.
It further comprises in the clustering method of functional similarity and is clustered with the label information of service, such as Wu et al.
A kind of new method for being known as WTCluster is proposed, promotes the cluster and discovery of Web service using label, and use
LDA model integrates label data and WSDL document, and the probability topic distribution for obtaining Web service improves service cluster
Precision.Aznag et al. proposes a kind of alogical matching process, and this method uses relevant topic model from semantic service
Theme is extracted in description, and the correlation between the theme of extraction is modeled.
Non-functional factor, such as relationship, the service quality (QoS) between service context, service, also by many researchs
Personnel are used to refine and enhance service discovery and cluster, such as Zhou et al. and are inputted, exported, language based on service offer element
Adopted relationship proposes a kind of improved Fuzzy C-Mean Algorithm and is clustered, and Skoutas et al. uses multi-standard dominance relationship pair
Web service has carried out sequence and cluster, and Chen et al. describes a kind of mixing QoS prediction technique, can alleviate collaborative filtering
Data sparsity problem, Kumara et al. propose a kind of service recommendation method based on cluster, and this method uses between service
Semantic Similarity and relevance come to service carry out cluster, and by filtering process selection with more preferable qos value service
Cluster provides service for the service currently called.
Summary of the invention:
To solve the problems, such as that SOAP service clusters under web environment, the present invention is by extracting hiding term from WSDL document
Information on services is divided into two classes by information, i.e. service self information and Service context information, to calculate term characteristic value, and is led to
It crosses calculated characteristic value and generates special Bigraph hierarchical model.SOAP service is calculated by Bigraph hierarchical model
Similarity, in combination with the k-means algorithm preprocessed data collection based on density, using tissue P system, in conjunction with based on level
Agnes algorithm is divided, genetic algorithm (GA) is based on, is based on weighted fuzzy clustering (FCM) algorithm, is proposed a kind of based on Bigraph
The SOAP service similarity calculation and clustering method of structure.
In order to solve the above technical problem, the present invention provides the following technical solutions:
A kind of SOAP service similarity calculation and clustering method based on Bigraph structure, comprising the following steps:
First step formal definitions
1.1, term defines:
Enable TL={ T1,T2,…TnIt is the one group of term set serviced in corpus, n is the quantity of term, A={ a1,
a2,…amIt is the atomic vocabulary for forming term TL, i.e. the vocabulary can not be subdivided again, and m corresponds to all atomic vocabulary numbers
Amount, defines the frequency of termThat is term TiThe number of the appearance of appearance, the whole terms for being same as calculating in corpus TL go out
The summation of occurrence number, corresponding atomic vocabulary frequencyCalculate the summation of all vocabulary frequency of occurrence, under calculation formula shown in:
NumTLFor all term quantity of TL, NumAPromising atomic vocabulary frequency of occurrence summation;
1.2, tissue P system (P System) definition:
One degree by data cell tissue P system can take formal definitions as following eight tuple for i.e. 3 of 3:
ω=(OB1,OB2,OB3,OR1,OR2,OR3,OR',OEo)
Wherein:
OB1、OB2And OB3For each histiocytic object set, i.e. data cell aggregation;
OR1、OR2And OR3For each histiocytic evolutionary rule, respectively represent based on Agnes and k-means algorithm, base
In weighted FCM algorithm and clustering rule based on GA algorithm;
OR' represents each histiocytic transhipment rule in entire P system, regular by transhipment, can between cell and cell
To carry out the shared of object and exchange;
OEo=0 is the output area of system, represents environment;
1.3, organization object defines
In Data Clustering Algorithm, tissue P system function is that optimal cluster is searched for for the data set clustered
Therefore the cluster centre of data is indicated that defining the histocyte object T in P system is one by center with a group objects
N*d dimension vector, as follows:
T=(t11, t12..., t1d..., ti1, ti2..., tid..., tN1, tN2..., tNd)
Wherein N, which represents data cell T, N number of cluster, this N number of cluster C1,C2,…,CNCorresponding cluster center is t1,t2,…,
tN, it is similar to data point, each of object cluster center is all a d dimension vector, then tiIt can be expressed as ti1,ti2,…
tid, i=1,2 ..., N.tidRepresent d-th of component at i-th of aggregate of data center;
OBiThe object set in P system in i-th of evolution film is represented, interior includes a group objects, these objects pass through
Evolutionary mechanism in different tissues cell carries out evolution reaction, and defining the initial object quantity in each evolution film is m, group
At its object set Q. in the evolutionary process of P system is implemented, system needs the superiority and inferiority of a tolerance mechanism evaluation existing object,
By the clustering problem performance function J for calculating sample integral tufts variancem, the high-quality judgement of object is carried out, wherein sjRepresent data
Some data set in cluster, JmIt is worth smaller, illustrates that object is better, is sorted by the judgement of object, have one in each evolution film
Its a optimal object, i.e. local optimum object OBibest, and an optimal object is preserved in the environment of system, i.e., it is global most
Excellent object, is denoted as Tbest;When whole system reaches shutdown status, global optimum's object in environment is required
Solution and optimal cluster centre;
Second step characteristic value calculates, and process is as follows:
2.1 unique characteristics values calculate
A term T is found in service corpusi, its information content I (P is calculated by method of information theoryi), in this base
It, can be by term T on plinthiCharacteristic value Spe (Ti) assignment is as follows
Spe(Ti)=I (Pi) (3)
By calculating joint probability distribution P { pi,qjCalculate term characteristic value, wherein pi∈ P and qj∈ Q, piBe from
One word of selection in terminology TL, and qjIt is one word of acquirement from atomic vocabulary A, wherein { p1,p2,…pnAnd { q1,
q2,…,qmIndicated respectively by stochastic variable P, Q, piAnd qjMutual information calculating calculated by following formula;
The list of feature values of term pi is shown as I (pi, Q), the relationship of pi term and lexicon Q is indicated, in conjunction with art in corpus
The formula that the frequency of language and vocabulary calculates pi characteristic value is as follows:
Spe(Ti)≈I(Pi, Q) and (5)
According to Bayes' theorem,
The self information characteristic value SelfSpe (Ti) of final SOAP service calculates as follows
Term generally comprises 1 to 2 vocabulary in the WSDL document of analytic routines, thereforeVocabulary in representative term
It is set approximately to 1 calculating, θ represents weighted value, sets based on method of information theory, and value range is 0 to 1;
2.2 contextual information characteristic values calculate
According to method of information theory, the contextual information of service is characterized in the entropy of the distribution of the term Word probability based on modification, is
This, is calculated by the following formula its entropy;
Wherein NT representative term TiModification quantity, (modm,Ti) represent modmModify term TiProbability, entropy is by institute
(the mod havingm,Ti) average information is calculated, in a specific field, the qualifier distribution of term is more close, therefore
Term entropy in a specific neck is lower, calculates term T by entropyiContextual information characteristic value
ContextSpe(Ti) it is as follows:
Wherein 1≤j≤K, K be all identical definition qualifier quantity and,Represent each qualifier.
2.3 composite character values calculate
The unique characteristics value and contextual information feature calculated by formula (7) and (9), cover the feature of descriptor with
And the information that word itself cannot describe, the characteristic value for acquiring mixing eventually by formula (10) are as follows:
Mixed coefficint α value is 0.65 according to experimental setup between zero and one, by normalized, itself spy of service
Value indicative, contextual feature value and composite character value value are between zero and one;
Third step field weight calculation, process are as follows:
3.1 field weighted values calculate
The size of weight is embodied by term at the same level, and the weight of the bigger term at the same level of definition structure similarity is bigger,
Calculation method is as follows.
Wherein,For a new terminology TnTerm set at the same level, HybridSpe (Ts) and HybridSpe (Tn) respectively
Represent the characteristic value of each term and new term at the same level.If newly added term is without term at the same level, directly definition is weighed
Weight values are 0.5, GiFor current Bigraph structure, bigraph (Bigraph) be binary group a B=<BP, BL>, be by figure spirit prize
Winner Milner proposes that BP, BL are the location drawing (place graph) and connection figure (link graph) respectively;BP is a three
Tuple, BP=<V, E, P>be made of the node collection V of figure, the set E and interface P on side, nested node are father in the location drawing
Subrelation indicates embedding between node with branch's relationship.BL is equally also by the node collection V of figure, the set E on side and to connect with BP
Mouth P forms a triple, and BL is used to indicate the connection relationship between node;
3.2 term weighted values calculate
The similarity of term is calculated by comparing the word similarity of two terms, is calculated as follows:
Wherein,WithIt respectively represents in term TiAnd TnIn composition word quantity,Represent this two
Same word quantity in a term, defines that the related sub-structures term similar that a new terminology includes is more, then weight is got over
It is high.Term weighted value is acquired according to the similarity of term, calculation formula is as follows.
Wherein NP is the total collection of the higher level of term, peer and junior's term, TiRepresent one in these term items;
Step 4: generating the Bigraph hierarchical structure of term:
The Bigraph hierarchical structure for constructing different terms, similar to the location drawing of Bigraph, wherein Bigraph's is every
One node on behalf, one term object, the value of node represent the characteristic value of the term object, and the Bigraph hierarchical structure is certainly
It is constructed under above, steps are as follows:
4.1, the composite character value for calculating the term that WSDL document neutralization is extracted from Google according to formula (10) is put into
It in array A, and is arranged according to ascending order, selects the term object of front 3 to constitute as three nodes of Bigraph initial
Bigraph structure T;
4.2, for term T remaining in array An, it is added in existing Bigraph hierarchical structure, if TxMeet
(HybridSpe(Tn)-0.3<HybridSpe(Tx)<HybridSpe(Tn)+0.3, then by TxLabeled as destination node, TxFor
The term of some Bigraph levels, by these destination nodes, to determine TnLocating target minor structure position, so that it is determined that
Both candidate nodes minor structure;
4.3, by comprehensively considering the field weight W of new terminology and candidate minor structureDS(Gi) and term weight WTS(Gi),
Final node weights are calculated by formula 14, to find optimal minor structure;
Wf(Gi)=ω WDS(Gi)+(1-ω)WTS(Gi) (14)
Wherein, ω is coefficient, and range runs 4.2-4.3 until all terms are added to 0 to 1, by iteration
In Bigraph level;
5th step constructs similarity matrix:
Similarity is calculated using following formula:
Wherein, the maximum number of plies for the Bigraph hierarchical structure that D representative term is constituted, dis (T1,T2) represent two terms
T1,T2The shortest distance in the Bigraph hierarchical structure, i.e. similarity of the SOAP service in some feature calculate SOAP
The similarity for servicing each feature, the similarity by the sum of feature similarity as service, by the similarity relationship between service
It is built into similarity matrix;
6th step service cluster
The point that the selection of cluster centre point needs to concentrate data calculates the value of integral tufts variance, but deposits in data set
In many non-alternative points, there are the isolated points of data noise and edge, and these points not only will affect the selection at cluster center, and
And can additionally increase calculating cost, while needing the quantity of artificial predesignated aggregate of data, the present invention is lacked in view of above
Point proposes that a kind of K-means algorithm based on density improves, and by calculating the density number of each point, extracts highly dense degree
Data point as cluster center.By improved K-means algorithm, it is poly- that pretreatment is carried out to initial data set S to be clustered
Class, S are made of the data point that M dimension is d, and the dot density of the K-means algorithm based on density calculates as follows:
Wherein Density (Si) represent in SiR within the scope of put total number, distance calculate sim (Si,Sj) adopt as service
SiAnd SjSimilarity.
For this purpose, the cluster process based on density K-means algorithm is as follows:
6.1, to data prediction, pass through calculating different data S using based on density K-means algorithmiThe distance between,
According to radius R, data are divided into different clusters, choose density highest, i.e. Density (Si) highest K SiMake
For cluster center, finally by similarity to data clusters, process is as follows:
6.1.1 each data S is calculated according to formula 16iIn organization object Q at a distance from each aggregate of data center, really
Recognize SiNumber at each aggregate of data midpoint is ranked up data acquisition system based on density;
6.1.2 the most S of quantity in K density highest, that is, R range point before choosingk, as new aggregate of data center Ck。
6.1.3 according to the distance between different clusters of division, each S is obtainediAnd CkSimilarity sim (Si,Ck), according to
Average similarity Avesim, if sim (Ck,Si) > Avesim, then by SiIt is divided into aggregate of data Ck, finally obtain N number of aggregate of data;
6.2 histocyte O1 evolutionary rule
O1Using Agnes as evolutionary rule, guidance completes intracellular object and evolves, according to similarity between setting cluster
Threshold value Cs merges the N number of initial cluster obtained by density k-means algorithm by Agnes algorithm, and process is as follows:
6.2.1 according to any two aggregate of data Ci,CjAverage similarity dis (the C of interior datai,Cj), construct similarity moment
Battle array D
Wherein SXFor aggregate of data CiIn data point, SYFor aggregate of data CjIn data point, U, V are respectively Ci,CjMiddle data
The quantity of point;
6.2.2 dis (C is selectedi,Cj) maximum aggregate of data Ci,Cj, according to similarity threshold Cs between cluster, if dis (Ci,Cj)
> Cs is then by aggregate of data CiAnd CjMerge;
6.2.3 step 6.2.2 is repeated until meeting similarity threshold requirement between all aggregates of data;
6.3 histocyte O2Evolutionary rule
O2Using the FCM algorithm based on sample weighting as evolutionary rule, guidance completes intracellular object and evolves, tradition
FCM algorithm objective function and cluster center calculation do not consider the otherness of sample, one is carried out to all samples and is treated as
Benevolence processing, but have the defects that the isolated point in easy dilated data set or noise data influence, to reduce some heavy
Sample is wanted to the contribution of cluster, leads to the accuracy decline of cluster.It is influenced to reduce sample variation to Clustering Effect, the present invention
A kind of FCM clustering algorithm based on sample weighting is proposed, by being reasonably weighted to objective function and cluster centre function
Processing improves Clustering Effect;
For data set S={ s1,s2,…,sn,
6.3.1 FCM degree of membership is calculated according to the following formula:
Wherein uijThe angle value that is subordinate to that i-th of data belongs to jth cluster is represented, i.e. it is maximum to be divided into degree of membership for i-th of data
Aggregate of data j, | | si-tj| | it is data siTo cluster center tjEuclidean distance, n is data bulk, it is found that all data
The sum of degree of membership is 1, that is, is metJ=1,2 ..., n;
6.3.2 weight and entropy information are calculated
The confusion degree of thermodynamic entropy representative information, the present invention is based on entropy definition effectively to be divided data degree of membership
Analysis, and sample weighting is carried out to FCM objective function, Entropy Changes amount E is defined firstiRepresent degree of membership uijEffectiveness, and lead to
Cross calculating weight wiMeasure data siIt is shown under their calculation formula to the influence degree of the secondary cluster:
6.3.3 according to Ei,wiCalculate new objective function
Weight coefficient wiMeetThen newly define the objective function F (S, t) such as formula (22) of FCM:
M is Weighted Index, is greater than the integer equal to 1, in order to seek the extreme value of objective function under Prescribed Properties, is utilized
Method of Lagrange multipliers constructs the following function of fresh target function:
Ask extreme value optimal condition as follows objective function:
Calculate new cluster centre tjAre as follows:
Update degree of membership uij, i-th of data is divided into the maximum Data Data cluster of degree of membership
If 6.3.4 | F (S, t)i-1-F(S,t)i| greater than the threshold value of setting, step 6.3.3 is repeated, otherwise terminates algorithm,
It exports result F (S, t)iIndicate the FCM target function value that i-th iteration obtains;
6.4 histocyte O3 evolutionary rules
O3Using three kinds of the selecting of GA, intersection, variation genetic manipulations as evolutionary rule, guidance is completed each right into the cell
The evolution of elephant, evolutionary step are as follows:
6.4.1O3It is by m object in own cells and by the object merging that other two histocyte transhipments come
New object evolution pond P;
6.4.2O3Selection, intersection and mutation operation are executed to new object evolution pond P, wherein selection operation is using optimal
Conversation strategy carries out, and intersects and mutation operation is made a variation using the intersection and single-point of integer form, the specific method is as follows:
6.4.2.1 the assessed value p of each object k is calculatedk, N is the quantity of aggregate of data, tiFor the center of i-th of aggregate of data,
pmSmaller to illustrate that classification method is more suitable, the object is easier to be genetic to the next generation.
6.4.2.2 each object k fitness function fitness is definedk
fitnessk=α (1- α)index-1 (30)
Wherein α be the parameter set value range as 0 to 1, index be the number of iterations.
6.4.2.3 selection operation, according to object fitness institute accounting
Wherein u is the sum of object in object pool, and for each object, a random number p is randomly generated in circulation, if p <
CifkThe object is then genetic to the next generation;
6.4.2.4 the crossover location in crossover operation is determined by crossover probability Pc, selects two from evolution pond at random
Object carries out crossover operation, each component of traverse object, if following bad generation random number p p < Pc, exchanges two in the position
Object in the position after component, terminate traversal;
6.4.2.5 defining mutation probability Pm, for each object, random chance p is set, if Probability p is less than variation
Probability PmIf z is according to mutation probability PmThe change point (i.e. some component) of identified object, the then value after making a variation are
zθ, the object after variation is expressed as:
Wherein [0,1] δ ∈ is the random number generated at random ,+,-number foundation a probability occurs;
6.4.3 step 6.4.1-6.4.2 is repeated, to keep the object scale in evolution pond to stablize, O3To pair after evolution
As being screened, carried out according to the fitness of object it is superseded, retain the highest m object of fitness reconstitute object evolve
Pond P';
7th step data cell is according to operating Policy Updates global optimum object
Between histiocytic cell membrane in system exist transhipment channel, different objects different histocytes it
Between shared and exchanged, the transhipment rule support that the system of requiring defines defines transhipment in the tissue P system of design
Rule instructs to exchange between histocyte information, and rule is as follows:
(x, T1, T2... Tm,/T '1, T '2... T 'm, y), x ≠ y, x, y=1,2,3.
This transhipment rule represent histocyte x and histocyte y can two-way carry out object transhipment, wherein T1,T2,…
TmFor the m object of histocyte x, similarly T1’,T2’,…Tm' be histocyte y m object, can by the transhipment rule
To reach following effect:
7.1) m object T in histocyte x1,T2,…TmIt is transported in histocyte y,
7.2) m object T in histocyte y1’,T2’,…Tm' be transported in histocyte x;
(x, Txbest/Tbest, OEo), x ≠ y, x, y=1,2,3.
This transhipment rule represents histocyte x and system environments is transported through, wherein TxbestIt is thin for current computation organization
Local optimum object in born of the same parents x, TbestFor global optimum's object in current environment, rule, histocyte are transported by this
Optimal object in x is transported in environment, and at the same time updating global optimum's object of the environment;
8th step is shut down and output
Each histocyte in system is run as individual execution unit with parallel structure evolution, therefore the system
It is parallel distributed, within the system, defines a series of calculatings step and be one and calculate, it can be from including primary data
The histocyte of cell object collection starts, and in each is calculated, can mean that one or more evolutionary rule is applied
In on current data cell object collection, when reaching the shutdown constraint condition of system, system autostop, calculated result is in
Now in the external environment of system.
In order to reduce the complexity of system, using the halt condition simply calculated based on maximum execution, specifically,
The shutdown when the tissue P system goes to the max calculation number of setting, and export global optimum's object set in current environment.
The invention has the benefit that generating special Bigraph by extracting hiding term information from WSDL document
Hierarchical model, by based on composition word information on services is divided into two classes, i.e., service self information and service context letter
Breath introduces a kind of new term characteristics value calculating method.Most of terms are the composite terms with one group of modifier, from
I is important one group of internal feature in representative domain corpus information.Contextual information helps to make up service itself
The deficiency of information.Final characteristic value is calculated by the combination of self information and contextual information.It can more accurately calculate similar
Degree.
Simultaneously using the k-means based on density and based on tissue P system is used, will be calculated based on distinguishing hierarchy Agnes
Method as evolutionary rule can be effectively combined these three based on genetic algorithm (GA), based on weighted fuzzy clustering (FCM) algorithm
The advantages of clustering algorithm, obtains better cluster result,
Detailed description of the invention:
Fig. 1 is the SOAP service similarity calculation flow chart based on Bigraph structure.
Fig. 2 is term set.
Specific embodiment
The invention will be further described below in conjunction with the accompanying drawings
Referring to Figures 1 and 2, the SOAP service similarity calculation under a kind of Web environment based on Bigraph structure and cluster
Method, service self information includes the term information for constituting vocabulary and the internal structure of the term, and most important term is compound
Term, this facilitates to indicate that the meaning of term.The internal structure of each component word or term determines the feature of this service
Value, if a term includes multiple words, this term specific total is greater than its some internal word.By
This is as it can be seen that a term word comprising more multi-semantic meaning has higher specificity.
For example, three composite terms of analysis: T1=NovelAuthor, T2=FictionNovelAuthor and T3=
ScienceFictionNovelAuthor.As can be seen that Novel, Fiction and Science are equal in above three term
For qualifier, and Author is defined terms, composite terms T1It is the combination of qualifier Novel and defined terms Author, so as to
To think that Author and Science has subordinate hierarchical relationship, similarly, composite terms T2By in T1Middle addition one is new to repair
Excuse Fiction is constituted, T3By in T2Middle addition Science is constituted, and with the increase of qualifier quantity, composite terms have
More specifical meaning is also implied that with higher characteristic value, therefore composite terms T1、T2And T3Characteristic value sequence are as follows: T1<
T2<T3。
It is as follows in conjunction with Fig. 1 specific implementation step:
First step formal definitions
1.1, term defines:
Enable TL={ T1,T2,…TnIt is service one group of term set of corpus, Fig. 2 is the example of a TL, and n is term
Quantity, A={ a1,a2,…amIt is the atomic vocabulary for forming term TL, i.e. the vocabulary can not be subdivided again, and m corresponds to institute
Some atomic vocabulary quantity, defines the frequency of termThat is term TiThe number of the appearance of appearance is same as calculating corpus TL
In whole term frequency of occurrence summation, corresponding atomic vocabulary frequencyThe summation of all vocabulary frequency of occurrence is calculated,
Shown under calculation formula:
NumTLFor all term quantity of TL, NumAPromising atomic vocabulary frequency of occurrence summation.The Num in Fig. 2TL=
8, NumA=21;
1.2, tissue P system (P System) definition:
One degree by data cell tissue P system can take formal definitions as following eight tuple for i.e. 3 of 3:
ω=(OB1,OB2,OB3,OR1,OR2,OR3,OR',OEo)
Wherein:
OB1、OB2And OB3For each histiocytic object set, i.e. data cell aggregation;
OR1、OR2And OR3For each histiocytic evolutionary rule, respectively represent based on Agnes and k-means algorithm, base
In weighted FCM algorithm and clustering rule based on GA algorithm;
OR' represents each histiocytic transhipment rule in entire P system, regular by transhipment, can between cell and cell
To carry out the shared of object and exchange;
OEo=0 is the output area of system, represents environment;
1.3, organization object defines
In Data Clustering Algorithm, tissue P system function is that optimal cluster is searched for for the data set clustered
Therefore center can indicate the cluster centre of data with a group objects, the present invention defines the histocyte pair in P system
As T be a N*d dimension vector, it is specific as follows:
T=(t11, t12..., t1d..., ti1, ti2..., tid..., tN1, tN2..., tNd)
Wherein N, which represents data cell T, N number of cluster, this N number of cluster C1,C2,…,CNCorresponding cluster center is t1,t2,…,
tN, it is similar to data point, each of object cluster center is all a d dimension vector, then tiIt can be expressed as ti1,ti2,…
tid, i=1,2 ..., N.tidRepresent d-th of component at i-th of aggregate of data center;
OBiThe object set in P system in i-th of evolution film is represented, interior includes a group objects, these objects pass through
Evolutionary mechanism in different tissues cell carries out evolution reaction, and defining the initial object quantity in each evolution film is m, group
At its object set Q.In the evolutionary process of P system is implemented, system needs the superiority and inferiority of a tolerance mechanism evaluation existing object,
The clustering problem performance function J that the present invention passes through calculating sample integral tufts variancem, the high-quality judgement of object is carried out, wherein sjGeneration
Some data set in table aggregate of data, JmIt is worth smaller, illustrates that object is better, is sorted by the judgement of object, in each evolution film
There are its optimal object, i.e. local optimum object OBibest, and an optimal object is preserved in the environment of system, i.e.,
Global optimum's object, is denoted as Tbest, when whole system reaches shutdown status, global optimum's object in environment is
Required solution and optimal cluster centre;
Second step characteristic value calculates
Corresponding feature is extracted from WSDL, example " service name " as shown in figure 1, " port name ", " action name ", " defeated
Enter information " and " output information " five features, the service corpus of construction, for the calculating of characteristic value, process is as follows:
2.1 unique characteristics values calculate
A term T is found in service corpusi, its information content I (P can be calculated by method of information theoryi), In
On the basis of this, by term TiCharacteristic value Spe (Ti) assignment is as follows:
Spe(Ti)=I (Pi)
By calculating joint probability distribution P { pi,qjCalculate term characteristic value, wherein pi∈ P and qj∈ Q, piBe from
One word of selection in terminology TL, and qjIt is one word of acquirement from atomic vocabulary A, wherein { p1,p2,…pnAnd { q1,
q2,…,qmIndicated respectively by stochastic variable P, Q, piAnd qjMutual information calculating calculated by following formula:
The list of feature values of term pi is shown as I (pi, Q), the relationship of pi term and lexicon Q is indicated, in conjunction with art in corpus
The formula that the frequency of language and vocabulary calculates pi characteristic value is as follows:
Spe(Ti)=I (Pi, Q)
According to Bayes' theorem,
The self information characteristic value SelfSpe (Ti) of final SOAP service calculates as follows
Term generally comprises 1 to 2 vocabulary in the WSDL document of analytic routines, thereforeVocabulary in representative term
It is set approximately to 1 calculating, θ represents weighted value, sets based on method of information theory, and value range is 0 to 1;
2.2 contextual information characteristic values calculate
According to method of information theory, the contextual information of service is characterized in the entropy of the distribution of the term Word probability based on modification, is
This, is calculated by the following formula its entropy:
Wherein NT representative term TiModification quantity, (modm,Ti) represent modmModify term TiProbability, entropy is by institute
(the mod havingm,Ti) average information is calculated, in a specific field, the qualifier distribution of term is more close, therefore
Term entropy in a specific neck is lower, can calculate term T by entropyiContextual information characteristic value
ContextSpe(Ti) it is as follows:
Wherein 1≤j≤K, K be all identical definition qualifier quantity and,Represent each qualifier;
2.3 composite character values calculate
The unique characteristics value and contextual information feature calculated by formula (7) and (9), can cover the spy of descriptor
The information that sign and word itself cannot describe, the characteristic value for acquiring mixing eventually by formula (10) are as follows:
Mixed coefficint α value is 0.65 according to experimental setup between zero and one, by normalized, itself spy of service
Value indicative, contextual feature value and composite character value value are between zero and one.
Third step field weight computations:
3.1 field weighted values calculate
The weight based on domain features value is needed in the process of Bigraph structural generation, the size of weight passes through peer
Term embodies, and the weight of the bigger term at the same level of definition structure similarity is bigger, and circular is as follows.
Wherein,For a new terminology TnTerm set at the same level, HybridSpe (Ts) and HybridSpe (Tn) respectively
The characteristic value for representing each term and new term at the same level, if newly added term is without term at the same level, directly definition is weighed
Weight values are 0.5, GiFor current Bigraph structure;
3.2 term weighted values calculate
The similarity of term is calculated by comparing the word similarity of two terms, is calculated as follows:
Wherein,WithIt respectively represents in term TiAnd TnIn composition word quantity,Represent this two
Same word quantity in a term, defines that the related sub-structures term similar that a new terminology includes is more, then weight is got over
It is high.Can be in the hope of term weighted value according to the similarity of term, calculation formula is as follows:
Wherein NP is the total collection of the higher level of term, peer and junior's term, TiRepresent one in these term items;
The Bigraph hierarchical structure of 4th step generation term:
The present invention proposes a kind of term Bigraph schichtenaufbau algorithm, constructs the Bigraph hierarchical structure of different terms,
Similar to the location drawing of Bigraph, wherein one term object of each node on behalf of Bigraph, the value representative of node are somebody's turn to do
The characteristic value of term object, the Bigraph hierarchical structure are constructed from top to bottom, and steps are as follows:
4.1, the composite character value for calculating the term that WSDL document neutralization is extracted from Google according to formula 10 is put into number
It in group A, and is arranged according to ascending order, selects the term object of front 3 to constitute as three nodes of Bigraph initial
Bigraph structure T;
4.2, for term T remaining in array An, it is added in existing Bigraph hierarchical structure, if TxMeet
(HybridSpe(Tn)-0.3<HybridSpe(Tx)<HybridSpe(Tn)+0.3, then by TxLabeled as destination node, TxFor
The term of some Bigraph levels, by these destination nodes, to determine TnLocating target minor structure position, so that it is determined that
Candidate Bigraph structure;
4.3, by comprehensively considering the field weight W of new terminology Yu candidate's Bigraph structureDS(Ci) and term weight WTS
(Gi), it is calculated by the following formula to obtain final node weights, to find best Bigraph structure;
Wf(Gi)=ω WDS(Gi)+(1-ω)WTS(Gi)
Wherein, ω is coefficient, and range runs 4.2-4.3 until all terms are added to 0 to 1, by iteration
In Bigraph level;
5th step constructs similarity matrix:
Similarity is calculated using following formula:
Wherein, the maximum number of plies for the Bigraph hierarchical structure that D representative term is constituted, dis (T1,T2) represent two terms
T1,T2The shortest distance in the Bigraph hierarchical structure, i.e. similarity of the SOAP service in some feature calculate SOAP
The similarity for servicing each feature, the similarity by the sum of feature similarity as service, by the similarity relationship between service
It is built into similarity matrix;
6th step service cluster
The point that the selection of cluster centre point needs to concentrate data calculates the value of integral tufts variance, but deposits in data set
In many non-alternative points, there are the isolated points of data noise and edge, and these points not only will affect the selection at cluster center, and
And it can additionally increase calculating cost, while needing the quantity of artificial predesignated aggregate of data, it is contemplated that disadvantage mentioned above proposes
A kind of K-means algorithm based on density improves, and the main density number by calculating each point extracts highly dense degree
Data point is as cluster center, by improved K-means algorithm, carries out pretreatment cluster, S to initial data set S to be clustered
It is made of the data point that M dimension is d, the dot density of the K-means algorithm based on density calculates as follows:
Wherein Density (Si) represent in SiR within the scope of put total number, distance calculate sim (Si,Sj) adopt as service
SiAnd SjSimilarity;
For this purpose, the cluster process based on density K-means algorithm is as follows:
6.1, to data prediction, pass through calculating different data S using based on density K-means algorithmiThe distance between,
According to radius R, data are divided into different clusters, choose density highest, i.e. Density (Si) highest K SiMake
For cluster center, finally by similarity to data clusters, process is as follows:
6.1.1 each data S is calculated according to formula 16iThe distance at each aggregate of data center in organization object Q confirms Si
In the number at each aggregate of data midpoint, data acquisition system is ranked up based on density;
6.1.2 the most S of quantity in K density highest, that is, R range point before choosingk, as new aggregate of data center Ck;
6.1.3 according to the distance between different clusters of division, each S is obtainediAnd CkSimilarity sim (Si,Ck), according to
Average similarity Avesim, if sim (Ck,Si) > Avesim, then by SiIt is divided into aggregate of data Ck, finally obtain N number of aggregate of data;
6.2 histocyte O1 evolutionary rules
O1Using Agnes as evolutionary rule, guidance completes intracellular object and evolves.According to similarity between setting cluster
Threshold value Cs merges the N number of initial cluster obtained by density k-means algorithm by Agnes algorithm, and process is as follows:
6.2.1 according to any two aggregate of data Ci,CjAverage similarity dis (the C of interior datai,Cj), construct similarity moment
Battle array D
Wherein SXFor aggregate of data CiIn data point, SYFor aggregate of data CjIn data point, U, V are respectively Ci,CjMiddle data
The quantity of point;
6.2.2 dis (C is selectedi,Cj) maximum aggregate of data Ci,Cj, according to similarity threshold Cs between cluster, if dis (Ci,Cj)
> Cs is then by aggregate of data CiAnd CjMerge;
6.2.3 step 6.2.2 is repeated until meeting similarity threshold requirement between all aggregates of data;
6.3 histocyte O2Evolutionary rule
O2Using the FCM algorithm based on sample weighting as evolutionary rule, guidance completes intracellular object and evolves, tradition
FCM algorithm objective function and cluster center calculation do not consider the otherness of sample, one is carried out to all samples and is treated as
Benevolence processing, but have the defects that the isolated point in easy dilated data set or noise data influence, to reduce some heavy
Sample is wanted to the contribution of cluster, leads to the accuracy decline of cluster;It is influenced to reduce sample variation to Clustering Effect, proposes one
FCM clustering algorithm of the kind based on sample weighting, by being reasonably weighted processing to objective function and cluster centre function,
Improve Clustering Effect;
For data set S={ s1,s2,…,sn,
6.3.1 FCM degree of membership is calculated according to the following formula:
Wherein uijThe angle value that is subordinate to that i-th of data belongs to jth cluster is represented, i.e. it is maximum to be divided into degree of membership for i-th of data
Aggregate of data j, | | si-tj| | it is data siTo cluster center tjEuclidean distance, n is data bulk.It can be found that all data
The sum of degree of membership is 1, that is, is metJ=1,2 ..., n;
6.3.2 weight and entropy information are calculated
The confusion degree of thermodynamic entropy representative information effectively analyzes data degree of membership based on entropy definition, and right
FCM objective function carries out sample weighting, defines Entropy Changes amount E firstiRepresent degree of membership uijEffectiveness, and pass through calculating
Weight wiMeasure data siIt is shown under their calculation formula to the influence degree of the secondary cluster:
6.3.3 according to Ei,wiCalculate new objective function
Weight coefficient wiMeetObjective function F (S, the t) formula for then newly defining FCM is as follows:
M is Weighted Index, is greater than the integer equal to 1, in order to seek the extreme value of objective function under Prescribed Properties, is utilized
Method of Lagrange multipliers constructs the following function of fresh target function:
Ask extreme value optimal condition as follows objective function:
Calculate new cluster centre tjAre as follows:
Update degree of membership uij, i-th of data is divided into the maximum Data Data cluster of degree of membership;
If 6.3.4 | F (S, t)i-1-F(S,t)i| greater than the threshold value of setting, step 6.3.3 is repeated, otherwise terminates algorithm,
It exports result F (S, t)iIndicate the FCM target function value that i-th iteration obtains;
6.4 histocyte O3 evolutionary rules
O3Using three kinds of the selecting of GA, intersection, variation genetic manipulations as evolutionary rule, guidance is completed each right into the cell
The evolution of elephant, evolutionary step are as follows:
6.4.1O3It is by m object in own cells and by the object merging that other two histocyte transhipments come
New object evolution pond P;
6.4.2O3Selection, intersection and mutation operation are executed to new object evolution pond P, wherein selection operation is using optimal
Conversation strategy carries out, and intersects and mutation operation is made a variation using the intersection and single-point of integer form, the specific method is as follows:
6.4.2.1 the assessed value p of each object k is calculatedk, N is the quantity of aggregate of data, tiFor the center of i-th of aggregate of data,
pmSmaller to illustrate that classification method is more suitable, the object is easier to be genetic to the next generation.
6.4.2.2 each object k fitness function fitness is definedk
fitnessk=α (1- α)index-1
Wherein α be the parameter set value range as 0 to 1, index be the number of iterations;
6.4.2.3 selection operation, according to object fitness institute accounting
Wherein u is the sum of object in object pool, and for each object, a random number p is randomly generated in circulation, if p <
CifkThe object is then genetic to the next generation;
6.4.2.4 the crossover location in crossover operation is determined by crossover probability Pc, selects two from evolution pond at random
Object carries out crossover operation, each component of traverse object, if following bad generation random number p p < Pc, exchanges two in the position
Object in the position after component, terminate traversal;
6.4.2.5 defining mutation probability Pm, for each object, random chance p is set, if Probability p is less than variation
Probability PmIf z is according to mutation probability PmThe change point (i.e. some component) of identified object, the then value after making a variation are
zθ, the object after variation is expressed as:
Wherein [0,1] δ ∈ is the random number generated at random ,+,-number foundation a probability occurs;
6.4.3 step 6.4.1-6.4.2 is repeated, to keep the object scale in evolution pond to stablize, O3To pair after evolution
As being screened, carried out according to the fitness of object it is superseded, retain the highest m object of fitness reconstitute object evolve
Pond P';
7th step data cell is according to operating rule with new global optimum's object
Between histiocytic cell membrane in system exist transhipment channel, different objects different histocytes it
Between shared and exchanged, the transhipment rule support that the system of requiring defines defines transhipment in the tissue P system of design
Rule instructs to exchange information between histocyte, and specific rules are as follows:
(x, T1, T2... Tm,/T '1, T '2... T 'm, y), x ≠ y, x, y=1,2,3.
This transhipment rule represent histocyte x and histocyte y can two-way carry out object transhipment, wherein T1,T2,…
TmFor the m object of histocyte x, similarly T1’,T2’,…Tm' be histocyte y m object;It can by the transhipment rule
To reach following effect:
7.1) m object T in histocyte x1,T2,…TmIt is transported in histocyte y,
7.2) m object T in histocyte y1’,T2’,…Tm' be transported in histocyte x,
(xTxbest/Tbest, OEo), x ≠ y, x, y=1,2,3.
This transhipment rule represents histocyte x and system environments is transported through, wherein TxbestIt is thin for current computation organization
Local optimum object in born of the same parents x, TbestFor global optimum's object in current environment, rule, histocyte are transported by this
Optimal object in x is transported in environment, and at the same time updating global optimum's object of the environment;
8th step is shut down and output
Each histocyte in system is run as individual execution unit with parallel structure evolution, therefore the system
It is parallel distributed, within the system, defines a series of calculatings step and be one and calculate, it can be from including primary data
The histocyte of cell object collection starts, and in each is calculated, can mean that one or more evolutionary rule is applied
In on current data cell object collection, when reaching the shutdown constraint condition of system, system autostop, calculated result is in
Now in the external environment of system.
In order to reduce the complexity of system, using the halt condition simply calculated based on maximum execution, specifically,
The shutdown when the tissue P system goes to the max calculation number of setting, and export global optimum's object set in current environment.
Claims (10)
1. a kind of SOAP service similarity calculation and clustering method based on Bigraph structure, which is characterized in that the method packet
Include following steps:
First step formal definitions;
Second step characteristic value calculates;
Third step field weight calculation;
The Bigraph hierarchical structure of 4th step generation term:
The Bigraph hierarchical structure for constructing different terms, similar to the location drawing of Bigraph, wherein each of Bigraph is saved
Point represent a term object, the value of node represents the characteristic value of the term object, the Bigraph hierarchical structure from top to bottom into
Row construction;
5th step constructs similarity matrix:
Similarity is calculated using following formula:
Wherein, the maximum number of plies for the Bigraph hierarchical structure that D representative term is constituted, dis (T1, T2) represent two term T1, T2In
It is each to calculate SOAP service for the shortest distance in the Bigraph hierarchical structure, i.e. similarity of the SOAP service in some feature
The similarity of feature, the similarity by the sum of feature similarity as service, is built into phase for the similarity relationship between service
Like degree matrix;
6th step service cluster
The point that the selection of cluster centre point needs to concentrate data calculates the value of integral tufts variance, but there is many in data set
Non- alternative point, there are the isolated points of data noise and edge, and these points not only will affect the selection at cluster center, but also can be additional
Increase and calculate cost, while needing the quantity of artificial predesignated aggregate of data, the present invention considers disadvantage mentioned above, proposes one kind
K-means algorithm based on density improves, and by calculating the density number of each point, extracts the data point conduct of highly dense degree
Cluster center carries out pretreatment cluster to initial data set S to be clustered, S is by M dimension by improved K-means algorithm
The data point of d is constituted, and the dot density of the K-means algorithm based on density calculates as follows:
Wherein Density (Si) represent in SiR within the scope of put total number, distance calculate sim (Si, Sj) adopt to service SiAnd Sj
Similarity;
7th step data cell is according to operating Policy Updates global optimum object
There is transhipment channel between histiocytic cell membrane in system, different objects carries out between different histocytes
It shares and exchanges, the transhipment rule support that the system of requiring defines defines transhipment rule in the tissue P system of design to refer to
It leads and exchanges information between histocyte, rule is as follows:
(x, T1, T2... Tm,/T '1, T '2... T 'm, y), x ≠ y, x, y=1,2,3.
This transhipment rule represent histocyte x and histocyte y can two-way carry out object transhipment, wherein T1, T2... TmFor
The m object of histocyte x, similarly T1', T2' ... Tm' be histocyte y m object;It can be reached by the transhipment rule
To following effect:
7.1) m object T in histocyte x1, T2... TmIt is transported in histocyte y,
7.2) m object T in histocyte y1', T2' ... Tm' be transported in histocyte x;
(x, Txbest/Tbest, OEo), x ≠ y, x, y=1,2,3.
This transhipment rule represents histocyte x and system environments is transported through, wherein TxbestFor in current computation organization's cell x
Local optimum object, TbestIt is regular by this transhipment for global optimum's object in current environment, in histocyte x
Optimal object is transported in environment, and at the same time updating global optimum's object of the environment;
8th step is shut down and output
Each histocyte in system is run as individual execution unit with parallel structure evolution, therefore the system is parallel
It is distributed, within the system, defines a series of calculatings step and be one and calculate, it can be from including primary data cell object
The histocyte of collection starts, and in each is calculated, it is current to can mean that one or more evolutionary rule is applied to
On data cell object collection, when reaching the shutdown constraint condition of system, system autostop, calculated result is presented in system
In external environment.
2. as described in claim 1 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in the first step, the process of formal definitions is as follows:
1.1, term defines:
Enable TL={ T1, T2... TnIt is the one group of term set serviced in corpus, n is the quantity of term, A={ a1, a2,
...amIt is the atomic vocabulary for forming term TL, i.e. the vocabulary can not be subdivided again, and m corresponds to all atomic vocabulary numbers
Amount, defines the frequency of termThat is term TiThe number of the appearance of appearance is same as calculating whole terms in corpus TL and occurs
The summation of number, corresponding atomic vocabulary frequencyCalculate the summation of all vocabulary frequency of occurrence, under calculation formula shown in:
NumTLFor all term quantity of TL, NumAPromising atomic vocabulary frequency of occurrence summation;
1.2, tissue P system definition:
One degree by data cell tissue P system can take formal definitions as following eight tuple for i.e. 3 of 3:
ω=(OB1, OB2, OB3, OR1, OR2, OR3, OR ', OEo)
Wherein:
OB1、OB2And OB3For each histiocytic object set, i.e. data cell aggregation;
OR1、OR2And OR3For each histiocytic evolutionary rule, respectively represent based on Agnes and k-means algorithm, based on weighting
FCM algorithm and clustering rule based on GA algorithm;
Each histiocytic transhipment rule in the entire P system of OR ' representative, can be by transhipment rule, between cell and cell
It row object shared and exchanges;
OEo=0 is the output area of system, represents environment;
1.3, organization object defines
In Data Clustering Algorithm, tissue P system function is that optimal cluster centre is searched for for the data set clustered,
Therefore, the cluster centre of data is indicated with a group objects, defining the histocyte object T in P system is a N*d dimension
Vector, as follows:
T=(t11, t12..., t1d..., ti1, ti2..., tid..., tN1, tN2..., tNd)
Wherein N, which represents data cell T, N number of cluster, this N number of cluster C1, C2..., CNCorresponding cluster center is t1, t2..., tN,
Similar to data point, each of object cluster center is all a d dimension vector, then tiIt can be expressed as ti1, ti2,
...tid, i=1,2 ..., N, tidRepresent d-th of component at i-th of aggregate of data center;
OBiThe object set in P system in i-th of evolution film is represented, interior includes a group objects, these objects pass through different groups
The evolutionary mechanism knitted in cell carries out evolution reaction, and the initial object quantity defined in each evolution film is m, forms its object
Collect Q, in the evolutionary process of P system is implemented, system needs the superiority and inferiority of a tolerance mechanism evaluation existing object, by calculating sample
The clustering problem performance function J of this integral tufts variancem, the high-quality judgement of object is carried out, wherein sjRepresent certain number in aggregate of data
According to collection, JmIt is worth smaller, illustrates that object is better, is sorted by the judgement of object, have its optimal pair in each evolution film
As i.e. local optimum object OBibest, and an optimal object is preserved in the environment of system, i.e. global optimum's object, be denoted as
Tbest;When whole system reaches shutdown status, global optimum's object in environment is required solution and optimal
Cluster centre;
3. as claimed in claim 2 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in the second step, characteristic value calculating process is as follows:
2.1 unique characteristics values calculate
A term T is found in service corpusi, its information content I (P is calculated by method of information theoryi), on this basis,
It can be by term TiCharacteristic value Spe (Ti) assignment is as follows
Spe(Ti)=I (Pi) (3)
By calculating joint probability distribution P { pi, qjCalculate term characteristic value, wherein pi∈ P and qj∈ Q, piIt is from term
Collect and selects a word in TL, and qjIt is one word of acquirement from atomic vocabulary A, wherein { p1, p2... pnAnd { q1, q2...,
qmIndicated respectively by stochastic variable P, Q, piAnd qjMutual information calculating calculated by following formula:
The list of feature values of term pi is shown as I (pi, Q), the relationship of pi term and lexicon Q is indicated, in conjunction with term in corpus and word
The formula that the frequency of remittance calculates pi characteristic value is as follows:
Spe(Ti)≈I(pi, Q) and (5)
According to Bayes' theorem,
The self information characteristic value SelfSpe (Ti) of final SOAP service calculates as follows
Term generally comprises 1 to 2 vocabulary in the WSDL document of analytic routines, thereforeVocabulary in representative term is approximate
It is set as 1 calculating, θ represents weighted value, sets based on method of information theory, and value range is 0 to 1;
2.2 contextual information characteristic values calculate
According to method of information theory, the contextual information of service is characterized in the entropy of the distribution of the term Word probability based on modification, for this purpose, logical
It crosses following formula and calculates its entropy;
Wherein NT representative term TiModification quantity, (modm, Ti) represent modmModify term TiProbability, entropy is by all
(modm, Ti) average information is calculated, in a specific field, the qualifier distribution of term is more close, therefore at one
Term entropy in specific neck is lower, calculates term T by entropyiContextual information characteristic value ContextSpe (Ti) such as
Under:
Wherein 1≤j≤K, K be all identical definition qualifier quantity and,Represent each qualifier;
2.3 composite character values calculate
The unique characteristics value and contextual information feature calculated by formula (7) and (9), covers the feature and word of descriptor
The information that itself cannot be described, the characteristic value for acquiring mixing eventually by formula (10) are as follows:
Mixed coefficint α value is 0.65 according to experimental setup between zero and one, by normalized, the unique characteristics value of service,
Contextual feature value and composite character value value are between zero and one.
4. as claimed in claim 3 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in the third step, field weight computations are as follows:
3.1 field weighted values calculate
The size of weight is embodied by term at the same level, and the weight of the bigger term at the same level of definition structure similarity is bigger, is calculated
Method is as follows:
Wherein,For a new terminology TnTerm set at the same level, HybridSpe (Ts) and HybridSpe (Tn) respectively represent
The characteristic value of each peer term and new term directly defines weighted value if newly added term is without term at the same level
It is 0.5, GiFor current Bigraph structure, bigraph (Bigraph) be binary group a B=<BP, BL>, be to be obtained by figure spirit prize
Person Milner proposes that BP, BL are the location drawing (place graph) and connection figure (link graph) respectively, and BP is a triple,
BP=< V, E, P > is made of the node collection V of figure, the set E and interface P on side, and nested node closes in the location drawing for father and son
System, indicates embedding between node with branch's relationship, BL is equally also by the node collection V of figure, the set E on side and interface P group with BP
At a triple, BL is used to indicate the connection relationship between node;
3.2 term weighted values calculate
The similarity of term is calculated by comparing the word similarity of two terms, is calculated as follows:
Wherein,WithIt respectively represents in term TiAnd TnIn composition word quantity,Represent the two terms
In same word quantity, define that the related sub-structures term similar that a new terminology includes is more, then weight is higher, according to art
The similarity of language acquires term weighted value, and calculation formula is as follows:
Wherein NP is the total collection of the higher level of term, peer and junior's term, TiRepresent one in these term items.
5. as claimed in claim 4 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in the 4th step, the step of generating the Bigraph hierarchical structure of term is as follows:
4.1, WSDL document neutralization is calculated to be put into array A from the composite character value of the term extracted in Google, and according to
Ascending order arrangement selects the term object of front 3 to constitute initial Bigraph structure T as three nodes of Bigraph;
4.2, for term T remaining in array An, it is added in existing Bigraph hierarchical structure, if TxMeet (HybridSpe
(Tn) -0.3 < HybridSpe (Tx) < HybridSpe (Tn)+0.3, then by TxLabeled as destination node, TxIt is existing
The term of Bigraph level, by these destination nodes, to determine TnLocating target minor structure position, so that it is determined that candidate
Node substructure;
4.3, by comprehensively considering the field weight W of new terminology and candidate minor structureDS(Gi) and term weight WTS(Gi), pass through public affairs
Final node weights are calculated in formula (14), to find optimal minor structure;
Wf(Gi)=ω WDS(Gi)+(1-ω)WTS(Gi) (14)
Wherein, ω is coefficient, and range runs 4.2-4.3 until all terms are added to Bigraph 0 to 1, by iteration
In level.
6. the SOAP service similarity calculation and clustering method based on Bigraph structure as described in one of Claims 1 to 5,
It is characterized in that, the cluster process based on density K-means algorithm is as follows in the 6th step:
6.1, to data prediction, pass through calculating different data S using based on density K-means algorithmiThe distance between, according to
Data are divided into different clusters by radius R, choose density highest, i.e. Density (Si) highest K SiAs in cluster
The heart, finally by similarity to data clusters;
6.2 histocyte O1 evolutionary rules
O1Using Agnes as evolutionary rule, guidance completes intracellular object and evolves, according to similarity threshold Cs between setting cluster,
The N number of initial cluster obtained by density k-means algorithm is merged by Agnes algorithm;
6.3 histocyte O2Evolutionary rule
O2Using the FCM algorithm based on sample weighting as evolutionary rule, guidance completes intracellular object and evolves, and traditional FCM is calculated
The objective function and cluster center calculation of method do not consider the otherness of sample, carry out processing of making no exception to all samples,
But have the defects that the isolated point in easy dilated data set or noise data influence, to reduce some significant samples pair
The contribution of cluster leads to the accuracy decline of cluster;It influences, proposes a kind of based on sample to reduce sample variation to Clustering Effect
The FCM clustering algorithm of weighting improves cluster effect by being reasonably weighted processing to objective function and cluster centre function
Fruit;
6.4 histocyte O3 evolutionary rules
O3Using three kinds of the selecting of GA, intersection, variation genetic manipulations as evolutionary rule, guidance complete each object into the cell into
Change.
7. as claimed in claim 6 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In described 6.1 process is as follows:
6.1.1 each data S is calculated according to formula 16iIn organization object Q at a distance from each aggregate of data center, S is confirmediIn
The number at each aggregate of data midpoint is ranked up data acquisition system based on density;
6.1.2 the most S of quantity in K density highest, that is, R range point before choosingk, as new aggregate of data center Ck;
6.1.3 according to the distance between different clusters of division, each S is obtainediAnd CkSimilarity sim (Si, Ck), according to average
Similarity Avesim, if sim (Ck, Si) > Avesim, then by SiIt is divided into aggregate of data Ck, finally obtain N number of aggregate of data;
8. as claimed in claim 7 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in described 6.2, the process of histocyte O1 evolutionary rule is as follows:
6.2.1 according to any two aggregate of data Ci, CjAverage similarity dis (the C of interior datai, Cj), construct similarity matrix D
Wherein SXFor aggregate of data CiIn data point, SYFor aggregate of data CjIn data point, U, V are respectively Ci, CjMiddle data point
Quantity;
6.2.2 dis (C is selectedi, Cj) maximum aggregate of data Ci, Cj, according to similarity threshold Cs between cluster, if dis (Ci, Cj) > Cs
Then by aggregate of data CiAnd CjMerge;
6.2.3 step 6.2.2 is repeated until meeting similarity threshold requirement between all aggregates of data.
9. as claimed in claim 7 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature exists
In in described 6.3, for data set S={ s1, s2..., sn, process is as follows:
6.3.1 FCM degree of membership is calculated according to the following formula:
Wherein uijThe angle value that is subordinate to that i-th of data belongs to jth cluster is represented, i.e. i-th of data are divided into the maximum data of degree of membership
Cluster j, | | si-tj| | it is data siTo cluster center tjEuclidean distance, n is data bulk, it is found that all data memberships
The sum of degree is 1, that is, is met
6.3.2 weight and entropy information are calculated
The confusion degree of thermodynamic entropy representative information, the present invention is based on entropy definition effectively to be analyzed data degree of membership, and
Sample weighting is carried out to FCM objective function, defines Entropy Changes amount E firstiRepresent degree of membership uijEffectiveness, and pass through calculating
Weight wiMeasure data siIt is shown under their calculation formula to the influence degree of the secondary cluster:
6.3.3 according to Ei, wiCalculate new objective function
Weight coefficient wiMeetThen newly define the objective function F (S, t) such as formula (22) of FCM:
M is Weighted Index, is greater than the integer equal to 1, bright using glug in order to seek the extreme value of objective function under Prescribed Properties
The following function of day multiplier method construction fresh target function:
Ask extreme value optimal condition as follows objective function:
Calculate new cluster centre tjAre as follows:
Update degree of membership uij, i-th of data is divided into the maximum Data Data cluster of degree of membership;
If 6.3.4 | F (S, t)i-1- F (S, t)i| greater than the threshold value of setting, step 6.3.3 is repeated, otherwise terminates algorithm, output knot
Fruit F (S, t)iIndicate the FCM target function value that i-th iteration obtains.
10. as claimed in claim 9 based on the SOAP service similarity calculation and clustering method of Bigraph structure, feature
It is, in described 6.4, evolutionary step is as follows:
6.4.1 O3It is new by m object in own cells and by the object merging that other two histocyte transhipments come
Object evolution pond P;
6.4.2 O3Selection, intersection and mutation operation are executed to new object evolution pond P, wherein selection operation uses optimal save strategy
Strategy carries out, and intersects and mutation operation is made a variation using the intersection and single-point of integer form, the specific method is as follows:
6.4.2.1 the assessed value p of each object k is calculatedk, N is the quantity of aggregate of data, tiFor the center of i-th of aggregate of data, PmMore
Small to illustrate that classification method is more suitable, the object is easier to be genetic to the next generation;
6.4.2.2 each object k fitness function fitness is definedk
fitnessk=α (1- α)index-1 (30)
Wherein α be the parameter set value range as 0 to 1, index be the number of iterations;
6.4.2.3 selection operation, according to object fitness institute accounting
Wherein u is the sum of object in object pool, and for each object, a random number p is randomly generated in circulation, if p < CifkThen
The object is genetic to the next generation;
6.4.2.4 the crossover location in crossover operation is determined by crossover probability Pc, selects two objects from evolution pond at random
Crossover operation, each component of traverse object are carried out, if following bad generation random number p p < Pc, exchanges two objects in the position
Component after in the position terminates traversal
6.4.2.5 defining mutation probability Pm, for each object, random chance p is set, if Probability p is less than mutation probability
PmIf z is according to mutation probability PmThe change point (i.e. some component) of identified object, the then value after making a variation are zθ, variation
Object afterwards is expressed as:
Wherein [0,1] δ ∈ is the random number generated at random ,+,-number foundation a probability occurs;
6.4.3 step 6.4.1-6.4.2 is repeated, to keep the object scale in evolution pond to stablize, O3To the object after evolution into
Row screening, carried out according to the fitness of object it is superseded, retain the highest m object of fitness reconstitute object evolution pond P '.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692227.XA CN110533072B (en) | 2019-07-30 | 2019-07-30 | SOAP service similarity calculation and clustering method based on Bigraph structure in Web environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910692227.XA CN110533072B (en) | 2019-07-30 | 2019-07-30 | SOAP service similarity calculation and clustering method based on Bigraph structure in Web environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110533072A true CN110533072A (en) | 2019-12-03 |
CN110533072B CN110533072B (en) | 2022-09-23 |
Family
ID=68660492
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910692227.XA Active CN110533072B (en) | 2019-07-30 | 2019-07-30 | SOAP service similarity calculation and clustering method based on Bigraph structure in Web environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110533072B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113628225A (en) * | 2021-08-24 | 2021-11-09 | 合肥工业大学 | Fuzzy C-means image segmentation method and system based on structural similarity and image region block |
CN114362973A (en) * | 2020-09-27 | 2022-04-15 | 中国科学院软件研究所 | K-means and FCM clustering combined flow detection method and electronic device |
WO2022156328A1 (en) * | 2021-01-19 | 2022-07-28 | 青岛科技大学 | Restful-type web service clustering method fusing service cooperation relationships |
CN115148330A (en) * | 2022-05-24 | 2022-10-04 | 中国医学科学院北京协和医院 | POP treatment scheme forming method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120023141A1 (en) * | 2007-08-06 | 2012-01-26 | Atasa Ltd. | System and method for representing, organizing, storing and retrieving information |
CN107135092A (en) * | 2017-03-15 | 2017-09-05 | 浙江工业大学 | A kind of Web service clustering method towards global social interaction server net |
CN109005049A (en) * | 2018-05-25 | 2018-12-14 | 浙江工业大学 | Service combining method based on Bigraph consistency algorithm under a kind of internet environment |
-
2019
- 2019-07-30 CN CN201910692227.XA patent/CN110533072B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120023141A1 (en) * | 2007-08-06 | 2012-01-26 | Atasa Ltd. | System and method for representing, organizing, storing and retrieving information |
CN107135092A (en) * | 2017-03-15 | 2017-09-05 | 浙江工业大学 | A kind of Web service clustering method towards global social interaction server net |
CN109005049A (en) * | 2018-05-25 | 2018-12-14 | 浙江工业大学 | Service combining method based on Bigraph consistency algorithm under a kind of internet environment |
Non-Patent Citations (2)
Title |
---|
DOMINIK WACHHOLDER等: ""Bigraph-Ensured Interoperability for System(-of-Systems) Emergence"", 《OTM 2014 WORKSHOPS,LNCS 8842》 * |
吴海华等: ""基于新型聚类算法Increase K-Means的Blog相似度分析"", 《厦门大学学报(自然科学版)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114362973A (en) * | 2020-09-27 | 2022-04-15 | 中国科学院软件研究所 | K-means and FCM clustering combined flow detection method and electronic device |
CN114362973B (en) * | 2020-09-27 | 2023-02-28 | 中国科学院软件研究所 | K-means and FCM clustering combined flow detection method and electronic device |
WO2022156328A1 (en) * | 2021-01-19 | 2022-07-28 | 青岛科技大学 | Restful-type web service clustering method fusing service cooperation relationships |
CN113628225A (en) * | 2021-08-24 | 2021-11-09 | 合肥工业大学 | Fuzzy C-means image segmentation method and system based on structural similarity and image region block |
CN113628225B (en) * | 2021-08-24 | 2024-02-20 | 合肥工业大学 | Fuzzy C-means image segmentation method and system based on structural similarity and image region block |
CN115148330A (en) * | 2022-05-24 | 2022-10-04 | 中国医学科学院北京协和医院 | POP treatment scheme forming method and system |
CN115148330B (en) * | 2022-05-24 | 2023-07-25 | 中国医学科学院北京协和医院 | POP treatment scheme forming method and system |
Also Published As
Publication number | Publication date |
---|---|
CN110533072B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106779087B (en) | A kind of general-purpose machinery learning data analysis platform | |
CN110533072A (en) | Based on the SOAP service similarity calculation and clustering method of Bigraph structure under Web environment | |
CN103559504B (en) | Image target category identification method and device | |
Baradwaj et al. | Mining educational data to analyze students' performance | |
Huang et al. | A graph neural network-based node classification model on class-imbalanced graph data | |
CN110826639B (en) | Zero sample image classification method trained by full data | |
CN110298434A (en) | A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED | |
CN110532429B (en) | Online user group classification method and device based on clustering and association rules | |
Astudillo et al. | On achieving semi-supervised pattern recognition by utilizing tree-based SOMs | |
CN111723285A (en) | Depth spectrum convolution collaborative filtering recommendation method based on scores | |
Wang et al. | I am going MAD: Maximum discrepancy competition for comparing classifiers adaptively | |
CN112183652A (en) | Edge end bias detection method under federated machine learning environment | |
CN103136309B (en) | Social intensity is modeled by kernel-based learning algorithms | |
Rizzo et al. | Approximate classification with web ontologies through evidential terminological trees and forests | |
González-Almagro et al. | Semi-supervised constrained clustering: An in-depth overview, ranked taxonomy and future research directions | |
Ye et al. | Rebalanced zero-shot learning | |
CN110659363A (en) | Web service mixed evolution clustering method based on membrane computing | |
CN110110628A (en) | A kind of detection method and detection device of frequency synthesizer deterioration | |
CN108388769A (en) | The protein function module recognition method of label propagation algorithm based on side driving | |
CN114896514B (en) | Web API label recommendation method based on graph neural network | |
Govindarajan | Text mining technique for data mining application | |
CN115840853A (en) | Course recommendation system based on knowledge graph and attention network | |
Mendoza et al. | Predicting affinity ties in a surname network | |
Shahzad | Classification and Associative Classification Rule Discovery Using Ant Colony Optimization | |
Dattachaudhuri et al. | Transparent neural based expert system for credit risk (TNESCR): an automated credit risk evaluation system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |