CN103077202B - A kind of copying information cluster method of sing on web service - Google Patents

A kind of copying information cluster method of sing on web service Download PDF

Info

Publication number
CN103077202B
CN103077202B CN201210580107.9A CN201210580107A CN103077202B CN 103077202 B CN103077202 B CN 103077202B CN 201210580107 A CN201210580107 A CN 201210580107A CN 103077202 B CN103077202 B CN 103077202B
Authority
CN
China
Prior art keywords
service
simulation
node
similarity
sws
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210580107.9A
Other languages
Chinese (zh)
Other versions
CN103077202A (en
Inventor
毕敬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simulation Center
Original Assignee
Beijing Simulation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simulation Center filed Critical Beijing Simulation Center
Priority to CN201210580107.9A priority Critical patent/CN103077202B/en
Publication of CN103077202A publication Critical patent/CN103077202A/en
Application granted granted Critical
Publication of CN103077202B publication Critical patent/CN103077202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to the copying information cluster method of a kind of sing on web service, service of simulation available in a large number in service of simulation registration center is carried out cluster analysis according to its function information by the method, proposes the measure of the service of simulation functional similarity degree of stratification and the service description information preprocess method according to coding style and naming rule respectively.On this basis, build similarity graph by set-up function threshold parameter, obtain service of simulation functional similarity network.Last integrated structure network clustering algorithm carries out cluster to the service of simulation in service of simulation registration center.Experimental result shows, the clustering method of proposition can improve the Clustering Effect of Web service effectively, improves the discovery performance of service of simulation, realizes finding required service rapidly and accurately from service of simulation available in a large number.

Description

A kind of copying information cluster method of sing on web service
Technical field
The present invention relates to a kind of copying information cluster method, particularly relate to the copying information cluster method of a kind of sing on web service.
Background technology
Along with analogue system constantly increases, the textural difference between each analogue system being distributed in different department is increasing, often needs to interconnect between these analogue systems left over.In addition, analogue system newly developed also needs to communicate with Legacy System, has reached the object effectively utilizing existing system.But, to leave between analogue system and and analogue system newly developed between isomerism bring a difficult problem to system communication each other.Utilize the characteristics such as the loose couplings of Web service technology, dynamic expansion and technology neutrality, serviceization encapsulation can be carried out to analogue system, be changed into service of simulation, shield distribution and the isomerism of analogue system, thus realize the transparent use to analogue system.
Meanwhile, along with available service of simulation increases fast, in service of simulation registration center, from service of simulation available in a large number, required service how is found just to become very necessary and difficult rapidly and accurately.Because service of simulation is developed by different organizational structures, make the descriptor of service of simulation lack unified standard, with very strong randomness, and be somewhat dependent upon the hobby of developer.Therefore the service of simulation registered in service of simulation registration center is carried out according to its function information the discovery performance that cluster analysis can improve service of simulation.But existing service of simulation clustering method comes with some shortcomings.First, existing method only considers the information of Partial Elements in Web Services Description Language (WSDL) (WSDL) file when measuring service of simulation functional similarity and spending; Secondly, in service of simulation information pre-processing, current method does not consider coding style and naming rule.
Summary of the invention
For above the deficiencies in the prior art, the invention provides the copying information cluster method of a kind of sing on web service, to solve the problem finding required service in service of simulation registration center from service of simulation available in a large number rapidly and accurately.
Object of the present invention is achieved through the following technical solutions:
A copying information cluster method for sing on web service, the method comprises the steps:
1) for any two service of simulation, analyze the information of corresponding element in two wsdl documents be associated with these two service of simulation, calculate the similarity of these two wsdl document corresponding elements, and using the integration value of these similarities that the obtains functional similarity degree as any two service of simulation;
2) pre-service is carried out to the information of the corresponding element extracted in above wsdl document, calculate the similarity of each level mentioned in the first step further;
3) the functional similarity degree between any two service of simulation obtained according to step 2, set-up function similarity threshold parameter σ builds similarity graph G σ, obtain service of simulation functional similarity network; ,
4) by how sharing neighbor node in copying similar network between node, node is divided, the service of simulation in service of simulation registration center is carried out cluster.
Further, the similarity of described corresponding different levels comprises element similarity, segment-similarity, WSDL similarity and service of simulation similarity.
Further, the concrete steps that described information carries out preprocess method are as follows,
301) distinctive domain feature vocabulary is extracted;
302) word that any style, form differ is carried out standardization, be converted into hump formula;
303) standard words of each hump formula processed and be converted into many different sub-words, being designated as subnames;
304) adopt stemming method to subnames process, to be reduced to corresponding root-form;
305) establish an inactive vocabulary to remove the interference of some common words, all words appeared in inactive vocabulary will be filtered.
Further, described set-up function similarity threshold parameter σ builds similarity graph G σconcrete grammar is, for any two service of simulation SWS a, SWS bif, Sim (SWS a, SWS b, then there is a limit and connect SWS in)>=σ aand SWS b, otherwise not fillet therebetween.
It is further, described that to carry out the step of cluster to the service of simulation in service of simulation registration center as follows:
501) first traversing graph G σin all nodes thus find figure G σin all meet structure that given parameters requires is connected bunch.During beginning, all nodes are marked as unfiled node, member's node during each node is divided into bunch by this clustering algorithm or non-bunch of member's node, for each still non-classified node, this clustering algorithm checks whether this node is obs network node, if, so can produce one bunch from this node, otherwise this node is labeled as non-member node;
502) in order to find one new bunch, from any one obs network node v, find all nodes that can reach from v structure, the all neighbor nodes adjacent with node v are inserted in a queue, for each node in queue, the all still non-classified Knots inserting that directly can be reached by this node, in queue, repeats this process until queue is for empty;
503) for non-bunch of member's node, be divided into line concentration point and outlier further, if an isolated node is connected with two or more bunch, so this node is divided into line concentration point, otherwise this node is outlier.
The invention has the advantages that:
1. the method can identify and publish picture in bunch, line concentration point and outlier, can ensure that the service of simulation in each bunch identified is enough similar each other.In addition, all there is the service of simulation of one or more core in each bunch, these service of simulation can return to service of simulation consumer as optimum matching service.
2. the service of simulation of service register center can be realized to carry out cluster, reach the object finding the service of simulation meeting user's specific function from service available in a large number rapidly and accurately.
Accompanying drawing explanation
The copying information cluster method flow schematic diagram of a kind of sing on web service of Fig. 1;
The algorithmic procedure figure of Fig. 2 constructive simulation service function similar network;
The cluster result figure of the partial simulation service that Fig. 3 adopts the present invention to obtain.
Embodiment
For service of simulation, embodiment illustrates that the method that the present invention is concrete, the present invention are also applicable to other relevant application herein.Be illustrated in figure 1 the copying information cluster method flow schematic diagram of a kind of sing on web service of the present invention, below the method step be described in detail.
Step 1: the information analyzing respective element in the wsdl document be associated with this service of simulation in any two service of simulation, calculate the similarity of each wsdl document respective element, and using the integration value of these similarities that the obtains functional similarity degree as any two service of simulation.
Each service of simulation all has a wsdl document associated therewith, this documents describes the information such as the function of this service of simulation, interface and parameter.When measuring the functional similarity between service of simulation and spending, different elements has different effects, therefore distinguishes the similarity of the corresponding element of calculation services, finally using the functional similarity degree of the integration value of these similarities as two service of simulation.According to the structure of wsdl document, analyze the structure of existing wsdl document, further define the functional similarity degree of service of simulation.Concrete grammar is as follows:
Calculate element similarity, Sim (e).Given WSDL ain element e aand WSDL bin element e b, element similarity Sim (e a, e b) be calculated as follows formula (1):
Sim ( e a , e b ) = 0 , ( e a . Tagname ≠ e b . Tagname ) SimText ( e a , e b ) , ( e a . Tagname = e b . Tagname ) - - - ( 1 )
Wherein, e a.Tagname element e is referred to atag name, SimText (e a, e b) refer to that, based on the text similarity of element term with corresponding file description information, text similarity adopts the information preprocessing method proposed to draw.Concrete computation process is as follows, to respectively by first `e aand e bthe corresponding informance extracted can obtain both keyword list after carrying out pre-service.Then TF-IDF (word frequency and anti-document frequency) weighting technique is applied in vector space model, and adopts cosine similarity to calculate element e aand e bbetween similarity, result is SimText (e a, e b).
Calculate segment-similarity Sim (seg).After the first step obtains element similarity, the element in WSDL is divided into several fragments, utilizes average weighted method to calculate segment-similarity.Two fragments in given two WSDL, WSDL ain seg aand WSDL bin seg b, segment-similarity Sim (seg a, seg b) computing method are as shown in the formula (2):
Sim ( seg a , seg b ) = Σ i = 1 n seg ϵ i · Sim ( e ai , e bi ) - - - ( 2 )
Wherein n segrefer to fragment seg aor seg bthe number of element.Give element similarity Sim (e herein ai, e bi) (i=1,2 ..., n seg) weight ε i, wherein e ai(e bi) represent seg a(seg b) in element.
Calculate WSDL similarity, Sim (WSDL).After the similarity obtaining different fragments in WSDL, average weighted method is utilized to calculate WSDL similarity.Circular is as shown in the formula (3):
Sim ( WSDL a , WSDL b ) = Σ i = 1 n λ i · Si m ( seg ai , seg bi ) - - - ( 3 )
Wherein, n represents WSDL aor WSDL bthe quantity of middle fragment.Give segment-similarity Sim (seg respectively ai, seg bi) (i=1,2 ..., n) weight λ i, seg ai(seg bi) represent WSDL a(WSDL b) in a fragment.
Computer sim-ulation service function similarity Sim (SWS).Obtaining WSDL similarity, service of simulation functional similarity degree Sim (SWS) can obtained, as shown in the formula (4):
Sim(SWS a,SWS b)=Sim(WSDL a,WSDL b)(4)
Sim (the SWS finally obtained a, SWS b) be two service between similarity.The similarity between any two service of simulation can be measured in this way exactly.
Step 2: propose a kind of information preprocessing method
In order to calculate each level similarity mentioned in the first step more quickly and accurately, propose a kind of information preprocessing method.The method has fully taken into account coding style and naming rule.In order to improve Clustering Effect, each word extracted from the wsdl document of correspondence is all through special process.Concrete steps are as follows:
201) in order to by the semi-structured convert information be included in WSDL be storage structurized data in a database, utilize dom4j tool parses wsdl document and the information of correspondence is stored in database, then can extract different information and calculate Sim respectively i(seg a, seg b) (i=1,2 ..., n).These information are made up of the element term of correspondence and file description information.
202) in order to reach good pretreating effect, the file description information of correspondence divided by space, and then obtain corresponding character string list, each character string is wherein identical with the disposal route of element term.For convenience of description, each character string obtained and element term unification are designated as names.In order to improve the effect of cluster, names is not considered as single character string simply, but according to following step, special process is carried out to names.
203) from names, special domain feature vocabulary (as SMS, ICD, AMI, EMBL) etc. is extracted.These domain feature vocabulary are very representative, can represent the function of this service of simulation to a great extent.
204) names of any style is converted into the hump formula style of standard, and the names of the hump formula of standard is divided into several subnames.Such as, a given word SMSaudio_System.First extract a domain feature word SMS, secondly remaining audio_System is converted into standard words AudioSystem, finally, this word is converted into two sub-word: audio and system.
205) adopt the stemming algorithm of porter proposition to subnames process, to be reduced to corresponding root-form.
206) in order to remove the interference of some common words, an inactive vocabulary is established.All words appeared in inactive vocabulary will be filtered.Corresponding Keyword List can be obtained through above process, the method for the calculating element similarity of proposition can be adopted to obtain the similarity of each level, and then obtain service of simulation functional similarity degree.
Through above step, any two service of simulation functional similarity degree can be obtained.
Step 3: constructive simulation service function similar network
After obtaining the similarity between any two service of simulation, the functional similarity degree threshold parameter σ according to setting can build similarity graph G σ, the value of parameter σ is calculated by the functional similarity degree model proposed and gets.G σbe made up of some service of simulation, each service of simulation is figure G σin a node.Any two service of simulation SWS aand SWS bif, Sim (SWS a, SWS b, then there is a limit and connect SWS in)>=σ aand SWS b, otherwise not fillet therebetween.According to said method build service of simulation functional similarity network, be illustrated in figure 2 and build service of simulation functional similarity flow through a network schematic diagram.
Step 4: cluster is carried out to service of simulation
On the basis of service of simulation functional similarity degree model set forth above and information preprocessing method, and integrated structure network clustering method, cluster is carried out to the service of simulation in service of simulation registration center.
The present invention carries out cluster according to the direct connection between the neighbor relationships between node instead of node.According to how sharing neighbor node between node decide how to divide node.Concrete steps are as follows:
401) first traversing graph G σin all nodes find figure G σin all meet structure that given parameters requires is connected bunch.During beginning, all nodes are marked as unfiled node, member's node during each node is divided into bunch by this clustering algorithm or non-bunch of member's node, for each still non-classified node, this clustering algorithm checks whether this node is obs network node, if, so can produce one bunch from this node, otherwise this node is labeled as non-member node;
402) in order to find one new bunch, from any one obs network node v, find all nodes that can reach from v structure, the all neighbor nodes adjacent with node v are inserted in a queue, for each node in queue, the all still non-classified Knots inserting that directly can be reached by this node, in queue, repeats this process until queue is for empty;
403) for non-bunch of member's node, be divided into line concentration point and outlier further, if an isolated node is connected with two or more bunch, so this node is divided into line concentration point, otherwise this node is outlier.
During the method can identify and publish picture bunch, line concentration point and outlier, can ensure that the service of simulation in each bunch identified is enough similar each other.In addition, all there is the service of simulation of one or more core in each bunch, these service of simulation can return to service of simulation consumer as optimum matching service.The method rapidly and efficiently, only travels through once each node in figure.In addition, the service of line concentration point is identified also highly significant.Because but the service of line concentration point does not belong to any bunch be but connected to different bunches.If the service of simulation of all bunches that the service of line concentration point connects all lost efficacy, then the service of line concentration point can be served the request meeting service of simulation consumer as an alternative.On the contrary, outlier service, because functionally similar to little service, therefore should be considered as the noise data in serving.This algorithm can identify and from bunch remove outlier service thus improve service of simulation Clustering Effect.
Through above step, the service of simulation of service register center just can be realized to carry out cluster, and then reach the object finding the service of simulation meeting user's specific function from service available in a large number rapidly and accurately.Fig. 3 indicates and utilizes this invention to carry out to partial simulation service existing in service register center the result figure that cluster obtains.In figure, bullet, triangle and circle represent the service of line concentration point, outlier service and functional group respectively.3 bullet WS 136, WS 212, WS 458representative connects the line concentration point service of difference in functionality bunch.Circle C i(i=1,2 ..., 18) size show the number of serving in corresponding functional group, the number of serving in the numeral functional group in circle.The solid line connecting different circle shows to there is intimate service between corresponding functional group.Similar, the connection of line concentration point and outlier service is represented by dashed line.Because outlier service is more and be isolated, in figure, only show but the service of part outlier.As known in the figure, the result verification validity of the present invention of cluster.
Should be appreciated that above is illustrative and not restrictive by preferred embodiment to the detailed description that technical scheme of the present invention is carried out.Those of ordinary skill in the art can modify to the technical scheme described in each embodiment on the basis of reading instructions of the present invention, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (3)

1. a copying information cluster method for sing on web service, it is characterized in that, the method comprises the steps:
1) for any two service of simulation, analyze the information of corresponding element in two wsdl documents be associated with these two service of simulation, calculate the similarity of these two wsdl document corresponding elements, and using the integration value of these similarities that the obtains functional similarity degree as any two service of simulation;
2) pre-service is carried out to the information of the corresponding element extracted in above wsdl document, further calculation procedure 1) in the similarity of each level;
3) according to step 2) functional similarity degree between any two service of simulation of obtaining, set-up function similarity threshold parameter σ builds similarity graph G σ, obtain service of simulation functional similarity network;
Described set-up function similarity threshold parameter σ builds similarity graph G σ, concrete grammar is, for any two service of simulation SWS a, SWS bif, Sim (SWS a, SWS b)>=σ, Sim (SWS a, SWS b) be two service of simulation SWS a, SWS bbetween the functional similarity degree of service of simulation, then there is a limit and connect SWS aand SWS b, otherwise not fillet therebetween;
4) by how sharing neighbor node in copying similar network between node, node is divided, the service of simulation in service of simulation registration center is carried out cluster;
It is described that to carry out the step of cluster to the service of simulation in service of simulation registration center as follows:
501) first traversing graph G σin all nodes thus find figure G σin all meet structure that given parameters requires is connected bunch, during beginning, all nodes are marked as unfiled node, member's node during each node is divided into bunch by this clustering algorithm or non-bunch of member's node, for each still non-classified node, this clustering algorithm checks whether this node is obs network node, if so, so can produce one bunch from this node, otherwise this node is labeled as non-member node;
502) in order to find one new bunch, from any one obs network node v, find all nodes that can reach from v structure, the all neighbor nodes adjacent with node v are inserted in a queue, for each node in queue, the all still non-classified Knots inserting that directly can be reached by this node, in queue, repeats this process until queue is for empty;
503) for non-bunch of member's node, be divided into line concentration point and outlier further, if an isolated node is connected with two or more bunch, so this node is divided into line concentration point, otherwise this node is outlier.
2. the copying information cluster method of a kind of sing on web service according to claim 1, it is characterized in that, the similarity of described each level comprises the functional similarity degree of element similarity, segment-similarity, WSDL similarity and service of simulation.
3. the copying information cluster method of a kind of sing on web service according to claim 1, it is characterized in that, the concrete steps that described information carries out preprocess method are as follows,
301) distinctive domain feature vocabulary is extracted;
302) word that any style, form differ is carried out standardization, be converted into hump formula;
303) standard words of each hump formula processed and be converted into many different sub-words, being designated as subnames;
304) adopt stemming method to subnames process, to be reduced to corresponding root-form;
305) establish an inactive vocabulary to remove the interference of some common words, all words appeared in inactive vocabulary will be filtered.
CN201210580107.9A 2012-12-27 2012-12-27 A kind of copying information cluster method of sing on web service Active CN103077202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210580107.9A CN103077202B (en) 2012-12-27 2012-12-27 A kind of copying information cluster method of sing on web service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210580107.9A CN103077202B (en) 2012-12-27 2012-12-27 A kind of copying information cluster method of sing on web service

Publications (2)

Publication Number Publication Date
CN103077202A CN103077202A (en) 2013-05-01
CN103077202B true CN103077202B (en) 2016-03-30

Family

ID=48153732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210580107.9A Active CN103077202B (en) 2012-12-27 2012-12-27 A kind of copying information cluster method of sing on web service

Country Status (1)

Country Link
CN (1) CN103077202B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068858A (en) * 2015-07-29 2015-11-18 北京世冠金洋科技发展有限公司 Multi-source heterogeneous system emulation method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2043009A1 (en) * 2007-09-28 2009-04-01 Alcatel Lucent Method for building semantic referential gathering semantic service descriptions
CN101695082A (en) * 2009-09-30 2010-04-14 北京航空航天大学 Service organization method based on relation mining and device thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2043009A1 (en) * 2007-09-28 2009-04-01 Alcatel Lucent Method for building semantic referential gathering semantic service descriptions
CN101695082A (en) * 2009-09-30 2010-04-14 北京航空航天大学 Service organization method based on relation mining and device thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Ranking and Clustering Web Services Using;Dimitrios Skoutas .etc;《IEEE TRANSACTIONS ON SERVICES COMPUTING》;20100930;第3卷(第3期);全文 *
Web Service Discovery with additional Semantics;Richi Nayak;《IEEE/WIC/ACM International Conference on Web Intelligence》;20071231;全文 *
基于无监督决策树聚类方法的研究;张婵婵;《中国优秀硕士学位论文全文数据库信息科技辑》;20101115(第11期);第2.2节第9段 *
基于语义Web的服务描述技术研究;张君泉;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20070215(第2期);第4.1节、第4.3节 *

Also Published As

Publication number Publication date
CN103077202A (en) 2013-05-01

Similar Documents

Publication Publication Date Title
CN106909643B (en) Knowledge graph-based social media big data topic discovery method
CN109858030B (en) Two-way intent slot value cross-correlation task-based dialog understanding system and method
CN106533742B (en) Weighting directed complex networks networking method based on time sequence model characterization
CN103942308A (en) Method and device for detecting large-scale social network communities
WO2019114700A1 (en) Traffic analysis method, public service traffic attribution method and corresponding computer system
CN107797998B (en) Rumor-containing user generated content identification method and device
CN104156436A (en) Social association cloud media collaborative filtering and recommending method
CN110598070B (en) Application type identification method and device, server and storage medium
CN102043863B (en) Method for Web service clustering
CN111325022B (en) Method and device for identifying hierarchical address
CN103838857B (en) Automatic service combination system and method based on semantics
JP6199517B1 (en) Determination apparatus, determination method, and determination program
CN104731958A (en) User-demand-oriented cloud manufacturing service recommendation method
CN112667750A (en) Method and device for determining and identifying message category
CN112214316A (en) Data resource allocation method based on Internet of things and cloud computing server
Shen et al. The analysis of intelligent real-time image recognition technology based on mobile edge computing and deep learning
CN111274823A (en) Text semantic understanding method and related device
CN103077202B (en) A kind of copying information cluster method of sing on web service
CN105302866A (en) OSN community discovery method based on LDA Theme model
CN103064907A (en) System and method for topic meta search based on unsupervised entity relation extraction
CN117251779A (en) Node classification method based on global perception neural network
CN105160037A (en) Traveling route screening methods and systems
CN114296785A (en) Log data modeling method and system
CN115204279A (en) Abnormal network traffic identification method based on VAE and COD-SNN
CN104063215A (en) RESTful Web service matching system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant