CN105138594B

CN105138594B - A kind of Web service based on the sparse study of label finds method

Info

Publication number: CN105138594B
Application number: CN201510466572.3A
Authority: CN
Inventors: 尹建伟; 罗威; 邓水光; 李莹; 吴健; 吴朝晖
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2018-06-19
Anticipated expiration: 2035-07-31
Also published as: CN105138594A

Abstract

The invention discloses a kind of Web services based on the sparse study of label to find method, and target system breaks through current service data source using single present Research, fully using the process of text message optimization service discovery.This method using Open-Source Tools extraction service description file and the text message of correlation tag, reuses the hiding relationship between sparse model tool excavation service description file and label, accurate Tag Estimation function is realized finally by Optimization Learning first.It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation；In addition, by the present invention in that with two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, the Tag Estimation list of generation helps to improve the efficiency of Web service discovery.

Description

A kind of Web service based on the sparse study of label finds method

Technical field

The invention belongs to Computer Service technical fields, and in particular to a kind of Web service hair based on the sparse study of label Existing method.

Background technology

With the continuous development of 2.0 Time Technology revolutions of Web, the Main Morphology of internet works software production method, operation side Huge variation is just occurring for formula, the mode of production and occupation mode.Based on Web service dynamic aggregation, automatic combination and elasticity are stretched The new service discovery of contracting becomes the important trend of future network application and development.The application of these Web service technologies is all established It is unfolded on the basis of service search engine discovery and management service.In recent years, find that service becomes using search engine Industrial quarters and the emphasis of academia's concern.

It is mainly at present what is polymerize and managed by search engine about Web service.In practical operation, Yong Huti Search key is handed over, search engine carries out service search by string matching WSDL (web services definition language) file content With discovery.However, under the efficiency of this scheme is very low, the reason is as follows that：(1) the Web service framework of Issues and Crucial Practices of Contemporary Enterprises tissue is answered It is miscellaneous, common WSDL is caused to include the text message of very more redundancies, string matching is directly carried out and causes asking for the wasting of resources Topic.(2) contemporary internet is flourishing, and exponentially type increases Web service.Matching all wsdl documents causes efficiency too low Problem.Under real conditions, industrial quarters needs a kind of efficient service index strategy, simple to be made using the text message of WSDL Into the problem of seriously hinder field of service calculation development.Therefore, novel Relevant Service Discovery Technologies are the boostings of Web service research Device.

In the prior art, academia carries out service index using label in exploration and achieves significant progress.However, it learns It is sufficient and accurate that art circle, which generally assumes to mark the service labels of WSDL, and there are some deficiencies in practice for this premise：

1. in fact, label is rare.Label depends on handmarking and is compared with big data service growth, this The label of sample seems excessively inefficient, causes label always rare.

2. since label is handmarking, there are the shortcomings of arbitrariness and lack of standardizationization, merely using inquiry request and Label, which carries out matching, will directly reduce service discovery effect.

Invention content

For the above-mentioned technical problem present in the prior art, the present invention proposes a kind of based on the sparse study of label Web service finds method, can effectively improve the accuracy of Tag Estimation, further improves the efficiency of Web service discovery.

A kind of Web service based on the sparse study of label finds method, includes the following steps：

(1) wsdl document of each Web service and the service labels of handmarking in services set are collected；

(2) wsdl document and service labels of each Web service are pre-processed；

(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of this Label relative to services set weight vectors w；

Wherein：v_dFor the Text eigenvector of d-th of Web service wsdl document in services set, D is to own in services set The total number of Web service；If the label is by service labels of the handmarking for d-th of Web service, y_d=1, otherwise y_d= 0；α is the default rule factor,^TFor vectorial transposition；

(4) for any label in tag library, the weight vectors w for making the label and each Web service in services set The Text eigenvector of wsdl document carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service；

By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and is made Prediction label of the label as these Web services；

(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, is taken Search engine be engaged in directly by target query request and the wsdl document information progress character string of Web service each in services set Match；If services set is more than certain amount scale, service search engine is directly by target query request and each Web in services set The prediction label of service carries out string matching；The Web service matched is finally presented to user.

The wsdl document and service labels of each Web service are pre-processed in the step (2), wherein for Wsdl document then extracts the characteristic information of wsdl document using XML (extensible markup language) tool and establishes corresponding text Feature vector；For service labels, then text-normalization chemical industry tool (such as word that increases income common in natural language processing is utilized Stemming technologies) regularization is carried out to service labels.

Minimum solution is carried out to object function L by following iterative algorithm in the step (3)：

Wherein：w_tAnd w_t+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For w_tWeight vectors after gradient declines, w_t+1(i) it is weight vectors w_t+1In i-th of element value,For weight VectorIn i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, and θ is Preset iteration factor.

The Web service matched is finally packaged into html page formattings, and then search by service in the step (5) Index, which is held up, is presented to user.

It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation；In addition, the present invention is logical Cross using two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, generation Tag Estimation row Table helps to improve the efficiency of Web service discovery.

Description of the drawings

Fig. 1 is that the present invention is based on the flow diagrams that the sparse Learning Service of label finds method.

Fig. 2 is sparse study nucleus module WTLearning (Web Service Tag Learning, the network service of label Label learn) internal process schematic diagram.

Specific embodiment

In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and specific embodiment is to technical scheme of the present invention It is described in detail.

As shown in Figure 1, the Web service the present invention is based on the sparse study of label finds that method is included with lower part：

Step 1：Service search engine collects the wsdl document that developer of services provides.Per service file, engine It manages user and label information is provided.Assuming that developer provides D service altogether is search for engine Candidate Set, then then shares D Wsdl document describes corresponding with service.In initial phase, user marks D service document on label to illustrate the effect of service, Process is ensured the quality of label by service search engine mechanism.After acquisition, D wsdl document and label establish " a pair It is more " mapping relations.

Step 2：Search engine pre-processes the wsdl document and label of collection.

For wsdl document, engine extracts text message with XML tools and establishes Bag-of-words (BoW) dictionary model, The model has ignored the grammer and word order of text, and wsdl document content is expressed with one group of unordered word.Specifically, for Wsdl document d, engine use the corresponding υ of dictionary model foundation_a, length is the total words of WSDL.The value of this vector for 0 or 1：If corresponding word occurs, otherwise value is 1 is 0.After treatment, D wsdl document is converted to D text by engine Feature vector.

For label, engine uses common word stemming technologies in natural language processing, should label Regularization Technology isolates out the symbol and stop words of word, ensures the quality of input text.

Step 3：Service search engine receives target user inquiry request q, and backstage carries out service search processing.

Step 4：The inquiry request received step 3 performs an analysis：

(1) if candidate service collection D is less than 1000, then service search engine will be directly by WSDL text messages and inquiry Request carries out string matching.

(2) if candidate service collection D is more than or equal to 1000, then service search engine will carry out the online of step 5 WTLearning modules carry out Tag Estimation, and label and corresponding wsdl document are as a result carried out simultaneous, searchable engine into Row label and inquiry request carry out string matching.

Step 5：The execution entity of the sparse study of the online labels of nucleus module WTLearning.As shown in Fig. 2, The sub-process of WTLearning modules includes following sections：

5.1 receive target user's inquiry request according to step 3 selects suitable object function core.Usually, module permits Perhaps User Defined object function, such as classical 0-1 loss functions and logistic object function.Letter is carried out in order to facilitate user Easy to operate, the log object functions that system provides acquiescence are as follows：

Wherein：D is wsdl document sum.For label t, if wsdl document is marked as t, y_d=1 on the contrary be then 0。υ_aFor corresponding WSDL Text eigenvectors, length is total words mesh V.W is the target weight vector for label t.

Experiment shows that the selection of object function can have an important influence on final precision of prediction.For ordinary user, mould Block acquiescence provides formula (1) initial function template.By experiment show, this define can both ensure engine rapid solving Object function, while also high-precision label can be recommended to meet user demand.

5.2, in order to increase the openness characteristic for being allowed to adapt to high dimensional data of target feature vector, facilitate big data environment Under operation, present embodiment introduces increment of the classical sparsity constraints technology as formula (1)：

α||w||₁ (2)

Wherein：||w||₁It is this black 1 norm of not Luo Beini of object vector w, is defined as：

In formula (3), α is rule factor, the fitting rate of control targe vector w.Sparsity constraints cause what is solved Feature vector has more 0 value, and this arrangement increase operating flexibility of the module under big data environment.Finally, module handle It is as follows that formula (1) and formula (2) are incorporated as engine default objects function：

5.3 hybrid intelligents solve：

Conventional method substantially can not solving complexity object function, shaped like formula (4), present embodiment is using the two-stage Algorithm of hybrid intelligent solves object vector w.

First stage, gradient descent method carry out partial derivative equation solution to formula (1), ask inclined for the vectorial w of demand solution Derivative is as follows：

Wherein：θ is iteration factor, for controlling gradient fall off rate.

Second stage, the Constraint Method processing sparse item of subsequent formula (2)：

5.4 for iteration each time, and module is by the fresh target vector of generationNew vector is substituted into formula (4), is calculated L is so as to the result of more new formula (4).The end condition of iteration is：

L^′-L≤ε (9)

Wherein：ε is iteration threshold, usual ε=0.001.

If loss function meets above-mentioned end condition, then iterative process terminates.Vacation is if not satisfied, then return sub-step 5.3 gradient descent methods continue iteration, until meeting condition.

5.5 generate corresponding object vector w for each specific label t, module by solution formula (4).For Destination service describes file WSDL, and engine generates Text eigenvector v using XML extraction techniques.The target generated for study Vectorial w, module generate the probability of this wsdl document label label by the dot product of w and v, and sequence is used to learn to generate label Tag Estimation result of the TOP-5 sequences as this service.

Online Tag Estimation algorithm engine is the core of the present invention.Under real conditions, engine is needed in face of numerous users Real-time query request, this require algorithm must be reduced while precision of prediction is improved calculating time complexity.The present invention's Algorithms T-cbmplexity essentially consists in formula (5).Mathematical proof, iteration time complexity is O (ρ d) each time：Wherein ρ is mark Density is signed, d is constant, is the dimension in implicit features space.It can be seen that the time complexity and original tag of iteration each time Density is linear.In general, original tag is all very sparse, therefore the time complexity of single iteration is very low.It is meanwhile real Verify that the prediction algorithm of the bright present invention usually may conform to preset condition in 15 times or so iteration.In conclusion the present invention's is pre- Method of determining and calculating can real-time response multi-user online service inquiry request.

Step 6：Responsible acquisition meets the service list of user's request, and is packaged into html page formattings, and pass through front end Display engine gives user result presentation.

In order to quantify to show, the present invention is based between the Forecasting Methodology of the sparse study of label and conventional labels Forecasting Methodology Quality, we carry out the accuracy of assessment prediction using the general F scores of search engine.It is first simple in order to preferably explain F scores Define F scores：

Wherein：The Tag Estimation list accuracy of P representation modules generation, the list of labels of R representation modules generation are recalled Degree, F1 scores weigh the predictive ability of module from accuracy and degree of recalling synthesis.

339 wsdl documents are contained in the data set that experiment uses and corresponding 4825 labels make training set, in addition 5120 wsdl documents make test set.During the test, we proportionally randomly select the label of training set.Test result As shown in table 1：

Table 1

	5%	10%	15%	20%
					LDA	0.5123	0.6325	0.7060	0.6822
WTCluster	0.7916	0.7311	0.6910	0.6310
					The present invention	0.8813	0.8794	0.8787	0.8784

With present method LDA (Latent Dirichlet Allocation) and WTCluster (Web Service Tag Cluster) it compares, the F1 value highers of the method for the present invention, i.e. prediction result is more accurate.

It is understood that the above description of the embodiments is intended to facilitate those skilled in the art and using this hair It is bright.Person skilled in the art obviously can easily make above-described embodiment various modifications, and described herein General Principle is applied in other embodiment without having to go through creative labor.Therefore, the present invention is not limited to above-described embodiment, Those skilled in the art's announcement according to the present invention, the improvement made for the present invention and modification all should be in the protections of the present invention Within the scope of.

Claims

1. a kind of Web service based on the sparse study of label finds method, include the following steps：

(2) wsdl document and service labels of each Web service are pre-processed, wherein for wsdl document, then utilizes XML The characteristic information of tool extraction wsdl document simultaneously establishes corresponding Text eigenvector；For service labels, then nature language is utilized Common text-normalization chemical industry tool of increasing income carries out regularization to service labels in speech processing；

(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of the label Relative to the weight vectors w of services set；

Wherein：v_dFor the Text eigenvector of d-th of Web service wsdl document in services set, D is all Web services in services set Total number；If the label is by service labels of the handmarking for d-th of Web service, y_d=1, otherwise y_d=0；α is pre- If rule factor, T is vectorial transposition；Minimum solution is specifically carried out to object function L using following iterative equation：

Wherein：w_tAnd w_t+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For w_t Weight vectors after gradient declines, w_t+1(i) it is weight vectors w_t+1In i-th of element value,For weight vectors In i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, θ for it is preset repeatedly For the factor；

(4) for any label in tag library, make the weight vectors w of the label and Web service WSDL texts each in services set The Text eigenvector of part carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service；

By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and makes the mark Sign the prediction label as these Web services；

(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, service is searched Index is held up directly carries out string matching by target query request and the wsdl document information of Web service each in services set；If Services set is more than certain amount scale, then service search engine is directly by target query request and each Web service in services set Prediction label carry out string matching；The Web service matched is finally presented to user.

2. Web service according to claim 1 finds method, it is characterised in that：It finally will matching in the step (5) On Web service be packaged into html page formattings, and then user is presented to by service search engine.