CN105138594B - A kind of Web service based on the sparse study of label finds method - Google Patents

A kind of Web service based on the sparse study of label finds method Download PDF

Info

Publication number
CN105138594B
CN105138594B CN201510466572.3A CN201510466572A CN105138594B CN 105138594 B CN105138594 B CN 105138594B CN 201510466572 A CN201510466572 A CN 201510466572A CN 105138594 B CN105138594 B CN 105138594B
Authority
CN
China
Prior art keywords
service
label
web service
services set
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510466572.3A
Other languages
Chinese (zh)
Other versions
CN105138594A (en
Inventor
尹建伟
罗威
邓水光
李莹
吴健
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201510466572.3A priority Critical patent/CN105138594B/en
Publication of CN105138594A publication Critical patent/CN105138594A/en
Application granted granted Critical
Publication of CN105138594B publication Critical patent/CN105138594B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Web services based on the sparse study of label to find method, and target system breaks through current service data source using single present Research, fully using the process of text message optimization service discovery.This method using Open-Source Tools extraction service description file and the text message of correlation tag, reuses the hiding relationship between sparse model tool excavation service description file and label, accurate Tag Estimation function is realized finally by Optimization Learning first.It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation;In addition, by the present invention in that with two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, the Tag Estimation list of generation helps to improve the efficiency of Web service discovery.

Description

A kind of Web service based on the sparse study of label finds method
Technical field
The invention belongs to Computer Service technical fields, and in particular to a kind of Web service hair based on the sparse study of label Existing method.
Background technology
With the continuous development of 2.0 Time Technology revolutions of Web, the Main Morphology of internet works software production method, operation side Huge variation is just occurring for formula, the mode of production and occupation mode.Based on Web service dynamic aggregation, automatic combination and elasticity are stretched The new service discovery of contracting becomes the important trend of future network application and development.The application of these Web service technologies is all established It is unfolded on the basis of service search engine discovery and management service.In recent years, find that service becomes using search engine Industrial quarters and the emphasis of academia's concern.
It is mainly at present what is polymerize and managed by search engine about Web service.In practical operation, Yong Huti Search key is handed over, search engine carries out service search by string matching WSDL (web services definition language) file content With discovery.However, under the efficiency of this scheme is very low, the reason is as follows that:(1) the Web service framework of Issues and Crucial Practices of Contemporary Enterprises tissue is answered It is miscellaneous, common WSDL is caused to include the text message of very more redundancies, string matching is directly carried out and causes asking for the wasting of resources Topic.(2) contemporary internet is flourishing, and exponentially type increases Web service.Matching all wsdl documents causes efficiency too low Problem.Under real conditions, industrial quarters needs a kind of efficient service index strategy, simple to be made using the text message of WSDL Into the problem of seriously hinder field of service calculation development.Therefore, novel Relevant Service Discovery Technologies are the boostings of Web service research Device.
In the prior art, academia carries out service index using label in exploration and achieves significant progress.However, it learns It is sufficient and accurate that art circle, which generally assumes to mark the service labels of WSDL, and there are some deficiencies in practice for this premise:
1. in fact, label is rare.Label depends on handmarking and is compared with big data service growth, this The label of sample seems excessively inefficient, causes label always rare.
2. since label is handmarking, there are the shortcomings of arbitrariness and lack of standardizationization, merely using inquiry request and Label, which carries out matching, will directly reduce service discovery effect.
Invention content
For the above-mentioned technical problem present in the prior art, the present invention proposes a kind of based on the sparse study of label Web service finds method, can effectively improve the accuracy of Tag Estimation, further improves the efficiency of Web service discovery.
A kind of Web service based on the sparse study of label finds method, includes the following steps:
(1) wsdl document of each Web service and the service labels of handmarking in services set are collected;
(2) wsdl document and service labels of each Web service are pre-processed;
(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of this Label relative to services set weight vectors w;
Wherein:vdFor the Text eigenvector of d-th of Web service wsdl document in services set, D is to own in services set The total number of Web service;If the label is by service labels of the handmarking for d-th of Web service, yd=1, otherwise yd= 0;α is the default rule factor,TFor vectorial transposition;
(4) for any label in tag library, the weight vectors w for making the label and each Web service in services set The Text eigenvector of wsdl document carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service;
By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and is made Prediction label of the label as these Web services;
(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, is taken Search engine be engaged in directly by target query request and the wsdl document information progress character string of Web service each in services set Match;If services set is more than certain amount scale, service search engine is directly by target query request and each Web in services set The prediction label of service carries out string matching;The Web service matched is finally presented to user.
The wsdl document and service labels of each Web service are pre-processed in the step (2), wherein for Wsdl document then extracts the characteristic information of wsdl document using XML (extensible markup language) tool and establishes corresponding text Feature vector;For service labels, then text-normalization chemical industry tool (such as word that increases income common in natural language processing is utilized Stemming technologies) regularization is carried out to service labels.
Minimum solution is carried out to object function L by following iterative algorithm in the step (3):
Wherein:wtAnd wt+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For wtWeight vectors after gradient declines, wt+1(i) it is weight vectors wt+1In i-th of element value,For weight VectorIn i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, and θ is Preset iteration factor.
The Web service matched is finally packaged into html page formattings, and then search by service in the step (5) Index, which is held up, is presented to user.
It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation;In addition, the present invention is logical Cross using two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, generation Tag Estimation row Table helps to improve the efficiency of Web service discovery.
Description of the drawings
Fig. 1 is that the present invention is based on the flow diagrams that the sparse Learning Service of label finds method.
Fig. 2 is sparse study nucleus module WTLearning (Web Service Tag Learning, the network service of label Label learn) internal process schematic diagram.
Specific embodiment
In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and specific embodiment is to technical scheme of the present invention It is described in detail.
As shown in Figure 1, the Web service the present invention is based on the sparse study of label finds that method is included with lower part:
Step 1:Service search engine collects the wsdl document that developer of services provides.Per service file, engine It manages user and label information is provided.Assuming that developer provides D service altogether is search for engine Candidate Set, then then shares D Wsdl document describes corresponding with service.In initial phase, user marks D service document on label to illustrate the effect of service, Process is ensured the quality of label by service search engine mechanism.After acquisition, D wsdl document and label establish " a pair It is more " mapping relations.
Step 2:Search engine pre-processes the wsdl document and label of collection.
For wsdl document, engine extracts text message with XML tools and establishes Bag-of-words (BoW) dictionary model, The model has ignored the grammer and word order of text, and wsdl document content is expressed with one group of unordered word.Specifically, for Wsdl document d, engine use the corresponding υ of dictionary model foundationa, length is the total words of WSDL.The value of this vector for 0 or 1:If corresponding word occurs, otherwise value is 1 is 0.After treatment, D wsdl document is converted to D text by engine Feature vector.
For label, engine uses common word stemming technologies in natural language processing, should label Regularization Technology isolates out the symbol and stop words of word, ensures the quality of input text.
Step 3:Service search engine receives target user inquiry request q, and backstage carries out service search processing.
Step 4:The inquiry request received step 3 performs an analysis:
(1) if candidate service collection D is less than 1000, then service search engine will be directly by WSDL text messages and inquiry Request carries out string matching.
(2) if candidate service collection D is more than or equal to 1000, then service search engine will carry out the online of step 5 WTLearning modules carry out Tag Estimation, and label and corresponding wsdl document are as a result carried out simultaneous, searchable engine into Row label and inquiry request carry out string matching.
Step 5:The execution entity of the sparse study of the online labels of nucleus module WTLearning.As shown in Fig. 2, The sub-process of WTLearning modules includes following sections:
5.1 receive target user's inquiry request according to step 3 selects suitable object function core.Usually, module permits Perhaps User Defined object function, such as classical 0-1 loss functions and logistic object function.Letter is carried out in order to facilitate user Easy to operate, the log object functions that system provides acquiescence are as follows:
Wherein:D is wsdl document sum.For label t, if wsdl document is marked as t, yd=1 on the contrary be then 0。υaFor corresponding WSDL Text eigenvectors, length is total words mesh V.W is the target weight vector for label t.
Experiment shows that the selection of object function can have an important influence on final precision of prediction.For ordinary user, mould Block acquiescence provides formula (1) initial function template.By experiment show, this define can both ensure engine rapid solving Object function, while also high-precision label can be recommended to meet user demand.
5.2, in order to increase the openness characteristic for being allowed to adapt to high dimensional data of target feature vector, facilitate big data environment Under operation, present embodiment introduces increment of the classical sparsity constraints technology as formula (1):
α||w||1 (2)
Wherein:||w||1It is this black 1 norm of not Luo Beini of object vector w, is defined as:
In formula (3), α is rule factor, the fitting rate of control targe vector w.Sparsity constraints cause what is solved Feature vector has more 0 value, and this arrangement increase operating flexibility of the module under big data environment.Finally, module handle It is as follows that formula (1) and formula (2) are incorporated as engine default objects function:
5.3 hybrid intelligents solve:
Conventional method substantially can not solving complexity object function, shaped like formula (4), present embodiment is using the two-stage Algorithm of hybrid intelligent solves object vector w.
First stage, gradient descent method carry out partial derivative equation solution to formula (1), ask inclined for the vectorial w of demand solution Derivative is as follows:
Wherein:θ is iteration factor, for controlling gradient fall off rate.
Second stage, the Constraint Method processing sparse item of subsequent formula (2):
5.4 for iteration each time, and module is by the fresh target vector of generationNew vector is substituted into formula (4), is calculated L is so as to the result of more new formula (4).The end condition of iteration is:
L-L≤ε (9)
Wherein:ε is iteration threshold, usual ε=0.001.
If loss function meets above-mentioned end condition, then iterative process terminates.Vacation is if not satisfied, then return sub-step 5.3 gradient descent methods continue iteration, until meeting condition.
5.5 generate corresponding object vector w for each specific label t, module by solution formula (4).For Destination service describes file WSDL, and engine generates Text eigenvector v using XML extraction techniques.The target generated for study Vectorial w, module generate the probability of this wsdl document label label by the dot product of w and v, and sequence is used to learn to generate label Tag Estimation result of the TOP-5 sequences as this service.
Online Tag Estimation algorithm engine is the core of the present invention.Under real conditions, engine is needed in face of numerous users Real-time query request, this require algorithm must be reduced while precision of prediction is improved calculating time complexity.The present invention's Algorithms T-cbmplexity essentially consists in formula (5).Mathematical proof, iteration time complexity is O (ρ d) each time:Wherein ρ is mark Density is signed, d is constant, is the dimension in implicit features space.It can be seen that the time complexity and original tag of iteration each time Density is linear.In general, original tag is all very sparse, therefore the time complexity of single iteration is very low.It is meanwhile real Verify that the prediction algorithm of the bright present invention usually may conform to preset condition in 15 times or so iteration.In conclusion the present invention's is pre- Method of determining and calculating can real-time response multi-user online service inquiry request.
Step 6:Responsible acquisition meets the service list of user's request, and is packaged into html page formattings, and pass through front end Display engine gives user result presentation.
In order to quantify to show, the present invention is based between the Forecasting Methodology of the sparse study of label and conventional labels Forecasting Methodology Quality, we carry out the accuracy of assessment prediction using the general F scores of search engine.It is first simple in order to preferably explain F scores Define F scores:
Wherein:The Tag Estimation list accuracy of P representation modules generation, the list of labels of R representation modules generation are recalled Degree, F1 scores weigh the predictive ability of module from accuracy and degree of recalling synthesis.
339 wsdl documents are contained in the data set that experiment uses and corresponding 4825 labels make training set, in addition 5120 wsdl documents make test set.During the test, we proportionally randomly select the label of training set.Test result As shown in table 1:
Table 1
5% 10% 15% 20%
LDA 0.5123 0.6325 0.7060 0.6822
WTCluster 0.7916 0.7311 0.6910 0.6310
The present invention 0.8813 0.8794 0.8787 0.8784
With present method LDA (Latent Dirichlet Allocation) and WTCluster (Web Service Tag Cluster) it compares, the F1 value highers of the method for the present invention, i.e. prediction result is more accurate.
It is understood that the above description of the embodiments is intended to facilitate those skilled in the art and using this hair It is bright.Person skilled in the art obviously can easily make above-described embodiment various modifications, and described herein General Principle is applied in other embodiment without having to go through creative labor.Therefore, the present invention is not limited to above-described embodiment, Those skilled in the art's announcement according to the present invention, the improvement made for the present invention and modification all should be in the protections of the present invention Within the scope of.

Claims (2)

1. a kind of Web service based on the sparse study of label finds method, include the following steps:
(1) wsdl document of each Web service and the service labels of handmarking in services set are collected;
(2) wsdl document and service labels of each Web service are pre-processed, wherein for wsdl document, then utilizes XML The characteristic information of tool extraction wsdl document simultaneously establishes corresponding Text eigenvector;For service labels, then nature language is utilized Common text-normalization chemical industry tool of increasing income carries out regularization to service labels in speech processing;
(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of the label Relative to the weight vectors w of services set;
Wherein:vdFor the Text eigenvector of d-th of Web service wsdl document in services set, D is all Web services in services set Total number;If the label is by service labels of the handmarking for d-th of Web service, yd=1, otherwise yd=0;α is pre- If rule factor, T is vectorial transposition;Minimum solution is specifically carried out to object function L using following iterative equation:
Wherein:wtAnd wt+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For wt Weight vectors after gradient declines, wt+1(i) it is weight vectors wt+1In i-th of element value,For weight vectors In i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, θ for it is preset repeatedly For the factor;
(4) for any label in tag library, make the weight vectors w of the label and Web service WSDL texts each in services set The Text eigenvector of part carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service;
By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and makes the mark Sign the prediction label as these Web services;
(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, service is searched Index is held up directly carries out string matching by target query request and the wsdl document information of Web service each in services set;If Services set is more than certain amount scale, then service search engine is directly by target query request and each Web service in services set Prediction label carry out string matching;The Web service matched is finally presented to user.
2. Web service according to claim 1 finds method, it is characterised in that:It finally will matching in the step (5) On Web service be packaged into html page formattings, and then user is presented to by service search engine.
CN201510466572.3A 2015-07-31 2015-07-31 A kind of Web service based on the sparse study of label finds method Active CN105138594B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510466572.3A CN105138594B (en) 2015-07-31 2015-07-31 A kind of Web service based on the sparse study of label finds method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510466572.3A CN105138594B (en) 2015-07-31 2015-07-31 A kind of Web service based on the sparse study of label finds method

Publications (2)

Publication Number Publication Date
CN105138594A CN105138594A (en) 2015-12-09
CN105138594B true CN105138594B (en) 2018-06-19

Family

ID=54723943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510466572.3A Active CN105138594B (en) 2015-07-31 2015-07-31 A kind of Web service based on the sparse study of label finds method

Country Status (1)

Country Link
CN (1) CN105138594B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833561A (en) * 2010-02-12 2010-09-15 西安电子科技大学 Natural language processing oriented Web service intelligent agent

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6364718B1 (en) * 2001-02-02 2002-04-02 Molex Incorporated Keying system for electrical connector assemblies

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833561A (en) * 2010-02-12 2010-09-15 西安电子科技大学 Natural language processing oriented Web service intelligent agent

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Accelerated Sparse Learning on Tag Annotation for Web Service Discovery;Wei Lo 等;《2015 IEEE International Conference on Web Services》;20150702;265-272 *
基于二分图匹配的语义Web服务发现方法;邓水光 等;《计算机学报》;20080831;第31卷(第8期);1364-1375 *

Also Published As

Publication number Publication date
CN105138594A (en) 2015-12-09

Similar Documents

Publication Publication Date Title
CN111708869B (en) Processing method and device for man-machine conversation
WO2018218705A1 (en) Method for recognizing network text named entity based on neural network probability disambiguation
CN111325029B (en) Text similarity calculation method based on deep learning integrated model
WO2018218708A1 (en) Deep-learning-based public opinion hotspot category classification method
CN112836509B (en) Expert system knowledge base construction method and system
CN106709754A (en) Power user grouping method based on text mining
CN112883193A (en) Training method, device and equipment of text classification model and readable medium
CN113887643B (en) New dialogue intention recognition method based on pseudo tag self-training and source domain retraining
CN113672718B (en) Dialogue intention recognition method and system based on feature matching and field self-adaption
CN112052684A (en) Named entity identification method, device, equipment and storage medium for power metering
CN108959474B (en) Entity relation extraction method
WO2022048194A1 (en) Method, apparatus and device for optimizing event subject identification model, and readable storage medium
CN112699685B (en) Named entity recognition method based on label-guided word fusion
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN113051914A (en) Enterprise hidden label extraction method and device based on multi-feature dynamic portrait
CN113360582B (en) Relation classification method and system based on BERT model fusion multi-entity information
CN112328748A (en) Method for identifying insurance configuration intention
CN118113849A (en) Information consultation service system and method based on big data
CN116167379A (en) Entity relation extraction method based on BERT and entity position information
CN112417132A (en) New intention recognition method for screening negative samples by utilizing predicate guest information
CN115935998A (en) Multi-feature financial field named entity identification method
TW202111569A (en) Text classification method with high scalability and multi-tag and apparatus thereof also providing a method and a device for constructing topic classification templates
CN114169447B (en) Event detection method based on self-attention convolution bidirectional gating cyclic unit network
CN111339258A (en) University computer basic exercise recommendation method based on knowledge graph
CN112489689B (en) Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant