CN105138594B - A kind of Web service based on the sparse study of label finds method - Google Patents
A kind of Web service based on the sparse study of label finds method Download PDFInfo
- Publication number
- CN105138594B CN105138594B CN201510466572.3A CN201510466572A CN105138594B CN 105138594 B CN105138594 B CN 105138594B CN 201510466572 A CN201510466572 A CN 201510466572A CN 105138594 B CN105138594 B CN 105138594B
- Authority
- CN
- China
- Prior art keywords
- service
- label
- web service
- services set
- web
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 230000006870 function Effects 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 claims description 2
- 230000017105 transposition Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 4
- 230000004044 response Effects 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 abstract 2
- 238000009412 basement excavation Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 8
- 238000011161 development Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000004224 protection Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of Web services based on the sparse study of label to find method, and target system breaks through current service data source using single present Research, fully using the process of text message optimization service discovery.This method using Open-Source Tools extraction service description file and the text message of correlation tag, reuses the hiding relationship between sparse model tool excavation service description file and label, accurate Tag Estimation function is realized finally by Optimization Learning first.It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation;In addition, by the present invention in that with two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, the Tag Estimation list of generation helps to improve the efficiency of Web service discovery.
Description
Technical field
The invention belongs to Computer Service technical fields, and in particular to a kind of Web service hair based on the sparse study of label
Existing method.
Background technology
With the continuous development of 2.0 Time Technology revolutions of Web, the Main Morphology of internet works software production method, operation side
Huge variation is just occurring for formula, the mode of production and occupation mode.Based on Web service dynamic aggregation, automatic combination and elasticity are stretched
The new service discovery of contracting becomes the important trend of future network application and development.The application of these Web service technologies is all established
It is unfolded on the basis of service search engine discovery and management service.In recent years, find that service becomes using search engine
Industrial quarters and the emphasis of academia's concern.
It is mainly at present what is polymerize and managed by search engine about Web service.In practical operation, Yong Huti
Search key is handed over, search engine carries out service search by string matching WSDL (web services definition language) file content
With discovery.However, under the efficiency of this scheme is very low, the reason is as follows that:(1) the Web service framework of Issues and Crucial Practices of Contemporary Enterprises tissue is answered
It is miscellaneous, common WSDL is caused to include the text message of very more redundancies, string matching is directly carried out and causes asking for the wasting of resources
Topic.(2) contemporary internet is flourishing, and exponentially type increases Web service.Matching all wsdl documents causes efficiency too low
Problem.Under real conditions, industrial quarters needs a kind of efficient service index strategy, simple to be made using the text message of WSDL
Into the problem of seriously hinder field of service calculation development.Therefore, novel Relevant Service Discovery Technologies are the boostings of Web service research
Device.
In the prior art, academia carries out service index using label in exploration and achieves significant progress.However, it learns
It is sufficient and accurate that art circle, which generally assumes to mark the service labels of WSDL, and there are some deficiencies in practice for this premise:
1. in fact, label is rare.Label depends on handmarking and is compared with big data service growth, this
The label of sample seems excessively inefficient, causes label always rare.
2. since label is handmarking, there are the shortcomings of arbitrariness and lack of standardizationization, merely using inquiry request and
Label, which carries out matching, will directly reduce service discovery effect.
Invention content
For the above-mentioned technical problem present in the prior art, the present invention proposes a kind of based on the sparse study of label
Web service finds method, can effectively improve the accuracy of Tag Estimation, further improves the efficiency of Web service discovery.
A kind of Web service based on the sparse study of label finds method, includes the following steps:
(1) wsdl document of each Web service and the service labels of handmarking in services set are collected;
(2) wsdl document and service labels of each Web service are pre-processed;
(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of this
Label relative to services set weight vectors w;
Wherein:vdFor the Text eigenvector of d-th of Web service wsdl document in services set, D is to own in services set
The total number of Web service;If the label is by service labels of the handmarking for d-th of Web service, yd=1, otherwise yd=
0;α is the default rule factor,TFor vectorial transposition;
(4) for any label in tag library, the weight vectors w for making the label and each Web service in services set
The Text eigenvector of wsdl document carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service;
By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and is made
Prediction label of the label as these Web services;
(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, is taken
Search engine be engaged in directly by target query request and the wsdl document information progress character string of Web service each in services set
Match;If services set is more than certain amount scale, service search engine is directly by target query request and each Web in services set
The prediction label of service carries out string matching;The Web service matched is finally presented to user.
The wsdl document and service labels of each Web service are pre-processed in the step (2), wherein for
Wsdl document then extracts the characteristic information of wsdl document using XML (extensible markup language) tool and establishes corresponding text
Feature vector;For service labels, then text-normalization chemical industry tool (such as word that increases income common in natural language processing is utilized
Stemming technologies) regularization is carried out to service labels.
Minimum solution is carried out to object function L by following iterative algorithm in the step (3):
Wherein:wtAnd wt+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For wtWeight vectors after gradient declines, wt+1(i) it is weight vectors wt+1In i-th of element value,For weight
VectorIn i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, and θ is
Preset iteration factor.
The Web service matched is finally packaged into html page formattings, and then search by service in the step (5)
Index, which is held up, is presented to user.
It is of the invention fully to excavate WSDL text features to effectively improve the accuracy of Tag Estimation;In addition, the present invention is logical
Cross using two-stage integrated intelligent algorithm can real-time response multi-user personalized service inquiry request, generation Tag Estimation row
Table helps to improve the efficiency of Web service discovery.
Description of the drawings
Fig. 1 is that the present invention is based on the flow diagrams that the sparse Learning Service of label finds method.
Fig. 2 is sparse study nucleus module WTLearning (Web Service Tag Learning, the network service of label
Label learn) internal process schematic diagram.
Specific embodiment
In order to more specifically describe the present invention, below in conjunction with the accompanying drawings and specific embodiment is to technical scheme of the present invention
It is described in detail.
As shown in Figure 1, the Web service the present invention is based on the sparse study of label finds that method is included with lower part:
Step 1:Service search engine collects the wsdl document that developer of services provides.Per service file, engine
It manages user and label information is provided.Assuming that developer provides D service altogether is search for engine Candidate Set, then then shares D
Wsdl document describes corresponding with service.In initial phase, user marks D service document on label to illustrate the effect of service,
Process is ensured the quality of label by service search engine mechanism.After acquisition, D wsdl document and label establish " a pair
It is more " mapping relations.
Step 2:Search engine pre-processes the wsdl document and label of collection.
For wsdl document, engine extracts text message with XML tools and establishes Bag-of-words (BoW) dictionary model,
The model has ignored the grammer and word order of text, and wsdl document content is expressed with one group of unordered word.Specifically, for
Wsdl document d, engine use the corresponding υ of dictionary model foundationa, length is the total words of WSDL.The value of this vector for 0 or
1:If corresponding word occurs, otherwise value is 1 is 0.After treatment, D wsdl document is converted to D text by engine
Feature vector.
For label, engine uses common word stemming technologies in natural language processing, should label Regularization
Technology isolates out the symbol and stop words of word, ensures the quality of input text.
Step 3:Service search engine receives target user inquiry request q, and backstage carries out service search processing.
Step 4:The inquiry request received step 3 performs an analysis:
(1) if candidate service collection D is less than 1000, then service search engine will be directly by WSDL text messages and inquiry
Request carries out string matching.
(2) if candidate service collection D is more than or equal to 1000, then service search engine will carry out the online of step 5
WTLearning modules carry out Tag Estimation, and label and corresponding wsdl document are as a result carried out simultaneous, searchable engine into
Row label and inquiry request carry out string matching.
Step 5:The execution entity of the sparse study of the online labels of nucleus module WTLearning.As shown in Fig. 2,
The sub-process of WTLearning modules includes following sections:
5.1 receive target user's inquiry request according to step 3 selects suitable object function core.Usually, module permits
Perhaps User Defined object function, such as classical 0-1 loss functions and logistic object function.Letter is carried out in order to facilitate user
Easy to operate, the log object functions that system provides acquiescence are as follows:
Wherein:D is wsdl document sum.For label t, if wsdl document is marked as t, yd=1 on the contrary be then
0。υaFor corresponding WSDL Text eigenvectors, length is total words mesh V.W is the target weight vector for label t.
Experiment shows that the selection of object function can have an important influence on final precision of prediction.For ordinary user, mould
Block acquiescence provides formula (1) initial function template.By experiment show, this define can both ensure engine rapid solving
Object function, while also high-precision label can be recommended to meet user demand.
5.2, in order to increase the openness characteristic for being allowed to adapt to high dimensional data of target feature vector, facilitate big data environment
Under operation, present embodiment introduces increment of the classical sparsity constraints technology as formula (1):
α||w||1 (2)
Wherein:||w||1It is this black 1 norm of not Luo Beini of object vector w, is defined as:
In formula (3), α is rule factor, the fitting rate of control targe vector w.Sparsity constraints cause what is solved
Feature vector has more 0 value, and this arrangement increase operating flexibility of the module under big data environment.Finally, module handle
It is as follows that formula (1) and formula (2) are incorporated as engine default objects function:
5.3 hybrid intelligents solve:
Conventional method substantially can not solving complexity object function, shaped like formula (4), present embodiment is using the two-stage
Algorithm of hybrid intelligent solves object vector w.
First stage, gradient descent method carry out partial derivative equation solution to formula (1), ask inclined for the vectorial w of demand solution
Derivative is as follows:
Wherein:θ is iteration factor, for controlling gradient fall off rate.
Second stage, the Constraint Method processing sparse item of subsequent formula (2):
5.4 for iteration each time, and module is by the fresh target vector of generationNew vector is substituted into formula (4), is calculated
L is so as to the result of more new formula (4).The end condition of iteration is:
L′-L≤ε (9)
Wherein:ε is iteration threshold, usual ε=0.001.
If loss function meets above-mentioned end condition, then iterative process terminates.Vacation is if not satisfied, then return sub-step
5.3 gradient descent methods continue iteration, until meeting condition.
5.5 generate corresponding object vector w for each specific label t, module by solution formula (4).For
Destination service describes file WSDL, and engine generates Text eigenvector v using XML extraction techniques.The target generated for study
Vectorial w, module generate the probability of this wsdl document label label by the dot product of w and v, and sequence is used to learn to generate label
Tag Estimation result of the TOP-5 sequences as this service.
Online Tag Estimation algorithm engine is the core of the present invention.Under real conditions, engine is needed in face of numerous users
Real-time query request, this require algorithm must be reduced while precision of prediction is improved calculating time complexity.The present invention's
Algorithms T-cbmplexity essentially consists in formula (5).Mathematical proof, iteration time complexity is O (ρ d) each time:Wherein ρ is mark
Density is signed, d is constant, is the dimension in implicit features space.It can be seen that the time complexity and original tag of iteration each time
Density is linear.In general, original tag is all very sparse, therefore the time complexity of single iteration is very low.It is meanwhile real
Verify that the prediction algorithm of the bright present invention usually may conform to preset condition in 15 times or so iteration.In conclusion the present invention's is pre-
Method of determining and calculating can real-time response multi-user online service inquiry request.
Step 6:Responsible acquisition meets the service list of user's request, and is packaged into html page formattings, and pass through front end
Display engine gives user result presentation.
In order to quantify to show, the present invention is based between the Forecasting Methodology of the sparse study of label and conventional labels Forecasting Methodology
Quality, we carry out the accuracy of assessment prediction using the general F scores of search engine.It is first simple in order to preferably explain F scores
Define F scores:
Wherein:The Tag Estimation list accuracy of P representation modules generation, the list of labels of R representation modules generation are recalled
Degree, F1 scores weigh the predictive ability of module from accuracy and degree of recalling synthesis.
339 wsdl documents are contained in the data set that experiment uses and corresponding 4825 labels make training set, in addition
5120 wsdl documents make test set.During the test, we proportionally randomly select the label of training set.Test result
As shown in table 1:
Table 1
5% | 10% | 15% | 20% | |
LDA | 0.5123 | 0.6325 | 0.7060 | 0.6822 |
WTCluster | 0.7916 | 0.7311 | 0.6910 | 0.6310 |
The present invention | 0.8813 | 0.8794 | 0.8787 | 0.8784 |
With present method LDA (Latent Dirichlet Allocation) and WTCluster (Web Service
Tag Cluster) it compares, the F1 value highers of the method for the present invention, i.e. prediction result is more accurate.
It is understood that the above description of the embodiments is intended to facilitate those skilled in the art and using this hair
It is bright.Person skilled in the art obviously can easily make above-described embodiment various modifications, and described herein
General Principle is applied in other embodiment without having to go through creative labor.Therefore, the present invention is not limited to above-described embodiment,
Those skilled in the art's announcement according to the present invention, the improvement made for the present invention and modification all should be in the protections of the present invention
Within the scope of.
Claims (2)
1. a kind of Web service based on the sparse study of label finds method, include the following steps:
(1) wsdl document of each Web service and the service labels of handmarking in services set are collected;
(2) wsdl document and service labels of each Web service are pre-processed, wherein for wsdl document, then utilizes XML
The characteristic information of tool extraction wsdl document simultaneously establishes corresponding Text eigenvector;For service labels, then nature language is utilized
Common text-normalization chemical industry tool of increasing income carries out regularization to service labels in speech processing;
(3) for any label in tag library, by carrying out minimum solution to following object function L, in the hope of the label
Relative to the weight vectors w of services set;
Wherein:vdFor the Text eigenvector of d-th of Web service wsdl document in services set, D is all Web services in services set
Total number;If the label is by service labels of the handmarking for d-th of Web service, yd=1, otherwise yd=0;α is pre-
If rule factor, T is vectorial transposition;Minimum solution is specifically carried out to object function L using following iterative equation:
Wherein:wtAnd wt+1Respectively the t times iteration and the t+1 times iteration label relative to services set weight vectors,For wt
Weight vectors after gradient declines, wt+1(i) it is weight vectors wt+1In i-th of element value,For weight vectors
In i-th of element value, t is iterations, the dimension that i is natural number and 1≤i≤N, N are weight vectors w, θ for it is preset repeatedly
For the factor;
(4) for any label in tag library, make the weight vectors w of the label and Web service WSDL texts each in services set
The Text eigenvector of part carries out inner product operation, and correspondence obtains marking probability of the label relative to each Web service;
By setting probability threshold value, marking probability is extracted from services set and is more than the Web service of the probability threshold value, and makes the mark
Sign the prediction label as these Web services;
(5) target query for receiving user by service search engine is asked, if services set is less than certain amount scale, service is searched
Index is held up directly carries out string matching by target query request and the wsdl document information of Web service each in services set;If
Services set is more than certain amount scale, then service search engine is directly by target query request and each Web service in services set
Prediction label carry out string matching;The Web service matched is finally presented to user.
2. Web service according to claim 1 finds method, it is characterised in that:It finally will matching in the step (5)
On Web service be packaged into html page formattings, and then user is presented to by service search engine.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510466572.3A CN105138594B (en) | 2015-07-31 | 2015-07-31 | A kind of Web service based on the sparse study of label finds method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510466572.3A CN105138594B (en) | 2015-07-31 | 2015-07-31 | A kind of Web service based on the sparse study of label finds method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105138594A CN105138594A (en) | 2015-12-09 |
CN105138594B true CN105138594B (en) | 2018-06-19 |
Family
ID=54723943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510466572.3A Active CN105138594B (en) | 2015-07-31 | 2015-07-31 | A kind of Web service based on the sparse study of label finds method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105138594B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833561A (en) * | 2010-02-12 | 2010-09-15 | 西安电子科技大学 | Natural language processing oriented Web service intelligent agent |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6364718B1 (en) * | 2001-02-02 | 2002-04-02 | Molex Incorporated | Keying system for electrical connector assemblies |
-
2015
- 2015-07-31 CN CN201510466572.3A patent/CN105138594B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101833561A (en) * | 2010-02-12 | 2010-09-15 | 西安电子科技大学 | Natural language processing oriented Web service intelligent agent |
Non-Patent Citations (2)
Title |
---|
Accelerated Sparse Learning on Tag Annotation for Web Service Discovery;Wei Lo 等;《2015 IEEE International Conference on Web Services》;20150702;265-272 * |
基于二分图匹配的语义Web服务发现方法;邓水光 等;《计算机学报》;20080831;第31卷(第8期);1364-1375 * |
Also Published As
Publication number | Publication date |
---|---|
CN105138594A (en) | 2015-12-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708869B (en) | Processing method and device for man-machine conversation | |
WO2018218705A1 (en) | Method for recognizing network text named entity based on neural network probability disambiguation | |
CN111325029B (en) | Text similarity calculation method based on deep learning integrated model | |
WO2018218708A1 (en) | Deep-learning-based public opinion hotspot category classification method | |
CN112836509B (en) | Expert system knowledge base construction method and system | |
CN106709754A (en) | Power user grouping method based on text mining | |
CN112883193A (en) | Training method, device and equipment of text classification model and readable medium | |
CN113887643B (en) | New dialogue intention recognition method based on pseudo tag self-training and source domain retraining | |
CN113672718B (en) | Dialogue intention recognition method and system based on feature matching and field self-adaption | |
CN112052684A (en) | Named entity identification method, device, equipment and storage medium for power metering | |
CN108959474B (en) | Entity relation extraction method | |
WO2022048194A1 (en) | Method, apparatus and device for optimizing event subject identification model, and readable storage medium | |
CN112699685B (en) | Named entity recognition method based on label-guided word fusion | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN113360582B (en) | Relation classification method and system based on BERT model fusion multi-entity information | |
CN112328748A (en) | Method for identifying insurance configuration intention | |
CN118113849A (en) | Information consultation service system and method based on big data | |
CN116167379A (en) | Entity relation extraction method based on BERT and entity position information | |
CN112417132A (en) | New intention recognition method for screening negative samples by utilizing predicate guest information | |
CN115935998A (en) | Multi-feature financial field named entity identification method | |
TW202111569A (en) | Text classification method with high scalability and multi-tag and apparatus thereof also providing a method and a device for constructing topic classification templates | |
CN114169447B (en) | Event detection method based on self-attention convolution bidirectional gating cyclic unit network | |
CN111339258A (en) | University computer basic exercise recommendation method based on knowledge graph | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |