WO2020042332A1 - 一种基于词向量的事件驱动服务匹配方法 - Google Patents

一种基于词向量的事件驱动服务匹配方法 Download PDF

Info

Publication number
WO2020042332A1
WO2020042332A1 PCT/CN2018/113227 CN2018113227W WO2020042332A1 WO 2020042332 A1 WO2020042332 A1 WO 2020042332A1 CN 2018113227 W CN2018113227 W CN 2018113227W WO 2020042332 A1 WO2020042332 A1 WO 2020042332A1
Authority
WO
WIPO (PCT)
Prior art keywords
event
service
word
word vectors
frequency
Prior art date
Application number
PCT/CN2018/113227
Other languages
English (en)
French (fr)
Inventor
刘发贵
邓达成
Original Assignee
华南理工大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华南理工大学 filed Critical 华南理工大学
Priority to US17/266,979 priority Critical patent/US20210312133A1/en
Publication of WO2020042332A1 publication Critical patent/WO2020042332A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Definitions

  • the invention belongs to the field of event-driven service discovery in the semantic Internet of things, and particularly relates to an event-driven service matching method based on word vectors.
  • events reflect changes in the state of an observed object.
  • the key is to match the services available to respond based on the event.
  • Services in Semantic Internet of Things are the products of semantic description of Internet of Things services using Semantic Web technology.
  • the requester of the service is not an explicitly stated service requirement, but an event that occurs in the IoT environment.
  • the relationship between events and services is mainly constructed through manual selection, predefined rules, and other forms, so as to achieve the purpose of service matching.
  • these methods rely too much on prior knowledge.
  • the types and number of events and services increase, the accuracy and efficiency of service matching will face huge challenges. Therefore, automatic event-driven service matching through semantic technology has become an urgent problem.
  • semantic-based service matching similarity calculation between service and request can be used as an important basis for service matching.
  • a structured knowledge base or an unstructured corpus is usually used.
  • a corpus-based method can learn word vectors from a large number of corpora and perform service matching by calculating the similarity of the word vectors. This method is characterized by ensuring sufficient vocabulary coverage and low training costs for word vectors.
  • a continuous bag model (CBOW) model proposed by Mikolov et al.
  • This model models the training process of word vectors as a neural network, which is based on N-
  • the Gram model takes the context information of words in the corpus (n neighboring words before and after the word) as the input of the neural network, trains the word vector by maximizing the log likelihood of the word, and finally projects the implicit semantics of the word To low-dimensional, continuous vector space.
  • some researchers have proposed integrating the knowledge base into the training of word vectors, so that the trained word vectors carry more semantic information.
  • Lu et al. Proposed a Multiple Semantic Fusion (MSF) model. This model fuses semantic information into word vectors through different vector operations, and then uses the obtained word vectors to calculate the similarity between services and requests. This serves as the main basis for service matching.
  • MSF Multiple Semantic Fusion
  • the present invention proposes an event-driven service matching method based on word vectors, which differentiates high-frequency words and low-frequency words, proposes a mixed-word vector training algorithm, and processes high-frequency words.
  • a continuous bag model CBOW
  • SGM Semantic Generation Model
  • the joint processing stage low-frequency word vectors are used.
  • Cosine Similarity Retrofitting performs joint optimization on high-frequency word vectors and low-frequency word vectors to obtain high-quality word vectors; defines event discovery services and event processing services, and establishes event-driven service matching models
  • the word vector is used to calculate the matching degree of the service, solve the problem of automatic service matching, and improve the efficiency and accuracy of service matching.
  • the present invention is achieved through the following technical solutions.
  • An event-driven service matching method based on word vectors which includes two parts: using a hybrid word vector training algorithm to obtain high-quality word vectors and using an event-driven service matching model for event-driven service matching.
  • the method of using the mixed word vector training algorithm to obtain high-quality word vectors includes: classifying words into two types: high-frequency words and low-frequency words; using adjacent relations between words in the corpus and semantic relations between words in the dictionary; Training, low-frequency word processing, and joint processing in three stages to obtain word vectors;
  • the event-driven service matching model defines two types of event-related services: event recognition service and event processing service, and uses word vectors to calculate the degree of matching between services. When the degree of matching is higher than a given threshold, the service is successfully matched. .
  • a continuous bag model (CBOW) is used to train high-frequency word vectors according to the adjacent relations between words in the corpus.
  • CSR Cosine Similarity Retrofitting
  • Events are used as the output of the Event Recognition Service (ERS) and the input of the Event Handle Service (EHS), respectively, using description logic (Formalization represents the relationship between concepts and concepts) is expressed as hasOutput and hasInput.
  • Event is a concept representing an event
  • ERS is a concept representing an event recognition service
  • EHS is a concept representing an event processing service
  • hasOutput represents an output relationship
  • hasInput represents an input relationship.
  • E r and E h are events, which respectively represent the output of the event recognition service and the input of the event processing service, ⁇ represents the threshold, and Sim (E r , E h ) represents the matching degree of the service event recognition service and the event processing service. .
  • a represents a property of the event
  • attr (E r) E r represents the set of attributes
  • attribute a weight W is represented by a weight
  • E r represents an event of a similarity of attributes and attribute E h i, the cosine similarity obtained by calculating the vector corresponding to the attribute of the word, in particular,
  • the present invention has the following advantages and technical effects:
  • Figure 1 is a diagram of an event-driven service matching architecture based on word vectors
  • FIG. 2 is a diagram of a mixed word vector training algorithm
  • Figure 3 is a schematic diagram of the CSR model.
  • the event-driven service matching architecture proposed in this implementation case, as shown in Figure 1, includes two parts: mixed word vector training and service matching. First, considering the impact of word frequency, a high-quality word vector is trained from the corpus and dictionary through a hybrid word vector training algorithm. Then use the obtained word vector and use the event-driven service matching model to complete the automatic matching of services.
  • the mixed word vector training algorithm is shown in Figure 2.
  • the algorithm contains three stages: high-frequency word processing, low-frequency word processing, and joint processing.
  • CBOW is used to train to obtain high-frequency word vectors
  • SGM models are used to construct low-frequency word vectors
  • CSR models are used to combine high-frequency word vectors and low-frequency word vectors. Optimization to obtain the final word vector;
  • the adjacent relationship between words and words is obtained from the corpus and trained using the CBOW model.
  • the core idea is to use the level of joint probability of a group of words to determine the possibility that it conforms to the laws of natural language.
  • the goal of training is to maximize the probability of occurrence of all words in the corpus.
  • the objective function is a log-likelihood function expressed as follows:
  • step 2) Repeat step 2) until all high-frequency words in the corpus are trained to obtain word vectors of high-frequency words.
  • SGM Semantic Generation Model
  • n the number of categories of semantic relationships
  • ⁇ k the weight of each semantic relationship.
  • set ⁇ k 0.25, indicating that the relationships are equally important.
  • R k words consisting of, e (w i) represents the vector word word w i, e (w i) from the word-frequency vector obtained word processing stage.
  • the word vectors of high-frequency words and low-frequency words are jointly processed in order to incorporate the two types of semantic relationship information: ⁇ high, high>, ⁇ low, low> into the word vector.
  • the present invention proposes a Cosine Similarity Retrofitting (CSR) model to optimize word vectors.
  • CSR Cosine Similarity Retrofitting
  • w N ⁇ Represents the words in the vocabulary, the word vector corresponding to the word represents the vertex V, and the semantic relationship set of the words Represents an edge in a graph.
  • An example of a simple CSR model is shown in Figure 3.
  • v i respectively represent the initial word vector and the modified word vector of the word w i , and the solid line edges are a subset of E.
  • the purpose of the model is to make the modified word vector closer to its corresponding word vector, and the similarity relationship between word vectors with semantic relations is stronger.
  • the correlation formula that defines all words in the vocabulary is expressed as:
  • N is the number of words in the vocabulary
  • It represents a vector word word w i
  • v i represents a correction term vector word w i
  • v j represents the word w i and word w j adjacent word correction vector
  • the cosine similarity of CosSim (v i , v j ) represents the cosine similarity of the modified word vectors v i and v j .
  • the gradient optimal method is used to find the approximate optimal solution of the correlation formula.
  • the iterative steps are as follows:
  • the learning rate
  • 0.005
  • the modified word vector is obtained by iteration and used as the final word vector after joint processing.
  • an event is a special requestor of a service. Although the event information can indicate the status change of related objects, it cannot be directly expressed as a service request.
  • this article defines two types of services related to events: Event Recognition Service (ERS) and Event Handling Service (EHS).
  • ERS Event Recognition Service
  • EHS Event Handling Service
  • the event is used as the output attribute and input of ERS and EHS, respectively.
  • Input attribute, and proposes an event-driven semantic IoT service matching model.
  • OWL-S is used to describe the service. According to the description form of the description logic, the event identification service and event processing service are defined as follows:
  • Er and Eh respectively represent the output of ERS and the input of EHS
  • represents the threshold
  • Sim (E r , E h ) represents the matching degree of serving ERS and EHS.
  • the service matching degree Sim (E r , E h ) is expressed as:
  • Attr (E r ) represents the attribute set of Er (including time, location, object, etc.), and W a represents the weight of attribute a, specifically: Said Represents the similarity of E r in attributes a and E h , specifically,
  • the similarity between the attribute a of the event E r and the attribute i of the event E h can be obtained by calculating the cosine similarity of the word vector corresponding to the attribute.
  • the present invention fully considers the influence of the word frequency on the training result, and uses the CBOW model and the SGM model to obtain the word vectors of high-frequency words and low-frequency words, and then optimizes the word vectors through the CSR model.
  • the word vector can improve the quality of the word vector.
  • the present invention defines an event discovery service and an event processing service, establishes an event-driven service matching model, calculates the service matching degree through the word vector, solves the problem of automatic service matching, and improves service matching. Efficiency and accuracy. Establish an event-driven matching model to achieve automatic matching of services.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开一种基于词向量的事件驱动服务匹配方法,其包括:(1)混合词向量训练算法的实现和(2)事件驱动的服务匹配模型的实现。所述的混合词向量训练算法,考虑词频对于词向量训练的影响,利用语料库中词间的相邻关系和词典中词间的语义关系,通过高频词处理、低频词处理和联合处理三个阶段来训练得到词向量;事件驱动的服务匹配模型定义了两种与事件相关的服务:事件识别服务和事件处理服务,利用词向量计算两个服务的匹配度,当匹配度高于给定阈值时表示匹配成功。本发明能提高词向量的质量,并进一步提高服务匹配的准确率和效率。

Description

一种基于词向量的事件驱动服务匹配方法 技术领域
本发明属于语义物联网中事件驱动的服务发现领域,具体涉及基于词向量的事件驱动的服务匹配方法。
背景技术
在物联网环境中,事件反映了观测对象的状态变化。为了通过服务来快速地响应事件,关键在于根据事件来匹配到可供响应的服务。语义物联网中的服务则是利用语义网技术对物联网服务进行语义化描述的产物。与传统服务发现不同的是,服务的请求者不是明确表示的服务需求,而是物联网环境中发生的事件。目前,主要通过人工选择、预定义规则等形式来构建事件和服务的关联关系,从而达到服务匹配的目的。然而这些方式过于依赖先验知识,当事件和服务的种类和数量增多时,服务匹配的准确率和效率将面临巨大的挑战。因此,通过语义技术来自动地进行事件驱动的服务匹配已成为亟待解决的问题。
在基于语义的服务匹配中,服务和请求之间的相似度计算可以作为服务匹配的重要依据。在计算语义相似度时,通常会借助结构化知识库或非结构化语料库。基于语料库的方法可以从大量的语料库中学习词向量,并通过计算词向量的相似度来进行服务匹配,这类方法的特点是能够保证充分的词汇覆盖率,词向量的训练成本也较低。目前,在训练词向量的模型中,Mikolov等人提出的一种连续词袋模型(Continuous Bag of Words Model,CBOW)模型,该模型将词向量的训练过程建模为神经网络,它根据N-Gram模型将词在语料库中的上下文信息(词的前后n个相邻词)作为神经网络的输入,通过最大化该词的对数似然进行词向量的训练,最终将词汇的隐含语义投射到低维、连续的向量空间。为进一步提升词向量的质量,一些研究者提出将知识库融入到词向量的训练中,从而使训练的词向量携带更多的语义信息。Lu等人提出了多语义融合(Multiple Semantic Fusion,MSF)模型,该模型将语义信息通过不同的向量操作融合到词向量中,再利用得到的词向量来计算服务和请求的相似度,并以此作为服务匹配的主要依据。Faruqui等人提出了一种Retrofitting模型,它利用词典中存在的词间语义关系对已有的词向量进行二次训练,以达到往词向量注入语义信息的目的。然而,目前大多的词向量训练方法在训练过程中并未考虑词频对训练结果的影响,对所有的词进行同样的处理。因此,Wang等人指出在训练词向量时,相比于高频词,低频词可能因上下文信息 较少而导致训练效果不佳。
发明内容
为提高事件驱动服务匹配的效率和准确率,本发明提出基于词向量的事件驱动的服务匹配方法,对高频词和低频词进行差异化处理,提出混合词向量训练算法,在高频词处理阶段采用连续词袋模型(Continuous Bag of Words Model,CBOW)进行训练得到高频词向量,在低频词处理阶段利用语义生成模型(Semantic Generation Model,SGM)构造得到低频词向量,在联合处理阶段采用余弦相似度改装模型(Cosine Similarity Retrofitting,CSR)对高频词向量和低频词向量进行联合优化,以此来获取优质的词向量;定义事件发现服务和事件处理服务,建立事件驱动的服务匹配模型,通过词向量来计算服务的匹配度,解决服务自动化匹配问题,提升服务匹配的效率和准确率。
本发明通过如下技术方案实现。
一种基于词向量的事件驱动服务匹配方法,其包括利用混合词向量训练算法获取优质的词向量和利用事件驱动的服务匹配模型进行事件驱动服务匹配两部分;
所述利用混合词向量训练算法获取优质的词向量包括:将词分为高频词和低频词两类,利用语料库中词间的相邻关系和词典中词间的语义关系,通过高频词处理、低频词处理和联合处理三个阶段训练得到词向量;
所述的事件驱动的服务匹配模型,定义了事件识别服务和事件处理服务两类事件相关的服务,并利用词向量计算服务间的匹配度,当匹配度高于给定阈值则表示服务匹配成功。
进一步地,在高频词处理阶段,根据语料库中词间的相邻关系,采用连续词袋模型(Continuous Bag of Words Model,CBOW)进行训练得到高频词向量。
进一步地,在低频词处理阶段,根据词典中词间的语义关系和已得到的高频词向量,利用语义生成模型(Semantic Generation Model,SGM)构造得到低频词向量。
进一步地,在联合处理阶段,采用余弦相似度改装模型(Cosine Similarity Retrofitting,CSR)对高频词向量和低频词向量进行联合优化。
进一步地,所述的事件驱动的服务匹配模型中,把事件(Event)分别作为事件识别服务(Event Recognition Service,ERS)的输出和事件处理服务(Event Handle Service,EHS)的输入,利用描述逻辑(形式化表示概念与概念间的关系)表示为
Figure PCTCN2018113227-appb-000001
hasOutput和
Figure PCTCN2018113227-appb-000002
hasInput。其中,Event是表示事件的概念,ERS是表示事件识别服务的概念,EHS是表示事件处理服务的概念,hasOutput表示输出关系,hasInput表示输入关系。 给出服务匹配模型如下:
Figure PCTCN2018113227-appb-000003
其中,E r和E h均是事件,它们分别代表事件识别服务的输出和事件处理服务的输入,τ表示阈值,Sim(E r,E h)表示服务事件识别服务和事件处理服务的匹配度。
进一步地,所述的服务匹配度Sim(E r,E h)表示为:
Figure PCTCN2018113227-appb-000004
其中,a表示事件的某一属性,attr(E r)表示E r的属性集合,W a表示属性a的权重,具体为
Figure PCTCN2018113227-appb-000005
所述的
Figure PCTCN2018113227-appb-000006
表示E r在属性a与E h的相似度,具体为,
Figure PCTCN2018113227-appb-000007
其中,
Figure PCTCN2018113227-appb-000008
表示事件E r的属性a与E h的属性i的相似度,通过计算属性对应的词向量的余弦相似度来得到,具体为,
Figure PCTCN2018113227-appb-000009
其中,x,y分别表示
Figure PCTCN2018113227-appb-000010
Figure PCTCN2018113227-appb-000011
对应的词向量,||x||和||y||分别表示x和y的模。
与现有技术相比,本发明具有如下优点和技术效果:
本发明在词向量训练过程中,充分考虑了词频对训练结果的影响,分别利用CBOW模型和SGM模型来得到高频词和低频词的词向量,再通过CSR模型对词向量进行优化;借助得到的词向量,建立事件驱动的匹配模型,实现对服务的自动化匹配。本发明能提升词向量的质量,并进一步提高服务匹配的效率和准确率。
附图说明
图1为基于词向量的事件驱动服务匹配架构图;
图2为混合词向量训练算法图;
图3为CSR模型示意图。
具体实施方式
为了使本发明的技术方案及优点更加清楚明白,以下结合附图,进行进一步的详细说明,但本发明的实施和保护不限于此,需指出的是,以下若有未特别详细说明之过程,均是本领域技术人员可参照现有技术实现的。
1.事件驱动的服务匹配架构
本实施案例提出的事件驱动的服务匹配架构,如图1所示,包含两个部分:混合词向量训练和服务匹配。首先,考虑词频的影响,通过混合词向量训练算法从语料库和词典中训练得到优质的词向量。然后利用得到的词向量,借助事件驱动的服务匹配模型,完成服务的自动化匹配。
2.混合词向量训练算法
混合词向量训练算法如图2所示,该算法包含三个阶段:高频词处理,低频词处理和联合处理。在高频词处理阶段,采用CBOW进行训练得到高频词向量;在低频词处理阶段,利用SGM模型构造得到低频词向量;在联合处理阶段采用CSR模型对高频词向量和低频词向量进行联合优化,以获取最终的词向量;
2.1高频词处理
在高频词处理阶段,从语料库中得到词与词的相邻关系,利用CBOW模型进行训练。其核心思想是利用一组词的联合概率的高低来判断它符合自然语言规律的可能性。训练的目标是最大化语料库中的所有词的出现概率。对于词汇表中的词w t,目标函数为对数似然函数表示如下:
Figure PCTCN2018113227-appb-000012
其中w t是目标词,T为语料库中词的总量,
Figure PCTCN2018113227-appb-000013
表示词w t的上下文,c表示窗口大小(即w t前后c个词作为上下文),当c=5时,能较为充分地表示上下文信息,
Figure PCTCN2018113227-appb-000014
表示为公式:
Figure PCTCN2018113227-appb-000015
其中,
Figure PCTCN2018113227-appb-000016
和e(w)分别代表CBOW模型中词w的输入和输出词向量,N表示词汇表的总量。具体的训练步骤如下:
1)对于语料库中的每个高频词,对它们的词向量初始化,设置词向量的维度D=400,即已满足表示的需求,且计算量适中;
2)从语料库中提取任一高频词的上下文作为输入,通过反向传播算法来最大化对数似然函数,以此修正词向量;
3)重复步骤2),直至语料库中所有高频词均被训练,得到高频词的词向量。
2.2低频词处理阶段
在低频词处理阶段,利用词典中<高,低>频词的语义关系,以及高频词训练阶段得到的词向量,提出语义生成模型(Semantic Generation Model,SGM)来构造低频词的词向量,SGM如下所示:
Figure PCTCN2018113227-appb-000017
其中,n表示语义关系的类别数量,ω k表示为每个语义关系的权重,当考虑4种关系时,设置ω k=0.25,表示关系均同样重要,
Figure PCTCN2018113227-appb-000018
代表与低频词具有R k语义关系的所有高频词组成的集合,e(w i)表示词w i的词向量,e(w i)来自于高频词处理阶段得到的词向量。具体的处理步骤如下:
1)对于每个低频词w和任一语义关系R k,从词典中提取与词w具有关系R k的高频词来组成集合
Figure PCTCN2018113227-appb-000019
2)利用SGM模型构建w的词向量e(w)。
2.3联合处理阶段
在获得初始的高、低频词向量之后,仅利用了知识库中<高,低>频词之间的语义关系。为充分利用知识库对初始向量进行修正,对高频词和低频词的词向量进行联合处理,以便将<高,高>,<低,低>这两类语义关系信息融入到词向量中。本发明提出余弦相似度改装模型(Cosine Similarity Retrofitting,CSR)来优化词向量,该模型的核心思想是将词间关系隐射为一个图,令集合W={w 1,w 2,…w N}代表词汇表中的词,词对应的词向量代表顶点V,词的语义关系集
Figure PCTCN2018113227-appb-000020
表示图中的边。给出一个简单的CSR模型实例如图3所示,
Figure PCTCN2018113227-appb-000021
和v i分别代表词w i的初始词向量和修正词向量,实线边则是E的的子集。
模型的目的是为了让修正词向量和它所对应的词向量更为紧密,而且具有语义关系的词向量间的相似关系更强。在此,我们以余弦相似度来评估词间的关联强度,相似度越大则表示关联越紧密。定义词汇表中所有词的关联度公式表示为:
Figure PCTCN2018113227-appb-000022
其中,N表示词汇表的中词的数量,
Figure PCTCN2018113227-appb-000023
表示词w i的词向量,v i表示词w i的修正词向量,v j表示与词w i相邻的词w j的修正词向量,α和β表示两个种关联关系的权重,设置α=β=0.5,表示两种关系同样重要,
Figure PCTCN2018113227-appb-000024
表示修正词向量v i和词向量
Figure PCTCN2018113227-appb-000025
的余弦相似度,CosSim(v i,v j)表示修正词向量v i和v j的余弦相似度。
继而,通过梯度上升法来求关联度公式的近似最优解,迭代步骤如下:
1)通过对关联度公式中v i求偏导得到公式如下:
Figure PCTCN2018113227-appb-000026
其中,|v i|表示修正词向量v i的模,
Figure PCTCN2018113227-appb-000027
表示词向量
Figure PCTCN2018113227-appb-000028
的模,|v j|表示修正词向量v j的模。
2)根据v i的偏导得到迭代公式如下:
Figure PCTCN2018113227-appb-000029
其中,η表示学习率,可设置η=0.005。
3)以迭代次数T为终止条件,设置T=10,短时间内可达到较好的收敛效果,通过迭代 获得修正后的词向量,并将其作为联合处理后的最终词向量。
3事件驱动的服务匹配模型
在事件驱动的服务提供中,事件是服务的一种特殊请求者。虽然,事件的信息可以表示相关对象的状态变化,但是无法直接表示为对服务请求。为此,本文定义了两种关于事件的服务:事件识别服务(Event Recognition Service,ERS)和事件处理服务(Event Handling Service,EHS),将事件分别作为ERS和EHS的输出(Output)属性和输入(Input)属性,并提出了一种事件驱动的语义物联网服务匹配模型。在服务的描述方面,利用OWL-S来描述服务,根据描述逻辑的表示形式,事件识别服务和事件处理服务的定义如下:
Figure PCTCN2018113227-appb-000030
Figure PCTCN2018113227-appb-000031
继而,事件驱动的服务匹配模型如下:
Figure PCTCN2018113227-appb-000032
其中,E r和E h分别代表ERS的输出和EHS的输入,τ表示阈值,Sim(E r,E h)表示服务ERS和EHS的匹配度,当匹配度大于阈值则表示匹配成功。
所述的服务匹配度Sim(E r,E h)表示为:
Figure PCTCN2018113227-appb-000033
其中,attr(E r)表示E r的属性集合(包含时间、位置、对象等),W a表示属性a的权重,具体为
Figure PCTCN2018113227-appb-000034
所述的
Figure PCTCN2018113227-appb-000035
表示E r在属性a与E h的相似度,具体为,
Figure PCTCN2018113227-appb-000036
其中,
Figure PCTCN2018113227-appb-000037
表示事件E r的属性a与E h的属性i的相似度,可以通过计算属性对应的词向量的余弦相似度来得到,具体为,
Figure PCTCN2018113227-appb-000038
其中,x,y分别表示
Figure PCTCN2018113227-appb-000039
Figure PCTCN2018113227-appb-000040
对应的词向量。
本发明在词向量训练过程中,充分考虑了词频对训练结果的影响,分别利用CBOW模型和SGM模型来得到高频词和低频词的词向量,再通过CSR模型对词向量进行优化;借助得到的词向量,能提升词向量的质量;本发明定义事件发现服务和事件处理服务,建立事件驱动的服务匹配模型,通过词向量来计算服务的匹配度,解决服务自动化匹配问题,提升服务匹配的效率和准确率。建立事件驱动的匹配模型,实现对服务的自动化匹配。

Claims (6)

  1. 一种基于词向量的事件驱动服务匹配方法,其特征在于包括利用混合词向量训练算法获取优质的词向量和利用事件驱动的服务匹配模型进行事件驱动服务匹配两部分;
    所述利用混合词向量训练算法获取优质的词向量包括:将词分为高频词和低频词两类,利用语料库中词间的相邻关系和词典中词间的语义关系,通过高频词处理、低频词处理和联合处理三个阶段训练得到词向量;
    所述的事件驱动的服务匹配模型,定义了事件识别服务和事件处理服务两类事件相关的服务,并利用词向量计算服务间的匹配度,当匹配度高于给定阈值则表示服务匹配成功。
  2. 根据权利要求1所述的一种基于词向量的事件驱动服务匹配方法,其特征在于在高频词处理阶段,根据语料库中词间的相邻关系,采用连续词袋模型(Continuous Bag of Words Model,CBOW)进行训练得到高频词向量。
  3. 根据权利要求1所述的一种基于词向量的事件驱动服务匹配方法,其特征在于在低频词处理阶段,根据词典中词间的语义关系和已得到的高频词向量,利用语义生成模型(Semantic Generation Model,SGM)构造得到低频词向量。
  4. 根据权利要求1所述的一种基于词向量的事件驱动服务匹配方法,其特征在于在联合处理阶段,采用余弦相似度改装模型(Cosine Similarity Retrofitting,CSR)对高频词向量和低频词向量进行联合优化。
  5. 根据权利要求1所述的一种基于词向量的事件驱动服务匹配方法,其特征在于,所述的事件驱动的服务匹配模型中,把事件(Event)分别作为事件识别服务(Event Recognition Service,ERS)的输出和事件处理服务(Event Handle Service,EHS)的输入,利用描述逻辑表示为
    Figure PCTCN2018113227-appb-100001
    Figure PCTCN2018113227-appb-100002
    其中,Event是表示事件的概念,ERS是表示事件识别服务的概念,EHS是表示事件处理服务的概念,hasOutput表示输出关系,hasInput表示输入关系。给出服务匹配模型如下:
    Figure PCTCN2018113227-appb-100003
    其中,E r和E h均是事件,它们分别代表事件识别服务的输出和事件处理服务的输入,τ表示阈值,Sim(E r,E h)表示服务事件识别服务和事件处理服务的匹配度。
  6. 根据权利要求5所述的一种基于词向量的事件驱动服务匹配方法,其特征在于,所述的服务匹配度Sim(E r,E h)表示为:
    Figure PCTCN2018113227-appb-100004
    其中,a表示事件的某一属性,attr(E r)表示E r的属性集合,W a表示属性a的权重,具体为
    Figure PCTCN2018113227-appb-100005
    所述的
    Figure PCTCN2018113227-appb-100006
    表示E r在属性a与E h的相似度,具体为,
    Figure PCTCN2018113227-appb-100007
    其中,
    Figure PCTCN2018113227-appb-100008
    表示事件E r的属性a与E h的属性i的相似度,通过计算属性对应的词向量的余弦相似度来得到,具体为,
    Figure PCTCN2018113227-appb-100009
    其中,x,y分别表示
    Figure PCTCN2018113227-appb-100010
    Figure PCTCN2018113227-appb-100011
    对应的词向量,||x||和||y||分别表示x和y的模。
PCT/CN2018/113227 2018-08-31 2018-10-31 一种基于词向量的事件驱动服务匹配方法 WO2020042332A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/266,979 US20210312133A1 (en) 2018-08-31 2018-10-31 Word vector-based event-driven service matching method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811014545.2A CN109271497B (zh) 2018-08-31 2018-08-31 一种基于词向量的事件驱动服务匹配方法
CN201811014545.2 2018-08-31

Publications (1)

Publication Number Publication Date
WO2020042332A1 true WO2020042332A1 (zh) 2020-03-05

Family

ID=65154993

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113227 WO2020042332A1 (zh) 2018-08-31 2018-10-31 一种基于词向量的事件驱动服务匹配方法

Country Status (3)

Country Link
US (1) US20210312133A1 (zh)
CN (1) CN109271497B (zh)
WO (1) WO2020042332A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110377914B (zh) * 2019-07-25 2023-01-06 腾讯科技(深圳)有限公司 字符识别方法、装置及存储介质
CN110941698B (zh) * 2019-11-18 2022-09-27 陕西师范大学 一种基于bert下卷积神经网络的服务发现方法
US11275776B2 (en) 2020-06-11 2022-03-15 Capital One Services, Llc Section-linked document classifiers
US11941565B2 (en) 2020-06-11 2024-03-26 Capital One Services, Llc Citation and policy based document classification
CN111966797B (zh) * 2020-07-23 2023-04-07 天津大学 利用引入了语义信息的词向量进行机器阅读理解的方法
CN113095084B (zh) * 2021-03-16 2022-09-23 重庆邮电大学 一种物联网中语义服务匹配方法、装置及存储介质
CN115880120B (zh) * 2023-02-24 2023-05-16 江西微博科技有限公司 一种在线政务服务系统及服务方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491285A (zh) * 2016-06-11 2017-12-19 苹果公司 智能设备仲裁和控制
CN107562772A (zh) * 2017-07-03 2018-01-09 南京柯基数据科技有限公司 事件抽取方法、装置、系统和存储介质
US20180068371A1 (en) * 2016-09-08 2018-03-08 Adobe Systems Incorporated Learning Vector-Space Representations of Items for Recommendations using Word Embedding Models
CN108369574A (zh) * 2015-09-30 2018-08-03 苹果公司 智能设备识别

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7343280B2 (en) * 2003-07-01 2008-03-11 Microsoft Corporation Processing noisy data and determining word similarity
US20150046152A1 (en) * 2013-08-08 2015-02-12 Quryon, Inc. Determining concept blocks based on context
US20180357531A1 (en) * 2015-11-27 2018-12-13 Devanathan GIRIDHARI Method for Text Classification and Feature Selection Using Class Vectors and the System Thereof
CN108228554A (zh) * 2016-12-09 2018-06-29 富士通株式会社 基于语义表示模型来生成词向量的方法、装置和电子设备
KR20180077690A (ko) * 2016-12-29 2018-07-09 주식회사 엔씨소프트 문서의 내러티브 학습 장치 및 방법, 문서의 내러티브 생성 장치 및 방법
CN107451911A (zh) * 2017-07-19 2017-12-08 唐周屹 一种基于财务流水数据提供实时可视化信息的方法和系统
CN107908716A (zh) * 2017-11-10 2018-04-13 国网山东省电力公司电力科学研究院 基于词向量模型的95598工单文本挖掘方法和装置
CN110019471B (zh) * 2017-12-15 2024-03-08 微软技术许可有限责任公司 从结构化数据生成文本
CN108345585A (zh) * 2018-01-11 2018-07-31 浙江大学 一种基于深度学习的自动问答方法
US11080598B2 (en) * 2018-05-15 2021-08-03 Sap Se Automated question generation using semantics and deep learning
JP7173149B2 (ja) * 2018-08-30 2022-11-16 富士通株式会社 生成方法、生成プログラムおよび情報処理装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369574A (zh) * 2015-09-30 2018-08-03 苹果公司 智能设备识别
CN107491285A (zh) * 2016-06-11 2017-12-19 苹果公司 智能设备仲裁和控制
US20180068371A1 (en) * 2016-09-08 2018-03-08 Adobe Systems Incorporated Learning Vector-Space Representations of Items for Recommendations using Word Embedding Models
CN107562772A (zh) * 2017-07-03 2018-01-09 南京柯基数据科技有限公司 事件抽取方法、装置、系统和存储介质

Also Published As

Publication number Publication date
CN109271497B (zh) 2021-10-26
CN109271497A (zh) 2019-01-25
US20210312133A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
WO2020042332A1 (zh) 一种基于词向量的事件驱动服务匹配方法
CN112084790B (zh) 一种基于预训练卷积神经网络的关系抽取方法及系统
CN110033008B (zh) 一种基于模态变换与文本归纳的图像描述生成方法
CN109359302B (zh) 一种领域化词向量的优化方法及基于其的融合排序方法
CN111241294A (zh) 基于依赖解析和关键词的图卷积网络的关系抽取方法
US20140032207A1 (en) Information Classification Based on Product Recognition
CN113239131B (zh) 基于元学习的少样本知识图谱补全方法
CN111914555B (zh) 基于Transformer结构的自动化关系抽取系统
CN115409124B (zh) 基于微调原型网络的小样本敏感信息识别方法
CN113051368B (zh) 双塔模型训练方法、检索方法、装置及电子设备
CN114332519A (zh) 一种基于外部三元组和抽象关系的图像描述生成方法
CN109214444B (zh) 基于孪生神经网络和gmm的游戏防沉迷判定系统及方法
CN110705272A (zh) 一种面向汽车发动机故障诊断的命名实体识别方法
CN116152554A (zh) 基于知识引导的小样本图像识别系统
CN108052683A (zh) 一种基于余弦度量规则的知识图谱表示学习方法
CN113987203A (zh) 一种基于仿射变换与偏置建模的知识图谱推理方法与系统
CN112597979B (zh) 一种实时更新余弦夹角损失函数参数的人脸识别方法
CN115188440A (zh) 一种相似病历智能匹配方法
CN109189915B (zh) 一种基于深度相关匹配模型的信息检索方法
CN115630304A (zh) 一种文本抽取任务中的事件分割抽取方法及系统
CN115827890A (zh) 一种基于网络社交平台的热点事件知识图谱链接估计方法
CN110298545B (zh) 一种基于神经网络的专利评价方法、系统和介质
CN113963235A (zh) 一种跨类别图像识别模型重用方法和系统
CN117743694B (zh) 基于图节点特征增强的多层迁移学习跨域推荐方法及系统
CN109710843A (zh) 一种在大数量人才简历中提高搜索匹配度的方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931435

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15/06/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18931435

Country of ref document: EP

Kind code of ref document: A1