WO2020244023A1 - 基于知识图谱的信息汇聚方法、装置和设备 - Google Patents

基于知识图谱的信息汇聚方法、装置和设备 Download PDF

Info

Publication number
WO2020244023A1
WO2020244023A1 PCT/CN2019/095563 CN2019095563W WO2020244023A1 WO 2020244023 A1 WO2020244023 A1 WO 2020244023A1 CN 2019095563 W CN2019095563 W CN 2019095563W WO 2020244023 A1 WO2020244023 A1 WO 2020244023A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge graph
query
web service
information
result
Prior art date
Application number
PCT/CN2019/095563
Other languages
English (en)
French (fr)
Inventor
盛寅
莫海健
毛亿
刘岩
田云钢
Original Assignee
中国电子科技集团公司第二十八研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电子科技集团公司第二十八研究所 filed Critical 中国电子科技集团公司第二十八研究所
Priority to GB2013426.8A priority Critical patent/GB2589431A/en
Publication of WO2020244023A1 publication Critical patent/WO2020244023A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/972Access to data in other repository systems, e.g. legacy data or dynamic Web page generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Definitions

  • the invention belongs to the technical field of information search, and specifically relates to an information aggregation method, device and equipment based on a knowledge graph.
  • Knowledge Graph describes concepts, entities and their relationships in the objective world in a structured form, and expresses Internet information in a form closer to the human cognitive world, providing a better organization, management and understanding of the Internet Ability to massive information.
  • the knowledge graph has brought vitality to Internet semantic search, and at the same time has shown strong power in intelligent question and answer, and has become the infrastructure of Internet knowledge-driven intelligent applications. Together with big data and deep learning, knowledge graphs have become one of the core driving forces for the development of the Internet and artificial intelligence.
  • the present invention proposes an information aggregation method based on knowledge graphs, which can obtain information from knowledge graphs and Web services together, and use distributed data on the network to improve the query effect when knowledge graph information is incomplete .
  • Another object of the present invention is to provide an information aggregation device and computer equipment based on a knowledge graph.
  • an information aggregation method based on a knowledge graph which includes the following steps:
  • the adding Web service description information to the knowledge graph includes:
  • the attributes of the entity include the service ID, service name and WSDL address provided by the service publisher;
  • the information query based on the knowledge graph includes:
  • the obtaining the associated Web service information according to the input query sentence includes:
  • the fusion of the query result of the knowledge graph and the returned Web service query result includes:
  • the truth discovery algorithm is used to return the most reliable result.
  • the truth discovery algorithm calculates the voting value of the data source on the result by setting the weight of the index based on the reliability of all data sources and the number of requests for each result returned, and returns the result with the highest number of votes.
  • the method further includes: when the knowledge graph is inconsistent with the query result returned by the Web service, synchronously returning the most credible result to other data sources, so as to provide the data source administrator with a reference for modification.
  • an information aggregation device based on an information graph
  • the device comprising: a knowledge graph construction module, a query module, and an information fusion module, wherein the knowledge graph construction module is used for the knowledge graph Add Web service description information; the query module is used to query information based on the knowledge graph, and obtain related Web service information according to the input query sentence; the information fusion module is used to query the knowledge graph query results and returned Web services The results are fused.
  • adding Web service description information to the knowledge graph by the knowledge graph building module includes: creating a description entity for each Web service in the knowledge graph, and the attributes of the entity include service ID, service name and WSDL address; it is a Web service entity Increase the relationship with other entities to describe the data that the Web service can provide.
  • the device further includes an update module, which is used to update the knowledge graph when the knowledge graph is inconsistent with the query result returned by the web service, and the information provided by the web service is the latest information.
  • a computer device comprising:
  • One or more processors are One or more processors;
  • One or more programs wherein the one or more programs are stored in the memory and are configured to be executed by the one or more processors, and when the programs are executed by the processor, the implementation of the The steps described in one aspect.
  • the present invention adds Web service description information to the knowledge graph, and provides corresponding retrieval schemes for different data sources, and can provide both the query result of the knowledge graph and the query result based on the Web service when the user queries. Multiple data sources can get richer query results. At the same time, it also provides a data fusion scheme for the search results of different data sources, and a knowledge map data update scheme, so that the information query can be more accurate.
  • the method has good operability and scalability.
  • Fig. 1 is a flow chart of an information aggregation method based on a knowledge graph according to the present invention
  • Fig. 2 is a process diagram of a knowledge graph entity construction process according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of the construction result of a knowledge graph entity according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the construction result of the knowledge graph relationship according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a construction result of a knowledge graph Web service according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a process of information query and aggregation based on a knowledge graph according to an embodiment of the present invention
  • Fig. 7 is a structural block diagram of an information aggregation device based on a knowledge graph according to an embodiment of the present invention.
  • the method of information aggregation based on the knowledge graph includes the following steps:
  • Step S10 adding Web service description information to the knowledge graph.
  • the method of the present invention can add Web service description information to the existing knowledge graph, or it can first construct a local knowledge graph and then add Web service description information on this basis.
  • Existing knowledge graphs such as current representative large-scale network knowledge bases including DBpedia, Freebase, YAGO, etc.
  • the attribute value of each entity can be a simple type of numeric value/string, etc., or other entities.
  • the relationship name is generally related to the data types of other entities. Take the flight plan information in Table 2 as an example. If a flight plan includes departure airport: Beijing Capital International Airport, landing airport: Shanghai Hongqiao Airport, then the flight plan is linked to Beijing Capital International Airport entity through DepartFrom, and Shanghai Hongqiao Airport is linked through ArriveAt .
  • the construction statement based on neo4j is: MATCH(n:FlightPlan ⁇ ID:”MU564” ⁇ ),(m:Airport ⁇ ICAOID:”ZBAA” ⁇ )CREATE(n-[r:ArriveAt]->m)RETURN r.
  • the result of the creation is shown in Figure 4. Other weather, runway and other information contained in Figure 4 will not be listed in detail here in tabular form.
  • Web service publisher When a Web service publisher adds a Web service to the knowledge graph, it actually adds the description information of the Web service, not all the information that the Web can provide. When users need to query related data, they find a suitable Web service and send a request to it.
  • the method of adding Web service description information is: first create a description entity for each Web service in the knowledge graph, and the attributes of the entity include service ID, service name and WSDL address; then add the relationship between the Web service entity and other entities for Describe the data that the web service can provide.
  • the service entity is constructed based on neo4j, and the construction statement is: CREATE(n:WebService ⁇ ID:" ⁇ Metar", name:"North ChinaMetar", wsdl:"http://WebServiceURL/NorthChinaMetar?wsdl” ⁇ ).
  • step S20 information query is performed based on the knowledge graph, and related Web service information is obtained according to the input query sentence.
  • step S10 the process of information query and aggregation based on the knowledge graph constructed in step S10 is shown.
  • the word segmentation tool is first used to segment the user's query sentence.
  • word segmentation tools currently available, such as jieba, HanLP, etc., which can be selected according to specific business needs.
  • Table 3 shows the correspondence between the user query statement template and the knowledge graph query statement.
  • the user After the user enters the query statement, it is compared with the user query statement template to calculate the similarity.
  • Current word segmentation software generally directly supports sentence similarity calculation. Select the most similar sentence template, and replace the keywords in the corresponding knowledge graph query sentence with the keywords in the user query sentence. For example, the user query sentence is "What is the landing airport of MU5183?" According to this sentence, it is most similar to the flight landing airport template.
  • MU5183 conforms to the flight plan number.
  • the flight plan number can be realized by regular expression, and the first two letters and the last four digits can be regarded as the flight plan number.
  • the word segmentation result is "What is the landing airport of the flight plan MU5183". Replace FlightPlanNo in the knowledge graph query statement with MU5183, then you can construct the query statement MATCH(n:FlightPlan ⁇ ID:”MU5183” ⁇ )-[r:DepartFrom]->(m:Airport)RETURN m to get the result.
  • the web service discovery process is similar to the template matching method, finds the web service most similar to the user's query sentence, and returns the service information.
  • the most similar service is the flight data query service.
  • the user searches the MU5183 landing airport according to the template in Table 4 to match the most relevant service query sentence. According to Table 5, the best match is the flight departure and arrival airport query service.
  • the user calls the service according to the WSDL (Web Service Description Language) file automatically generated when the service is released.
  • WSDL contains the message, function and other elements of the service, and describes how the service is called.
  • Step S30 fusing the query result of the knowledge graph and the returned Web service query result.
  • the truth discovery algorithm calculates the data source's voting value on the results based on the reliability of all data sources (knowledge graphs, web services) and the number of requests, and returns the result with the highest number of votes.
  • the query result in the knowledge map is light rain
  • the airport weather query service result is moderate rain
  • the North China weather service query result is light rain.
  • the returned results fall into two categories: light rain and moderate rain. Vote for each data source of the two types of results.
  • Table 6 set the weights of 100 and 0.5 for the reliability and the number of requests respectively, and use the weighted sum to calculate the votes of each data source, which are 230, 175, and 128 respectively.
  • light rain was 358 and moderate rain was 175.
  • the credible result is light rain.
  • Step S40 update the knowledge graph.
  • the most reliable result is returned to each data source (knowledge graph or web service) based on the results of data fusion, and the data publisher can modify the data to provide modification suggestions.
  • an information aggregation device based on a knowledge graph which includes: a knowledge graph building module, a query module, an information fusion module, and an update module.
  • the knowledge graph building module is used to add Web service description information to the knowledge graph;
  • the query module is used to query information based on the knowledge graph and obtain related Web service information according to the input query statement;
  • the information fusion module is used to compare the knowledge graph The query result and the returned Web service query result are merged;
  • the update module is used to update the knowledge graph.
  • the knowledge graph building module can add Web service description information to the existing knowledge graph.
  • the method of adding Web service description information is: first create a description entity for each Web service in the knowledge graph, and the attributes of the entity include service ID, service name and WSDL address; then add the relationship between the Web service entity and other entities for Describe the data that the web service can provide.
  • the knowledge graph building module can construct a local knowledge graph and then add web service description information on this basis.
  • knowledge graphs in the air traffic management field as an example.
  • air traffic management information such as flight plans, airport information, geographic information, airlines, and weather information.
  • These structured data can be added to the knowledge graph as entities.
  • the attribute value of each entity can be a simple type of numeric value/string, etc., or other entities.
  • simple types of attributes they are directly used as attributes of the entity itself when the entity is created.
  • the attribute value of the entity is other entity, the relationship between the entities needs to be constructed.
  • the relationship name is generally related to the data types of other entities. For a specific creation example, reference may be made to the description in the foregoing method embodiment, which will not be repeated here.
  • the query module uses the word segmentation tool to segment the user's query sentence, adds the necessary data type description to the word segmentation result, and constructs the knowledge graph query sentence based on the result of adding the description, and queries related information in the knowledge graph.
  • word segmentation tool uses the word segmentation tool to segment the user's query sentence, adds the necessary data type description to the word segmentation result, and constructs the knowledge graph query sentence based on the result of adding the description, and queries related information in the knowledge graph.
  • template matching After the user enters the query statement, it is compared with the user query statement template to calculate the similarity. Current word segmentation software generally directly supports sentence similarity calculation.
  • the query module will select the most similar sentence template, and replace the keywords in the corresponding knowledge graph query sentence with the keywords in the user query sentence.
  • the web service discovery process is similar to the template matching method.
  • the query module finds the web service most similar to the user's query statement and returns the service information.
  • the user calls the service according to the wsdl address.
  • the fusion method of the information fusion module is as follows: if only the knowledge graph or web service returns query results, there is only one final query result, and no fusion is required; if the knowledge graph and the web service return query results are consistent, no data conflict will occur, and no need Perform fusion; if they are inconsistent, use the truth discovery algorithm to return the most reliable result. According to the returned results, the truth discovery algorithm calculates the data source's voting value on the results based on the reliability of all data sources (knowledge graphs, web services) and the number of requests, and returns the result with the highest number of votes.
  • the update module When the knowledge graph is inconsistent with the query result returned by the Web service, the update module returns the most credible result to each data source based on the result of data fusion, and provides modification suggestions for the data publisher to modify the data.
  • a computer device includes: one or more processors; a memory; and one or more programs, wherein the one One or more programs are stored in the memory and configured to be executed by the one or more processors, and when the programs are executed by the processor, each step in the method embodiment is implemented.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于知识图谱的信息汇聚方法、装置和设备,所述方法包括:在知识图谱中加入Web服务描述信息(S10);基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息(S20);对知识图谱查询结果和返回的Web服务查询结果进行融合(S30)。所述方法通过在知识图谱中加入Web服务描述信息,并面向不同的数据源提供了相应的检索方案,在用户查询时能够同时提供知识图谱的查询结果和基于Web服务的查询结果,根据多样的数据来源能够获得更丰富的查询结果。同时还提供了对不同数据源的检索结果进行数据融合的方案,以及知识图谱数据更新方案,使得信息的查询可以更精确,具有良好的可操作性和可扩展性。

Description

基于知识图谱的信息汇聚方法、装置和设备 技术领域
本发明属于信息搜索技术领域,具体涉及一种基于知识图谱的信息汇聚方法、装置和设备。
背景技术
知识图谱(Knowledge Graph)以结构化的形式描述客观世界中概念、实体及其关系,将互联网的信息表达成更接近人类认知世界的形式,提供了一种更好地组织、管理和理解互联网海量信息的能力。知识图谱给互联网语义搜索带来了活力,同时也在智能问答中显示出强大威力,已经成为互联网知识驱动的智能应用的基础设施。知识图谱与大数据和深度学习一起,成为推动互联网和人工智能发展的核心驱动力之一。
然而,当前基于知识图谱的搜索过于依赖于知识图谱的信息完整程度,当知识图谱中某些信息缺失时或者更新不及时,会影响搜索效果。此外,当前很多信息在网络上采用分布式存储,很难完全存放到知识图谱中。因此,如何在知识图谱信息不完整时提升查询的满意度具有重要意义。同时,如何利用网络上分布式数据也是目前知识发现领域亟待解决的问题。
发明内容
发明目的:针对现有技术的问题,本发明提出一种基于知识图谱的信息汇聚方法,能够从知识图谱及Web服务共同获取信息,利用网络上分布式数据提升知识图谱信息不完整时的查询效果。
本发明的另一目的在于提供一种基于知识图谱的信息汇聚装置和计算机设备。
技术方案:根据本发明的第一方面,提供一种基于知识图谱的信息汇聚方法,包括以下步骤:
在知识图谱中加入Web服务描述信息;
基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息;
对知识图谱查询结果和返回的Web服务查询结果进行融合。
进一步地,所述在知识图谱中加入Web服务描述信息包括:
在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务发布者提供的服务ID、服务名称和WSDL地址;
为Web服务实体增加与其它实体的关系,用于描述Web服务能提供的数据。
进一步地,所述基于知识图谱进行信息查询包括:
利用分词工具对用户查询的语句进行分词;
在分词结果中增加类型描述,并根据增加描述后的结果构建知识图谱查询语句,在知识图谱中查询相关的信息。
进一步地,所述根据输入的查询语句获取关联的Web服务信息包括:
根据分词结果计算输入的查询语句与Web服务描述的相似度;
按Web服务的相似度排名,返回若干个Web服务查询结果。
进一步地,所述对知识图谱查询结果和返回的Web服务查询结果进行融合包括:
若仅有知识图谱或仅有一个Web服务返回查询结果,则最终查询结果不需要数据融合;
若知识图谱与Web服务返回查询结果不一致,则采用真值发现算法返回可信度最高的结果。
所述真值发现算法针对各返回的结果,依据所有数据源的可靠性、被请求次数指标,通过对指标设定权重,计算数据源对结果的投票值,并将得票数最高的结果返回。
进一步地,所述方法还包括:当知识图谱与Web服务返回查询结果不一致时,将最可信的结果同步返回给其它数据源,为数据源的管理者提供修改的参考。
根据本发明的第二方面,提供一种基于信息图谱的信息汇聚装置,所述装置包括:知识图谱构建模块、查询模块、信息融合模块,其中,所述知识图谱构建模块用于在知识图谱中加入Web服务描述信息;所述查询模块用于基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息;所述信息融合模块用于对知识图谱查询结果和返回的Web服务查询结果进行融合。
进一步地,所述知识图谱构建模块在知识图谱中加入Web服务描述信息包括:在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务ID、服务名称和WSDL地址;为Web服务实体增加与其它实体的关系,用于描述Web服务能提供的数据。
进一步地,所述装置还包括更新模块,用于当知识图谱与Web服务返回查询结果不一致,且Web服务提供的信息为最新信息时,对知识图谱进行更新。
根据本发明的第三方面,提供一种计算机设备,所述设备包括:
一个或多个处理器;
存储器;以及
一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被配置为由所述一个或多个处理器执行,所述程序被处理器执行时实现如本发明第一方面所述的步骤。
有益效果:本发明通过在知识图谱中加入Web服务描述信息,并面向不同的数据源提供了相应的检索方案,在用户查询时能够同时提供知识图谱的查询结果和基于Web服务的查询结果,根据多样的数据来源能够获得更丰富的查询结果。同时还提供了对不同数据源的检索结果提供数据融合方案,以及知识图谱数据更新方案,使得信息的查询可以更精确。方法具有良好的可操作性和可扩展性。
附图说明
图1是根据本发明的基于知识图谱的信息汇聚方法流程图;
图2是根据本发明实施例的知识图谱实体构建过程图;
图3是根据本发明实施例的知识图谱实体构建结果示意图;
图4是根据本发明实施例的知识图谱关系构建结果示意图;
图5是根据本发明实施例的知识图谱Web服务构建结果示意图;
图6是根据本发明实施例的基于知识图谱进行信息查询与汇聚的过程示意图;
图7是根据本发明实施例的基于知识图谱的信息汇聚装置结构框图。
具体实施方式
下面结合附图对本发明的技术方案作进一步说明。应当了解,以下提供的实施例仅是为了详尽地且完全地公开本发明,并且向所属技术领域的技术人员充分传达本发明的技术构思,本发明还可以用许多不同的形式来实施,并且不局限于此处描述的实施例。对于表示在附图中的示例性实施方式中的术语并不是对本发明的限定。
参照图1,在一个实施例中,基于知识图谱的信息汇聚方法,包括以下步骤:
步骤S10,在知识图谱中加入Web服务描述信息。
本发明的方法可以在现有的知识图谱中加入Web服务描述信息,也可以先构建一个本地知识图谱然后在此基础上添加Web服务描述信息。现有的知识图谱例如当前具有代表性的大规模网络知识库包括DBpedia,Freebase,YAGO等,也可以是用户自行构建的知识图谱。参照图2,在一个实施例中,以空中交通管理领域知识图谱的构建为例,空管信息中有大量结构化数据,例如飞行计划、机场信息、地理信息、航空公司、 气象信息等。这些结构化数据可作为实体加入知识图谱。每个实体的属性值可以是简单类型的数值/字符串等,也可以是其它实体。
对于简单类型的属性,在创建实体时直接作为实体自身的属性。以表1中机场信息为例,基于neo4j创建首都国际机场实体的方法:CREATE(n:Airport{ICAOID:”ZBAA”,IATAID:”PEK”,name:”北京首都国际机场”})。其它实体可按照类似的方法创建。由于neo4j提供了JAVA接口,上述过程可通过程序自动化执行。创建完成后的实体如图3所示。
表1机场信息
机场名 IATA代码 ICAO代码
首都国际机场 PEK ZBAA
浦东国际机场 PVG ZSPD
……    
对于实体的属性值是其它实体的情况,则需要构建实体间的关系。关系名称一般跟其它实体的数据类型相关。以表2的飞行计划信息为例,某一飞行计划包含起飞机场:北京首都国际机场,降落机场:上海虹桥机场,则该飞行计划通过DepartFrom关联北京首都国际机场实体,通过ArriveAt关联上海虹桥机场。基于neo4j的构建语句为:MATCH(n:FlightPlan{ID:”MU564”}),(m:Airport{ICAOID:”ZBAA”})CREATE(n-[r:ArriveAt]->m)RETURN r。创建结果如图4所示。图4中包含的其它气象、跑道等信息此处不再以表格形式详细列举。
表2飞行计划信息
Figure PCTCN2019095563-appb-000001
Web服务发布者将Web服务加入知识图谱时,实际上是加入Web服务的描述信息,并不是Web所有能提供的信息。用户在需要查询相关的数据时,找到合适的Web服务并向其发送请求。加入Web服务描述信息的方法为:首先在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务ID、服务名称和WSDL地址;然后为Web服务 实体增加与其它实体的关系,用于描述Web服务能提供的数据。在实施例中,基于neo4j构建服务实体,构建语句为:CREATE(n:WebService{ID:”华北Metar”,name:”华北气象查询服务”,wsdl:”http://WebServiceURL/NorthChinaMetar?wsdl”})。
Web服务实体创建完成后,需要加入与其它实体的关系,用于支持更精确的服务发现。Web服务与其它实体的关系名一般用hasDescription。对于华北气象服务,可创建描述MATCH(n:WebService{ID:”华北Metar”}),(m:Metar{ID:”气象信息”})CREATE(n-[r:hasDescription]->m)RETURN r。由于涉及到的Web服务主要为数据服务,不是计算服务,因此不需要描述计算功能。Web服务增加描述信息后如图5所示。
步骤S20,基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息。
参照图6,示出了基于步骤S10构建的知识图谱进行信息查询与汇聚的过程。当用户提交查询请求时,首先利用分词工具对用户查询的语句进行分词。当前可用的分词工具有多种,如jieba,HanLP等,可根据具体的业务需求选择。
分词完成后,在分词结果中增加必要的数据类型描述,并根据增加描述后的结果构建知识图谱查询语句。查询语句的构建方式也有多种,较为容易实现的是基于模板匹配的方法。表3为用户查询语句模板与知识图谱查询语句之间的对应关系。当用户输入查询语句后,将其与用户查询语句模板对比,计算相似度。当前分词软件一般可直接支持语句相似度计算。选择最相似的语句模板,将对应的知识图谱查询语句中的关键词替换为用户查询语句中的关键词。例如,用户查询语句为“MU5183的降落机场是什么”,根据该句与航班降落机场模板最相似,根据分词结果,MU5183符合飞行计划编号。这里飞行计划编号可用正则表达式实现,满足前两位字母后4位数字则可认为是飞行计划号。分词结果为“飞行计划MU5183的降落机场是什么”。将知识图谱查询语句中FlightPlanNo替换为MU5183,则可构建查询语句MATCH(n:FlightPlan{ID:”MU5183”})-[r:DepartFrom]->(m:Airport)RETURN m获得结果。
表3知识图谱查询语句
Figure PCTCN2019095563-appb-000002
Figure PCTCN2019095563-appb-000003
Web服务发现过程与模板匹配方法相似,找出与用户查询语句最相似的Web服务,并返回服务信息。当用户搜索“MU5183的降落机场是什么”时,最相似的服务为飞行数据查询服务,用户依据表4的模板匹配最相关的服务查询语句查询MU5183的降落机场。根据表5,最匹配的是航班起飞降落机场查询服务。用户根据该服务发布时自动生成的WSDL(Web Service Description Language)文件调用服务。WSDL包含服务的消息、功能等要素,描述了服务如何被调用。
表4 Web服务查询语句
Figure PCTCN2019095563-appb-000004
表5Web服务列表
Figure PCTCN2019095563-appb-000005
步骤S30,对知识图谱查询结果和返回的Web服务查询结果进行融合。
若仅有知识图谱或一个Web服务返回查询结果,则最终查询结果仅有一个,此时 不需要进行数据融合;若知识图谱与所有Web服务返回查询结果一致,也不产生数据冲突,亦无需进行融合;若不一致,则采用真值发现算法返回可信度最高的结果。真值发现算法针对各返回的结果,依据所有数据源(知识图谱、各Web服务)的可靠性、被请求次数等指标,计算数据源对结果的投票值,并将得票数最高的结果返回。
以查询北京首都国际机场第二天的天气为例,知识图谱中查询结果为小雨,机场天气查询服务结果为中雨,华北气象服务查询结果为小雨。返回的结果为两类:小雨和中雨。针对两类结果各数据源进行投票。
依据表6,并对可靠性和被请求次数分别设置权重100和0.5,采用加权和计算各数据源票数,分别为230,175和128。最终小雨为358,中雨为175。可信的结果为小雨。
表6数据源的可靠性与被请求次数
数据源 可靠性 被请求次数
知识图谱 80% 300
机场天气查询服务 85% 180
华北气象服务 78% 100
…… ……  
步骤S40,更新知识图谱。
当知识图谱与Web服务返回查询结果不一致时,依据数据融合的结果,向各数据源(知识图谱或Web服务)返回最可信的结果,为数据的发布者修改数据提供修改建议。
参照图7,在另一个实施例中,提供一种基于知识图谱的信息汇聚装置,包括:知识图谱构建模块、查询模块、信息融合模块以及更新模块。其中,知识图谱构建模块用于在知识图谱中加入Web服务描述信息;查询模块用于基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息;信息融合模块用于对知识图谱查询结果和返回的Web服务查询结果进行融合;更新模块用于对知识图谱进行更新。
知识图谱构建模块可以在现有的知识图谱中加入Web服务描述信息。加入Web服务描述信息的方法为:首先在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务ID、服务名称和WSDL地址;然后为Web服务实体增加与其它实体的关系,用于描述Web服务能提供的数据。
另选地或可选地,知识图谱构建模块可以构建一个本地知识图谱然后在此基础上添 加Web服务描述信息。以空中交通管理领域知识图谱的构建为例,空管信息中有大量结构化数据,例如飞行计划、机场信息、地理信息、航空公司、气象信息等。这些结构化数据可作为实体加入知识图谱。每个实体的属性值可以是简单类型的数值/字符串等,也可以是其它实体。对于简单类型的属性,在创建实体时直接作为实体自身的属性。对于实体的属性值是其它实体的情况,则需要构建实体间的关系。关系名称一般跟其它实体的数据类型相关。具体的创建实例可以参照上述方法实施例中的描述,此处不再赘述。
查询模块利用分词工具对用户查询的语句进行分词,在分词结果中增加必要的数据类型描述,并根据增加描述后的结果构建知识图谱查询语句,在知识图谱中查询相关信息。查询语句的构建方式也有多种,较为容易实现的是基于模板匹配的方法。当用户输入查询语句后,将其与用户查询语句模板对比,计算相似度。当前分词软件一般可直接支持语句相似度计算。查询模块会选择最相似的语句模板,将对应的知识图谱查询语句中的关键词替换为用户查询语句中的关键词。Web服务发现过程与模板匹配方法相似,查询模块找出与用户查询语句最相似的Web服务,并返回服务信息。用户根据wsdl地址调用服务。
信息融合模块的融合方式如下:若仅有知识图谱或Web服务返回查询结果,则最终查询结果仅有一个,无需融合;若知识图谱与Web服务返回查询结果一致,也不产生数据冲突,亦无需进行融合;若不一致,则采用真值发现算法返回可信度最高的结果。真值发现算法针对各返回的结果,依据所有数据源(知识图谱、各Web服务)的可靠性、被请求次数等指标,计算数据源对结果的投票值,并将得票数最高的结果返回。
当知识图谱与Web服务返回查询结果不一致时,更新模块依据数据融合的结果,向各数据源返回最可信的结果,为数据的发布者修改数据提供修改建议。
基于与方法实施例相同的技术构思,根据本发明的另一实施例,提供一种计算机设备,所述设备包括:一个或多个处理器;存储器;以及一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被配置为由所述一个或多个处理器执行,所述程序被处理器执行时实现方法实施例中的各步骤。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的 计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
最后应当说明的是:以上实施例仅用以说明本发明的技术方案而非对其限制,尽管参照上述实施例对本发明进行了详细的说明,所属领域的普通技术人员应当理解:依然可以对本发明的具体实施方式进行修改或者等同替换,而未脱离本发明精神和范围的任何修改或者等同替换,其均应涵盖在本发明的权利要求保护范围之内。

Claims (11)

  1. 一种基于知识图谱的信息汇聚方法,其特征在于,所述方法包括以下步骤:
    在知识图谱中加入Web服务描述信息;
    基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息;
    对知识图谱查询结果和返回的Web服务查询结果进行融合。
  2. 根据权利要求1所述的基于知识图谱的信息汇聚方法,其特征在于,所述在知识图谱中加入Web服务描述信息包括:
    在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务发布者提供的服务ID、服务名称和WSDL地址;
    为Web服务实体增加与其它实体的关系,用于描述Web服务能提供的数据。
  3. 根据权利要求1所述的基于知识图谱的信息汇聚方法,其特征在于,所述基于知识图谱进行信息查询包括:
    利用分词工具对用户查询的语句进行分词;
    在分词结果中增加类型描述,并根据增加描述后的结果构建知识图谱查询语句,在知识图谱中查询相关的信息。
  4. 根据权利要求3所述的基于知识图谱的信息汇聚方法,其特征在于,所述根据输入的查询语句获取关联的Web服务信息包括:
    根据分词结果计算输入的查询语句与Web服务描述的相似度;
    按Web服务的相似度排名,返回若干个Web服务查询结果。
  5. 根据权利要求1所述的基于知识图谱的信息汇聚方法,其特征在于,所述对知识图谱查询结果和返回的Web服务查询结果进行融合包括:
    若仅有知识图谱或仅有一个Web服务返回查询结果,则不进行数据融合;
    若知识图谱与Web服务返回查询结果不一致,则采用真值发现算法返回可信度最高的结果。
  6. 根据权利要求5所述的基于知识图谱的信息汇聚方法,其特征在于,所述真值发现算法针对各返回的结果,依据所有数据源的可靠性、被请求次数指标,通过对指标设定权重,计算数据源对结果的投票值,并将得票数最高的结果返回。
  7. 根据权利要求1所述的基于知识图谱的信息汇聚方法,其特征在于,所述方法还包括:当知识图谱与Web服务返回查询结果不一致时,将最可信的结果同步返回给其它数据源,为数据源的管理者提供修改的参考。
  8. 一种基于知识图谱的信息汇聚装置,其特征在于,所述装置包括:知识图谱构建模块、查询模块、信息融合模块,其中,所述知识图谱构建模块用于在知识图谱中加入Web服务描述信息;所述查询模块用于基于知识图谱进行信息查询,并根据输入的查询语句获取关联的Web服务信息;所述信息融合模块用于对知识图谱查询结果和返回的Web服务查询结果进行融合。
  9. 根据权利要求8所述的基于知识图谱的信息汇聚装置,其特征在于,所述知识图谱构建模块在知识图谱中加入Web服务描述信息包括:在知识图谱中为每个Web服务创建描述实体,实体的属性包括服务ID、服务名称和WSDL地址;为Web服务实体增加与其它实体的关系,用于描述Web服务能提供的数据。
  10. 根据权利要求8所述的基于知识图谱的信息汇聚装置,其特征在于,所述装置还包括更新模块,用于当知识图谱与Web服务返回查询结果不一致时,将最可信的结果同步返回给其它数据源,为数据源的管理者提供修改的参考。
  11. 一种计算机设备,其特征在于,所述设备包括:
    一个或多个处理器;
    存储器;以及
    一个或多个程序,其中所述一个或多个程序被存储在所述存储器中,并且被配置为由所述一个或多个处理器执行,所述程序被处理器执行时实现如权利要求1-7中的任一项所述的步骤。
PCT/CN2019/095563 2019-06-06 2019-07-11 基于知识图谱的信息汇聚方法、装置和设备 WO2020244023A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB2013426.8A GB2589431A (en) 2019-06-06 2019-07-11 Information Aggregation Method and Apparatus Based on Knowledge Graph and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910491557.2 2019-06-06
CN201910491557.2A CN110222127B (zh) 2019-06-06 2019-06-06 基于知识图谱的信息汇聚方法、装置和设备

Publications (1)

Publication Number Publication Date
WO2020244023A1 true WO2020244023A1 (zh) 2020-12-10

Family

ID=67815925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/095563 WO2020244023A1 (zh) 2019-06-06 2019-07-11 基于知识图谱的信息汇聚方法、装置和设备

Country Status (3)

Country Link
CN (1) CN110222127B (zh)
GB (1) GB2589431A (zh)
WO (1) WO2020244023A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757828A (zh) * 2022-11-16 2023-03-07 南京航空航天大学 一种基于辐射源知识图谱的空中目标意图识别方法

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825886A (zh) * 2019-11-14 2020-02-21 北京京航计算通讯研究所 知识图谱融合系统
CN110825887A (zh) * 2019-11-14 2020-02-21 北京京航计算通讯研究所 知识图谱融合方法
CN111177400B (zh) * 2019-12-05 2023-07-25 国网能源研究院有限公司 基于知识图谱的设备、业务及数据的关联显示方法和装置
CN111125372A (zh) * 2019-12-12 2020-05-08 中汇信息技术(上海)有限公司 文本信息发布方法、装置、可读存储介质和电子设备
CN113127494B (zh) * 2019-12-30 2022-10-11 海信集团有限公司 一种知识图谱的更新方法及装置
CN111274410A (zh) * 2020-01-21 2020-06-12 北京明略软件系统有限公司 一种数据存储方法、装置及数据查询方法、装置
CN112241424A (zh) * 2020-10-16 2021-01-19 中国民用航空华东地区空中交通管理局 一种基于知识图谱的空管设备应用系统及方法
CN112818071A (zh) * 2021-02-09 2021-05-18 青岛海信网络科技股份有限公司 一种基于统一路网的交管领域知识图谱构建方法及装置
CN113140134B (zh) * 2021-03-12 2022-07-08 北京航空航天大学 一种面向智慧空管系统的航班延误智能预测框架
CN117907242A (zh) * 2024-03-15 2024-04-19 贵州省第一测绘院(贵州省北斗导航位置服务中心) 基于动态遥感技术的国土测绘方法、系统及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345647A (zh) * 2018-01-18 2018-07-31 北京邮电大学 基于Web的领域知识图谱构建系统及方法
CN109410650A (zh) * 2018-10-10 2019-03-01 中国电子科技集团公司第二十八研究所 面向全系统信息管理的基于情景与语义的信息聚合方法
CN109635272A (zh) * 2018-10-24 2019-04-16 中国电子科技集团公司第二十八研究所 一种空中交通管理领域的本体交互模型构建方法
US20190155831A1 (en) * 2006-11-13 2019-05-23 Ip Reservoir, Llc Method and System for High Performance Integration, Processing and Searching of Structured and Unstructured Data

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521288A (zh) * 2011-11-29 2012-06-27 北京北大软件工程发展有限公司 一种互联网Web服务信息获取方法
WO2018000277A1 (zh) * 2016-06-29 2018-01-04 深圳狗尾草智能科技有限公司 一种问答方法、系统和机器人
US11004131B2 (en) * 2016-10-16 2021-05-11 Ebay Inc. Intelligent online personal assistant with multi-turn dialog based on visual search
CN106815293A (zh) * 2016-12-08 2017-06-09 中国电子科技集团公司第三十二研究所 一种面向情报分析的构建知识图谱的系统及方法
CN106649878A (zh) * 2017-01-07 2017-05-10 陈翔宇 基于人工智能的物联网实体搜索方法及系统
CN107633093A (zh) * 2017-10-10 2018-01-26 南通大学 一种供电决策知识图谱的构建及其查询方法
CN108920716B (zh) * 2018-07-27 2022-11-25 中国电子科技集团公司第二十八研究所 基于知识图谱的数据检索与可视化系统及方法
CN109447713A (zh) * 2018-10-31 2019-03-08 国家电网公司 一种基于知识图谱的推荐方法及装置
CN109408627B (zh) * 2018-11-15 2021-03-02 众安信息技术服务有限公司 一种融合卷积神经网络和循环神经网络的问答方法及系统
CN109582849A (zh) * 2018-12-03 2019-04-05 浪潮天元通信信息系统有限公司 一种基于知识图谱的网络资源智能检索方法
CN109614419B (zh) * 2018-12-05 2022-04-29 湖南科技大学 一种面向命名数据网络的知识服务路由挖掘方法
CN109714408B (zh) * 2018-12-20 2021-04-02 中国科学院沈阳自动化研究所 一种基于Handle标识的语义化工业网络服务接口系统
CN109684456B (zh) * 2018-12-27 2021-02-02 中国电子科技集团公司信息科学研究院 基于物联网能力知识图谱的场景能力智能问答系统
US10963518B2 (en) * 2019-02-22 2021-03-30 General Electric Company Knowledge-driven federated big data query and analytics platform

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190155831A1 (en) * 2006-11-13 2019-05-23 Ip Reservoir, Llc Method and System for High Performance Integration, Processing and Searching of Structured and Unstructured Data
CN108345647A (zh) * 2018-01-18 2018-07-31 北京邮电大学 基于Web的领域知识图谱构建系统及方法
CN109410650A (zh) * 2018-10-10 2019-03-01 中国电子科技集团公司第二十八研究所 面向全系统信息管理的基于情景与语义的信息聚合方法
CN109635272A (zh) * 2018-10-24 2019-04-16 中国电子科技集团公司第二十八研究所 一种空中交通管理领域的本体交互模型构建方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115757828A (zh) * 2022-11-16 2023-03-07 南京航空航天大学 一种基于辐射源知识图谱的空中目标意图识别方法
CN115757828B (zh) * 2022-11-16 2023-11-10 南京航空航天大学 一种基于辐射源知识图谱的空中目标意图识别方法

Also Published As

Publication number Publication date
GB2589431A (en) 2021-06-02
CN110222127A (zh) 2019-09-10
CN110222127B (zh) 2021-07-09
GB202013426D0 (en) 2020-10-14

Similar Documents

Publication Publication Date Title
WO2020244023A1 (zh) 基于知识图谱的信息汇聚方法、装置和设备
US20210303531A1 (en) Apparatus, systems, and methods for grouping data records
US10102268B1 (en) Efficient index for low latency search of large graphs
CN110347843B (zh) 一种基于知识图谱的中文旅游领域知识服务平台构建方法
CN110941612A (zh) 基于关联数据的自治数据湖构建系统及方法
CN110059264B (zh) 基于知识图谱的地点检索方法、设备及计算机存储介质
WO2017048303A1 (en) Graph-based queries
US11263187B2 (en) Schema alignment and structural data mapping of database objects
US11164153B1 (en) Generating skill data through machine learning
CN113254630B (zh) 一种面向全球综合观测成果的领域知识图谱推荐方法
US11726999B1 (en) Obtaining inferences to perform access requests at a non-relational database system
CN115757689A (zh) 一种信息查询系统、方法及设备
Jin et al. Collective keyword query on a spatial knowledge base
JP2024041902A (ja) マルチソース型の相互運用性および/または情報検索の最適化
Cheng et al. Quickly locating POIs in large datasets from descriptions based on improved address matching and compact qualitative representations
Li et al. Research on distributed search technology of multiple data sources intelligent information based on knowledge graph
Matuszka et al. Geodint: towards semantic web-based geographic data integration
CN108804580B (zh) 一种在联邦型rdf数据库中查询关键字的方法
US20170177580A1 (en) Title standardization ranking algorithm
CN106933844A (zh) 面向大规模rdf数据的可达性查询索引的构建方法
CN115269862A (zh) 一种基于知识图谱的电力问答与可视化系统
JP7443649B2 (ja) モデル更新方法、装置、電子デバイス及び記憶媒体
US11704309B2 (en) Selective use of data structure operations for path query evaluation
CN115329221B (zh) 一种针对多源地理实体的查询方法及查询系统
Cai et al. Application research of employment recommendation based on improved K-means++ algorithm in colleges and universities

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 202013426

Country of ref document: GB

Kind code of ref document: A

Free format text: PCT FILING DATE = 20190711

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19931986

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19931986

Country of ref document: EP

Kind code of ref document: A1