CN114969263A - Construction method, construction device and application of urban traffic knowledge map - Google Patents

Construction method, construction device and application of urban traffic knowledge map Download PDF

Info

Publication number
CN114969263A
CN114969263A CN202210617739.1A CN202210617739A CN114969263A CN 114969263 A CN114969263 A CN 114969263A CN 202210617739 A CN202210617739 A CN 202210617739A CN 114969263 A CN114969263 A CN 114969263A
Authority
CN
China
Prior art keywords
knowledge
data
traffic
urban traffic
entities
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210617739.1A
Other languages
Chinese (zh)
Inventor
谭墍元
邱倩倩
罗文秀
郭伟伟
薛晴婉
郑国荣
张帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN202210617739.1A priority Critical patent/CN114969263A/en
Publication of CN114969263A publication Critical patent/CN114969263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种城市交通知识图谱的构建方法,包括如下步骤:利用本体构建工具构建城市交通本体,形成知识图谱模式层;获取城市交通数据,抽取出实体、属性和实体间的关系,构建知识图谱数据层;将知识图谱模式层和知识图谱数据层结合,生成城市交通知识图谱,并存储到数据库中;利用知识表示模型推理出城市交通知识图谱中的新知识,补入城市交通知识图谱,本发明还公开了一种城市交通知识图谱的构建装置和应用。本发明通过利用知识图谱形成交通知识体系,整合了多源异构的交通大数据,并通过基于表示学习的知识推理模型挖掘出交通实体间的潜在关系,实现了交通领域的多源出行数据的有效融合与组织,实现了交通领域数据的开放共享。

Figure 202210617739

The invention discloses a method for constructing an urban traffic knowledge graph, comprising the following steps: constructing an urban traffic ontology by using an ontology building tool to form a knowledge graph mode layer; acquiring urban traffic data, extracting entities, attributes and relationships between entities, and constructing Knowledge graph data layer; combine the knowledge graph mode layer with the knowledge graph data layer to generate an urban traffic knowledge graph and store it in the database; use the knowledge representation model to infer new knowledge in the urban traffic knowledge graph and add it to the urban traffic knowledge graph , the invention also discloses a construction device and application of an urban traffic knowledge map. The invention forms a traffic knowledge system by using a knowledge graph, integrates multi-source and heterogeneous traffic big data, and mines the potential relationship between traffic entities through a knowledge inference model based on representation learning, thereby realizing the multi-source travel data in the traffic field. Effective integration and organization realize the open sharing of data in the transportation field.

Figure 202210617739

Description

一种城市交通知识图谱的构建方法、构建装置及应用A construction method, construction device and application of an urban traffic knowledge graph

技术领域technical field

本发明属于智能交通技术领域,具体涉及一种城市交通知识图谱的构建方法、构建装置及应用。The invention belongs to the technical field of intelligent transportation, and in particular relates to a construction method, construction device and application of an urban traffic knowledge graph.

背景技术Background technique

城市交通具有强耦合性,需协调“人、车、路、环境”完成一体化管控,并且海量的交通出行数据具有时空关联性,但多源数据间融合不足,因此可以采用知识图谱对交通出行数据进行融合,知识图谱的数据组织形式是结构化的,它可以刻画真实世界中存在的实体、实体所具有的属性以及两个实体之间的关联关系,但是目前构建的大规模知识库,尽管数据规模大,但仍存在数据稀疏问题。Urban traffic has strong coupling, and it is necessary to coordinate "people, vehicles, roads, and environment" to complete integrated management and control, and the massive traffic travel data is related in time and space, but the integration of multi-source data is insufficient. Therefore, knowledge graphs can be used to analyze traffic travel. Data fusion, the data organization form of knowledge graph is structured, it can describe the entities existing in the real world, the attributes of entities and the relationship between two entities, but the large-scale knowledge base currently constructed, although The scale of data is large, but there is still a problem of data sparseness.

因此,亟需一种较优的城市交通知识图谱的构建方法,以解决知识图谱中数据稀疏的问题。Therefore, there is an urgent need for a better construction method of urban traffic knowledge graph to solve the problem of sparse data in the knowledge graph.

发明内容SUMMARY OF THE INVENTION

为解决上述现有技术的弊端,本发明公开了一种可以精准推理出新知识的城市交通知识图谱的构建方法,具体技术方案如下:In order to solve the drawbacks of the above-mentioned prior art, the present invention discloses a method for constructing an urban traffic knowledge graph that can accurately infer new knowledge. The specific technical solutions are as follows:

一种城市交通知识图谱的构建方法,包括如下步骤:A method for constructing an urban traffic knowledge graph, comprising the following steps:

利用本体构建工具构建城市交通本体,形成知识图谱模式层;Use ontology construction tools to build urban traffic ontology and form a knowledge graph model layer;

获取城市交通数据,抽取出实体、属性和实体间的关系,构建知识图谱数据层;Obtain urban traffic data, extract entities, attributes and relationships between entities, and build a knowledge map data layer;

将知识图谱模式层和知识图谱数据层结合,生成城市交通知识图谱,并存储到数据库中;Combine the knowledge graph mode layer with the knowledge graph data layer to generate an urban traffic knowledge graph and store it in the database;

利用知识表示模型推理出城市交通知识图谱中的新知识,补入城市交通知识图谱。The knowledge representation model is used to infer the new knowledge in the urban traffic knowledge graph, and add it to the urban traffic knowledge graph.

进一步的,构建城市交通本体的方法为:Further, the method of constructing the urban traffic ontology is as follows:

采用七步法构建城市交通本体,并在创建实例前进行质量评估;Adopt a seven-step method to build an urban traffic ontology, and conduct quality assessment before creating an instance;

所述质量评估具体包括:The quality assessment specifically includes:

通过绘制树形结构图,验证类的层次结构的传递性是否成立;Verify that the transitivity of the class hierarchy is established by drawing a tree structure diagram;

检查每个本体的应用范围和表达方式在各处使用时是否一致,是否出现类和本体的冗余;Check whether the application scope and expression method of each ontology are consistent when used everywhere, and whether there is redundancy between classes and ontology;

检查属性的描述信息是否完整,属性约束是否符合逻辑,属性是否具有共享性;Check whether the attribute description information is complete, whether the attribute constraints are logical, and whether the attributes are shared;

检查本体的可扩展性;Check the scalability of the ontology;

检查类间关系的完整性,唯一性和逻辑的一致性。Check the integrity, uniqueness, and logical consistency of relationships between classes.

进一步的,利用知识表示模型推理出城市交通知识图谱中的新知识的具体方法为:Further, the specific method of using the knowledge representation model to infer the new knowledge in the urban traffic knowledge graph is as follows:

将城市交通知识图谱中的三元组数据划分为训练集、验证集和测试集;Divide the triplet data in the urban traffic knowledge graph into training set, validation set and test set;

构建所述三元组数据的负样本,并过滤所述负样本中的假负例;constructing negative samples of the triplet data, and filtering false negatives in the negative samples;

设置知识表示模型超参数;Set the knowledge representation model hyperparameters;

利用训练集和负样本,基于小批量随机梯度下降法训练知识表示模型,通过adadelta方法,在训练过程中自适应的调整学习速率;Using the training set and negative samples, the knowledge representation model is trained based on the mini-batch stochastic gradient descent method, and the learning rate is adaptively adjusted during the training process through the adadelta method;

利用验证集和负样本对训练的知识表示模型进行超参数调整;Hyperparameter tuning of the trained knowledge representation model using the validation set and negative samples;

利用测试集和负样本对训练的知识表示模型进行评价;Use the test set and negative samples to evaluate the trained knowledge representation model;

利用训练的知识表示模型挖掘城市交通知识图谱的隐含关系和缺失实体,补入城市交通知识图谱。Use the trained knowledge representation model to mine the implicit relationships and missing entities of the urban traffic knowledge graph, and add it to the urban traffic knowledge graph.

进一步的,构建所述三元组数据的负样本的方法为:Further, the method for constructing the negative sample of the triplet data is:

对训练集、验证集或测试集中的具有某一种关系的三元组,根据伯努利分布,计算选择头实体或尾实体来完成替换操作的概率,将概率较高的实体替换掉;For the triples with a certain relationship in the training set, validation set or test set, according to Bernoulli distribution, calculate the probability of selecting the head entity or the tail entity to complete the replacement operation, and replace the entity with higher probability;

根据关系类型约束,由关系来决定用哪些实体来替换,具体如下式所示:According to the relationship type constraint, the relationship determines which entities to replace, as shown in the following formula:

Figure BDA0003673931880000021
Figure BDA0003673931880000021

其中,Δ′为构建的负三元组的集合,dr为满足关系类型r的领域约束内所有实体的有序索引;rr为满足关系类型r的范围约束内所有实体的有序索引,h为三元组的头实体,h'为负三元组的头实体,t为三元组的尾实体,t'为负三元组的尾实体,r为关系。Among them, Δ′ is the set of constructed negative triples, d r is the ordered index of all entities that satisfy the domain constraints of relation type r; r r is the ordered index of all entities that satisfy the range constraints of relation type r, h is the head entity of the triplet, h' is the head entity of the negative triplet, t is the tail entity of the triplet, t' is the tail entity of the negative triplet, and r is the relation.

进一步的,过滤所述负样本中的假负例的具体方法为:Further, the specific method for filtering false negative examples in the negative samples is:

将所述三元组数据和负样本中的负三元组数据导入关系数据库中,使用关系数据库中的查询功能将重复数据查找出来,并将所述负样本中的重复数据剔除;Importing the triplet data and the negative triplet data in the negative sample into a relational database, using the query function in the relational database to find out the duplicate data, and removing the duplicate data in the negative sample;

其中,所述重复数据为既存在于三元组数据中,又存在于负三元组数据中的数据。Wherein, the repeated data is data that exists both in triple data and in negative triple data.

进一步的,further,

所述知识表示模型为TransD模型,所述TransD知识表示模型如下所示:The knowledge representation model is a TransD model, and the TransD knowledge representation model is as follows:

映射矩阵:Mapping matrix:

Figure BDA0003673931880000022
Figure BDA0003673931880000022

Figure BDA0003673931880000023
Figure BDA0003673931880000023

其中,Mrh为头实体映射矩阵,Mrt为尾实体映射矩阵,rp为关系向量,

Figure BDA0003673931880000024
为头实体映射向量,
Figure BDA0003673931880000025
为尾实体映射向量,Imxn为单位矩阵;Among them, M rh is the head entity mapping matrix, M rt is the tail entity mapping matrix, r p is the relationship vector,
Figure BDA0003673931880000024
map vector for head entity,
Figure BDA0003673931880000025
is the tail entity mapping vector, and I mxn is the identity matrix;

将实体向量投影到关系空间中:Project entity vectors into relational space:

h=Mrhhh =M rh h

t=Mrttt =M rt t

h为头实体由Mrh映射后的头实体向量,t为尾实体由Mrt映射后的尾实体向量;h is the head entity vector after the head entity is mapped by M rh , t is the tail entity vector after the tail entity is mapped by M rt ;

得分函数:Score function:

Figure BDA0003673931880000031
Figure BDA0003673931880000031

损失函数:Loss function:

Figure BDA0003673931880000032
Figure BDA0003673931880000032

其中,γ是超参数,表示正确三元组与负三元组之间的最大间隔。[x]+=max(0,x),Δ表示正确三元组的集合,Δ′表示构建的负三元组的集合。where γ is a hyperparameter representing the maximum separation between correct triples and negative triples. [x] + =max(0,x), Δ represents the set of correct triples, and Δ′ represents the set of constructed negative triples.

进一步的,利用测试集和负样本对训练的知识表示模型进行评价的具体方法为:Further, the specific method for evaluating the trained knowledge representation model using the test set and negative samples is as follows:

对于测试集中的任意一个三元组,根据训练的知识推理模型中的得分函数计算该三元组得分和根据该三元组及知识图谱中的实体构建的负三元组得分,并按照得分值由大到小对该三元组及该负三元组进行排名;For any triplet in the test set, calculate the triplet score and the negative triplet score constructed according to the triplet and entities in the knowledge graph according to the score function in the trained knowledge inference model, and calculate the score according to the score. Rank the triplet and the negative triplet in descending order of value;

采用平均排名、平均倒数排名、首位命中率、前三命中率和前十命中率中的一种或几种评价指标衡量链接预测任务完成的效果。One or more evaluation indicators among the average ranking, the average penultimate ranking, the first hit rate, the top three hit rate and the top ten hit rate are used to measure the effect of link prediction task completion.

进一步的,further,

所述城市交通包括公共交通和道路交通,针对公共交通和道路交通分别构建公共交通知识图谱和道路交通知识图谱;The urban traffic includes public traffic and road traffic, and a public traffic knowledge map and a road traffic knowledge map are respectively constructed for public traffic and road traffic;

获取公共交通数据的方法为通过网络爬虫技术获取目标城市的地铁线路及站点信息,获取目标时间内所述地铁线路的地铁刷卡数据;The method for obtaining public transportation data is to obtain the subway line and station information of the target city through the web crawler technology, and obtain the subway card swiping data of the subway line within the target time;

获取道路交通数据的方法为从地图数据库上获取目标道路网络数据、利用地图API获取目标道路上的目标兴趣点信息和交通态势数据。The method of obtaining road traffic data is to obtain the target road network data from the map database, and use the map API to obtain the target point of interest information and traffic situation data on the target road.

本发明还公开了一种城市交通知识图谱的构建装置,包括:The invention also discloses a device for constructing a knowledge map of urban traffic, comprising:

本体构建模块,用于利用本体构建工具构建城市交通本体,形成知识图谱模式层;The ontology building module is used to use ontology building tools to build urban traffic ontology and form a knowledge graph model layer;

数据获取模块,用于获取城市交通数据,抽取出交通实体、属性和实体间的关系,构建知识图谱数据层;The data acquisition module is used to acquire urban traffic data, extract the relationship between traffic entities, attributes and entities, and build a knowledge graph data layer;

存储模块,用于将知识图谱模式层和知识图谱数据层结合生成城市交通知识图谱,并存储到数据库中;The storage module is used to combine the knowledge graph mode layer and the knowledge graph data layer to generate the urban traffic knowledge graph and store it in the database;

推理模块,用于利用知识表示模型推理出城市交通知识图谱中的新知识,补入城市交通知识图谱。The reasoning module is used to use the knowledge representation model to infer new knowledge in the urban traffic knowledge graph and add it to the urban traffic knowledge graph.

本发明还公开了上述任一所述的城市交通知识图谱构建方法在城市交通领域的应用。The invention also discloses the application of any of the above-mentioned methods for constructing a knowledge map of urban traffic in the field of urban traffic.

通过采用上述技术方案,本发明的有益效果为:By adopting the above-mentioned technical scheme, the beneficial effects of the present invention are:

本发明通过利用知识图谱形成交通知识体系,整合了多源异构的交通大数据,对交通实体间的时空关系进行建模,并通过基于表示学习的知识推理模型挖掘出交通实体间的潜在关系,实现了交通领域的多源出行数据的有效融合与组织,形成了交通领域的知识网络,实现了交通领域数据的开放共享。The invention forms a traffic knowledge system by using a knowledge graph, integrates multi-source heterogeneous traffic big data, models the spatiotemporal relationship between traffic entities, and mines the potential relationship between traffic entities through a knowledge inference model based on representation learning , realizes the effective integration and organization of multi-source travel data in the field of transportation, forms a knowledge network in the field of transportation, and realizes the open sharing of data in the field of transportation.

本发明在本体构建时,相对于原始的七步法增加了质量评估步骤,通过质量评估对本体加入知识库的质量进行了严格的把控,通过从本体的结构丰富度、逻辑关系等支持层面进行评估,实现了标准化本体的可重复利用,确保了本体构建的准确性及有效性,使得后续采用该本体构建的知识图谱推理新知识时,准确性也得到了提升。Compared with the original seven-step method, the present invention adds a quality assessment step when constructing the ontology, and strictly controls the quality of the ontology added to the knowledge base through the quality assessment. The evaluation is carried out to realize the reusability of the standardized ontology, to ensure the accuracy and validity of the ontology construction, and to improve the accuracy of the subsequent use of the knowledge graph constructed by the ontology to infer new knowledge.

本发明通过利用关系类型约束,提高了在构建负样本时抽取到相同类型实体来替换原有三元组的概率,有利于将相同类型实体间的距离拉大,即加大实体的向量表达之间的区别,利用关系的先验知识,由关系来决定用哪些实体来替换,可显著提高知识推理模型的预测精确度。The present invention improves the probability of extracting the same type of entities to replace the original triples when constructing negative samples by using the relationship type constraints, which is beneficial to increase the distance between entities of the same type, that is, to increase the distance between the vector representations of entities Using the prior knowledge of the relationship to decide which entities to replace by the relationship can significantly improve the prediction accuracy of the knowledge inference model.

附图说明Description of drawings

图1为本申请一种实施例的城市交通知识图谱构建流程;Fig. 1 is an urban traffic knowledge map construction process according to an embodiment of the application;

图2为本申请一种实施例的城市交通本体构建流程;Fig. 2 is an urban traffic ontology construction process according to an embodiment of the application;

图3为本申请一种实施例的城市交通本体的类层次图;3 is a class hierarchy diagram of an urban traffic ontology according to an embodiment of the application;

图4为本申请一种实施例的城市交通本体的类间关系图;FIG. 4 is an inter-class relationship diagram of an urban traffic ontology according to an embodiment of the application;

图5为本申请一种实施例的道路网络信息可视化示意图;FIG. 5 is a schematic diagram of visualization of road network information according to an embodiment of the present application;

图6为本申请一种实施例的道路交通态势信息可视化示意图;6 is a schematic diagram of visualization of road traffic situation information according to an embodiment of the application;

图7为本申请一种实施例的公共交通本体的类间关系图;FIG. 7 is an inter-class relationship diagram of a public transportation ontology according to an embodiment of the application;

图8为本申请一种实施例的公共交通出行用户与行程,行程与站点之间的关系图;FIG. 8 is a diagram showing the relationship between a public transportation travel user and a itinerary, and a itinerary and a site according to an embodiment of the application;

图9为本申请一种实施例的道路交通知识图谱的实体和关系图;FIG. 9 is an entity and relationship diagram of a road traffic knowledge graph according to an embodiment of the application;

图10为本申请一种实施例的道路与交通态势的关联关系图;FIG. 10 is a relationship diagram of a road and a traffic situation according to an embodiment of the application;

图11为本申请一种实施例的TransD模型说明图;11 is an explanatory diagram of a TransD model according to an embodiment of the application;

图12为本申请一种实施例的模型训练过程图;Fig. 12 is a model training process diagram according to an embodiment of the application;

图13为本申请一种实施例的链接预测任务评价的流程图;13 is a flowchart of link prediction task evaluation according to an embodiment of the application;

图14为本申请一种实施例的模型训练中损失值的变化曲线图;FIG. 14 is a graph showing the change of loss value in model training according to an embodiment of the present application;

图15为本申请一种实施例的多种推理模型的平均倒数排名结果对比图;15 is a comparison diagram of the average reciprocal ranking results of multiple inference models according to an embodiment of the present application;

图16为本申请一种实施例的多种推理模型的前十命中率结果对比图;16 is a comparison diagram of the top ten hit rate results of multiple inference models according to an embodiment of the present application;

图17为本申请一种实施例的某街道在开学前后的早高峰时段平均速度对比图;17 is a comparison diagram of the average speed of a street in the morning rush hour before and after school starts according to an embodiment of the application;

图18为本申请一种实施例的道路交通知识图谱示意图。FIG. 18 is a schematic diagram of a road traffic knowledge graph according to an embodiment of the present application.

具体实施方式Detailed ways

下面将对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明的一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

如图1所示,本发明公开了一种城市交通知识图谱的构建方法,包括如下步骤:As shown in FIG. 1 , the present invention discloses a method for constructing an urban traffic knowledge graph, comprising the following steps:

利用本体构建工具构建城市交通本体,形成知识图谱模式层;Use ontology construction tools to build urban traffic ontology and form a knowledge graph model layer;

获取城市交通数据,抽取出实体、属性和实体间的关系,构建知识图谱数据层;Obtain urban traffic data, extract entities, attributes and relationships between entities, and build a knowledge map data layer;

将知识图谱模式层和知识图谱数据层结合,生成城市交通知识图谱,并存储到数据库中;Combine the knowledge graph mode layer with the knowledge graph data layer to generate an urban traffic knowledge graph and store it in the database;

利用知识表示模型推理出城市交通知识图谱中的新知识,补入城市交通知识图谱。The knowledge representation model is used to infer the new knowledge in the urban traffic knowledge graph, and add it to the urban traffic knowledge graph.

本发明通过利用知识图谱形成交通知识体系,整合了多源异构的交通大数据,对交通实体间的时空关系进行建模,并通过基于表示学习的知识推理模型挖掘出交通实体间的潜在关系,实现了交通领域的多源出行数据的有效融合与组织,形成了交通领域的知识网络,实现了交通领域数据的开放共享。The invention forms a traffic knowledge system by using a knowledge graph, integrates multi-source heterogeneous traffic big data, models the spatiotemporal relationship between traffic entities, and mines the potential relationship between traffic entities through a knowledge inference model based on representation learning , realizes the effective integration and organization of multi-source travel data in the field of transportation, forms a knowledge network in the field of transportation, and realizes the open sharing of data in the field of transportation.

为了便于对本发明技术方案的理解,下面对本发明的构建方法进行进一步的解释说明;In order to facilitate the understanding of the technical solution of the present invention, the construction method of the present invention is further explained below;

一、利用本体构建工具构建城市交通本体(即城市交通出行本体),形成知识图谱模式层;1. Use ontology construction tools to build urban traffic ontology (ie, urban traffic travel ontology), and form a knowledge map model layer;

城市交通出行领域本体的构建,是面向特定的业务场景综合考虑到获取的数据资源,并考虑领域术语的标准化和概念类别的广泛适用性,抽象得到交通领域的知识层次结构,定义本体所包含的类,类的属性以及各个类之间的关联关系。The construction of the ontology in the field of urban transportation is to comprehensively consider the acquired data resources for specific business scenarios, and consider the standardization of domain terms and the wide applicability of concept categories, abstract the knowledge hierarchy in the transportation domain, and define the contents of the ontology. Classes, attributes of classes, and associations between classes.

本体定义了知识图谱的模具,描述知识图谱的顶层结构,因此构建本体可对领域知识的体系或层次进行分析,并可实现领域知识的重复使用。The ontology defines the mold of the knowledge graph and describes the top-level structure of the knowledge graph. Therefore, building an ontology can analyze the system or level of domain knowledge and realize the reuse of domain knowledge.

如图2所示,本申请使用斯坦福大学发布的七步法构建本体,可以选用手动构建本体的方法,并在七步法的基础上增加了质量评估部分,具体的本体构建流程如下:As shown in Figure 2, this application uses the seven-step method released by Stanford University to build the ontology. The method of manually building the ontology can be selected, and the quality assessment part is added on the basis of the seven-step method. The specific ontology building process is as follows:

第一步:确定本体的领域,明确领域本体的目的,本申请的本体涉及领域为城市交通领域;Step 1: Determine the domain of the ontology, clarify the purpose of the domain ontology, and the domain involved in the ontology of this application is the urban transportation domain;

第二步:查看是否有可复用本体,调研是否有相关的本体已经构建好,如果有的话可直接导入节省构建成本和时间;Step 2: Check whether there is a reusable ontology, investigate whether there is a related ontology that has been built, and if so, it can be imported directly to save construction cost and time;

第三步:确定重要术语,在列举所有的术语时需保证全面性;Step 3: Identify important terms, and ensure comprehensiveness when listing all terms;

第四步:定义类及其层次结构,本申请采用自顶向下的方法,即先定义领域中最宽泛的概念,然后进行细化,如图3所示;The fourth step: define the class and its hierarchical structure. This application adopts the top-down method, that is, the broadest concept in the field is first defined, and then refined, as shown in Figure 3;

第五步:确定类的属性,仅依靠类无法提供充足的信息,因此还需要定义类的属性来进一步描述类,确定类的属性还包括定义类间关系,在城市交通本体中存在一些关系,两个类之间通过关系联系起来,例如某个兴趣点的位置在某一条道路上,兴趣点与城市道路之间是位于的关系,如图4所示表示了城市交通本体中所有类之间的关系。Step 5: Determine the attributes of the class, only relying on the class can not provide sufficient information, so it is necessary to define the attributes of the class to further describe the class. Determining the attributes of the class also includes defining the relationship between classes. There are some relationships in the urban traffic ontology. The two classes are connected by a relationship. For example, the location of a certain interest point is on a certain road, and the relationship between the interest point and the urban road is located. As shown in Figure 4, it represents the relationship between all classes in the urban traffic ontology. Relationship.

此处还需要注意的是类具有继承性,子类会继承其父类的属性,所以属性应该被放置到最宽泛的类中,越靠近顶层越好,表1给出了几种类和属性的具体实例,如表1所示,应该将属性放置在最宽泛的类中,以保证子类均可继承该属性;It should also be noted here that classes are inherited, and subclasses will inherit the properties of their parent classes, so properties should be placed in the broadest class, the closer to the top level, the better. Table 1 shows several classes and properties. As shown in Table 1, the property should be placed in the broadest class to ensure that subclasses can inherit the property;

表1Table 1

Figure BDA0003673931880000051
Figure BDA0003673931880000051

Figure BDA0003673931880000061
Figure BDA0003673931880000061

第六步:定义属性的约束,如对属性值的类型、属性值的数量范围等进行限制;Step 6: Define the constraints of attributes, such as restricting the type of attribute values, the number range of attribute values, etc.;

第七步:通过质量评估后创建类的具体实例。Step 7: Create a concrete instance of the class after passing the quality assessment.

其中,质量评估应遵循清晰性、一致性、可扩展性、精简性和独立共享性的原则,具体的质量评估方法为:Among them, the quality assessment should follow the principles of clarity, consistency, scalability, simplicity and independent sharing. The specific quality assessment methods are:

(1)首先对类的层次结构进行评估,通过绘制树形结构图,明确根节点和子节点,每一个子树都对应着领域中独立的、模块化的知识,确保层次清晰,检验层次结构的传递性是否成立,避免类层次结构中出现循环;(1) First, evaluate the hierarchical structure of the class. By drawing a tree structure diagram, the root node and sub-nodes are defined. Each sub-tree corresponds to independent and modular knowledge in the field. Whether transitivity is established to avoid loops in the class hierarchy;

(2)对本体进行评估,确定本体表达明确且清晰,检查每个本体的应用范围和表达方式在各处使用时是否一致,包括关系逻辑的一致性,是否有重复定义的类,是否有需要合并的本体,避免出现冗余;(2) Evaluate the ontology, confirm that the expression of the ontology is clear and clear, and check whether the application scope and expression method of each ontology are consistent when used everywhere, including the consistency of the relational logic, whether there are duplicate defined classes, and whether there is a need Merged ontology to avoid redundancy;

(3)对定义的属性与属性约束进行评估,检查属性的描述信息是否完整,属性约束是否符合逻辑;评估属性的共享性,是够广泛适用于多个类,而不是仅仅局限于某一种类;(3) Evaluate the defined attributes and attribute constraints, check whether the description information of the attributes is complete, and whether the attribute constraints are logical; the sharedness of the evaluation attributes is widely applicable to multiple classes, not just limited to a certain class. ;

(4)检查本体的可拓展性,保证本体能够随着类和属性的不断增加、修改而不断进行灵活的完善和更新;(4) Check the scalability of the ontology to ensure that the ontology can be continuously improved and updated flexibly with the continuous increase and modification of classes and attributes;

(5)检查类间关系逻辑的一致性,并评估类间关系的完整性,检查本体是够囊括了所有类之间可能存在的关系,检查类间关系的唯一性,即检查类之间是够只存在一种关系。(5) Check the consistency of the logic of the relationship between classes, and evaluate the integrity of the relationship between the classes, check that the ontology is enough to include all possible relationships between classes, and check the uniqueness of the relationship between classes, that is, check whether the classes are Enough that there is only one relationship.

如图2所示,在本体构建时若未通过质量评估,应该重新返回第三步重新进行本体构建,直到所构建的本体可以通过质量评估,再去创建具体实例。As shown in Figure 2, if the ontology construction fails to pass the quality assessment, it should go back to the third step to construct the ontology again until the constructed ontology can pass the quality assessment, and then create a specific instance.

本发明在本体构建时,相对于原始的七步法增加了质量评估步骤,通过质量评估对本体加入知识库的质量进行了严格的把控,通过从本体的结构丰富度、逻辑关系等支持层面进行评估,实现了标准化本体的可重复利用,确保了本体构建的准确性及有效性,使得后续采用该本体构建的知识图谱推理新知识时,准确性也得到了提升。Compared with the original seven-step method, the present invention adds a quality assessment step when constructing the ontology, and strictly controls the quality of the ontology added to the knowledge base through the quality assessment. The evaluation is carried out to realize the reusability of the standardized ontology, to ensure the accuracy and validity of the ontology construction, and to improve the accuracy of the subsequent use of the knowledge graph constructed by the ontology to infer new knowledge.

本申请所述的本体构建工具可以为protégé5.5.0软件,采用该软件构建的本体,可以直观的看到类,属性和关系。The ontology construction tool described in this application can be protégé 5.5.0 software, and the ontology constructed by this software can intuitively see classes, attributes and relationships.

二、获取城市交通数据,抽取出交通实体、属性和实体间的关系,构建知识图谱数据层;2. Obtain urban traffic data, extract traffic entities, attributes and relationships between entities, and build a knowledge map data layer;

城市交通包括公共交通和城市道路交通,所以本申请需要获取的城市交通数据包括公共交通数据(即公共交通出行数据)和道路交通数据(即道路交通出行数据),由于公共交通需要获取的刷卡数据和道路交通需要获取的交通态势数据的时空不同,所以需要分别构建公共交通数据层和道路交通数据层,并分别构建公共交通知识图谱和道路交通知识图谱。Urban traffic includes public traffic and urban road traffic, so the urban traffic data to be acquired in this application includes public traffic data (that is, public traffic travel data) and road traffic data (that is, road traffic travel data). It is different from the time and space of the traffic situation data that road traffic needs to obtain, so it is necessary to build the public traffic data layer and the road traffic data layer respectively, and build the public traffic knowledge map and the road traffic knowledge map respectively.

S1、构建公共交通数据层:公共交通出行数据的获取;S1. Build the public transport data layer: the acquisition of public transport travel data;

获取公共交通出行数据的方法为通过网络爬虫技术获取目标城市的地铁线路及站点信息,获取目标时间内所述地铁线路的地铁刷卡数据并进行预处理;The method for obtaining the travel data of public transport is to obtain the subway line and station information of the target city through the web crawler technology, obtain the subway card swiping data of the subway line within the target time, and perform preprocessing;

具体方法如下:The specific method is as follows:

S11、获取地铁线路及站点信息;S11. Obtain subway line and station information;

本申请利用互联网爬虫技术获取目标城市的地铁站点及线路等数据,网络爬虫主要是利用唯一的网站地址(URL)来查找网站,从网站中自动抓取和下载目标信息,具体操作方法为:This application uses the Internet crawler technology to obtain data such as subway stations and lines in the target city. The web crawler mainly uses the unique website address (URL) to find the website, and automatically grabs and downloads the target information from the website. The specific operation methods are:

Step1:首先需要构建传入的参数,主要包括key值、城市编码、城市名称,参数经过URL编码后,向目标HTTP接口发起请求,即发送一个request;Step1: First, you need to construct the incoming parameters, mainly including key value, city code, and city name. After the parameters are URL-encoded, a request is made to the target HTTP interface, that is, a request is sent;

Step2:接收HTTP请求返回的response,解析返回的数据,该数据格式为json;Step2: Receive the response returned by the HTTP request, parse the returned data, and the data format is json;

Step3:将解析好的数据存储至PostgreSQL数据库中,存储数据如表2所示,包括线路及站点的名称、编号、站点序列编号、站点经纬度、是否可换乘、经过此站点的线路等信息。Step3: Store the parsed data in the PostgreSQL database. The stored data is shown in Table 2, including the line and station name, number, station serial number, station latitude and longitude, whether it is transferable, and the route passing through this station.

表2存储的线路及站点信息Line and station information stored in Table 2

Figure BDA0003673931880000071
Figure BDA0003673931880000071

S12、获取地铁AFC刷卡数据并进行预处理;S12. Obtain subway AFC card swiping data and perform preprocessing;

下面以目标城市为深圳,目标时间为2016年1月25日(周一)至2016年1月29日(周五)7点-9点为例,详细介绍获取地铁刷卡数据并进行预处理的方法,其它城市均可采用该方法进行数据的获取和预处理。The following takes the target city as Shenzhen and the target time from January 25, 2016 (Monday) to January 29, 2016 (Friday) 7:00-9:00 as an example to introduce the method of obtaining subway card swiping data and preprocessing in detail. , other cities can use this method for data acquisition and preprocessing.

深圳地铁采用按里程分段计费的方法,因此乘客在进站与出战站时都需要刷卡,截止至2020年深圳市共有地铁线路8条(深圳地铁1-5号线、7号线、9号线、11号线),182个地铁站点,本申请的数据为深圳通刷卡数据,时间范围为2016年1月25日(周一)至2016年1月29日(周五),共5天工作日的刷卡记录,数据格式如表3所示。Shenzhen Metro adopts the method of charging by mileage, so passengers need to swipe their cards when entering and leaving the station. As of 2020, there are 8 subway lines in Shenzhen (Shenzhen Metro Line 1-5, Line 7, 9 Line, Line 11), 182 subway stations, the data in this application is the data of Shenzhen Tong credit card, the time range is from January 25, 2016 (Monday) to January 29, 2016 (Friday), a total of 5 days The card swiping records on working days, the data format is shown in Table 3.

表3地铁AFC刷卡数据格式Table 3 Metro AFC swipe data format

Figure BDA0003673931880000072
Figure BDA0003673931880000072

Figure BDA0003673931880000081
Figure BDA0003673931880000081

在原始的地铁AFC刷卡数据中,需要根据字段(COST_TYPE)来区分进出站类型(即交易类型),其中COST_TYPE=21表示该数据为进站信息,为交易开始状态,COST_TYPE=22表示数据为出站信息,交易为完成状态,因此需要进、出站数据成对的出现。In the original subway AFC card swiping data, it is necessary to distinguish the inbound and outbound type (ie transaction type) according to the field (COST_TYPE), where COST_TYPE=21 indicates that the data is inbound information, which is the transaction start state, and COST_TYPE=22 indicates that the data is outbound Inbound and outbound data needs to appear in pairs.

(1)目标数据筛选:筛选出在早高峰内(7点-9点)的刷卡记录,并统计同一IC卡在每天早高峰时段内的刷卡次数,选取次数为2次的记录(一进一出)(1) Target data screening: Screen out the card swiping records during the morning peak (7:00-9:00), and count the number of times the same IC card is swiped during the morning peak period every day, and select the records with the number of 2 times (one-by-one). out)

(2)异常数据剔除:剔除进站和出站不是成对出现的刷卡记录,即有进没出或有出没进。删除刷卡记录中站点位置缺失的数据,并删除进出站位置相同的数据,以及进出站时间差大于5小时的数据。(2) Elimination of abnormal data: Eliminate the card swiping records that are not paired when entering and leaving the station, that is, whether there is entry or exit, or whether there is entry or exit. Delete the missing data of the station location in the card swipe record, and delete the data with the same entry and exit location, and the data whose entry and exit time difference is greater than 5 hours.

(3)添加行程ID:将时间相邻的两行进出站数据合并为一行数据,包含起始及终止时间和站点名称,并添加一列字段为行程ID,以区分用户在一周内的多次出行。(3) Add itinerary ID: Combine the two lines of inbound and outbound data adjacent to each other into one line of data, including the start and end time and station name, and add a column of field as the itinerary ID to distinguish the user's multiple trips within a week .

预处理后的地铁刷卡数据格式如表4所示。The preprocessed subway card swiping data format is shown in Table 4.

表4预处理后的地铁刷卡数据Table 4 Subway credit card data after preprocessing

Figure BDA0003673931880000082
Figure BDA0003673931880000082

S2、构建道路交通数据层:道路交通出行数据的获取及预处理;S2. Build a road traffic data layer: acquisition and preprocessing of road traffic travel data;

获取道路交通出行数据的方法为从地图数据库上获取目标道路网络数据、利用地图API获取目标道路上的目标兴趣点信息和交通态势数据。The method of obtaining the road traffic travel data is to obtain the target road network data from the map database, and use the map API to obtain the target point of interest information and traffic situation data on the target road.

具体步骤为:The specific steps are:

S21、目标道路网络数据的获取;S21. Acquisition of target road network data;

Step1:以目标道路为北京市五环路以内区域为例,从地图数据库上获取北京市五环路以内区域的道路数据,例如可以从BBBike网站上下载OpenStreetMap开源地图数据库上的北京市五环路以内区域的道路数据;BBBike网站支持多种数据导出格式(如OSM,Shapefile,GeoJSON等),并且可自定义下载地图得到区域范围,是一种较优的目标道路网络数据下载网站,当然此处的下载网站和地图数据库均为是示例性的,并不做限制。Step1: Taking the target road as the area within the Fifth Ring Road of Beijing as an example, obtain the road data of the area within the Fifth Ring Road of Beijing from the map database. For example, you can download the Fifth Ring Road of Beijing on the OpenStreetMap open source map database from the BBBike website. Road data in the area; BBBike website supports a variety of data export formats (such as OSM, Shapefile, GeoJSON, etc.), and can customize the download map to get the area range, is a better target road network data download website, of course, here The download site and map database of the above are exemplary and not limiting.

Step2:利用OSM2GMNS从下载的目标道路网络中输出符合GMNS标准的道路网络数据,GMNS全称General Modeling Network Specification,其定义了一套灵活统一的多模式交通网络表示格式,OSM2GMNS还提供了简化交叉口功能,输出文件包括道路节点(node.csv)和道路连接弧(link.csv),主要的字段说明如表5和表6所示。Step2: Use OSM2GMNS to output road network data that conforms to the GMNS standard from the downloaded target road network. The full name of GMNS is General Modeling Network Specification, which defines a flexible and unified representation format for multi-mode transportation networks. OSM2GMNS also provides a simplified intersection function. , the output file includes road nodes (node.csv) and road connection arcs (link.csv). The main field descriptions are shown in Table 5 and Table 6.

表5道路节点的字段说明Table 5 Field descriptions of road nodes

Figure BDA0003673931880000091
Figure BDA0003673931880000091

表6道路连接弧的字段说明Table 6 Field descriptions for road connecting arcs

Figure BDA0003673931880000092
Figure BDA0003673931880000092

其中,节点类型(OSM_HIGHWAY)分为与高速路相交的交叉口,有交通信号控制的交叉口,无交通信号控制的交叉口;路段等级(LINK_TYPE_NAME)分为高速公路、主干路、次干路、支路、小区路。Among them, the node type (OSM_HIGHWAY) is divided into intersections with expressways, intersections with traffic signal control, and intersections without traffic signal control; the link class (LINK_TYPE_NAME) is divided into expressways, arterial roads, secondary arterial roads, Branch road, community road.

Step3:将数据导入至关系型数据库中,并在QGIS软件进行可视化展示。Step3: Import the data into the relational database and visualize it in the QGIS software.

如图5所示,以玉泉路与石景山路交叉口为例,图中虚线为道路连接弧,圆点为道路节点,每两个节点之间有一个连接弧。As shown in Figure 5, taking the intersection of Yuquan Road and Shijingshan Road as an example, the dotted line in the figure is the road connecting arc, the dots are the road nodes, and there is a connecting arc between every two nodes.

S22、目标道路兴趣点的获取;S22, the acquisition of the interest point of the target road;

兴趣点POI(Point of Interest),泛指真实世界中具有实际意义的点,如与人们生活相关的设施或建筑等,如停车场、学校、医院等,POI数据一般包含名称、类型、地址及经纬度等基本属性。POI (Point of Interest) generally refers to points with practical significance in the real world, such as facilities or buildings related to people's lives, such as parking lots, schools, hospitals, etc. POI data generally includes name, type, address and Basic attributes such as latitude and longitude.

兴趣点可以通过地图API中的搜索POI接口获取,本申请以用高德地图API获取北京市五环以内的中小学信息为例,阐述下兴趣点的获取过程,Points of interest can be obtained through the search POI interface in the map API. This application uses the AutoNavi map API to obtain the information of primary and secondary schools within the Fifth Ring Road of Beijing as an example to describe the process of obtaining points of interest.

Step1:首先确定查询区域,利用QGIS绘制北京市五环的边界,导出为GeoJSON格式,并处理为一列为经度,另一列为纬度的csv文件(边界坐标对)。Step1: First determine the query area, use QGIS to draw the boundary of Beijing's Fifth Ring Road, export it to GeoJSON format, and process it as a csv file (boundary coordinate pair) with one column of longitude and the other of latitude.

Step2:因为每次请求最多返回1000个POI信息,所以需要将大区域划分为多个小网格。设置网格大小为10km*10km,将边界坐标映射到网格上,最后得到区域内每个小网格的左下角顶点坐标。Step2: Because each request returns a maximum of 1000 POI information, it is necessary to divide the large area into multiple small grids. Set the grid size to 10km*10km, map the boundary coordinates to the grid, and finally get the coordinates of the lower left corner vertex of each small grid in the area.

Step3:确定所查询POI类型的编码(中学为141202,小学为141203)。之后依次获取每个网格内的POI信息,请求参数为key值、顶点坐标对和查询的POI类型编码,参数经过URL编码后,向目标HTTP接口发起请求。Step3: Determine the code of the queried POI type (141202 for secondary schools and 141203 for primary schools). After that, the POI information in each grid is obtained in turn. The request parameters are the key value, the vertex coordinate pair, and the POI type code of the query. After the parameters are URL-encoded, a request is sent to the target HTTP interface.

Step4:对以json格式返回的数据进行解析,并存储到数据库中。最终获取五环内的小学数量为598个,中学数量为442个(其中存在一个学校有多个校区的情况,即一个学校名称对应多个POI数据),POI信息字段说明如表7所示。Step4: Parse the data returned in json format and store it in the database. In the end, the number of primary schools and 442 middle schools in the fifth ring is 598 (there is a situation where a school has multiple campuses, that is, a school name corresponds to multiple POI data). The POI information field descriptions are shown in Table 7.

表7 POI信息字段说明Table 7 POI information field description

Figure BDA0003673931880000101
Figure BDA0003673931880000101

S23、目标道路交通态势数据的获取;S23. Acquisition of target road traffic situation data;

本申请以研究学生上下学与道路交通态势之间的关联关系为例,阐述目标道路交通数据的获取步骤:This application takes the study of the relationship between students going to and from school and the road traffic situation as an example to describe the acquisition steps of the target road traffic data:

根据北京市2020年秋季学期的开学安排,从8月29日小学、初中、高中开始开学,到9月7日小学、初中、高中全部年级均开学,北京市小学生到校时间通常晚于7:50,放学时间早于16:30;中学生到校时间晚于7:30,放学早于17:30。According to Beijing's 2020 autumn semester start schedule, from August 29, elementary, junior high, and high school classes will start, and September 7, all grades of elementary, junior high and high schools will start. 50, school ends earlier than 16:30; middle school students arrive later than 7:30 and dismiss earlier than 17:30.

为研究学生上下学与道路交通态势之间的关联关系,本申请采集交通态势的时间范围为2020年学生开学前一周:8月24日(周一)至8月28日(周五),学生开学后一周:9月21日(周一)至9月25日(周五),每天采集的时段为早上6:30至10:30(4个小时),下午16:00至21:00(5小时),采集交通态势的空间范围为北京市五环以内区域中道路的交通态势数据。In order to study the relationship between students going to and from school and the road traffic situation, the time range for collecting the traffic situation in this application is one week before the start of school for students in 2020: August 24 (Monday) to August 28 (Friday), when students start school The next week: September 21st (Monday) to September 25th (Friday), the daily collection period is from 6:30 to 10:30 in the morning (4 hours), and from 16:00 to 21:00 in the afternoon (5 hours) ), the spatial scope of collecting traffic situation is the traffic situation data of roads in the area within the Fifth Ring Road of Beijing.

通过高德地图API的交通态势接口获取交通态势数据,获取交通态势数据的方法与获取POI的方法类似,同样根据矩形区域的方式查询数据,但要求矩形对角线长度要小于10公里,因此也需要将大区域划为多个小网格来突破这个区域范围的限制。Obtain traffic situation data through the traffic situation interface of AutoNavi Maps API. The method of obtaining traffic situation data is similar to the method of obtaining POI. The data is also queried according to the method of rectangular area, but the length of the diagonal of the rectangle is required to be less than 10 kilometers. The large area needs to be divided into multiple small grids to break through the limits of this area.

Step1:确定查询区域,利用QGIS绘制北京市五环的边界,导出数据并处理为一列为经度,另一列为纬度的csv文件,即边界坐标对Step1: Determine the query area, use QGIS to draw the boundary of the Fifth Ring Road in Beijing, export the data and process it as a csv file with one column for longitude and the other for latitude, that is, the pair of boundary coordinates

Step2:将区域划分为多个7km*7km的小网格,并将边界坐标映射到网格上,最后得到区域内每个小网格的左下角坐标。Step2: Divide the area into multiple small grids of 7km*7km, map the boundary coordinates to the grids, and finally obtain the coordinates of the lower left corner of each small grid in the area.

Step3:依次获取每个网格内的道路的交通态势数据,请求参数为key值、道路等级、矩形区域的顶点坐标对和返回数据格式类型,参数经过URL编码后,向目标HTTP接口发起请求。因为高德地图API的交通态势服务限制个人开发者的日调用量为2000次/日,超量会封停,所以需要设置单个KEY请求超过两千次就切换下一个KEY,继续发送请求。Step3: Obtain the traffic situation data of the roads in each grid in turn. The request parameters are the key value, the road level, the vertex coordinate pair of the rectangular area, and the return data format type. After the parameters are URL encoded, a request is made to the target HTTP interface. Because the traffic situation service of AutoNavi Maps API limits the daily call volume of individual developers to 2,000 times/day, the excess will be blocked, so it is necessary to set a single KEY request more than 2,000 times, switch to the next KEY, and continue to send requests.

Step4:路况信息每2分钟更新一次,对返回的json数据进行解析,并存储到数据库中。如表8所示,返回结果包含的字段有道路名称、方向描述、车行角度、路况、速度(km/h)等。Step4: The road condition information is updated every 2 minutes, and the returned json data is parsed and stored in the database. As shown in Table 8, the fields included in the returned result include road name, direction description, driving angle, road condition, speed (km/h), etc.

表8交通态势信息字段说明Table 8 Description of fields of traffic situation information

Figure BDA0003673931880000111
Figure BDA0003673931880000111

其中,车行角度(ANGLE)反映的是道路上车辆的行驶方向,其中将正东方向设置为零度,沿着逆时针方向旋转时取正数,取值范围为[0,360]。交通状态(STATUS)反映道路的交通状态,其中0代表状态未知、1代表交通畅通状态、2代表车辆缓行状态、3代表交通拥堵、4代表严重拥堵状态。Among them, the driving angle (ANGLE) reflects the driving direction of the vehicle on the road, in which the due east direction is set to zero degrees, and it is a positive number when it rotates in the counterclockwise direction, and the value range is [0, 360]. The traffic status (STATUS) reflects the traffic status of the road, where 0 represents unknown state, 1 represents smooth traffic state, 2 represents vehicle slowing state, 3 represents traffic congestion, and 4 represents serious congestion state.

图6为某一时刻交通态势数据在QGIS中可视化的结果,可以发现北京市五环以内区域中有交通态势返回的道路主要为城市快速路及主干路。Figure 6 shows the result of visualization of traffic situation data in QGIS at a certain time. It can be found that the roads with traffic situation return in the area within the Fifth Ring Road of Beijing are mainly urban expressways and trunk roads.

S24、道路交通出行数据的预处理:S24. Preprocessing of road traffic travel data:

(1)统一数据采样间隔:利用网络爬虫技术获取道路的实时交通态势数据,高德地图平台上的数据每2分钟更新,但因网络连接的不稳定性,交通态势数据返回的时间间隔不同,在4分钟-7分钟之间。为了方便后续的分析需要将数据采样间隔固定为5分钟。(1) Unified data sampling interval: The real-time traffic situation data of the road is obtained by using web crawler technology. The data on the AutoNavi map platform is updated every 2 minutes, but due to the instability of the network connection, the return time interval of the traffic situation data is different. Between 4 minutes and 7 minutes. In order to facilitate subsequent analysis, the data sampling interval needs to be fixed at 5 minutes.

(2)数据匹配及筛选:因为有交通态势返回的道路等级多为高速公路和主路,所以根据道路名称与道路网络连接弧数据相匹配,筛选出有交通态势数据的道路网络数据(包括道路网络节点和道路网络连接弧),剔除无效及冗余数据,从而改善数据质量。(2) Data matching and screening: Because most roads with traffic situation return are highways and main roads, the road network data with traffic situation data (including network nodes and road network connection arcs) to eliminate invalid and redundant data, thereby improving data quality.

三、将知识图谱模式层和知识图谱数据层结合生成城市交通知识图谱,并存储到数据库中;3. Combine the knowledge graph mode layer and the knowledge graph data layer to generate an urban traffic knowledge graph and store it in the database;

由于基于刷卡数据和交通态势数据的时空不同,而分别构建了公共交通数据层和城市道路交通数据层,同样需要分别生成公共交通知识图谱和道路交通知识图谱,并分别存储到数据库中,即本申请的城市交通知识图谱包括公共交通知识图谱和道路交通知识图谱。Due to the difference in time and space based on card swiping data and traffic situation data, the public transportation data layer and the urban road traffic data layer are constructed respectively. It is also necessary to generate the public transportation knowledge map and the road traffic knowledge map respectively, and store them in the database. The applied urban traffic knowledge map includes public traffic knowledge map and road traffic knowledge map.

(1)数据库的选择;(1) Selection of database;

用于存储知识图谱的数据库可以为图形数据库也可以为关系型数据库等其它存储数据库,本申请优选图形数据库作为存储数据库,图形数据库可以选用Neo4j、TitanDB等,图形数据库采用属性图模型,用节点和边组成图,通过图形化的结构直观表达信息,图形数据库特有的数据结构能有效地存储及表达知识图谱中的知识以及实体间的关联关系,如图形数据库中的节点代表实体,边代表实体间的关系,可以更直观的储存知识图谱,所以在存储知识图谱时,图形数据库的存储效果要优于关系型数据库等其他存储数据库。The database used to store the knowledge graph can be a graph database or a relational database and other storage databases. In this application, a graph database is preferred as the storage database. The graph database can be selected from Neo4j, TitanDB, etc. Edges form graphs and express information intuitively through a graphical structure. The unique data structure of graph databases can effectively store and express the knowledge in the knowledge graph and the relationship between entities. For example, nodes in a graph database represent entities, and edges represent entities. Therefore, when storing knowledge graphs, the storage effect of graph databases is better than other storage databases such as relational databases.

(2)公共交通知识图谱的存储;(2) Storage of public transportation knowledge graph;

将知识图谱模式层和公共交通知识图谱数据层结合,生成公共交通知识图谱并存储到数据库中;Combine the knowledge graph mode layer with the public transportation knowledge graph data layer to generate the public transportation knowledge graph and store it in the database;

如图7所示,图7展示了关于公共交通的城市交通本体的类间关系,本体中的类对应于图数据库中节点的标签,实体在知识图谱中以节点的形式存储。类间关系即实体间的关系分为如下几类:地铁站点与线路的所属关系(subway_station-belong-subway_line)、地铁站点实体间的相邻关系(subway_station-belong-subway_line)、公共交通出行用户与行程的归属关系(user-has-trip),行程与站点的起始和终止关系(trip-start_from/end_at-subway_station)。As shown in Figure 7, Figure 7 shows the inter-class relationship of the urban traffic ontology about public transportation, the classes in the ontology correspond to the labels of the nodes in the graph database, and the entities are stored in the form of nodes in the knowledge graph. The inter-class relationship, that is, the relationship between entities, is divided into the following categories: the relationship between the subway station and the line (subway_station-belong-subway_line), the adjacent relationship between the subway station entities (subway_station-belong-subway_line), public transportation users and The attribution relationship of the trip (user-has-trip), the start and end relationship between the trip and the station (trip-start_from/end_at-subway_station).

同一个公共交通出行用户在一周内会有多个行程,因此用户与行程间为一对多的关系,如图8所示,行程与站点之间有起始或终止的关系。The same public transportation user will have multiple trips in a week, so there is a one-to-many relationship between the user and the trip. As shown in Figure 8, there is a start or end relationship between the trip and the station.

(3)道路交通知识图谱的存储;(3) Storage of road traffic knowledge graph;

将知识图谱模式层和道路交通知识图谱数据层结合,生成道路交通知识图谱并存储到数据库中;Combine the knowledge graph mode layer with the road traffic knowledge graph data layer to generate the road traffic knowledge graph and store it in the database;

道路交通知识图谱内的实体包括领域本体中的城市道路、兴趣点、交通态势,并添加了表示时空关系数据的实体,如图9所示,9为道路交通知识图谱所包含的实体及实体之间的关联关系。The entities in the road traffic knowledge graph include urban roads, points of interest, and traffic situations in the domain ontology, and entities representing spatiotemporal relational data are added. As shown in Figure 9, 9 is the entity and the entity contained in the road traffic knowledge graph. relationship between.

其中,道路与交通态势之间的关联关系通过多步关系路径表示,日期和时间是交通态势的时间属性,道路实体首先与日期相关联,之后日期与时间实体相关联,最后时间与交通态势实体相关联,道路实体经过三步的关系路径与交通态势相关联,如图10所示。Among them, the relationship between the road and the traffic situation is represented by a multi-step relationship path. Date and time are the time attributes of the traffic situation. The road entity is first associated with the date, then the date is associated with the time entity, and the last time is associated with the traffic situation entity. The three-step relationship path of the road entity is associated with the traffic situation, as shown in Figure 10.

四、利用知识表示模型推理出新知识,补入知识图谱。Fourth, use the knowledge representation model to infer new knowledge and add it to the knowledge graph.

本申请所构建的公共交通知识图谱和道路交通知识图谱虽然将多源数据融合在了一起,但是存在数据稀疏的问题,所以需要挖掘出知识图谱中缺失的实体和潜在的关系,补入知识图谱。Although the public transportation knowledge graph and road traffic knowledge graph constructed in this application integrate multi-source data, there is a problem of sparse data, so it is necessary to excavate the missing entities and potential relationships in the knowledge graph and add it to the knowledge graph .

本申请通过知识推理模型挖掘出城市交通知识图谱中的隐含关系,以补全知识图谱,并在城市交通知识图谱上通过链接预测任务验证模型的有效性,具体步骤如下:The present application mines the implicit relationship in the urban traffic knowledge graph through the knowledge reasoning model to complement the knowledge graph, and verifies the validity of the model through the link prediction task on the urban traffic knowledge graph. The specific steps are as follows:

S1、知识推理模型的构建S1. Construction of knowledge reasoning model

常规的知识推理模型均可用于本申请去进行知识推理挖掘,例如TransE和TransH模型,但是根据本申请中知识推理的特性,优选TransD模型作为知识推理模型,该模型的主要思想为利用投影向量构建的动态映射矩阵将实体编码为关系空间中的低维嵌入向量,同时考虑了实体和关系具有不同的类型和属性,如图11所示。Conventional knowledge reasoning models can be used in this application to carry out knowledge reasoning mining, such as TransE and TransH models, but according to the characteristics of knowledge reasoning in this application, the preferred TransD model is used as a knowledge reasoning model. The main idea of the model is to use the projection vector to construct The dynamic mapping matrix of , encodes entities as low-dimensional embedding vectors in the relation space, taking into account that entities and relations have different types and properties, as shown in Figure 11.

在TransD模型中第一个向量(h,r,t)表示实体或关系的实际意义,第二个向量(hp,rp,tp)被用于映射矩阵的构建,映射矩阵由实体和关系的投影向量共同决定,可将实体从实体空间映射到向量空间中,并且用单位矩阵I初始化每个映射矩阵,用向量运算替代矩阵的乘法运算,有效降低了计算量。In the TransD model, the first vector (h, r, t) represents the actual meaning of the entity or relationship, and the second vector (h p , r p , t p ) is used for the construction of the mapping matrix, which consists of entities and The projection vector of the relationship is jointly determined, and the entity can be mapped from the entity space to the vector space, and each mapping matrix is initialized with the identity matrix I, and the vector operation is used to replace the matrix multiplication operation, which effectively reduces the amount of calculation.

Figure BDA0003673931880000131
Figure BDA0003673931880000131

Figure BDA0003673931880000132
Figure BDA0003673931880000132

其中,Mrh为头实体映射矩阵,Mrt为尾实体映射矩阵,rp为关系向量,

Figure BDA0003673931880000133
为头实体映射向量,
Figure BDA0003673931880000134
为尾实体映射向量,Imxn为单位矩阵;Among them, M rh is the head entity mapping matrix, M rt is the tail entity mapping matrix, r p is the relationship vector,
Figure BDA0003673931880000133
map vector for head entity,
Figure BDA0003673931880000134
is the tail entity mapping vector, and I mxn is the identity matrix;

将实体向量投影到关系空间中嵌入为:Project the entity vector into the relational space and embed as:

h=Mrhhh =M rh h

t=Mrttt =M rt t

h为头实体由Mrh映射后的头实体向量,t为尾实体由Mrt映射后的尾实体向量;h is the head entity vector after the head entity is mapped by M rh , t is the tail entity vector after the tail entity is mapped by M rt ;

TransD模型将三元组(h,r,t)中的关系向量r视为由头实体向量h的投影向量与尾实体向量t的投影向量在关系空间中经过平移操作得到的,即在关系空间中头实体的投影向量与关系的投影向量之和近似相等尾实体的投影向量。因此定义基于L2欧式距离的得分函数来衡量这两个向量之间的距离:The TransD model regards the relation vector r in the triplet (h, r, t) as obtained by the translation operation of the projection vector of the head entity vector h and the projection vector of the tail entity vector t in the relation space, that is, in the relation space The sum of the projection vector of the head entity and the projection vector of the relation is approximately equal to the projection vector of the tail entity. So define a score function based on L2 Euclidean distance to measure the distance between these two vectors:

Figure BDA0003673931880000141
Figure BDA0003673931880000141

模型给向量加入L2范数约束,可以使模型相关的参数变小,避免模型出现过拟合的问题,使其具有较强的泛化能力。The model adds L2 norm constraints to the vector, which can make the parameters related to the model smaller, avoid the problem of over-fitting of the model, and make it have a strong generalization ability.

||h||2≤1,||t||2≤1,||r||2≤1,||h||2≤1,||t||2≤1||h|| 2 ≤1,||t|| 2 ≤1,||r|| 2 ≤1,||h || 2 ≤1,||t || 2 ≤1

由上述得分函数可知,对于一个正确的三元组期望其得分值越大越好,而错误的三元组期望其得分值越小越好,因此本申请定义了基于距离间隔的排名损失函数,以最小化损失函数值作为模型的训练目标。From the above score function, it can be seen that for a correct triplet, the larger the score value is, the better, while the wrong triplet is expected to have a smaller score value, the better. Therefore, this application defines a distance interval-based ranking loss function. , with minimizing the loss function value as the training objective of the model.

Figure BDA0003673931880000142
Figure BDA0003673931880000142

其中,L是损失函数、γ是超参数,表示正确三元组与负三元组之间的最大间隔。[x]+=max(0,x),Δ表示正确三元组的集合,Δ′表示构建的负三元组的集合。where L is the loss function and γ is a hyperparameter representing the maximum separation between correct triples and negative triples. [x] + =max(0,x), Δ represents the set of correct triples, and Δ′ represents the set of constructed negative triples.

S2、数据集划分、模型参数设置及负样本构建;S2. Data set division, model parameter setting and negative sample construction;

S21、数据集划分;S21. Data set division;

将关系数据库中存储的城市交通知识图谱中的实体和关系数据导出,例如当采用图形数据库Neo4j存储知识图谱时,可以利用Neo4j的APOC(A Package of Components)插件,导出实体和关系数据,利用数据库的查找筛选功能将数据预处理成csv格式的数据,实体数据存储形式为实体名称加实体对应的id,关系数据存储形式为关系名称加关系对应的id,三元组存储形式为头实体id加尾实体id加关系id,并按照预定比例将三元组数据划分为训练集、测试集和验证集,预定比例按照实际需要进行选取,例如可以85%的训练集:10%的测试集:5%的验证集。Export the entity and relational data in the urban traffic knowledge graph stored in the relational database. For example, when the graph database Neo4j is used to store the knowledge graph, the APOC (A Package of Components) plug-in of Neo4j can be used to export the entity and relational data, and use the database The search and filtering function preprocesses the data into data in csv format, entity data is stored in the form of entity name plus the id corresponding to the entity, relational data is stored in the form of the relationship name plus the id corresponding to the relationship, and triples are stored in the form of the header entity id plus The tail entity id is added to the relationship id, and the triplet data is divided into training set, test set and validation set according to a predetermined ratio. The predetermined ratio is selected according to actual needs, for example, 85% of the training set: 10% of the test set: 5 % of the validation set.

S22、模型参数设置S22. Model parameter settings

TransD模型中存在许多的超参数,即该参数需要在训练中人为设置,模型中所包含的超参数有学习率α、嵌入维度k、每批样本的数量batch_size、间隔γ,各个参数设置大小对模型的影响和设置范围分别为:There are many hyperparameters in the TransD model, that is, the parameter needs to be manually set during training. The hyperparameters included in the model include the learning rate α, the embedding dimension k, the number of samples in each batch batch_size, and the interval γ. The size of each parameter is set to The influence and setting range of the model are:

①学习率α:学习率设置的过大时损失值可能会无法收敛;学习率设置过小会导致模型收敛所需的训练时间变长,本申请的学习率设置范围为{0.001,0.01,0.1};① Learning rate α: When the learning rate is set too large, the loss value may fail to converge; if the learning rate is set too small, the training time required for the model to converge becomes longer. The learning rate setting range of this application is {0.001, 0.01, 0.1 };

②嵌入维度k;嵌入维度设置过低表示能力不够,嵌入维度设置过高容易过拟合,嵌入维度设置范围为{20,50,100,150};②Embedding dimension k; if the embedding dimension is set too low, the performance is not enough, and if the embedding dimension is set too high, it is easy to over-fit, and the setting range of the embedding dimension is {20, 50, 100, 150};

③每批样本的数量batch size:batch size设置过大可能会因为内存空间有限导致程序崩溃,batch size设置过小会使模型收敛变得更加困难,因为样本量太小时得到的参数不具有代表性,影响模型的泛化性能,每批样本的数量设置范围为{64,128,256,;③The number of samples in each batch batch size: If the batch size is set too large, the program may crash due to limited memory space. If the batch size is set too small, it will make the model convergence more difficult, because the parameters obtained if the sample size is too small are not representative. , which affects the generalization performance of the model. The number of samples in each batch is set in the range of {64, 128, 256, ;

④间隔γ:设置范围为{0.25,0.5,1,2,3}。④Interval γ: The setting range is {0.25, 0.5, 1, 2, 3}.

S23、负样本的构建S23. Construction of negative samples

由于城市交通知识图谱中只存在正确的三元组,因此需要构建错误的三元组作为负样本,来参与模型的训练、验证和评估,本申请的负样本的构建方法具体为:Since there are only correct triples in the urban traffic knowledge graph, it is necessary to construct wrong triples as negative samples to participate in the training, verification and evaluation of the model. The construction method of the negative samples in this application is as follows:

S231、伯努利负采样S231, Bernoulli negative sampling

考虑到关系的不同种类,在对具有某一种关系的三元组进行负三元组构建时,实体数量较少的一方应该有更大的概率被选择来进行替换操作,统计具有某一个关系r的所有三元组,关于下列情况的数据量:Considering the different types of relationships, when constructing negative triples for triples with a certain relationship, the party with a smaller number of entities should have a greater probability to be selected for the replacement operation, and statistics with a certain relationship All triples of r, the amount of data about:

①头实体所关联的尾实体的平均数量,记为tpt① The average number of tail entities associated with the head entity, denoted as tpt

②尾实体所关联的头实体的平均数量,记为hpt②The average number of head entities associated with tail entities, denoted as hpt

Figure BDA0003673931880000151
Figure BDA0003673931880000151

随机变量X只取0和1这两个值,并且相应的概率为:The random variable X takes only two values, 0 and 1, and the corresponding probability is:

Figure BDA0003673931880000152
Figure BDA0003673931880000152

Figure BDA0003673931880000153
Figure BDA0003673931880000153

最终负样本的构造服从参数为p的伯努利分布,随机变量X的分布律为:The construction of the final negative sample obeys the Bernoulli distribution with parameter p, and the distribution law of the random variable X is:

P(X=x)=px(1-p)(1-x),x=0,1P(X=x)=p x (1-p) (1-x) ,x=0,1

对于具有关系r的某一个正确的三元组,选择头实体的概率为p,选择尾实体的概率为1-p,将概率较高的实体替换掉,以此构建负三元组。For a correct triple with relation r, the probability of selecting the head entity is p, the probability of selecting the tail entity is 1-p, and the entity with higher probability is replaced to construct a negative triple.

S232、关系类型约束S232. Relation type constraint

通过定义关系应关联的实体类型来表示关系的类型约束,利用关系类型的先验知识,由关系来决定用哪些实体来替换,定义以下变量:The type constraints of the relationship are represented by defining the entity type that the relationship should be associated with, and the prior knowledge of the relationship type is used to determine which entities to replace by the relationship, and the following variables are defined:

①满足关系类型r的领域约束内所有实体的有序索引domainr ①Ordered index domain r of all entities within the domain constraints that satisfy the relation type r

②满足关系类型r的范围约束内所有实体的有序索引ranger ② The ordered index range r of all entities within the range constraint of the relation type r is satisfied

对于具有某一关系r的全部三元组,在构建负三元组时,根据伯努利分布计算选择头或尾实体来完成替换操作的概率,若选择的是头实体则从该关系类型领域内的实体子集中选取,若选择的是尾实体则从该关系类型范围内实体子集中选取,如下式所示:For all triples with a certain relation r, when constructing a negative triple, calculate the probability of selecting the head or tail entity to complete the replacement operation according to the Bernoulli distribution. Select from the entity subset within the relationship type, if the tail entity is selected, select from the entity subset within the scope of the relationship type, as shown in the following formula:

Figure BDA0003673931880000154
Figure BDA0003673931880000154

其中,Δ′为构建的负三元组的集合,dr为满足关系类型r的领域约束内所有实体的有序索引;rr为满足关系类型r的范围约束内所有实体的有序索引,h为三元组的头实体,h'为负三元组的头实体,t为三元组的尾实体,t'为负三元组的尾实体,r为关系。Among them, Δ′ is the set of constructed negative triples, d r is the ordered index of all entities that satisfy the domain constraints of relation type r; r r is the ordered index of all entities that satisfy the range constraints of relation type r, h is the head entity of the triplet, h' is the head entity of the negative triplet, t is the tail entity of the triplet, t' is the tail entity of the negative triplet, and r is the relation.

现有技术中负三元组的构建方法为:对于任意的三元组(h,r,t),从包含所有实体的集合E中随机地抽取一个实体将原先三元组中的头实体或者尾实体替换掉,便得到一个错误的三元组,但是由于关系的种类中存在一对多、多对一和多对多,这种随机采样构造负样本的方法会引入许多错误的负样本,即假负例(False Negative)(即同时存在与三元组和负三元组中的数据)本申请通过伯努利负采样,在对具有某一种关系的三元组进行负三元组构建时,选择了实体数量较少的一方进行替换操作,大大减少了负样本中假负例的量。The construction method of the negative triplet in the prior art is: for any triplet (h, r, t), randomly extract an entity from the set E containing all entities and use the head entity or the head entity in the original triplet. If the tail entity is replaced, an erroneous triple will be obtained. However, due to the existence of one-to-many, many-to-one and many-to-many relationship types, this method of randomly sampling and constructing negative samples will introduce many wrong negative samples. That is, False Negative (that is, data in both triples and negative triples) This application uses Bernoulli negative sampling to perform negative triples on triples with a certain relationship. During construction, the party with a smaller number of entities is selected for the replacement operation, which greatly reduces the amount of false negatives in negative samples.

同时通过利用关系类型约束,提高了在构建负样本时抽取到相同类型实体来替换原有三元组的概率,有利于将相同类型实体间的距离拉大,即加大实体的向量表达之间的区别,利用关系的先验知识,由关系来决定用哪些实体来替换,可显著提高模型预测的精确度。At the same time, by using the relationship type constraint, the probability of extracting the same type of entities to replace the original triples when constructing negative samples is improved, which is beneficial to increase the distance between the same type of entities, that is, to increase the vector representation of entities. The difference, using prior knowledge of the relationship, determines which entities to replace by the relationship, can significantly improve the accuracy of the model prediction.

同时,即使负样本是采用关系类型约束方法构建的,但是仍不能避免负样本中存在少量的假负例,本申请还公开了一种过滤掉负样本中假负例的方法,将所述三元组数据(即训练集、验证集和测试集中的三元组数据)和负样本中的负三元组数据导入关系数据库中,例如可以选择PostgreSQL数据库,使用关系数据库中的查询功能将重复数据查找出来,并将所述负样本中的重复数据剔除;其中,所述重复数据为既存在于三元组数据中,又存在于负三元组数据中的数据。At the same time, even if the negative samples are constructed using the relation type constraint method, it is still unavoidable that there are a small number of false negatives in the negative samples. The present application also discloses a method for filtering out the false negatives in the negative samples. The tuple data (that is, the triplet data in the training set, the validation set and the test set) and the negative triplet data in the negative sample are imported into the relational database, for example, the PostgreSQL database can be selected, and the duplicate data can be duplicated using the query function in the relational database. Find out, and remove the duplicate data in the negative sample; wherein, the duplicate data is the data that exists both in the triplet data and in the negative triplet data.

通过过滤操作,避免了模型再训练、验证和预测时的负样本干扰,进一步提高了模型的精度。Through the filtering operation, the interference of negative samples during model retraining, validation and prediction is avoided, and the accuracy of the model is further improved.

S3、知识推理模型的训练S3. Training of knowledge inference model

将S21中划分的训练集和S22中设置的模型参数带入S1构建的推理模型中进行模型训练,同时采用S23所述的负样本构建方法对训练集中的三元组构建训练集负样本,将所述训练集负样本和训练集一起带入知识推理模型中进行模型训练,Bring the training set divided in S21 and the model parameters set in S22 into the inference model constructed by S1 for model training, and use the negative sample construction method described in S23 to construct a negative sample of the training set for the triples in the training set. The negative samples of the training set and the training set are brought into the knowledge inference model for model training,

同时,在进行模型训练时可以使用小批量随机梯度下降法(Mini-batch GradientDescent)来实现参数更新,求得损失函数的最小值,模型通过不断迭代来更新向量表示,直到模型的损失函数值收敛,或者模型已训练至最大次数,训练完成后得到实体和关系的嵌入表示,模型训练过程如图12所示。At the same time, the Mini-batch Gradient Descent method can be used to update the parameters during model training to obtain the minimum value of the loss function. The model updates the vector representation through continuous iteration until the loss function value of the model converges. , or the model has been trained to the maximum number of times, and the embedding representation of entities and relationships is obtained after the training is completed. The model training process is shown in Figure 12.

在训练过程中,手动设置的学习速率不合适会对学习效果造成不良影响,当学习率设置过大可能会导致模型不收敛,损失值不断震荡,学习率设置过小则会导致模型收敛速度较慢,需要较长的训练时间,本申请在使用小批量随机梯度下降法对参数进行更新学习时,使用adadelta方法,在训练过程中自适应的调整学习速率,防止了因手动设置的学习率不合适对学习效果造成的影响。During the training process, an inappropriate setting of the learning rate will adversely affect the learning effect. If the learning rate is set too large, the model may not converge, and the loss value will fluctuate continuously. If the learning rate is set too small, the model will converge faster. It is slow and requires a long training time. In this application, when using the small batch stochastic gradient descent method to update and learn the parameters, the adadelta method is used to adjust the learning rate adaptively during the training process, which prevents the manually set learning rate from being inconsistent. The impact of appropriateness on learning outcomes.

S4、知识推理模型的验证S4. Verification of knowledge reasoning model

将S2中划分的验证集带入S3训练好的知识推理模型中进行初步的验证和评估,并根据验证结果进行超参数的调整。The verification set divided in S2 is brought into the knowledge inference model trained in S3 for preliminary verification and evaluation, and the hyperparameters are adjusted according to the verification results.

S5、知识推理模型的评估S5. Evaluation of Knowledge Reasoning Models

本申请通过链接预测任务验证模型的有效性,并选取多个评价指标对推理模型的综合能力进行评估;This application verifies the validity of the model by linking the prediction task, and selects multiple evaluation indicators to evaluate the comprehensive ability of the inference model;

S51、链接预测S51. Link prediction

链接预测是指预测与给定实体具有特定关系的另一实体的任务,即对于一个三元组(h r t),在已知关系r和尾实体t的情况下对头实体h进行预测,表示为(?r t),或者在已知头实体h和关系r的情况下对尾实体t进行预测,表示为(h r?),或者在已知头实体h和尾实体t的情况下对关系r进行预测,表示为(h?t)。Link prediction refers to the task of predicting another entity that has a specific relationship with a given entity, i.e., for a triple (h r t), predict the head entity h given the known relationship r and tail entity t, denoted as ( ?r t), or make predictions on tail entity t given head entity h and relation r, denoted (h r ?), or make predictions on relation r given head entity h and tail entity t , denoted as (h?t).

S52、构建测试集负样本并进行排名S52, construct negative samples of the test set and rank them

采用S23所述的负样本构建方法和过滤操作,针对测试集中的三元组构建测试集负样本,以预测尾实体为例,由关系类型来决定选择实体集合中的哪些实体来代替尾实体,并在进行过滤操作后根据训练的知识推理模型中的得分函数计算该三元组得分和根据该三元组及知识图谱中的实体所构建的负三元组得分,并按照得分值由大到小对该三元组及根据该三元组及知识图谱中的实体所构建的负三元组进行排名;Using the negative sample construction method and filtering operation described in S23, a negative sample of the test set is constructed for the triples in the test set. Taking the predicted tail entity as an example, the relationship type determines which entities in the entity set are selected to replace the tail entity. And after the filtering operation is performed, the triplet score and the negative triplet score constructed according to the triplet and the entities in the knowledge graph are calculated according to the score function in the trained knowledge inference model, and the scores are calculated from the largest to the largest. Rank the triplet and the negative triplet constructed according to the triplet and entities in the knowledge graph from the smallest to the smallest;

同时对于预测头实体或者预测实体间的关系,均可用上述方法统计排名情况。At the same time, for the predicted head entity or the relationship between predicted entities, the above method can be used to count the ranking situation.

S53、评价指标S53. Evaluation indicators

本申请可以选用平均排名(mean rank,MR)、平均倒数排名(Mean ReciprocalRank,MRR),首位命中率(hits@1),前三命中率(hits@3)和前十命中率(hits@10)中的一个或多个评价指标来衡量链接预测任务完成的效果。This application can choose the mean rank (MR), the mean reciprocal rank (Mean Reciprocal Rank, MRR), the first hit rate (hits@1), the top three hit rate (hits@3) and the top ten hit rate (hits@10 ) to measure the effect of link prediction task completion.

①平均排名(MR):平均排名表示的是正确的测试集三元组在所有被替换头或尾实体得到的测试集负三元组集合中的排名次序,排名越靠前表明模型的效果越好。①Mean Rank (MR): The average rank represents the rank order of the correct test set triples in the set of test set negative triples obtained by all replaced head or tail entities. The higher the ranking, the better the effect of the model. it is good.

②平均倒数排名(MRR):平均倒数排名可以反映所有测试集中的三元组在构造的测试集负三元组列表中的整体排名情况,如下式所示。②Mean Reciprocal Rank (MRR): The average reciprocal ranking can reflect the overall ranking of the triples in all test sets in the constructed list of negative triples in the test set, as shown in the following formula.

Figure BDA0003673931880000171
Figure BDA0003673931880000171

其中,T表示所有测试集中的三元组组成的集合,|T|表示集合T的数据量,rank(h,ri,t)表示正确的三元组所对应的排名次序。Among them, T represents the set of triples in all test sets, |T| represents the amount of data in the set T, and rank(h, r i , t) represents the rank order corresponding to the correct triples.

③首位命中率(hits@1):首位命中率表示的是测试集中的三元组排在第一位的个数占测试集中全部三元组个数的比例,如下式所示。③The first hit rate (hits@1): The first hit rate represents the ratio of the number of triples ranked first in the test set to the total number of triples in the test set, as shown in the following formula.

Figure BDA0003673931880000172
Figure BDA0003673931880000172

其中,Ind(x)表示指示函数(indicator function),用于判断正确的三元组(h,r,t)是否排在第一位,若满足则输出1,否则输出0。Among them, Ind(x) represents the indicator function, which is used to judge whether the correct triple (h, r, t) is in the first place, if it is satisfied, output 1, otherwise output 0.

类似的,可定义前三命中率和前十命中率,如前十命中率(hits@10)表示的是测试集中的三元组排在前十名内的个数占测试集中全部三元组个数的比例。如下式所示。Similarly, the top three hit rate and the top ten hit rate can be defined. For example, the top ten hit rate (hits@10) indicates that the number of triples in the top ten in the test set accounts for all the triples in the test set. ratio of numbers. as shown in the following formula.

Figure BDA0003673931880000173
Figure BDA0003673931880000173

前十命中率的值越高表示模型的效果越好。The higher the value of the top ten hit rate, the better the effect of the model.

综上,利用链接预测任务评估模型的流程,如图13所示。In summary, the process of evaluating the model using the link prediction task is shown in Figure 13.

为了验证知识推理模型的选取、过滤操作和关系类型约束对预测结果的改善效果,本申请对比了有无过滤操作、有无关系类型约束和多个推理模型对预测任务结果的影响,结果如下所示:In order to verify the improvement effect of the selection of knowledge reasoning model, filtering operation and relation type constraints on the prediction results, this application compares the influence of the presence or absence of filtering operations, relation type constraints and multiple reasoning models on the prediction task results, and the results are as follows Show:

(1)有无过滤操作对预测结果的影响(1) The influence of the presence or absence of filtering operations on the prediction results

表9有无过滤操作的预测结果对比Table 9 Comparison of prediction results with and without filtering operations

Figure BDA0003673931880000174
Figure BDA0003673931880000174

Figure BDA0003673931880000181
Figure BDA0003673931880000181

如表9所示,在平均倒数排名(MRR)指标上,有过滤操作相比于无过滤操作提高了30%,在前十命中率(hits@10)指标上,有过滤操作相比于无过滤操作提高了10.2%,同时前三命中率指标和首位命中率指标也均有提高,说明有过滤操作可有效提高模型预测的准确度。As shown in Table 9, on the Mean Reciprocal Rank (MRR) indicator, the filtering operation is improved by 30% compared with the non-filtering operation. The filtering operation is increased by 10.2%, and the top three hit rate indicators and the first hit rate indicators are also improved, indicating that the filtering operation can effectively improve the accuracy of the model prediction.

(2)有无关系类型约束对预测结果的影响(2) The influence of relationship type constraints on the prediction results

在有过滤操作的基础上,对比在构建负三元组时是否利用关系类型约束得到的预测结果。有关系类型约束是指利用关系类型的先验知识,在构建三元组时由关系来决定用哪些实体来替换。未利用关系类型约束是指构造负三元组时从全部的实体中随机抽取实体进行替换,预测结果对比如表10所示。On the basis of the filtering operation, compare the prediction results obtained by using the relation type constraints when constructing negative triples. Relational type constraints refer to the use of prior knowledge of relational types to determine which entities are replaced by the relation when constructing a triple. Unused relationship type constraint refers to randomly extracting entities from all entities for replacement when constructing negative triples. The prediction results are shown in Table 10.

表10有无关系类型约束的预测结果对比Table 10 Comparison of prediction results with and without relation type constraints

Figure BDA0003673931880000182
Figure BDA0003673931880000182

如表10所示,有关系类型约束操作的正确三元组的头或尾实体的平均排名(MR)均明显优于无关系类型约束的平均排名,其余的评价指标也是如此,说明有关系类型约束可改善模型在链接预测任务的效果。As shown in Table 10, the mean rank (MR) of the head or tail entities of the correct triplet with relation type constraint operation is significantly better than that without relation type constraint, and the same is true for the rest of the evaluation metrics, indicating that there is a relation type Constraints can improve the performance of the model on the link prediction task.

(3)不同的知识推理模型对预测结果的影响(3) The influence of different knowledge inference models on the prediction results

本申请将TransD模型与经典的推理模型TransE、TransH进行对比,同样进行过滤操作和关系类型约束,三种模型在实体链接预测任务中的结果如表11~12所示。This application compares the TransD model with the classic reasoning models TransE and TransH, and also performs filtering operations and relationship type constraints. The results of the three models in the entity link prediction task are shown in Tables 11-12.

TransE模型将关系表示为头尾实体向量的平移,TransH模型将关系表示为实体向量在特定关系超平面上的平移,TransD模型是通过动态映射矩阵,将实体的向量表示投影到关系向量空间中,将关系向量看作实体投影向量的平移。不同模型的最优超参数设置如表13所示。The TransE model represents the relationship as the translation of the head and tail entity vectors, the TransH model represents the relationship as the translation of the entity vector on a specific relationship hyperplane, and the TransD model projects the vector representation of the entity into the relationship vector space through a dynamic mapping matrix. Think of the relation vector as a translation of the entity's projection vector. The optimal hyperparameter settings for different models are shown in Table 13.

表13不同模型的超参数取值Table 13 Values of hyperparameters for different models

Figure BDA0003673931880000191
Figure BDA0003673931880000191

本申请绘制了TransD模型在训练过程中,损失函数值不断下降直至收敛的过程,如图14所示,图中横轴为模型的迭代训练次数,纵轴为模型的损失函数值。This application draws a process in which the loss function value of the TransD model continues to decrease until it converges during the training process, as shown in Figure 14, where the horizontal axis in the figure is the number of iterations of the model, and the vertical axis is the loss function value of the model.

在本申请所构建的公共交通出行知识图谱的数据集上,通过链接预测任务,根据评价指标对比不同推理模型的性能与准确性,如表11所示为预测头实体的结果,表12为预测尾实体的结果。On the data set of the public transportation travel knowledge graph constructed in this application, the performance and accuracy of different reasoning models are compared according to the evaluation indicators by linking the prediction task. The result of the tail entity.

表11预测头实体的结果Table 11 Results of predicting head entities

Figure BDA0003673931880000192
Figure BDA0003673931880000192

表12预测尾实体的结果Table 12 Results for predicting tail entities

Figure BDA0003673931880000193
Figure BDA0003673931880000193

选取评价指标MRR和hists@10作为代表,绘制条形图,如图15~16所示,可以直观的对比不同模型的预测结果,发现TransD模型的效果最好,在所有评价指标上的表现均优于其他两个推理模型。如在预测头实体时,TransD模型的平均倒数排名(MRR)相比于TransH模型的平均排名倒数提高了6.6%,在预测尾实体时,该指标提高了22.4%。Select the evaluation indicators MRR and hists@10 as representatives, and draw a bar graph, as shown in Figures 15-16, which can intuitively compare the prediction results of different models. outperforms the other two inference models. For example, when predicting head entities, the mean reciprocal ranking (MRR) of the TransD model is improved by 6.6% compared to that of the TransH model, and when predicting tail entities, the metric is improved by 22.4%.

综上所述,本申请通过选用TransD模型作为知识推理模型,并通过关系类型约束构建负样本,通过过滤操作剔除了负样本中的假负例,有效的改善了预测结果的准确性,通过采用本申请所述的方法训练的知识推理模型的预测效果更优,在补全城市交通知识图谱时,推理出的实体和实体间的关系更为准确。To sum up, in this application, the TransD model is selected as the knowledge inference model, the negative samples are constructed through the relationship type constraints, and the false negative examples in the negative samples are eliminated by the filtering operation, which effectively improves the accuracy of the prediction results. The prediction effect of the knowledge inference model trained by the method described in this application is better, and the inferred entity and the relationship between the entities are more accurate when the urban traffic knowledge map is completed.

本申请还公开了一种城市交通知识图谱的构建装置,包括本体构建模块、数据获取模块、存储模块、推理模块。The present application also discloses an apparatus for constructing an urban traffic knowledge graph, including an ontology construction module, a data acquisition module, a storage module, and a reasoning module.

其中,本体构建模块,用于利用本体构建工具构建城市交通本体,形成知识图谱模式层;数据获取模块,用于获取城市交通数据,抽取出交通实体、属性和实体间的关系,构建知识图谱数据层;存储模块,用于将知识图谱模式层和知识图谱数据层结合生成城市交通知识图谱,并存储到数据库中;推理模块,用于利用知识表示模型推理出城市交通知识图谱中的新知识,补入城市交通知识图谱。Among them, the ontology building module is used to construct the urban traffic ontology by using ontology building tools to form the knowledge graph model layer; the data acquisition module is used to obtain the urban traffic data, extract the relationship between traffic entities, attributes and entities, and construct the knowledge graph data layer; the storage module is used to combine the knowledge graph mode layer and the knowledge graph data layer to generate the urban traffic knowledge graph and store it in the database; the inference module is used to use the knowledge representation model to infer new knowledge in the urban traffic knowledge graph. Add the knowledge map of urban traffic.

本申请还公开了上述任一所述的城市交通知识图谱构建方法在城市交通领域的应用,比如应用本申请的城市交通知识图谱构建方法对交通状态进行预测,以合理的安排出行时间;应用本申请的知识图谱可查询地铁站点间的最短距离路径;应用本申请的知识图谱可基于出行链相似性进行通勤人员伴随者查询。The present application also discloses the application of any of the above-mentioned methods for constructing an urban traffic knowledge graph in the field of urban traffic, for example, applying the method for constructing an urban traffic knowledge graph of the present application to predict the traffic state so as to reasonably arrange travel time; The applied knowledge graph can query the shortest distance path between subway stations; the applied knowledge graph can be used to query commuter companions based on the similarity of travel chains.

实施例1Example 1

下面将采用本申请所述方法,对某中小学附近某一道路在开学后的7点15分的速度值进行预测,验证本申请的知识推理模型是否能补全道路在某一时刻的速度值;The method described in this application will be used below to predict the speed value of a road near a primary and secondary school at 7:15 after the school starts, and verify whether the knowledge inference model of this application can complement the speed value of the road at a certain moment. ;

通过对比中小学开学前后道路的实际平均速度发现,开学后在早高峰时段道路平均速度降低的时间会提前半小时左右,如图17所示。开学后道路平均速度从早上7点开始便从45km\h下降为40km\h,当7点15分时,道路平均速度实际降到了35km\h。By comparing the actual average speed of the road before and after the school starts, it is found that the time when the average speed of the road decreases during the morning rush hour after school starts will be about half an hour earlier, as shown in Figure 17. After school started, the average road speed dropped from 45km\h to 40km\h from 7:00 in the morning. At 7:15, the average road speed actually dropped to 35km\h.

下面采用本申请所述的方法对上述道路在开学后7点15分的速度进行预测,道路交通知识图谱中所包含的实体和实体间的关系如图18所示,学校位于道路上,道路与速度值之间存在关系路径。选取某一条道路及其相关联的实体作为实验数据集,选取实体数53,关系523对,训练集417,验证集53,测试集53。针对时间与速度之间的关联关系(time_speed),以时间作为头实体,预测尾实体速度值。如下表18所示为预测尾实体时排在前五位的实体,表中加粗的表示正确尾实体。The method described in this application is used to predict the speed of the above-mentioned road at 7:15 after the school starts. The relationship between entities and entities included in the road traffic knowledge graph is shown in Figure 18. The school is located on the road, and the road and the There are relational paths between velocity values. A certain road and its associated entities are selected as the experimental data set, the number of entities is 53, the relationship is 523 pairs, the training set is 417, the verification set is 53, and the test set is 53. For the relationship between time and speed (time_speed), take time as the head entity, and predict the speed value of the tail entity. Table 18 below shows the top five entities when predicting tail entities, and the bolded ones in the table indicate the correct tail entities.

表14预测尾实体时排在前五位的实体名称Table 14 Top five entity names when predicting tail entities

Figure BDA0003673931880000201
Figure BDA0003673931880000201

如表14所示,采用本申请所述的方法预测的尾实体中排名第一的尾实体即为正确的尾实体,说明本申请的知识推理模型的预测结果较为准确,可以采用本申请的知识推理模型进行道路在某一时刻的速度值的补全。As shown in Table 14, the tail entity that ranks first among the tail entities predicted by the method described in this application is the correct tail entity, indicating that the prediction result of the knowledge inference model of this application is relatively accurate, and the knowledge of this application can be used. The inference model completes the speed value of the road at a certain moment.

以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围,其均应涵盖在本发明的权利要求和说明书的范围当中。The above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art can still Modification of the technical solutions, or equivalent replacement of some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present invention, and should be included in the scope of the technical solutions of the present invention. within the scope of the claims and description.

Claims (10)

1. A construction method of an urban traffic knowledge graph is characterized by comprising the following steps: the method comprises the following steps:
constructing an urban traffic body by using a body construction tool to form a knowledge map mode layer;
acquiring urban traffic data, extracting the relation among entities, attributes and entities, and constructing a knowledge map data layer;
combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map in a database;
and deducing new knowledge in the urban traffic knowledge map by using the knowledge representation model, and supplementing the urban traffic knowledge map.
2. The construction method according to claim 1, characterized in that: the method for constructing the urban traffic ontology comprises the following steps:
constructing an urban traffic body by adopting a seven-step method, and performing quality evaluation before creating an example;
the quality assessment specifically comprises:
verifying whether the transitivity of the hierarchical structure of the class is established or not by drawing a tree structure diagram;
checking whether the application range and the expression mode of each ontology are consistent when used everywhere, and whether redundancy of classes and ontologies appears;
checking whether the description information of the attribute is complete, whether the attribute constraint accords with the logic and whether the attribute has the sharing property;
checking the expandability of the body;
the integrity, uniqueness and logical consistency of the relationships between the classes are checked.
3. The construction method according to claim 1, characterized in that: the specific method for reasoning out new knowledge in the urban traffic knowledge map by using the knowledge representation model comprises the following steps:
dividing triple data in the urban traffic knowledge map into a training set, a verification set and a test set;
constructing a negative sample of the ternary data, and filtering false negative examples in the negative sample;
setting a knowledge representation model hyper-parameter;
training a knowledge representation model based on a small batch random gradient descent method by using a training set and a negative sample, and adaptively adjusting the learning rate in the training process by using an adapelta method;
carrying out hyper-parameter adjustment on the trained knowledge representation model by using the verification set and the negative samples;
evaluating the trained knowledge representation model by using the test set and the negative sample;
and mining implicit relations and missing entities of the urban traffic knowledge graph by using the trained knowledge representation model, and supplementing the urban traffic knowledge graph.
4. The construction method according to claim 3, wherein:
the method for constructing the negative sample of the triple data comprises the following steps:
calculating the probability of selecting a head entity or a tail entity to complete replacement operation according to Bernoulli distribution for triples with a certain relation in a training set, a verification set or a test set, and replacing the entities with higher probability;
according to the relationship type constraint, the relationship decides which entities to replace, as shown in the following formula:
Figure FDA0003673931870000011
where Δ' is the set of constructed negative triplets, d r An ordered index for all entities within the domain constraint that satisfy the relationship type r; r is r In order to satisfy the ordered indexes of all entities within the range constraint of the relationship type r, h is a head entity of the triple, h 'is a head entity of the negative triple, t is a tail entity of the triple, t' is a tail entity of the negative triple, and r is a relationship.
5. The construction method according to claim 3, wherein: the specific method for filtering the false negative examples in the negative sample comprises the following steps:
importing the ternary group data and the negative ternary group data in the negative sample into a relational database, finding out repeated data by using a query function in the relational database, and removing the repeated data in the negative sample;
wherein the repeated data is data existing in both the ternary data and the negative ternary data.
6. The construction method according to claim 3, wherein:
the knowledge representation model is a TransD model, and the TransD knowledge representation model is as follows:
mapping a matrix:
Figure FDA0003673931870000021
Figure FDA0003673931870000022
wherein M is rh Mapping matrices, M, for the head entity rt Mapping matrix, r, for tail entities p In the form of a relationship vector, the relationship vector,
Figure FDA0003673931870000023
a vector is mapped for the head entity,
Figure FDA0003673931870000024
mapping vectors for tail entities, I mxn Is an identity matrix;
projecting the entity vector into a relationship space:
h =M rh h
t =M rt t
h is head entity composed of M rh Mapped head entity vector, t Is a tail entity composed of M rt Mapping the tail entity vector;
the score function:
Figure FDA0003673931870000025
loss function:
Figure FDA0003673931870000026
where γ is a hyperparameter, indicating the maximum separation between the correct and negative triplets. [ x ] of] + Max (0, x), Δ represents the set of correct triples, and Δ' represents the set of constructed negative triples.
7. The construction method according to claim 6, wherein:
the specific method for evaluating the trained knowledge representation model by using the test set and the negative samples comprises the following steps:
for any triple in the test set, calculating a triple score and a negative triple score constructed according to the triple and entities in the knowledge graph according to a score function in a trained knowledge inference model, and ranking the triple and the negative triple from large to small according to the score value;
and measuring the completion effect of the link prediction task by adopting one or more evaluation indexes of average ranking, average reciprocal ranking, top hit rate, top three hit rate and top ten hit rate.
8. The construction method according to claim 1, characterized in that:
the urban traffic comprises public traffic and road traffic, and a public traffic knowledge map and a road traffic knowledge map are respectively constructed for the public traffic and the road traffic;
the method for acquiring the public traffic data comprises the steps of acquiring subway line and station information of a target city through a web crawler technology, and acquiring subway card swiping data of the subway line within a target time;
the method for acquiring the road traffic data comprises the steps of acquiring target road network data from a map database, and acquiring target interest point information and traffic situation data on a target road by utilizing a map API.
9. A construction device of an urban traffic knowledge map is characterized in that: the method comprises the following steps:
the body construction module is used for constructing the urban traffic body by using the body construction tool to form a knowledge map mode layer;
the data acquisition module is used for acquiring urban traffic data, extracting traffic entities, attributes and relationships among the entities and constructing a knowledge map data layer;
the storage module is used for combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map and storing the urban traffic knowledge map into a database;
and the reasoning module is used for reasoning new knowledge in the urban traffic knowledge map by using the knowledge representation model and supplementing the urban traffic knowledge map.
10. The application of the urban traffic knowledge base map construction method of any one of claims 1 to 8 in the field of urban traffic.
CN202210617739.1A 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map Pending CN114969263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210617739.1A CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210617739.1A CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Publications (1)

Publication Number Publication Date
CN114969263A true CN114969263A (en) 2022-08-30

Family

ID=82960512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210617739.1A Pending CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Country Status (1)

Country Link
CN (1) CN114969263A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN116796006A (en) * 2023-07-07 2023-09-22 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN118228812A (en) * 2024-05-24 2024-06-21 水利部交通运输部国家能源局南京水利科学研究院 Intelligent water conservancy-oriented AI knowledge base construction method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN115269931B (en) * 2022-09-28 2022-11-29 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN116796006A (en) * 2023-07-07 2023-09-22 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN116796006B (en) * 2023-07-07 2024-01-23 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN118228812A (en) * 2024-05-24 2024-06-21 水利部交通运输部国家能源局南京水利科学研究院 Intelligent water conservancy-oriented AI knowledge base construction method and system

Similar Documents

Publication Publication Date Title
CN114969263A (en) Construction method, construction device and application of urban traffic knowledge map
CN111768618A (en) Traffic congestion state propagation prediction and early warning system and method based on city portrait
CN112215427B (en) A method and system for reconstructing vehicle trajectory in the absence of bayonet data
Huang et al. Survey on vehicle map matching techniques
CN114202120B (en) Urban traffic journey time prediction method for multi-source heterogeneous data
CN111368095A (en) Architecture and method of decision support system based on water conservancy knowledge-event coupling network
CN103077604B (en) traffic sensor management method and system
CN109754594A (en) A kind of road condition information acquisition method and its equipment, storage medium, terminal
CN114664091A (en) Early warning method and system based on holiday traffic prediction algorithm
CN103050016B (en) Hybrid recommendation-based traffic signal control scheme real-time selection method
CN109859495A (en) A method of overall travel speed is obtained based on RFID data
CN113779430B (en) Road network data generation method and device, computing equipment and storage medium
CN106157624B (en) More granularity roads based on traffic location data shunt visual analysis method
CN109101649A (en) One kind can calculate road network method for building up and device
CN108829744A (en) A Travel Mode Recommendation Method Based on Contextual Elements and User Preferences
CN117407711A (en) Vehicle track prediction method based on space-time characteristics, geographic semantics and driving state
CN114819589A (en) Urban space high-quality utilization determination method, system, computer equipment and terminal
Saha et al. Network model for rural roadway tolling with pavement deterioration and repair
CN116017407A (en) A reliable identification method of residents' travel mode driven by mobile phone signaling data
CN113868492A (en) Visual OD (origin-destination) analysis method based on electric police and checkpoint data and application
CN107230350A (en) A kind of urban transportation amount acquisition methods based on bayonet socket Yu mobile phone flow call bill data
CN117275215A (en) Urban road congestion space-time prediction method based on graph process neural network
CN115565376B (en) Vehicle journey time prediction method and system integrating graph2vec and double-layer LSTM
CN116128172A (en) A method, system, device, and storage medium for generating an air-rail intermodal route
CN115730421A (en) Evaluation Method of Urban Road Network Capacity Reliability Considering Cognitive Uncertainty

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination