CN114969263A - Construction method, construction device and application of urban traffic knowledge map - Google Patents

Construction method, construction device and application of urban traffic knowledge map Download PDF

Info

Publication number
CN114969263A
CN114969263A CN202210617739.1A CN202210617739A CN114969263A CN 114969263 A CN114969263 A CN 114969263A CN 202210617739 A CN202210617739 A CN 202210617739A CN 114969263 A CN114969263 A CN 114969263A
Authority
CN
China
Prior art keywords
knowledge
data
traffic
urban traffic
knowledge map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210617739.1A
Other languages
Chinese (zh)
Inventor
谭墍元
邱倩倩
罗文秀
郭伟伟
薛晴婉
郑国荣
张帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN202210617739.1A priority Critical patent/CN114969263A/en
Publication of CN114969263A publication Critical patent/CN114969263A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/40Business processes related to the transportation industry
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A30/00Adapting or protecting infrastructure or their operation
    • Y02A30/60Planning or developing urban green infrastructure

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for constructing an urban traffic knowledge map, which comprises the following steps: constructing an urban traffic body by using a body construction tool to form a knowledge map mode layer; acquiring urban traffic data, extracting the relation among entities, attributes and entities, and constructing a knowledge graph data layer; combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map in a database; the invention further discloses a construction device and application of the urban traffic knowledge map. The invention integrates multi-source heterogeneous traffic big data by forming a traffic knowledge system by using the knowledge map, and digs out the potential relation among traffic entities by the knowledge reasoning model based on representation learning, thereby realizing the effective fusion and organization of the multi-source travel data in the traffic field and realizing the open sharing of the traffic field data.

Description

Construction method, construction device and application of urban traffic knowledge map
Technical Field
The invention belongs to the technical field of intelligent traffic, and particularly relates to a construction method, a construction device and application of an urban traffic knowledge graph.
Background
Urban traffic has strong coupling, people, vehicles, roads and environments need to be coordinated to complete integrated management and control, mass transportation data has space-time relevance, but fusion among multi-source data is insufficient, so that the transportation data can be fused by adopting a knowledge graph, the data organization form of the knowledge graph is structured, entities existing in the real world, attributes of the entities and the association relation between the two entities can be described, but the large-scale knowledge base constructed at present has the problem of data sparsity although the data scale is large.
Therefore, a better construction method of the urban traffic knowledge graph is urgently needed to solve the problem of data sparseness in the knowledge graph.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention discloses a method for constructing an urban traffic knowledge map, which can accurately deduce new knowledge, and the specific technical scheme is as follows:
a construction method of an urban traffic knowledge graph comprises the following steps:
constructing an urban traffic body by using a body construction tool to form a knowledge map mode layer;
acquiring urban traffic data, extracting the relation among entities, attributes and entities, and constructing a knowledge map data layer;
combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map into a database;
and deducing new knowledge in the urban traffic knowledge map by using the knowledge representation model, and supplementing the urban traffic knowledge map.
Further, the method for constructing the urban traffic ontology comprises the following steps:
constructing an urban traffic body by adopting a seven-step method, and performing quality evaluation before creating an example;
the quality assessment specifically comprises:
verifying whether the transitivity of the hierarchical structure of the class is established or not by drawing a tree structure diagram;
checking whether the application range and the expression mode of each ontology are consistent when used everywhere, and whether redundancy of classes and ontologies appears;
checking whether the description information of the attribute is complete, whether the attribute constraint accords with the logic and whether the attribute has the sharing property;
checking the expandability of the body;
the integrity, uniqueness and logical consistency of the relationships between the classes are checked.
Further, the specific method for reasoning out the new knowledge in the urban traffic knowledge map by using the knowledge representation model comprises the following steps:
dividing triple data in the urban traffic knowledge graph into a training set, a verification set and a test set;
constructing a negative sample of the ternary data, and filtering false negative examples in the negative sample;
setting a knowledge representation model hyper-parameter;
training a knowledge representation model based on a small batch random gradient descent method by using a training set and a negative sample, and adaptively adjusting the learning rate in the training process by using an adapelta method;
carrying out hyper-parameter adjustment on the trained knowledge representation model by using the verification set and the negative samples;
evaluating the trained knowledge representation model by using the test set and the negative sample;
and mining implicit relations and missing entities of the urban traffic knowledge graph by using the trained knowledge representation model, and supplementing the urban traffic knowledge graph.
Further, the method for constructing the negative sample of the triple data comprises the following steps:
calculating the probability of selecting a head entity or a tail entity to complete replacement operation according to Bernoulli distribution for triples with a certain relation in a training set, a verification set or a test set, and replacing the entities with higher probability;
according to the relationship type constraint, the relationship decides which entities to replace, as shown in the following formula:
Figure BDA0003673931880000021
where Δ' is the set of constructed negative triplets, d r An ordered index for all entities within the domain constraint that satisfy the relationship type r; r is a radical of hydrogen r In order to satisfy the ordered indexes of all entities within the range constraint of the relationship type r, h is a head entity of the triple, h 'is a head entity of the negative triple, t is a tail entity of the triple, t' is a tail entity of the negative triple, and r is a relationship.
Further, a specific method for filtering false negative examples in the negative sample includes:
importing the ternary group data and the negative ternary group data in the negative sample into a relational database, finding out repeated data by using a query function in the relational database, and removing the repeated data in the negative sample;
wherein the repeated data is data existing in both the ternary data and the negative ternary data.
In a further aspect of the present invention,
the knowledge representation model is a TransD model, and the TransD knowledge representation model is as follows:
mapping a matrix:
Figure BDA0003673931880000022
Figure BDA0003673931880000023
wherein M is rh Mapping matrices, M, for the head entity rt Mapping matrix, r, for tail entities p In the form of a relationship vector, the relationship vector,
Figure BDA0003673931880000024
a vector is mapped for the head entity,
Figure BDA0003673931880000025
mapping vectors for tail entities, I mxn Is an identity matrix;
projecting the entity vector into a relationship space:
h =M rh h
t =M rt t
h is head entity composed of M rh Mapped head entity vector, t Is a tail entity composed of M rt Mapping the tail entity vector;
the score function:
Figure BDA0003673931880000031
loss function:
Figure BDA0003673931880000032
where γ is a hyperparameter, indicating the maximum separation between the correct and negative triplets. [ x ] of] + Max (0, x), Δ represents the set of correct triples, and Δ' represents the set of constructed negative triples.
Further, a specific method for evaluating the trained knowledge representation model by using the test set and the negative samples comprises the following steps:
for any triple in the test set, calculating the triple score and a negative triple score constructed according to the triple and the entities in the knowledge graph according to a score function in a trained knowledge inference model, and ranking the triple and the negative triple from large to small according to the score value;
and measuring the completion effect of the link prediction task by adopting one or more evaluation indexes of average ranking, average reciprocal ranking, top hit rate, top three hit rate and top ten hit rate.
Further, in the above-mentioned case,
the urban traffic comprises public traffic and road traffic, and a public traffic knowledge map and a road traffic knowledge map are respectively constructed for the public traffic and the road traffic;
the method for acquiring the public traffic data comprises the steps of acquiring subway line and station information of a target city through a web crawler technology, and acquiring subway card swiping data of the subway line within a target time;
the method for acquiring the road traffic data comprises the steps of acquiring target road network data from a map database, and acquiring target interest point information and traffic situation data on a target road by utilizing a map API.
The invention also discloses a device for constructing the urban traffic knowledge map, which comprises the following steps:
the body construction module is used for constructing the urban traffic body by using the body construction tool to form a knowledge map mode layer;
the data acquisition module is used for acquiring urban traffic data, extracting traffic entities, attributes and relationships among the entities and constructing a knowledge map data layer;
the storage module is used for combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map and storing the urban traffic knowledge map into a database;
and the reasoning module is used for reasoning new knowledge in the urban traffic knowledge map by using the knowledge representation model and supplementing the urban traffic knowledge map.
The invention also discloses application of the urban traffic knowledge map construction method in the field of urban traffic.
By adopting the technical scheme, the invention has the beneficial effects that:
the invention integrates multi-source heterogeneous traffic big data by forming a traffic knowledge system by using the knowledge map, models the time-space relationship among traffic entities, and excavates the potential relationship among the traffic entities by a knowledge reasoning model based on representation learning, thereby realizing the effective fusion and organization of multi-source travel data in the traffic field, forming a knowledge network in the traffic field and realizing the open sharing of the traffic field data.
When the ontology is constructed, the quality evaluation step is added compared with the original seven-step method, the quality of the ontology added into the knowledge base is strictly controlled through the quality evaluation, and the reusability of the standardized ontology is realized through the evaluation from the support levels of the structure richness, the logic relation and the like of the ontology, so that the accuracy and the effectiveness of the ontology construction are ensured, and the accuracy is improved when the knowledge graph constructed by the ontology is subsequently adopted to reason new knowledge.
According to the invention, through the utilization of the relation type constraint, the probability that the same type of entity is extracted to replace the original triple is improved when the negative sample is constructed, the distance between the same type of entities is favorably enlarged, namely, the difference between vector expressions of the entities is increased, the prior knowledge of the relation is utilized, and the relation is used for determining which entities are used for replacement, so that the prediction accuracy of the knowledge inference model can be obviously improved.
Drawings
FIG. 1 is a flow chart of urban traffic knowledge base construction according to an embodiment of the present application;
FIG. 2 is a flow chart of an embodiment of the present application for building an urban traffic ontology;
FIG. 3 is a class hierarchy diagram of an urban traffic ontology according to an embodiment of the present application;
FIG. 4 is a diagram illustrating the relationship between classes of an urban traffic ontology according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a road network information visualization according to an embodiment of the present application;
FIG. 6 is a schematic view of a road traffic situation information visualization according to an embodiment of the present application;
FIG. 7 is a diagram illustrating inter-class relationships of a mass transit ontology according to an embodiment of the present application;
fig. 8 is a diagram illustrating a relationship between a user and a trip, and between a trip and a station in public transportation according to an embodiment of the present application;
FIG. 9 is an entity and relationship diagram of a road traffic knowledge-graph according to an embodiment of the present application;
FIG. 10 is a diagram illustrating a relationship between roads and traffic situations according to an embodiment of the present application;
FIG. 11 is a diagram illustrating a TransD model according to one embodiment of the present application;
FIG. 12 is a diagram of a model training process according to an embodiment of the present application;
FIG. 13 is a flow diagram of link prediction task evaluation according to an embodiment of the present application;
FIG. 14 is a graph of the variation of the loss values in the model training according to an embodiment of the present application;
FIG. 15 is a comparison graph of mean reciprocal rank results for various inference models in accordance with an embodiment of the present application;
FIG. 16 is a comparison of top ten hit results for various inference models in accordance with an embodiment of the present application;
FIG. 17 is a graph of the average speed of a street during the early rush hour before and after school according to one embodiment of the present application;
FIG. 18 is a schematic view of a road traffic knowledge map according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in FIG. 1, the invention discloses a method for constructing an urban traffic knowledge graph, which comprises the following steps:
constructing an urban traffic body by using a body construction tool to form a knowledge map mode layer;
acquiring urban traffic data, extracting the relation among entities, attributes and entities, and constructing a knowledge map data layer;
combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map in a database;
and deducing new knowledge in the urban traffic knowledge map by using the knowledge representation model, and supplementing the urban traffic knowledge map.
The invention integrates multi-source heterogeneous traffic big data by forming a traffic knowledge system by using the knowledge map, models the time-space relationship among traffic entities, and excavates the potential relationship among the traffic entities by a knowledge reasoning model based on representation learning, thereby realizing the effective fusion and organization of multi-source travel data in the traffic field, forming a knowledge network in the traffic field and realizing the open sharing of the traffic field data.
In order to facilitate the understanding of the technical scheme of the invention, the construction method of the invention is further explained below;
firstly, an urban traffic body (namely an urban traffic travel body) is constructed by using a body construction tool to form a knowledge map mode layer;
the construction of the urban transportation travel field ontology is to comprehensively consider acquired data resources for specific service scenes, consider standardization of field terms and wide applicability of concept categories, abstract a knowledge hierarchy structure of the transportation field, and define classes contained in the ontology, attributes of the classes and association relations among the classes.
The ontology defines a knowledge graph die and describes the top-level structure of the knowledge graph, so that the ontology can be constructed to analyze the system or the hierarchy of the domain knowledge and realize the reuse of the domain knowledge.
As shown in fig. 2, in the present application, the ontology is constructed by using a seven-step method published by stanford university, a method for manually constructing the ontology may be selected, and a quality evaluation part is added on the basis of the seven-step method, and a specific ontology construction flow is as follows:
the first step is as follows: determining the field of an ontology, and determining the purpose of the ontology of the field, wherein the ontology of the application relates to the field of urban traffic;
the second step is that: checking whether a reusable body exists or not, investigating whether a related body is constructed or not, and if so, directly importing the reusable body to save construction cost and time;
the third step: determining important terms, and ensuring comprehensiveness when all terms are listed;
the fourth step: defining classes and hierarchical structures thereof, the method adopts a top-down method, namely defining the widest concept in the field first and then refining, as shown in FIG. 3;
the fifth step: the attribute of the class is determined, and sufficient information cannot be provided only by the class, so that the attribute of the class needs to be defined to further describe the class, and the attribute of the class also includes defining the relationship between the classes, and some relationships exist in the urban traffic ontology, and the two classes are linked by the relationship, for example, the position of a certain interest point is on a certain road, and the relationship between the interest point and the urban road is located, as shown in fig. 4, the relationship between all classes in the urban traffic ontology is represented.
It should also be noted here that a class has inheritance, and a subclass inherits the properties of its parent class, so the properties should be placed in the broadest class, and the closer to the top level, the better, table 1 gives specific examples of several classes and properties, and as shown in table 1, the properties should be placed in the broadest class to ensure that the subclass can inherit the properties;
TABLE 1
Figure BDA0003673931880000051
Figure BDA0003673931880000061
And a sixth step: defining attribute constraints, such as limiting the types of attribute values, the number range of attribute values and the like;
the seventh step: a specific instance of the class is created after the quality assessment.
The quality evaluation follows the principles of clearness, consistency, expandability, simplicity and independent sharing, and the specific quality evaluation method comprises the following steps:
(1) firstly, evaluating a class hierarchical structure, defining a root node and a child node by drawing a tree structure diagram, wherein each subtree corresponds to independent and modularized knowledge in the field, ensuring clear hierarchy, checking whether the transitivity of the hierarchical structure is established or not, and avoiding circulation in the class hierarchical structure;
(2) evaluating the ontologies, determining that the expression of the ontologies is clear and clear, and checking whether the application range and the expression mode of each ontology are consistent when the ontologies are used everywhere, including the consistency of relational logic, whether a class with repeated definition exists or not, whether ontologies needing to be combined exist or not, and avoiding redundancy;
(3) evaluating the defined attribute and attribute constraint, and checking whether the description information of the attribute is complete and the attribute constraint accords with logic; the shareability of the attributes is evaluated to be widely applicable to a plurality of classes, but not limited to only one class;
(4) checking the expansibility of the body, and ensuring that the body can be continuously and flexibly perfected and updated along with the continuous increase and modification of classes and attributes;
(5) checking the consistency of the logic of the relationship among the classes and evaluating the integrity of the relationship among the classes, checking the ontology to cover the possible relationship among all the classes, and checking the uniqueness of the relationship among the classes, namely, checking the relationship among the classes to be only one.
As shown in fig. 2, if the ontology fails the quality evaluation, the third step should be returned to perform the ontology construction again until the constructed ontology can pass the quality evaluation, and then the specific example is created.
When the ontology is constructed, the quality evaluation step is added compared with the original seven-step method, the quality of the ontology added into the knowledge base is strictly controlled through the quality evaluation, and the reusability of the standardized ontology is realized through the evaluation from the support levels of the structure richness, the logic relation and the like of the ontology, so that the accuracy and the effectiveness of the ontology construction are ensured, and the accuracy is improved when the knowledge graph constructed by the ontology is subsequently adopted to reason new knowledge.
The ontology construction tool can be prot g 5.5.0 software, and the classes, attributes and relationships can be visually seen by the ontology constructed by the software.
Secondly, urban traffic data are obtained, the relation among traffic entities, attributes and entities is extracted, and a knowledge map data layer is constructed;
the urban traffic comprises public traffic and urban road traffic, so the urban traffic data required to be acquired by the method comprises public traffic data (namely public transport trip data) and road traffic data (namely road transport trip data), and as the card swiping data required to be acquired by the public traffic and the traffic situation data required to be acquired by the road traffic are different in time and space, a public traffic data layer and a road traffic data layer need to be respectively constructed, and a public traffic knowledge map and a road traffic knowledge map need to be respectively constructed.
S1, constructing a public transportation data layer: acquiring public transport trip data;
the method for acquiring the public transport travel data comprises the steps of acquiring subway line and station information of a target city through a web crawler technology, acquiring subway card swiping data of the subway line within a target time and preprocessing the subway card swiping data;
the specific method comprises the following steps:
s11, acquiring subway line and station information;
the method for acquiring the data of the subway station, the subway line and the like of the target city by using the internet crawler technology is characterized in that the web crawler mainly searches a website by using a unique website address (URL) and automatically captures and downloads target information from the website, and the specific operation method comprises the following steps:
step1, firstly, constructing the transmitted parameters which mainly comprise key values, city codes and city names, and after the parameters are coded by URL, initiating a request to a target HTTP interface, namely sending a request;
step2, receiving response returned by the HTTP request, and analyzing the returned data, wherein the data format is json;
step3, storing the analyzed data into a PostgreSQL database, wherein the stored data comprises information of names, numbers, serial numbers, longitude and latitude of sites, whether transfer is available, lines passing through the sites and the like of the lines and the sites as shown in table 2.
Table 2 stored route and site information
Figure BDA0003673931880000071
S12, acquiring AFC card swiping data of the subway and preprocessing the AFC card swiping data;
taking the target city as Shenzhen and the target time from 2016 (Monday) at 25 days 1 and 29 days 1 and 2016 (Friday) at 7 points-9 points 7 points, a method for acquiring and preprocessing subway card swiping data is described in detail below, and other cities can acquire and preprocess data by using the method.
The Shenzhen subway adopts a method of charging according to mileage segments, so passengers need to swipe cards when entering and leaving a station, 8 subway lines (Shenzhen subway No. 1-5 lines, No. 7 lines, No. 9 lines and No. 11 lines) are shared in Shenzhen city by 2020, 182 subway stations are used, the data in the application is Shenzhen general card swiping data, the time range is 2016 (Monday) 1 and 25 days (Monday) and 2016 (Friday) 1 and 29 days (Friday) in 2016, the card swiping record of a working day is 5 days in total, and the data format is shown in table 3.
TABLE 3 subway AFC card swiping data format
Figure BDA0003673931880000072
Figure BDA0003673931880000081
In the original AFC card swiping data of the subway, the entering and exiting station TYPEs (i.e. transaction TYPEs) need to be distinguished according to the field (COST _ TYPE), where COST _ TYPE ═ 21 indicates that the data is entering information and is in a transaction starting state, COST _ TYPE ═ 22 indicates that the data is exiting information and the transaction is in a completion state, and therefore the entering and exiting data need to appear in pairs.
(1) Screening target data: screening out the card swiping records in the morning rush hour (7-9 o' clock), counting the card swiping times of the same IC card in the morning rush hour every day, and selecting records (one in and one out) with the times of 2 times
(2) And (3) abnormal data elimination: the elimination of the inbound and outbound records is not a paired card swiping record, i.e. no or no inbound and no outbound. And deleting data missing at the station position in the card swiping record, deleting data with the same in-out station position and data with the time difference of in-out station being more than 5 hours.
(3) Adding a journey ID: two rows of data of entering and exiting stations adjacent in time are combined into one row of data, the data comprises starting and ending time and station names, and a column of fields are added to be a travel ID so as to distinguish multiple trips of a user in a week.
The format of the preprocessed subway card swiping data is shown in table 4.
TABLE 4 preprocessed subway swipe data
Figure BDA0003673931880000082
S2, constructing a road traffic data layer: acquiring and preprocessing road traffic travel data;
the method for acquiring the road traffic travel data comprises the steps of acquiring target road network data from a map database, and acquiring target interest point information and traffic situation data on a target road by using a map API.
The method comprises the following specific steps:
s21, acquiring target road network data;
step1, taking the target road as the area within the five-loop circuit of Beijing City as an example, acquiring the road data of the area within the five-loop circuit of Beijing City from the map database, for example, downloading the road data of the area within the five-loop circuit of Beijing City on the OpenStreetMap open source map database from a BBBike website; the BBBike website supports multiple data export formats (such as OSM, Shapefile, GeoJSON, etc.), can customize a downloaded map to obtain an area range, and is a preferred target road network data downloading website, although the downloading website and the map database are exemplary and not limiting.
Step2, using OSM2GMNS to output road Network data meeting GMNS standard from the downloaded target road Network, wherein GMNS is named General Modeling Network Specification, which defines a flexible and unified multi-mode traffic Network representation format, OSM2GMNS also provides a simplified intersection function, the output file comprises road nodes (node.csv) and road connecting arcs (link.csv), and the main field descriptions are shown in Table 5 and Table 6.
TABLE 5 field description of road nodes
Figure BDA0003673931880000091
TABLE 6 field description of road junction arcs
Figure BDA0003673931880000092
The node type (OSM _ HIGHWAY) comprises intersections intersected with HIGHWAYs, intersections controlled by traffic signals and intersections not controlled by the traffic signals; the road section grade (LINK _ TYPE _ NAME) is divided into a highway, a main road, a secondary main road, a branch road and a cell road.
And Step3, importing the data into the relational database, and visually displaying the data in the QGIS software.
As shown in fig. 5, taking the intersection of the jade spring road and the stone landscape mountain road as an example, the dotted line in the figure is a road connecting arc, the dot point is a road node, and one connecting arc is arranged between every two nodes.
S22, acquiring interest points of the target road;
a point of interest (POI) (point of interest) generally refers to a point in the real world having practical significance, such as facilities or buildings related to the life of people, for example, parking lots, schools, hospitals, etc., and POI data generally includes basic attributes such as name, type, address, latitude and longitude, etc.
The interest points can be obtained through a POI searching interface in the map API, the application explains the obtaining process of the interest points by taking the example of obtaining the information of middle and primary schools within five rings of Beijing city by using the Goodpasture map API,
step1, firstly, determining a query area, drawing the boundary of the five rings in Beijing by using a QGIS, exporting the boundary into a GeoJSON format, and processing the boundary into a csv file (boundary coordinate pair) with one column of longitude and the other column of latitude.
Step2 because a maximum of 1000 POI information are returned per request, a large area needs to be divided into multiple small grids. Setting the grid size to 10km by 10km, mapping the boundary coordinates onto the grids, and finally obtaining the vertex coordinates of the lower left corner of each small grid in the region.
Step3: determine the encoding of the POI type queried (141202 for middle school and 141203 for elementary school). And then sequentially acquiring POI information in each grid, wherein the request parameters are key values, vertex coordinate pairs and queried POI type codes, and the parameters are coded by URL (uniform resource locator) and then initiate a request to a target HTTP (hyper text transport protocol) interface.
Step4 the data returned in json format is parsed and stored in the database. The number of elementary schools in the five rings is 598, and the number of middle schools is 442 (where there are cases where there are multiple school districts in a school, that is, one school name corresponds to multiple POI data), and the POI information field description is shown in table 7.
TABLE 7 POI information field description
Figure BDA0003673931880000101
S23, acquiring traffic situation data of the target road;
the application explains the steps of acquiring the target road traffic data by taking the research on the association relationship between the students' schools and the road traffic situations as an example:
according to the arrangement of beginning to learn in 2020 autumn school of Beijing, the students start to learn from 8-29 th day of primary school, junior middle school and senior high school, and all grades of the primary school, junior middle school and senior high school are started from 9-7 th day of the school, and the arrival time of the students in Beijing is usually later than 7: 50, school time earlier than 16: 30; the arrival time of the middle school students is later than 7: 30, release earlier than 17: 30.
in order to research the incidence relation between the students and the road traffic situation, the time range for collecting the traffic situation is that the students begin a week before study in 2020: 24 days 8 month (monday) to 28 days 8 month (friday), one week after the student started the study: 21 days in month 9 (monday) to 25 days in month 9 (friday), with the daily collection period being 6: 30 to 10: 30(4 hours), 16 pm: 00 to 21: 00(5 hours), and collecting traffic situation data of roads in an area within five rings of Beijing City.
The traffic situation data are acquired through a traffic situation interface of the Goodpasture map API, the method for acquiring the traffic situation data is similar to the method for acquiring the POI, the data are inquired according to a rectangular area mode, but the length of a diagonal line of the rectangle is required to be less than 10 kilometers, and therefore a large area needs to be divided into a plurality of small grids to break through the limitation of the area range.
Step1, determining the query area, drawing the boundary of the five rings in Beijing by using QGIS, exporting data and processing the data into a csv file with longitude in one column and latitude in the other column, namely a boundary coordinate pair
And Step2, dividing the region into a plurality of small grids of 7km by 7km, mapping the boundary coordinates onto the grids, and finally obtaining the coordinates of the lower left corner of each small grid in the region.
And Step3, acquiring traffic situation data of roads in each grid in sequence, wherein the request parameters are key values, road grades, vertex coordinate pairs of rectangular areas and returned data format types, and the parameters are encoded by URL (uniform resource locator) and then initiate requests to a target HTTP (hyper text transport protocol) interface. Because the traffic situation service of the Gade map API limits the daily call amount of individual developers to 2000 times/day, and the excess will be blocked, a single KEY request needs to be set for more than two thousand times, the next KEY is switched, and the request is continuously sent.
And Step4, updating the road condition information every 2 minutes, analyzing the returned json data, and storing the json data in a database. As shown in table 8, the returned result includes fields such as road name, direction description, driving angle, road condition, speed (km/h), and the like.
TABLE 8 traffic situation information field description
Figure BDA0003673931880000111
The driving ANGLE (ANGLE) reflects the driving direction of the vehicle on the road, wherein the east-right direction is set to be zero, and a positive number is taken when the vehicle rotates along the counterclockwise direction, and the value range is [0,360 ]. The traffic state (STATUS) reflects the traffic state of a road, where 0 represents an unknown state, 1 represents a traffic clear state, 2 represents a vehicle slow-moving state, 3 represents traffic congestion, and 4 represents a severe congestion state.
Fig. 6 shows the visualized result of the traffic situation data in the QGIS at a certain time, and it can be found that the roads returned by the traffic situation in the area within five rings of beijing city are mainly urban expressways and main roads.
S24, preprocessing road traffic travel data:
(1) unifying data sampling intervals: the real-time traffic situation data of the road is acquired by utilizing a web crawler technology, the data on the Gaode map platform is updated every 2 minutes, but the returning time interval of the traffic situation data is different and is between 4 minutes and 7 minutes due to the instability of network connection. The data sampling interval needs to be fixed at 5 minutes for the convenience of subsequent analysis.
(2) Data matching and screening: because the road grades returned by the traffic situation are mostly expressways and main roads, the road network data (including road network nodes and road network connecting arcs) with the traffic situation data are screened out according to the matching of the road names and the road network connecting arc data, and invalid and redundant data are eliminated, so that the data quality is improved.
Combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map in a database;
because the public traffic data layer and the urban road traffic data layer are respectively constructed based on the time-space difference of the card swiping data and the traffic situation data, a public traffic knowledge map and a road traffic knowledge map are also required to be respectively generated and respectively stored in a database, namely the urban traffic knowledge map comprises the public traffic knowledge map and the road traffic knowledge map.
(1) Selecting a database;
the database used for storing the knowledge graph can be a graph database or other storage databases such as a relational database, the graph database is preferably used as the storage database, the graph database can be selected from Neo4j, TitanDB and the like, the graph database adopts an attribute graph model, a graph is formed by nodes and edges, information is visually expressed through a graphical structure, a specific data structure of the graph database can effectively store and express knowledge in the knowledge graph and the association relationship between entities, for example, the nodes in the graph database represent the entities, the edges represent the relationship between the entities, and the knowledge graph can be stored more visually, so that the storage effect of the graph database is superior to that of other storage databases such as the relational database when the knowledge graph is stored.
(2) Storing a public transport knowledge graph;
combining the knowledge map mode layer with the public transport knowledge map data layer to generate a public transport knowledge map and storing the public transport knowledge map in a database;
as shown in fig. 7, fig. 7 illustrates the inter-class relationships of urban traffic ontologies regarding public traffic, the classes in the ontology corresponding to the labels of nodes in a graph database, the entities being stored in the form of nodes in a knowledge graph. The inter-class relationships, i.e., the relationships between entities, are classified into the following categories: the method comprises the steps of (1) an affiliated relationship between subway stations and lines, (subway _ station-following-subway _ line), an adjacent relationship between subway station entities, (subway _ station-following-subway _ line), an affiliated relationship between public transport trip users and trips, (user-has-trip), and a starting and ending relationship between trips and stations (trip-start _ from/end _ at-subway _ start).
The same user on public transportation trip has multiple trips in one week, so there is a one-to-many relationship between the user and the trips, as shown in fig. 8, there is a starting or ending relationship between the trips and the stations.
(3) Storing a road traffic knowledge map;
combining the knowledge map mode layer with the road traffic knowledge map data layer to generate a road traffic knowledge map and storing the road traffic knowledge map in a database;
entities in the road traffic knowledge graph include urban roads, interest points and traffic situations in the domain ontology, and entities representing time-space relationship data are added, as shown in fig. 9, and 9 is an entity contained in the road traffic knowledge graph and an association relationship between the entities.
The association relationship between the road and the traffic situation is represented by a multi-step relationship path, the date and the time are time attributes of the traffic situation, the road entity is associated with the date first, the date is associated with the time entity later, the last time is associated with the traffic situation entity, and the relationship path of the road entity passing through three steps is associated with the traffic situation, as shown in fig. 10.
And fourthly, reasoning out new knowledge by using a knowledge representation model and supplementing a knowledge map.
The public traffic knowledge graph and the road traffic knowledge graph constructed by the method have the problem of sparse data although multi-source data are fused together, so that missing entities and potential relations in the knowledge graph need to be mined, and the knowledge graph is supplemented.
The method comprises the following specific steps of digging out the implicit relation in the urban traffic knowledge map through a knowledge inference model to complement the knowledge map, and verifying the effectiveness of the model through a link prediction task on the urban traffic knowledge map, wherein the method comprises the following specific steps:
s1 construction of knowledge inference model
Conventional knowledge inference models such as the TransE and TransH models can be used for the knowledge inference mining, but according to the characteristics of knowledge inference in the present application, the TransD model is preferably used as the knowledge inference model, and the main idea of the model is to encode an entity into a low-dimensional embedded vector in a relationship space by using a dynamic mapping matrix constructed by projection vectors, and simultaneously consider that the entity and the relationship have different types and attributes, as shown in FIG. 11.
In TransDThe first vector (h, r, t) in the model represents the actual meaning of an entity or relationship, and the second vector (h) p ,r p ,t p ) The method is used for constructing the mapping matrix, the mapping matrix is jointly determined by the entity and the projection vector of the relation, the entity can be mapped into the vector space from the entity space, each mapping matrix is initialized by the identity matrix I, and the multiplication operation of the matrix is replaced by the vector operation, so that the calculation amount is effectively reduced.
Figure BDA0003673931880000131
Figure BDA0003673931880000132
Wherein M is rh Mapping matrices, M, for the head entity rt Mapping matrix, r, for tail entities p In the form of a relationship vector, the relationship vector,
Figure BDA0003673931880000133
a vector is mapped for the head entity,
Figure BDA0003673931880000134
mapping vectors for tail entities, I mxn Is a unit matrix;
projecting the entity vector into the relationship space is embedded as:
h =M rh h
t =M rt t
h is head entity composed of M rh Mapped head entity vector, t Is a tail entity composed of M rt Mapping the tail entity vector;
the relation vector r in the triplet (h, r, t) is regarded as a projection vector of the head entity vector h and a projection vector of the tail entity vector t in the relation space through translation operation by the TransD model, namely, the sum of the projection vector of the head entity and the projection vector of the relation in the relation space is approximately equal to the projection vector of the tail entity. A scoring function based on L2 euclidean distance is therefore defined to measure the distance between these two vectors:
Figure BDA0003673931880000141
the model adds L2 norm constraint to the vector, so that the relevant parameters of the model can be reduced, the problem of overfitting of the model is avoided, and the model has strong generalization capability.
||h|| 2 ≤1,||t|| 2 ≤1,||r|| 2 ≤1,||h || 2 ≤1,||t || 2 ≤1
From the above-mentioned score function, it is better to expect that the score value of a correct triplet is larger, and the score value of an incorrect triplet is smaller, so the application defines a distance interval-based ranking loss function to minimize the loss function value as the training target of the model.
Figure BDA0003673931880000142
Where L is the loss function and γ is the hyperparameter, representing the maximum separation between the correct and negative triplets. [ x ] of] + Max (0, x), Δ represents the set of correct triples, and Δ' represents the set of constructed negative triples.
S2, dividing a data set, setting model parameters and constructing a negative sample;
s21, dividing a data set;
for example, when a knowledge graph is stored in a graph database Neo4j, entities and relationship data can be derived by using an apoc (Package of components) plugin of Neo4j, and the data is preprocessed into data in a csv format by using a search screening function of the database, wherein the entity data is stored in the form of an entity name plus an id corresponding to the entity, the relationship data is stored in the form of a relationship name plus an id corresponding to the relationship, the triple data is stored in the form of a head entity id plus a tail entity id plus a relationship id, and is divided into a training set, a test set and a verification set according to a predetermined proportion, and the predetermined proportion is selected according to actual needs, for example, 85% of the training set, 10% of the test set and 5% of the verification set.
S22 setting model parameters
Many hyper-parameters exist in the TransD model, namely the hyper-parameters need to be set artificially in training, the hyper-parameters contained in the model comprise a learning rate alpha, an embedding dimension k, the number of samples per batch, batch _ size and an interval gamma, and the influence and the setting range of each parameter on the model are respectively as follows:
learning rate α: the loss value may not converge when the learning rate setting is too large; the training time required by model convergence is prolonged due to the fact that the learning rate is set to be too small, and the learning rate setting range of the method is {0.001,0.01 and 0.1 };
embedding dimension k; the embedding dimension is set to be too low, which means that the capability is not enough, the embedding dimension is set to be too high, which is easy to overfit, and the setting range of the embedding dimension is {20,50,100,150 };
③ quantity of each batch of samples batch size: too large a batch size setting may cause program crash due to limited memory space, and too small a batch size setting may make model convergence more difficult because the parameters obtained when the sample size is too small are not representative and affect the generalization performance of the model, and the number of samples in each batch is set in the range of {64,128, 256';
interval gamma: the setting range is {0.25,0.5,1,2,3 }.
S23 construction of negative sample
Because only correct triples exist in the urban traffic knowledge graph, wrong triples need to be constructed as negative samples to participate in the training, verification and evaluation of the model, and the construction method of the negative samples specifically comprises the following steps:
s231 Bernoulli negative sampling
Considering different types of relations, when a triple with a certain relation is subjected to negative triple construction, a party with a smaller number of entities should have a greater probability to be selected for replacement operation, and all triples with a certain relation r are counted, and data volume of the following cases is considered:
(ii) the average number of tail entities associated with the head entity, denoted tpt
② the average number of head entities associated with the tail entity, is recorded as hpt
Figure BDA0003673931880000151
The random variable X takes only two values, 0 and 1, and the corresponding probability is:
Figure BDA0003673931880000152
Figure BDA0003673931880000153
and finally, the construction of the negative sample obeys Bernoulli distribution with a parameter p, and the distribution law of the random variable X is as follows:
P(X=x)=p x (1-p) (1-x) ,x=0,1
for a certain correct triple with the relation r, the probability of selecting a head entity is p, the probability of selecting a tail entity is 1-p, and the entity with higher probability is replaced, so that a negative triple is constructed.
S232, relation type constraint
The type constraints of a relationship are represented by defining the entity types with which the relationship should be associated, with a priori knowledge of the relationship types, by which entities are decided to replace, defining the following variables:
(ii) ordered index domains for all entities within the Domain constraint satisfying the relationship type r r
Second ordered index range of all entities within the scope constraint satisfying the relationship type r r
For all triples with a certain relation r, when constructing a negative triplet, calculating the probability of selecting a head or tail entity to complete a replacement operation according to bernoulli distribution, if the head entity is selected, selecting from a subset of entities in the relation type field, and if the tail entity is selected, selecting from a subset of entities in the relation type range, as shown in the following formula:
Figure BDA0003673931880000154
where Δ' is a set of constructed negative triples, d r An ordered index for all entities within the domain constraint that satisfy the relationship type r; r is r In order to satisfy the ordered indexes of all entities within the range constraint of the relationship type r, h is a head entity of the triple, h 'is a head entity of the negative triple, t is a tail entity of the triple, t' is a tail entity of the negative triple, and r is a relationship.
The construction method of the negative triplet in the prior art comprises the following steps: for any triplet (h, r, t), an entity is randomly extracted from a set E containing all entities, and a head entity or a tail entity in the original triplet is replaced, so that an erroneous triplet is obtained, but because one-to-many, many-to-one, and many-to-many exist in the types of relationships, the method for constructing Negative samples by random sampling introduces many erroneous Negative samples, namely False Negative examples (False Negative) (namely data in both the triplet and the Negative triplet) by bernoulli Negative sampling, and when the triplet with a certain relationship is constructed negatively, a party with a smaller number of entities is selected for replacement, so that the amount of False Negative examples in the Negative samples is greatly reduced.
Meanwhile, by utilizing the relation type constraint, the probability that the same type of entity is extracted to replace the original triple is improved when the negative sample is constructed, the distance between the same type of entities is favorably enlarged, namely, the difference between vector expressions of the entities is increased, the prior knowledge of the relation is utilized, the relation is used for determining which entities are used for replacement, and the accuracy of model prediction can be obviously improved.
Meanwhile, even if the negative sample is constructed by adopting a relation type constraint method, a small number of false negative examples in the negative sample still cannot be avoided, the application also discloses a method for filtering the false negative examples in the negative sample, the triple data (namely the triple data in the training set, the verification set and the testing set) and the negative triple data in the negative sample are led into a relation database, for example, a PostgreSQL database can be selected, the repeated data is searched by using the query function in the relation database, and the repeated data in the negative sample is removed; wherein the repeated data is data existing in both the ternary data and the negative ternary data.
Through the filtering operation, the negative sample interference during model retraining, verification and prediction is avoided, and the precision of the model is further improved.
S3 training of knowledge inference model
Substituting the training set divided in S21 and the model parameters set in S22 into the inference model constructed in S1 for model training, constructing training set negative samples for the triples in the training set by adopting the negative sample construction method in S23, substituting the training set negative samples and the training set into the knowledge inference model for model training,
meanwhile, when model training is performed, parameter updating can be achieved by using a small-batch random Gradient Descent method (Mini-batch Gradient) to obtain the minimum value of the loss function, the model updates vector representation through continuous iteration until the loss function value of the model converges or the model is trained to the maximum number of times, embedded representation of the entity and the relation is obtained after training is completed, and the model training process is shown in fig. 12.
In the training process, the improper meeting of manual setting's learning rate causes harmful effects to the learning effect, set up too big and probably lead to the model nonconvergence when the learning rate, the loss value constantly vibrates, the learning rate sets up the undersize then can lead to the model convergence speed slower, need longer training time, this application is when using the random gradient descent method of small batch volume to update the study to the parameter, use adapelta method, self-adaptation's adjustment learning rate in the training process, the influence that the improper of learning rate because of manual setting led to the fact the learning effect has been prevented.
S4 verification of knowledge inference model
And substituting the verification set divided in the S2 into the knowledge inference model trained in the S3 for preliminary verification and evaluation, and adjusting the hyper-parameters according to the verification result.
S5 evaluation of knowledge inference model
The method comprises the steps of verifying the effectiveness of a model by linking a prediction task, and selecting a plurality of evaluation indexes to evaluate the comprehensive capability of a reasoning model;
s51, link prediction
Link prediction refers to the task of predicting another entity that has a specific relationship to a given entity, i.e. for one triple (h r t), the head entity h is predicted with knowledge of the relationship r and the tail entity t, denoted (.
S52, constructing and ranking negative samples of the test set
By adopting the negative sample construction method and the filtering operation of S23, constructing a negative sample of the test set aiming at the triplets in the test set, taking predicting a tail entity as an example, determining which entities in the entity set are selected to replace the tail entity by the relationship type, calculating the triplets score and the negative triplets score constructed according to the triplets and the entities in the knowledge graph after the filtering operation according to the score function in the trained knowledge inference model, and ranking the triplets and the negative triplets constructed according to the triplets and the entities in the knowledge graph from big to small according to the score values;
meanwhile, for the predicted head entities or the relations among the predicted entities, the ranking condition can be counted by the method.
S53, evaluation index
The method can use one or more evaluation indexes of average Rank (MR), average Reciprocal Rank (MRR), top hit rate (hits @1), top three hit rate (hits @3) and top ten hit rate (hits @10) to measure the completion effect of the link prediction task.
Average ranking (MR): the average ranking represents the ranking order of the correct test set triples in the test set negative triple set obtained by all replaced head or tail entities, and the higher the ranking, the better the model effect.
And (2) average reciprocal ranking (MRR), wherein the average reciprocal ranking can reflect the overall ranking condition of all triples in the test set in the constructed negative triplet list of the test set, and the following formula is shown.
Figure BDA0003673931880000171
Where T represents the set of triples in all test sets, | T | represents the data volume of set T, rank (h, r) i And t) represents the ranking order corresponding to the correct triples.
Third hit rate (hits @ 1): the first hit rate is the ratio of the first-order triplet number in the test set to the total triplet number in the test set, as shown in the following formula.
Figure BDA0003673931880000172
Wherein, ind (x) represents an indicator function (indicator function) for determining whether the correct triplet (h, r, t) is arranged at the first bit, if yes, 1 is output, otherwise, 0 is output.
Similarly, a top three hit rate and a top ten hit rate may be defined, as represented by the top ten hit rate (hits @10) which is a ratio of the number of triples in the test set that are listed within the top ten triples to the number of all triples in the test set. As shown in the following formula.
Figure BDA0003673931880000173
Higher values of the top ten hits indicate better model performance.
In summary, the flow of the evaluation model using the link prediction task is shown in fig. 13.
In order to verify the improvement effect of the selection, the filtering operation and the relationship type constraint of the knowledge inference model on the prediction result, the influence of the filtering operation, the relationship type constraint and the inference models on the prediction task result is compared, and the result is as follows:
(1) influence of filtering operation on prediction result
TABLE 9 comparison of predicted results with and without Filter operation
Figure BDA0003673931880000174
Figure BDA0003673931880000181
As shown in table 9, in the average reciprocal rank (MRR) index, the filtering operation is improved by 30% compared with the non-filtering operation, in the top ten hit rate (hits @10) index, the filtering operation is improved by 10.2% compared with the non-filtering operation, and at the same time, the top three hit rate index and the top hit rate index are both improved, which indicates that the accuracy of model prediction can be effectively improved by the filtering operation.
(2) Influence of relationship type constraint on prediction result
And on the basis of the filtering operation, comparing the prediction results obtained by using relation type constraint or not when the negative triple is constructed. With relationship type constraints is meant that with a priori knowledge of the relationship type, the relationship decides which entities to replace when constructing the triplet. The unutilized relationship type constraint means that when a negative triple is constructed, entities are randomly extracted from all the entities for replacement, and the prediction result pair is shown in table 10.
TABLE 10 comparison of predicted results with and without relationship type constraints
Figure BDA0003673931880000182
As shown in Table 10, the average ranking (MR) of the head or tail entities of the correct triples for the operation of the relationship type constraint is obviously superior to that of the head or tail entities of the correct triples for the operation of the relationship type constraint, and the rest evaluation indexes are also the same, which shows that the relationship type constraint can improve the effect of the model in the link prediction task.
(3) Influence of different knowledge inference models on prediction results
The method compares the TransD model with classical reasoning models TransE and TransH, and also performs filtering operation and relationship type constraint, wherein results of the three models in an entity link prediction task are shown in tables 11-12.
The TransE model expresses the relation as the translation of a head-tail entity vector, the TransH model expresses the relation as the translation of the entity vector on a specific relation hyperplane, and the TransD model projects the vector representation of the entity into a relation vector space through a dynamic mapping matrix and takes the relation vector as the translation of the entity projection vector. The optimal hyper-parameter settings for the different models are shown in table 13.
TABLE 13 hyper-parameter values for different models
Figure BDA0003673931880000191
The application plots the process that the loss function value of the TransD model continuously decreases until convergence in the training process, as shown in FIG. 14, in the graph, the horizontal axis represents the iterative training times of the model, and the vertical axis represents the loss function value of the model.
On the data set of the public transportation travel knowledge graph constructed by the method, the performance and accuracy of different inference models are compared according to evaluation indexes by linking a prediction task, and as shown in a table 11, the result of predicting a head entity is shown, and a table 12 is the result of predicting a tail entity.
TABLE 11 prediction of head entity results
Figure BDA0003673931880000192
Table 12 results of predicting tail entities
Figure BDA0003673931880000193
And selecting the evaluation indexes MRR and hists @10 as representatives, drawing a bar graph, and visually comparing the prediction results of different models as shown in FIGS. 15-16 to find that the effect of the TransD model is the best and the performance on all the evaluation indexes is superior to that of other two reasoning models. The Mean Reciprocal Rank (MRR) of the TransD model was improved by 6.6% compared to the mean reciprocal rank of the TransH model as in predicting the head entity, and the index was improved by 22.4% in predicting the tail entity.
In summary, the method selects the TransD model as the knowledge inference model, constructs the negative sample through the relation type constraint, eliminates the false negative example in the negative sample through the filtering operation, effectively improves the accuracy of the prediction result, has better prediction effect through the knowledge inference model trained by the method, and has more accurate relationship between the inferred entity and the entity when the urban traffic knowledge graph is supplemented.
The application also discloses a construction device of the urban traffic knowledge map, which comprises a body construction module, a data acquisition module, a storage module and an inference module.
The body construction module is used for constructing the urban traffic body by using a body construction tool to form a knowledge map mode layer; the data acquisition module is used for acquiring urban traffic data, extracting the relation among traffic entities, attributes and entities and constructing a knowledge map data layer; the storage module is used for combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map and storing the urban traffic knowledge map into a database; and the reasoning module is used for reasoning new knowledge in the urban traffic knowledge map by using the knowledge representation model and supplementing the urban traffic knowledge map.
The application also discloses an application of the urban traffic knowledge map construction method in the urban traffic field, for example, the urban traffic knowledge map construction method is applied to predict traffic states so as to reasonably arrange travel time; the knowledge graph of the application can be used for inquiring the shortest distance path between subway stations; by applying the knowledge graph of the application, accompanying person query of commuters can be carried out based on travel chain similarity.
Example 1
Next, by adopting the method of the application, the speed value of a certain road near a middle and primary school at 7 o 'clock and 15 o' clock after the start of study is predicted, and whether the knowledge inference model of the application can complement the speed value of the road at a certain moment or not is verified;
by comparing the actual average speeds of the roads before and after the school of middle and primary schools, the time for the average speed of the road to decrease in the early peak period after the school is started can be advanced by about half an hour, as shown in fig. 17. The average speed of the road after the study is reduced from 45km \ h to 40km \ h from the 7 morning point, and the average speed of the road is actually reduced to 35km \ h when the 7 point is 15 minutes.
Next, the method described in the present application is used to predict the speed of the road at 7 points and 15 points after the start of study, the relationship between the entities contained in the road traffic knowledge map and the entities is shown in fig. 18, the school is located on the road, and there is a relationship path between the road and the speed value. A certain road and the related entities thereof are selected as an experimental data set, and the number of the entities is selected to be 53, the relation 523 pair, the training set 417, the verification set 53 and the test set 53. And predicting the speed value of the tail entity by taking the time as a head entity according to the association relation (time _ speed) between the time and the speed. Table 18 below shows the entities that are ranked in the first five digits when predicting the tail entity, with the bold in the table indicating the correct tail entity.
Table 14 names of entities that rank in the first five digits when predicting the tail entity
Figure BDA0003673931880000201
As shown in table 14, the tail entity with the first rank in the tail entities predicted by the method of the present application is the correct tail entity, which indicates that the prediction result of the knowledge inference model of the present application is more accurate, and the knowledge inference model of the present application may be used to complete the speed value of the road at a certain time.
The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art may still modify the technical solutions described in the foregoing embodiments, or may equally substitute some or all of the technical features; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (10)

1. A construction method of an urban traffic knowledge graph is characterized by comprising the following steps: the method comprises the following steps:
constructing an urban traffic body by using a body construction tool to form a knowledge map mode layer;
acquiring urban traffic data, extracting the relation among entities, attributes and entities, and constructing a knowledge map data layer;
combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map, and storing the urban traffic knowledge map in a database;
and deducing new knowledge in the urban traffic knowledge map by using the knowledge representation model, and supplementing the urban traffic knowledge map.
2. The construction method according to claim 1, characterized in that: the method for constructing the urban traffic ontology comprises the following steps:
constructing an urban traffic body by adopting a seven-step method, and performing quality evaluation before creating an example;
the quality assessment specifically comprises:
verifying whether the transitivity of the hierarchical structure of the class is established or not by drawing a tree structure diagram;
checking whether the application range and the expression mode of each ontology are consistent when used everywhere, and whether redundancy of classes and ontologies appears;
checking whether the description information of the attribute is complete, whether the attribute constraint accords with the logic and whether the attribute has the sharing property;
checking the expandability of the body;
the integrity, uniqueness and logical consistency of the relationships between the classes are checked.
3. The construction method according to claim 1, characterized in that: the specific method for reasoning out new knowledge in the urban traffic knowledge map by using the knowledge representation model comprises the following steps:
dividing triple data in the urban traffic knowledge map into a training set, a verification set and a test set;
constructing a negative sample of the ternary data, and filtering false negative examples in the negative sample;
setting a knowledge representation model hyper-parameter;
training a knowledge representation model based on a small batch random gradient descent method by using a training set and a negative sample, and adaptively adjusting the learning rate in the training process by using an adapelta method;
carrying out hyper-parameter adjustment on the trained knowledge representation model by using the verification set and the negative samples;
evaluating the trained knowledge representation model by using the test set and the negative sample;
and mining implicit relations and missing entities of the urban traffic knowledge graph by using the trained knowledge representation model, and supplementing the urban traffic knowledge graph.
4. The construction method according to claim 3, wherein:
the method for constructing the negative sample of the triple data comprises the following steps:
calculating the probability of selecting a head entity or a tail entity to complete replacement operation according to Bernoulli distribution for triples with a certain relation in a training set, a verification set or a test set, and replacing the entities with higher probability;
according to the relationship type constraint, the relationship decides which entities to replace, as shown in the following formula:
Figure FDA0003673931870000011
where Δ' is the set of constructed negative triplets, d r An ordered index for all entities within the domain constraint that satisfy the relationship type r; r is r In order to satisfy the ordered indexes of all entities within the range constraint of the relationship type r, h is a head entity of the triple, h 'is a head entity of the negative triple, t is a tail entity of the triple, t' is a tail entity of the negative triple, and r is a relationship.
5. The construction method according to claim 3, wherein: the specific method for filtering the false negative examples in the negative sample comprises the following steps:
importing the ternary group data and the negative ternary group data in the negative sample into a relational database, finding out repeated data by using a query function in the relational database, and removing the repeated data in the negative sample;
wherein the repeated data is data existing in both the ternary data and the negative ternary data.
6. The construction method according to claim 3, wherein:
the knowledge representation model is a TransD model, and the TransD knowledge representation model is as follows:
mapping a matrix:
Figure FDA0003673931870000021
Figure FDA0003673931870000022
wherein M is rh Mapping matrices, M, for the head entity rt Mapping matrix, r, for tail entities p In the form of a relationship vector, the relationship vector,
Figure FDA0003673931870000023
a vector is mapped for the head entity,
Figure FDA0003673931870000024
mapping vectors for tail entities, I mxn Is an identity matrix;
projecting the entity vector into a relationship space:
h =M rh h
t =M rt t
h is head entity composed of M rh Mapped head entity vector, t Is a tail entity composed of M rt Mapping the tail entity vector;
the score function:
Figure FDA0003673931870000025
loss function:
Figure FDA0003673931870000026
where γ is a hyperparameter, indicating the maximum separation between the correct and negative triplets. [ x ] of] + Max (0, x), Δ represents the set of correct triples, and Δ' represents the set of constructed negative triples.
7. The construction method according to claim 6, wherein:
the specific method for evaluating the trained knowledge representation model by using the test set and the negative samples comprises the following steps:
for any triple in the test set, calculating a triple score and a negative triple score constructed according to the triple and entities in the knowledge graph according to a score function in a trained knowledge inference model, and ranking the triple and the negative triple from large to small according to the score value;
and measuring the completion effect of the link prediction task by adopting one or more evaluation indexes of average ranking, average reciprocal ranking, top hit rate, top three hit rate and top ten hit rate.
8. The construction method according to claim 1, characterized in that:
the urban traffic comprises public traffic and road traffic, and a public traffic knowledge map and a road traffic knowledge map are respectively constructed for the public traffic and the road traffic;
the method for acquiring the public traffic data comprises the steps of acquiring subway line and station information of a target city through a web crawler technology, and acquiring subway card swiping data of the subway line within a target time;
the method for acquiring the road traffic data comprises the steps of acquiring target road network data from a map database, and acquiring target interest point information and traffic situation data on a target road by utilizing a map API.
9. A construction device of an urban traffic knowledge map is characterized in that: the method comprises the following steps:
the body construction module is used for constructing the urban traffic body by using the body construction tool to form a knowledge map mode layer;
the data acquisition module is used for acquiring urban traffic data, extracting traffic entities, attributes and relationships among the entities and constructing a knowledge map data layer;
the storage module is used for combining the knowledge map mode layer with the knowledge map data layer to generate an urban traffic knowledge map and storing the urban traffic knowledge map into a database;
and the reasoning module is used for reasoning new knowledge in the urban traffic knowledge map by using the knowledge representation model and supplementing the urban traffic knowledge map.
10. The application of the urban traffic knowledge base map construction method of any one of claims 1 to 8 in the field of urban traffic.
CN202210617739.1A 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map Pending CN114969263A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210617739.1A CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210617739.1A CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Publications (1)

Publication Number Publication Date
CN114969263A true CN114969263A (en) 2022-08-30

Family

ID=82960512

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210617739.1A Pending CN114969263A (en) 2022-06-01 2022-06-01 Construction method, construction device and application of urban traffic knowledge map

Country Status (1)

Country Link
CN (1) CN114969263A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN116796006A (en) * 2023-07-07 2023-09-22 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN118228812A (en) * 2024-05-24 2024-06-21 水利部交通运输部国家能源局南京水利科学研究院 Intelligent water conservancy-oriented AI knowledge base construction method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115269931A (en) * 2022-09-28 2022-11-01 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN115269931B (en) * 2022-09-28 2022-11-29 深圳技术大学 Rail transit station data map system based on service drive and construction method thereof
CN116796006A (en) * 2023-07-07 2023-09-22 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN116796006B (en) * 2023-07-07 2024-01-23 北京华录高诚科技有限公司 Public transport travel crowd image analysis method and system based on knowledge graph
CN118228812A (en) * 2024-05-24 2024-06-21 水利部交通运输部国家能源局南京水利科学研究院 Intelligent water conservancy-oriented AI knowledge base construction method and system

Similar Documents

Publication Publication Date Title
Ren et al. Mtrajrec: Map-constrained trajectory recovery via seq2seq multi-task learning
CN111368095B (en) Decision support system architecture and method based on water conservancy knowledge-affair coupling network
Wang et al. Visual traffic jam analysis based on trajectory data
Zheng et al. U-air: When urban air quality inference meets big data
CN114969263A (en) Construction method, construction device and application of urban traffic knowledge map
Wang et al. Crowdatlas: Self-updating maps for cloud and personal use
Wang et al. A big data approach for smart transportation management on bus network
CN111768618A (en) Traffic jam state propagation prediction and early warning system and method based on city portrait
CN109754594A (en) A kind of road condition information acquisition method and its equipment, storage medium, terminal
CN108345987B (en) Decision support system and method for evaluating influence of infrastructure construction projects of roads
Shoman et al. Deep learning framework for predicting bus delays on multiple routes using heterogenous datasets
Wang et al. Digital roadway interactive visualization and evaluation network applications to WSDOT operational data usage.
CN111199247A (en) Bus operation simulation method
Ayman et al. Data-driven prediction and optimization of energy use for transit fleets of electric and ICE vehicles
Saha et al. Network model for rural roadway tolling with pavement deterioration and repair
CN114201482A (en) Dynamic population distribution statistical method and device, electronic equipment and readable storage medium
Yu et al. City-scale vehicle trajectory data from traffic camera videos
Pei et al. Self-supervised spatiotemporal clustering of vehicle emissions with graph convolutional network
Andersen et al. An advanced data warehouse for integrating large sets of GPS data
Treboux et al. A predictive data-driven model for traffic-jams forecasting in smart santader city-scale testbed
Xu et al. Applying finite mixture models to New York City travel times
Rakow et al. Investigation of the system-wide effects of intelligent infrastructure concepts with microscopic and mesoscopic traffic simulation
Elleuch et al. Collection and exploration of GPS based vehicle traces database
Stanojevic et al. MapReuse: Recycling routing API queries
CN110189029A (en) A kind of bicycle cycling and parking demand appraisal procedure based on extensive mobile phone location data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination