CN117033366B

CN117033366B - Knowledge-graph-based ubiquitous space-time data cross verification method and device

Info

Publication number: CN117033366B
Application number: CN202311294674.2A
Authority: CN
Inventors: 王昊; 王宇翔; 周令泉; 刘凯; 李小涵; 廖通逵; 刘福权; 胡晓燕
Original assignee: Aerospace Hongtu Information Technology Co Ltd
Current assignee: Aerospace Hongtu Information Technology Co Ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-12-29
Anticipated expiration: 2043-10-09
Also published as: CN117033366A

Abstract

The invention provides a method and a device for cross-verifying ubiquitous space-time data based on a knowledge graph, wherein the method comprises the following steps: acquiring multi-source space-time data, and unifying coordinates and time of the multi-source space-time data; constructing a space-time knowledge graph based on multi-source space-time data with unified coordinates and time; and carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database. The invention relieves the problems of uneven quality of multi-source data, repeated knowledge from different data sources and inaccurate correlation between knowledge.

Description

Knowledge-graph-based ubiquitous space-time data cross verification method and device

Technical Field

The invention relates to the technical field of space-time data processing, in particular to a ubiquitous space-time data cross-validation method and device based on a knowledge graph.

Background

In recent years, with the continuous expansion of the ways of data generation and the covered field scope, the space-time data describing and recording the complexity of human society, the computer world and the material world is rapidly increased, the space-time data is more and more large in scale, and the semantics are more and more abundant. The space-time data information obtained by means of the Internet and the like not only comprises vectors, images, grids and the like with definite space references and standardization and adopts structured data stored in a general format, but also comprises a large amount of semi-structured or unstructured data stored in a non-standard and general format in texts, documents, pictures and the like, but also has definite time and positioning information. Specific types include geographic information data such as topographic data, remote sensing images, DEMs and the like; weather data: including air temperature, rainfall, etc.; traffic data: such as road network, vehicle position, traffic flow, etc.; other data such as social media data, demographics, economy, etc., sensor data, etc. The traditional method for processing the space-time data mainly comprises the steps of unifying the acquired multi-source heterogeneous space-time data information with a data standard, including a data structure, time and space references and the like, storing the data-described entity information or space-time information into a database, and selecting proper data for subsequent application analysis according to different requirements. However, the existing processing method has the problems of poor quality of multi-source data, repeated knowledge from different data sources, insufficient correlation among knowledge and the like.

Disclosure of Invention

In view of the above, the present invention aims to provide a method and a device for cross-verifying ubiquitous spatio-temporal data based on a knowledge graph, so as to alleviate the problems of poor quality of multi-source data, repeated knowledge from different data sources, and inaccurate correlation between knowledge.

In order to achieve the above object, the technical scheme adopted by the embodiment of the invention is as follows:

in a first aspect, an embodiment of the present invention provides a method for cross-verifying ubiquitous spatio-temporal data based on a knowledge graph, including: acquiring multi-source space-time data, and unifying coordinates and time of the multi-source space-time data; constructing a space-time knowledge graph based on multi-source space-time data with unified coordinates and time; and carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database.

In one embodiment, constructing a spatio-temporal knowledge graph based on multi-source spatio-temporal data unified in coordinates and time includes: acquiring a space entity and a time entity based on multi-source space-time data with unified coordinates and time, and determining a space-time knowledge graph according to the relation between the time entity and the space entity; mapping time information, space information and attribute information in the multi-source space-time data with the space-time knowledge graph to obtain a space-time data triplet of the space-time knowledge graph; and carrying out knowledge combination on the space-time data triples of the space-time knowledge graph.

In one embodiment, knowledge combining based on spatiotemporal data triplets of a spatiotemporal knowledge graph includes: calculating the spatial similarity of the spatial entities; and carrying out knowledge combination on the spatial entities with spatial similarity exceeding the similarity threshold.

In one embodiment, computing spatial similarity of spatial entities includes: when the space entity is a point entity, calculating the space similarity of the point entity by adopting the Euclidean distance; when the space entity is a line entity, decomposing the line entity into a discrete point set formed by break points, and calculating the space similarity of the line entity by utilizing the Hausdorff distance; when the space entity is a face entity, calculating the feature similarity of the face entity, and carrying out weighted summation based on the feature similarity to obtain the space similarity of the face entity; wherein, the feature similarity includes: distance similarity, shape similarity, and size similarity.

In one embodiment, performing entity space information verification on a time space knowledge graph includes: for the same space entity, extracting multi-source position information of the space entity, and determining the position information as candidate address information; word segmentation is carried out on the candidate address information, and the confidence coefficient of the candidate address information is calculated based on a standard address database and word segmentation results; sequencing the candidate address information according to the order of the confidence level from high to low, and selecting a preset number of candidate address information as new candidate addresses based on the sequencing result; based on the geographic coordinate information of the new candidate address, calculating the relative distance between every two geographic coordinate information, and determining two points with the minimum relative distance as the final candidate address; determining address information with higher confidence in the two final candidate addresses as a unique position information identifier of the space entity; a multi-scale geocode index is constructed based on the unique location information identification.

In one embodiment, for the same spatial entity, extracting multi-source location information of the spatial entity further comprises: for location information lacking address name information, location information corresponding to the geographic location coordinates is determined based on the geographic location coordinates and the open source map service.

In one embodiment, the verifying entity time sequence information of the time space knowledge graph includes: acquiring a heterogeneous time sequence data set describing the same entity information; constructing a corresponding fitting model for each heterologous time sequence data in the heterologous time sequence data set; fitting the heterogeneous time sequence data in the heterogeneous time sequence data set based on a fitting model to obtain a plurality of fitting value sets, and calculating the mean square error of each fitting value set and the heterogeneous time sequence data set; based on a minimum mean square error criterion, determining a fitting value set with the minimum mean square error as a best fitting value set, and calculating a relative error between the best fitting value set and the heterogeneous time sequence data set; data having a relative error less than the error threshold is combined.

In a second aspect, an embodiment of the present invention provides a knowledge-graph-based ubiquitous spatiotemporal data cross-validation apparatus, including: the data acquisition module is used for acquiring multi-source space-time data and unifying the coordinates and time of the multi-source space-time data; the map construction module is used for constructing a space-time knowledge map based on multi-source space-time data with unified coordinates and time; and the verification module is used for carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database.

In a third aspect, an embodiment of the present invention provides an electronic device comprising a processor and a memory storing computer executable instructions executable by the processor to perform the steps of the method of any one of the first aspects described above.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the method of any of the first aspects provided above.

The embodiment of the invention has the following beneficial effects:

the method and the device for cross-verifying the ubiquitous space-time data based on the knowledge graph provided by the embodiment of the invention firstly acquire multi-source space-time data and unify the coordinates and time of the multi-source space-time data; then constructing a space-time knowledge graph based on the multi-source space-time data with unified coordinates and time; and finally, carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database. According to the method, the space-time knowledge graph is established, and the space-time data from different knowledge sources are subjected to heterogeneous data integration under the same frame specification, so that high-quality space-time knowledge is obtained; and simultaneously, the quality evaluation of the heterogeneous information is realized by cross-verifying the space position information and cross-verifying the multi-source time sequence information describing the same information under the same entity at the time and space layers respectively, so that the problems of good quality of the multi-source data, repeated knowledge from different data sources and inaccurate association between the knowledge are solved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a ubiquitous space-time data cross-validation method based on a knowledge-graph, which is provided by an embodiment of the invention;

FIG. 2 is a flowchart of another ubiquitous spatiotemporal data cross-validation method based on knowledge-graph according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of a ubiquitous space-time data cross-validation device based on a knowledge graph according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

At present, the existing multi-source space-time data processing method mainly has the following problems.

1) The association relationship between the multi-source data is missing: because the space-time data is wide in source, the quality of knowledge is good and uneven, and the data from different data sources describing the same knowledge often have large differences, meanwhile, the data of different data sources often only pay attention to the information in the current field, and potential dependency and mutual influence between different data sources are ignored.

2) Lack of data validation for temporal and spatial levels: due to the fact that the data source acquisition modes are different, or the reason that the data source acquisition modes are different in the application process or the processing modes are different in the application process, deviation of the summarized results can be caused in the process of simply summarizing the multi-source data in different fields, abnormal conditions of multi-source inconsistency are formed, and the abnormal conditions are specifically represented in the modes that the change trends of the multi-source measurement values are parallel, opposite or random deviation and the like. Because of the multisource nature of sources and the diversity of data quality, the space-time data often corresponds to a plurality of space position information by the same space-time entity, the traditional data processing process often ignores the verification of the space position information, and only the space position information of a single data source is adopted for processing, so that the possibility of errors of the position information is high. For multi-source time sequence data, the verification is usually carried out only on the time sequence information of a single data source, and although the reasonability of the data in the current field is verified, the correctness and the relativity between multi-source associated information corresponding to the same entity are ignored, so that the credibility of the data after fusion is greatly influenced.

3) Spatio-temporal data management and indexing lacks a systematic, structured approach: along with the continuous expansion of the space-time information range and the wide association and fusion based on the time and space relationship, the attribute of the data element and the recorded entity information are continuously rich, and the multi-source space-time information processing process presents the complicated features of pedigree and nonlinearity. How to effectively manage the data characteristics of the multi-source space-time information and the iterative fusion process thereof, and to better organize and manage the space-time data, mine the data relationship and display the data content and relationship in a structured way under the management architecture taking a data lineage as a core, is a new requirement for space-time data management.

Based on the method and the device for cross-verifying the ubiquitous space-time data based on the knowledge graph, provided by the embodiment of the invention, the problems of uneven quality of multi-source data, repeated knowledge from different data sources and inaccurate correlation between the knowledge can be relieved.

For the sake of understanding the present embodiment, a method for cross-verifying ubiquitous spatio-temporal data based on a knowledge graph disclosed in the present embodiment is first described in detail, and the method may be executed by an electronic device, such as a smart phone, a computer, a tablet computer, etc. Referring to a flowchart of a knowledge-graph-based ubiquitous spatio-temporal data cross-validation method shown in fig. 1, it is shown that the method mainly includes the following steps S101 to S103:

step S101: and acquiring multi-source space-time data, and unifying coordinates and time of the multi-source space-time data.

In one embodiment, the space-time range of interest can be defined according to actual demands, methods such as web crawlers, construction of templated feature words, participation of space filtering conditions in crawling processes and the like are utilized to obtain multi-source space-time data comprising geographic information data, meteorological data, traffic data, population, economy and the like and including time and space features, and coordinate and time systems of the multi-source space-time data are unified, specifically, a 2000 national geodetic coordinate system (CGCS 2000), gaussian grid projection and land area part adopt 1985 national elevation reference unified coordinates, and Beijing time unified time system can be adopted.

Step S102: and constructing a space-time knowledge graph based on the multi-source space-time data with unified coordinates and time.

In one embodiment, a top-down method is adopted to construct a space-time knowledge graph including a concept layer and a data layer according to the acquired multi-source space-time data, and a large-scale unified knowledge base is created to realize knowledge extraction, knowledge fusion and index establishment of multi-source heterogeneous space-time data.

Step S103: and carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database.

In one embodiment, a method of confidence coefficient combined with relative distance calculation can be adopted, and multi-source position information corresponding to the geocode of the same spatial entity is cross-verified by utilizing the uniqueness of the geocode of the spatial entity, so that the accuracy of the data position information contained in the same spatial entity is ensured; meanwhile, a joint cross verification method can be utilized, a best fit model is extracted by analyzing the fitting effect of the fitting model established by different data source data on other data, and the heterogeneous data fusion and the verification of entity time sequence information are realized by error analysis of the fitting result of the best fit model on the different data source data.

According to the ubiquitous space-time data cross-validation method based on the knowledge graph, provided by the embodiment of the invention, the space-time knowledge graph is established, and the space-time data from different knowledge sources are subjected to heterogeneous data integration under the same frame specification, so that high-quality space-time knowledge is formed; meanwhile, in the time and space layers, through cross verification of space position information and cross verification of multi-source time sequence information describing the same information under the same entity, the evaluation of heterogeneous information is realized, and therefore the problems of poor quality of multi-source data, repeated knowledge from different data sources and inaccurate association between the knowledge are solved.

In one embodiment, for the aforementioned step S102, that is, when constructing a spatio-temporal knowledge graph based on multi-source spatio-temporal data with unified coordinates and time, the following means may be adopted, including but not limited to:

firstly, acquiring space entities and time entities based on multi-source space-time data with unified coordinates and time, and determining a space-time knowledge graph according to the relation between the time entities and the space entities.

In the implementation, a large-scale unified knowledge base is created from the top layer, the spatial relation among the spatial information, the time information and the attribute information is analyzed, the spatial entity and the time entity are obtained, the space-time knowledge composition content of the spatial entity and the time entity is summarized, the time concept and the spatial concept are utilized to represent the space-time knowledge map, the implementation is carried out by adopting the ontology, and the ontology can be constructed by adopting a manual editing and computer-aided mode.

And then mapping the time information, the space information and the attribute information in the multi-source space-time data with the space-time knowledge graph to obtain a space-time data triplet of the space-time knowledge graph.

In the implementation, mapping time information, space information and attribute information in the multi-source space-time data with an ontology in a space-time knowledge graph, wherein the attribute information can be used as an attribute value to form a triplet representation with the time and space mapping of the space-time data.

And finally, carrying out knowledge combination on the space-time data triples of the space-time knowledge graph.

In the implementation, because repeated data may exist in the multi-source space-time data from different data sources, in order to reduce knowledge repetition, the embodiment of the invention can perform entity alignment and coreference resolution, specifically, a similarity calculation-based method can be adopted to combine entities with the same meaning but different identifiers (i.e. from different data sources), so as to realize disambiguation and coreference resolution of semantics of multiple entities with the same name, multiple names, abbreviations and the like, and fusion and association of space-time information of the multiple sources about the same entity or concept, and on the basis, convergence and fusion of the multi-source data are realized.

In one embodiment, in knowledge-combining spatio-temporal data triplets based on spatio-temporal knowledge-maps, the following approaches may be used, including but not limited to: firstly, calculating the spatial similarity of spatial entities; then, the spatial entities with spatial similarity exceeding the similarity threshold are knowledge combined.

For different entity types, different similarity calculation methods can be adopted, and the method can be specifically divided into the following three types:

(1) When the space entity is a point entity, the Euclidean distance is adopted to calculate the space similarity of the point entity.

Specifically, the calculation formula of the euclidean distance is as follows:

wherein,Drepresenting point entity [ ]x ₁ ，y ₁ ) Sum point entity [ ]x ₂ ，y ₂ ) The euclidean distance between them, i.e. the spatial similarity.

(2) When the space entity is a line entity, decomposing the line entity into a discrete point set formed by folding points, and calculating the space similarity of the line entity by utilizing the Hausdorff distance.

Specifically, for a line entity, the similarity of the line entity can be calculated by decomposing the line entity into a discrete point set consisting of folding points and utilizing Hausdorff distance. The Hausdorff distance calculation principle is as follows:

for a given two point setAnd->There is a unidirectional Hausdorff distance between the point sets, namely:

In the method, in the process of the invention,and->Respectively is dot set->And->Any one of (2) is>Representing the euclidean distance between two points,max{ } summin{ } represents choosing the maximum and minimum of the distance set. Hausdorff distance is the larger of two unidirectional Hausdorff distances, also known as bi-directional Hausdorff distance, expressed as:

wherein,namely, the spatial similarity of the line entities.

(3) When the space entity is a face entity, calculating the feature similarity of the face entity, and carrying out weighted summation based on the feature similarity to obtain the space similarity of the face entity; wherein, the feature similarity includes: distance similarity, shape similarity, and size similarity.

Specifically, for the line entity, the spatial similarity of the plane entity is calculated by using the distance similarity, the shape similarity and the size similarity.

First, distance similarity is calculated. For an entity to be matched (i.e., entities from different data sources), determining a minimum circumscribed rectangle (Minimum Bounding Rectangle, MBR) of the entity to be matched, and determining an entity that the entity to be matched intersects the MBR as a candidate matching entity (set); then, the shape center point of the candidate matching entity is determined by adopting a template accumulation method, and the distance similarity is calculated through the following formula.

In the method, in the process of the invention,representation entity->Shape center point +.>And candidate matching entity->Shape center point +.>And U represents the maximum value of the distance between any boundary points of polygons of the entity to be matched.

Next, the shape similarity is calculated. Aligning the shape center points of the entity A and the candidate matching entity B by translating and rotating the candidate matching entity BAnd the orientation of the principal axis of the shape, and setting the coordinates of boundary points of the candidate matching entity B asAligned boundary point coordinates (++>) The method comprises the following steps:

in the method, in the process of the invention,indicating the angle of rotation of candidate matching entity B +.>Representing the shape center point coordinates of the translated candidate matching entity B, i.e., the shape center point coordinates of the entity a.

Further, a shape description function is set as:

the shape describing function is formed by points on the boundary of the polygonTo the shape center->Distance of->As the value of the shape describing function, a certain matching start point is bordered +.>To any point on the boundary->Is +.>As a parameter of the shape description function.

Method for calculating shape similarity by adopting absolute distance calculation between vectorsThe formula is as follows:

where A, B denotes the face entity to be matched,the shape description function representing the entity a to be matched is +. >Function value of point, /)>Representation band matchingThe shape description function of entity B is +.>The function value of the point,nfor passing->The number of boundary points of the shape to be matched, which is calculated by definition of the number, and U is the maximum value of the shape description function of the entity to be matched.

Then, the size similarity of the two surface entities is calculated by the area of the surface entities, and the calculation formula is as follows:

in the method, in the process of the invention,representing the area of the entity a to be matched, +.>Representing the area of the entity B to be matched.

Finally, through the three feature similarities, the total similarity of the entities A, B to be matched, namely the spatial similarity of the face entities, can be calculated：

Where q is the number of feature similarities employed, in this embodiment q=3,representing feature similarity>The weights representing the feature similarities may be determined by expert experience.

In one embodiment, for the aforementioned step S103, that is, when verifying entity space information on the time space knowledge-graph, the following steps a1 to a6 may be mainly included, but not limited to, the following methods may be adopted:

step a1: for the same spatial entity, multi-source location information of the spatial entity is extracted and the location information is determined as candidate address information.

In the implementation, under the same spatial entity, the multi-source position information corresponding to the spatial entity can be extracted by using a natural language processing technology, including: address name information and geographical coordinate information, and determines location information as candidate address information.

For location information lacking address name information, location information corresponding to the geographic location coordinates is determined based on the geographic location coordinates and the open source map service. Specifically, for the location information lacking address name information, the address information under the geographic location coordinate may be acquired by using an open-source map service. If the space entity corresponds to only one position information source, the position information is used as the position information of the space entity, and the process is ended.

Step a2: and performing word segmentation on the candidate address information, and calculating the confidence coefficient of the candidate address information based on the standard address database and the word segmentation result.

In specific implementation, the standard address database can be combined, normalized address coding is relied on, matching calculation is carried out by using a word segmentation algorithm and a confidence screening method, the confidence of candidate address information is calculated, and the confidence algorithm is as follows:

where D represents the confidence level,represents a weight coefficient, n represents the number of word segments of candidate address information, ++>Representing the position coefficient>The expression representing the similarity is as follows:

wherein Le is word segmentation length of candidate address information to be matched, and La is length of address data in a standard address database. In this embodiment, for the case where the candidate address information to be matched completely contains address data, α=0.98, n=1; for address segmentation, α=0.98; for normal segmentation, α=0.60.

Step a3: and sequencing the candidate address information according to the order of the confidence level from the high confidence level to the low confidence level, and selecting the preset number of candidate address information as new candidate addresses based on the sequencing result.

In specific implementation, the confidence levels of the candidate address information can be ranked in order from large to small, and three (i.e. a preset number) candidate address information with the top three confidence levels are selected as new candidate addresses. If there are only two candidate address information, go to step 5.

Step a4: based on the geographical coordinate information of the new candidate address, a relative distance between every two geographical coordinate information is calculated, and two points with the minimum relative distance are determined as final candidate addresses.

In the implementation, the relative distance between the geographic coordinate information of every two candidate addresses can be calculated by utilizing Euclidean distance according to the geographic coordinate information sent by the new candidate addresses obtained through confidence level screening, and two points with the minimum relative distance are selected as final candidate addresses. The Euclidean distance calculation formula is as follows:

in the method, in the process of the invention,Drepresenting geographical coordinate information [ ]x ₁ ，y ₁ ) And geographic coordinate information [ ]x ₂ ，y ₂ ) Between (a) and (b)Euclidean distance.

Step a5: and determining the address information with higher confidence in the two final candidate addresses as the unique position information identification of the space entity.

In the implementation, address information with higher confidence degree can be selected from two final candidate addresses based on the confidence degree to serve as final effective space data for matching and hooking, and the address information is used as the unique position information identification of the space entity.

Step a6: a multi-scale geocode index is constructed based on the unique location information identification.

In particular implementations, a multi-scale geocoding indexing mechanism may be constructed based on unique geographic location information identifiers. The tile pyramid method can be adopted, the tile pyramid is a multi-resolution hierarchical model, the resolution is lower and lower from the bottom layer to the top layer of the tile pyramid, and the geographical range of the representation is unchanged. The space-time data instance and the space-time ontology in the space-time knowledge graph are mapped with each other, one space-time data instance has a series of tile codes with different scales, namely, the tile where the space-time data instance is located can be found in each resolution level of the tile pyramid, so that the multi-scale space query of the space-time data instance is supported.

Further, for the aforementioned step S103, that is, when verifying the entity time series information for the time-domain knowledge-graph, the following methods may be used, including but not limited to, the following steps b1 to b5:

Step b1: a heterogeneous time series data set describing the same entity information is obtained.

In a specific implementation, a heterogeneous time sequence data set r= { describing the same entity information is acquired、/>、....../>}(n>=2), wherein%>、/>、....../>Respectively, the heterogeneous time sequence data which are corresponding to the same attribute information describing the same spatial entity.

Step b2: and constructing a corresponding fitting model for each heterologous time sequence data in the heterologous time sequence data set.

In particular, from heterologous timing dataInitially, the time sequence data +.>={/>，/>... Once (all) as a sample of the sample, according to sample data->The variation characteristics of each heterologous time sequence data are established to be corresponding to a fitting model f #)。

Step b3: and fitting the heterogeneous time sequence data in the heterogeneous time sequence data set based on the fitting model to obtain a plurality of fitting value sets, and calculating the mean square error of each fitting value set and the heterogeneous time sequence data set.

In the concrete implementation, the fitting model f is utilized respectively) Divide +.>Fitting is performed to other heterogeneous time sequence data of the data to obtain a plurality of fitting value sets:

={/>}

fitting the set of valuesComparing the value corresponding to the true value set (namely the heterogeneous time sequence data set R), and calculating the mean square error, wherein the calculation formula is as follows:

In the method, in the process of the invention,representing the data +.>Fitting the mean square error of the data to the samples, +.>Representing divisor dataAll heterologous time series data amounts except +.>Representing the true value +_>Representing the fit value. Based on the above, the embodiment of the invention can obtain the mean square error corresponding to the fitting values of all the heterologous time sequence data in the heterologous time sequence data set R>、/>、/>。

Step b4: and determining a fitting value set with the minimum mean square error as a best fitting value set based on a minimum mean square error criterion, and calculating the relative error between the best fitting value set and the heterogeneous time sequence data set.

In the concrete implementation, based on the minimum mean square error criterion, a fitting model f with the minimum mean square error is taken) Corresponding fitting value set +.>As a best fit value set, and calculate the best fit set +.>The relative error between the fit value and the true value in the model is calculated as follows:

in the method, in the process of the invention,representing the relative error.

Step b5: data having a relative error less than the error threshold is combined.

In practice, the confidence level α=0.1, i.e. when the relative error is<0.1, regarding the data as normal data and carrying out data merging; if->>0.1, marking the data as an abnormal constant Data merging is not performed. Based on the conditions, the normal data of the same information corresponding to the same space entity can be summarized into the same time sequence uniformly according to the time distribution for subsequent analysis and application.

The method provided by the embodiment of the invention has the following technical effects:

(1) The space-time knowledge graph is utilized to realize the refinement and the structural organization of space-time data management: establishing a corresponding space-time knowledge graph by utilizing the special time and space characteristics of the space-time data, establishing association of the space-time data in scattered and unordered states by means of an inherent entity alignment method of the space-time knowledge graph, and establishing a multi-scale geocoding index structure based on geocoding by combining a tile pyramid structure based on space-time geocoding, so that the storage, sharing and classification of the geospatial knowledge are converted from scattered and unordered to structured; and meanwhile, the heterogeneous data integration, disambiguation, processing, reasoning verification, updating and other steps are carried out on the knowledge of different knowledge sources under the same frame specification, so that the fusion of data, information, methods, experiences and human ideas is achieved, a high-quality knowledge base is formed, the conversion from space-time data to space-time knowledge is realized, and technical support is provided for the systematic and structured management application of the space-time data.

(2) The invention provides a multi-source data cross-validation method, which comprises the following steps: aiming at the characteristics of space-time data, a cross verification mode combining confidence and relative distance is adopted at the space-time level, so that accurate positioning of entity space information by utilizing multi-source position information is realized; in the multi-source time sequence data describing the same information, a joint cross verification method is adopted, the multi-source time sequence information is summarized uniformly, data fitting is carried out by taking different source data as samples, verification and synthesis of the multi-source time sequence data are realized based on a minimum mean square error criterion and relative error joint constraint, data meeting the conditions are fused, and data not meeting the conditions are marked for comprehensive processing in subsequent application.

In order to facilitate understanding, the embodiment of the invention also provides a specific knowledge-graph-based ubiquitous space-time data cross-validation method, which is shown in fig. 2, and mainly comprises the following steps:

step 1: and acquiring space-time data.

Step 2: unifying the coordinates and time system of the space-time data.

Step 3: constructing a time entity and a space entity.

Step 4: and extracting space-time knowledge, and constructing a space-time data example triplet.

Specifically, mapping time information, space information and attribute information of the spatiotemporal data with the ontology of the spatiotemporal knowledge graph to form a spatiotemporal data example triplet.

Step 5: entity linking and knowledge merging.

Specifically, knowledge combination is performed by calculating the point entity similarity, the line entity similarity and the plane entity similarity.

Step 6: entity location information cross-validation.

Step 7: multi-scale geocoding index implementation.

Step 8: multi-source timing information cross-validation under the same entity.

Step 9: high-quality space-time knowledge is generated and stored in a space-time database.

It should be noted that, the implementation principle and the technical effects of the method provided in this embodiment are the same as those of the foregoing method embodiment, and are not described herein again.

For the knowledge-graph-based ubiquitous space-time data cross verification method provided in the foregoing embodiment, the embodiment of the present invention further provides a knowledge-graph-based ubiquitous space-time data cross verification device, referring to a schematic structure diagram of the knowledge-graph-based ubiquitous space-time data cross verification device shown in fig. 3, which illustrates that the device mainly includes the following parts:

the data acquisition module 301 is configured to acquire multi-source spatiotemporal data, and unify coordinates and time of the multi-source spatiotemporal data.

The map construction module 302 is configured to construct a spatiotemporal knowledge map based on multi-source spatiotemporal data with unified coordinates and time.

And the verification module 303 is configured to perform entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and store the space-time knowledge in the space-time database.

According to the ubiquitous space-time data cross-validation device based on the knowledge graph, provided by the embodiment of the invention, the space-time knowledge graph is established, and the space-time data from different knowledge sources are subjected to heterogeneous data integration under the same frame specification to form a high-quality space-time knowledge set; meanwhile, in the time and space layers, through cross verification of space position information and cross verification of multi-source time sequence information describing the same information under the same entity, the evaluation of heterogeneous information is realized, and therefore the problems of poor quality of multi-source data, repeated knowledge from different data sources and inaccurate association between the knowledge are solved.

In one embodiment, the map construction module 302 is further configured to: acquiring a space entity and a time entity based on multi-source space-time data with unified coordinates and time, and determining a space-time knowledge graph according to the relation between the time entity and the space entity; mapping time information, space information and attribute information in the multi-source space-time data with the space-time knowledge graph to obtain a space-time data triplet of the space-time knowledge graph; and carrying out knowledge merging on the space-time data triples of the space-time knowledge graph.

In one embodiment, the map construction module 302 is further configured to: calculating the spatial similarity of the spatial entity based on the spatial information in the spatial-temporal data triplet of the spatial-temporal knowledge graph; and carrying out knowledge combination on the spatial entities with spatial similarity exceeding the similarity threshold.

In one embodiment, the map construction module 302 is further configured to: when the space entity is a point entity, calculating the space similarity of the point entity by adopting the Euclidean distance; when the space entity is a line entity, decomposing the line entity into a discrete point set formed by break points, and calculating the space similarity of the line entity by utilizing the Hausdorff distance; when the space entity is a face entity, calculating the feature similarity of the face entity, and carrying out weighted summation based on the feature similarity to obtain the space similarity of the face entity; wherein, the feature similarity includes: distance similarity, shape similarity, and size similarity.

In one embodiment, the verification module 303 is further configured to: for the same space entity, extracting multi-source position information of the space entity, and determining the position information as candidate address information; word segmentation is carried out on the candidate address information, and the confidence coefficient of the candidate address information is calculated based on a standard address database and word segmentation results; sequencing the candidate address information according to the order of the confidence level from high to low, and selecting a preset number of candidate address information as new candidate addresses based on the sequencing result; based on the geographic coordinate information of the new candidate address, calculating the relative distance between every two geographic coordinate information, and determining two points with the minimum relative distance as the final candidate address; determining address information with higher confidence in the two final candidate addresses as a unique position information identifier of the space entity; a multi-scale geocode index is constructed based on the unique location information identification.

In one embodiment, the verification module 303 is further configured to: for location information lacking address name information, location information corresponding to the geographic location coordinates is determined based on the geographic location coordinates and the open source map service.

In one embodiment, the verification module 303 is further configured to: acquiring a heterogeneous time sequence data set describing the same entity information; constructing a corresponding fitting model for each heterologous time sequence data in the heterologous time sequence data set; fitting the heterogeneous time sequence data in the heterogeneous time sequence data set based on a fitting model to obtain a plurality of fitting value sets, and calculating the mean square error of each fitting value set and the heterogeneous time sequence data set; based on a minimum mean square error criterion, determining a fitting value set with the minimum mean square error as a best fitting value set, and calculating a relative error between the best fitting value set and the heterogeneous time sequence data set; data having a relative error less than the error threshold is combined.

The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned.

The embodiment of the invention also provides electronic equipment, which comprises a processor and a storage device; the storage means has stored thereon a computer program which, when run by a processor, performs the method according to any of the above embodiments.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device 100 includes: a processor 40, a memory 41, a bus 42 and a communication interface 43, the processor 40, the communication interface 43 and the memory 41 being connected by the bus 42; the processor 40 is arranged to execute executable modules, such as computer programs, stored in the memory 41.

The memory 41 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and the at least one other network element is achieved via at least one communication interface 43 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 42 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The memory 41 is configured to store a program, and the processor 40 executes the program after receiving an execution instruction, and the method executed by the apparatus for flow defining disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 40 or implemented by the processor 40.

The processor 40 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in processor 40. The processor 40 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 41 and the processor 40 reads the information in the memory 41 and in combination with its hardware performs the steps of the method described above.

The computer program product of the readable storage medium provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, where the program code includes instructions for executing the method described in the foregoing method embodiment, and the specific implementation may refer to the foregoing method embodiment and will not be described herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The ubiquitous space-time data cross-validation method based on the knowledge graph is characterized by comprising the following steps of:

acquiring multi-source space-time data, and unifying coordinates and time of the multi-source space-time data;

constructing a space-time knowledge graph based on the multi-source space-time data with unified coordinates and time;

Performing entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database;

and verifying the entity time sequence information of the space-time knowledge graph, wherein the method comprises the following steps of: acquiring a heterogeneous time sequence data set describing the same entity information; constructing a corresponding fitting model for each heterologous time sequence data in the heterologous time sequence data set; fitting the heterogeneous time sequence data in the heterogeneous time sequence data set based on the fitting model to obtain a plurality of fitting value sets, and calculating the mean square error of each fitting value set and the heterogeneous time sequence data set; based on a minimum mean square error criterion, determining a fitting value set with the minimum mean square error as a best fitting value set, and calculating a relative error between the best fitting value set and the heterogeneous time sequence data set; and merging the data with the relative error smaller than the error threshold value.

2. The method of claim 1, wherein constructing a spatiotemporal knowledge-graph based on the multi-source spatiotemporal data unified in coordinates and time comprises:

Acquiring a space entity and a time entity based on the multi-source space-time data with unified coordinates and time, and determining a space-time knowledge graph according to the relation between the time entity and the space entity;

mapping time information, space information and attribute information in the multi-source space-time data with the space-time knowledge graph to obtain a space-time data triplet of the space-time knowledge graph;

and carrying out knowledge merging based on the space-time data triples of the space-time knowledge graph.

3. The method of claim 2, wherein knowledge-combining based on the spatiotemporal data triples of the spatiotemporal knowledge-graph comprises:

calculating the spatial similarity of the spatial entities;

and carrying out knowledge combination on the spatial entities with the spatial similarity exceeding the similarity threshold.

4. A method according to claim 3, wherein calculating the spatial similarity of the spatial entities comprises:

when the space entity is a point entity, calculating the space similarity of the point entity by adopting the Euclidean distance;

when the space entity is a line entity, decomposing the line entity into a discrete point set formed by folding points, and calculating the space similarity of the line entity by utilizing a Hausdorff distance;

When the space entity is a surface entity, calculating the feature similarity of the surface entity, and carrying out weighted summation based on the feature similarity to obtain the space similarity of the surface entity; wherein the feature similarity includes: distance similarity, shape similarity, and size similarity.

5. The method of claim 1, wherein performing entity space information verification on the spatiotemporal knowledge-graph comprises:

for the same space entity, extracting multi-source position information of the space entity, and determining the position information as candidate address information;

performing word segmentation on the candidate address information, and calculating the confidence coefficient of the candidate address information based on a standard address database and a word segmentation result;

sorting the candidate address information according to the order of the confidence level from high to low, and selecting a preset number of candidate address information as new candidate addresses based on the sorting result;

based on the geographic coordinate information of the new candidate address, calculating the relative distance between every two geographic coordinate information, and determining two points with the minimum relative distance as the final candidate address;

determining address information with higher confidence in the two final candidate addresses as a unique position information identifier of the space entity;

Constructing a multi-scale geocoding index based on the unique location information identification.

6. The method of claim 5, wherein extracting multi-source location information for the spatial entity for the same spatial entity further comprises:

for the position information lacking address name information, determining the position information corresponding to the geographic position coordinate based on the geographic position coordinate and the open source map service.

7. The utility model provides a ubiquitous space-time data cross-validation device based on knowledge graph which characterized in that includes:

the data acquisition module is used for acquiring multi-source space-time data and unifying the coordinates and time of the multi-source space-time data;

the map construction module is used for constructing a space-time knowledge map based on the multi-source space-time data with unified coordinates and time;

the verification module is used for carrying out entity space information verification and entity time sequence information verification on the space-time knowledge graph to obtain high-quality space-time knowledge, and storing the space-time knowledge into a space-time database;

the verification module is further configured to: acquiring a heterogeneous time sequence data set describing the same entity information; constructing a corresponding fitting model for each heterologous time sequence data in the heterologous time sequence data set; fitting the heterogeneous time sequence data in the heterogeneous time sequence data set based on the fitting model to obtain a plurality of fitting value sets, and calculating the mean square error of each fitting value set and the heterogeneous time sequence data set; based on a minimum mean square error criterion, determining a fitting value set with the minimum mean square error as a best fitting value set, and calculating a relative error between the best fitting value set and the heterogeneous time sequence data set; and merging the data with the relative error smaller than the error threshold value.

8. An electronic device comprising a processor and a memory, the memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the steps of the method of any one of claims 1 to 6.

9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the method of any of the preceding claims 1 to 6.