CN111581334A - Ocean data service publishing method based on data ontology and list - Google Patents

Ocean data service publishing method based on data ontology and list Download PDF

Info

Publication number
CN111581334A
CN111581334A CN202010396453.6A CN202010396453A CN111581334A CN 111581334 A CN111581334 A CN 111581334A CN 202010396453 A CN202010396453 A CN 202010396453A CN 111581334 A CN111581334 A CN 111581334A
Authority
CN
China
Prior art keywords
data
service
marine
class
oedo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010396453.6A
Other languages
Chinese (zh)
Inventor
任小丽
宋君强
任开军
李小勇
邓科峰
汪祥
朱俊星
周翱隆
杨云天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202010396453.6A priority Critical patent/CN111581334A/en
Publication of CN111581334A publication Critical patent/CN111581334A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a marine data service publishing method based on a data body and a list, which comprises the following steps: modeling concepts in the marine environment data ontology model OEDO as inputs and outputs of a data service interface; issuing the service to an optimized quick service query list QSQL using an expansion rule; a data service index list is generated for data access and improved data discovery. The method provides a unified semantic representation model OEDO for heterogeneous ocean data resources to improve data interoperation service, expands the field concept through WordNet, further optimizes the latest QSQL data structure to improve data discovery service, and can better improve data discovery and data access service based on the OEDO model and the optimized QSQL.

Description

Ocean data service publishing method based on data ontology and list
Technical Field
The invention relates to the technical field of data representation and release of ocean data, in particular to an ocean data service release method based on a data ontology and a list.
Background
With the rapid development of information technology and the development of marine observation, marine science is entering the big data era. Oceanography has become one of the typical data-intensive sciences today, and it has evolved from ship-based anaglyphic science to distributed observatory-based methods. With the development of marine observation systems and the use of more and more autonomous platforms and sensors, more and more important marine variables are observed, the data formats are diversified, and the volume is exponentially increased, which provides a new challenge for scientific data management. One of the great challenges of data-intensive science is to improve knowledge discovery by helping humans and machines discover, access, integrate, and analyze task-appropriate scientific data and other academic digital objects.
To overcome this problem, scientists have proposed publishing guidelines for publishing digital assets, such as data sets, code, and workflows, to make them searchable (Findable), Accessible (Accessible), Interoperable (Interoperable), and Reusable (Reusable) (FAIR). With the development of the FAIR principle, interoperability between data services has become an urgent priority. Although the goal of marine data management is to move it toward the FAIR philosophy, there is still no unified or standard data service model that satisfies marine data with the FAIR philosophy, and its challenges are mainly reflected in the following aspects.
Data search: the lack of efficient indexing makes it difficult to find valuable information from large amounts of data. Statistically, the marine observation platforms deployed in the past 10 years transmit data in an amount equivalent to that acquired in the last century within 1 year. Data access: the lack of standard and rich oceanographic metadata results in an inability to uniformly identify and access datasets. Data interoperation: the diversity and heterogeneity of marine data types and formats from different platforms and observation systems, as well as the possible ambiguity of the terminology, present significant challenges for data interoperability. Data reuse: in general, resources may be described by a machine-readable Resource Description Framework (RDF), XML, JSON, or human-readable HTML. At present, there are few standard representations for representing domain datasets, and a significant portion of marine datasets lack quality indicators or source information, making it difficult for data users to understand, analyze, or reuse data and unable to meet user requirements.
With the development of information technology, various types of data sets having various heterogeneous formats can be described by emerging technologies such as cloud computing and service computing, and data services in the marine field can be improved.
Service-oriented computing (SOC) has been rapidly developing over the last decade as a new computing model for distributed computing, cross-organizational resource sharing, and application integration. The semantic network based on the ontology is one of key technologies of service calculation, so that not only can unified description of heterogeneous resources be constructed, but also interoperability between services can be improved. In particular, ontology, one of its strongest features is that it provides a way to express explicit knowledge of the concept domain from which implicit new knowledge can be derived by logical reasoners, e.g., it is widely used to describe sensor and observation data. Notably, in recent years, ontologies have been widely used to describe heterogeneous resources in high performance computing and cloud service environments.
Cloud computing environments provide a wide range of services to customers through loosely coupled instances and storage systems, thereby ensuring a certain level of service. It is estimated that by 2021, 94% of the workload and computer instances will be processed through the cloud data center. In general, a conventional cloud provides three service models, namely, infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), and software-as-a-service (SaaS). With the development of data-intensive science and big data technology, a data-as-a-service (DaaS) model has been proposed in recent years to facilitate intelligent sharing and processing of large-scale data sets. However, there is still no unified data service model available to describe marine data resources and support FAIR data services.
Disclosure of Invention
In view of the above, the present invention aims to provide a method for publishing marine data services based on data ontology and list. Specifically, a unified semantic model, namely, Ocean Environmental Data Ontology (OEDO), is proposed first to represent various heterogeneous Ocean data resources and provide interoperable data services. Then, based on the latest Quick Service Query List (QSQL) data structure, the domain concept is further expanded through the vocabulary database WordNet, and QSQL is optimized. Finally, based on the OEDO model and the optimized QSQL, a Data Ontology and List based marine Data service publishing method (DOLP) is proposed to improve Data discovery and Data access services.
In order to achieve the purpose, the invention adopts the following technical scheme that the marine data service publishing method based on the data ontology and the list comprises the following steps:
step 1, modeling concepts in an ocean environment data ontology model OEDO as input and output of a data service interface;
step 2, issuing the service to an optimized quick service query list QSQL by using an expansion rule;
step 3, generating a data service index list for data access and data discovery improvement;
the service publishing process described in step 2 includes the following steps:
step 201, acquiring specific concepts from the marine environment data ontology model OEDO, acquiring synonyms of each parameter from a vocabulary database WordNet, and expanding equivalence classes of the synonyms through a rule 1;
step 202, for each element in the equivalence class, searching whether the element is added to a quick service query list QSQL, constructing a concept node, attaching a service identifier of the current data service to an exact match vector ExactVector of a data domain of the concept node, and constructing an equivalent chain EqualLink of the node link domain;
step 203, deducing the parent class of each element in the equivalence class through an inference program, expanding the parent class through the superior word in a vocabulary database WordNet according to a rule 2, and setting a parent class vector PluginVector of an element data domain and a super Link of a link domain;
step 204, expanding the ancestor parent class and the descendant class respectively according to the rule 2 and the rule 3;
step 205, returning a data service quick retrieval list OQSQL generated by the published model;
the rule 1: since there may not be a specific concept in the model that exactly matches the input or output parameters of the data service, the equivalent classes are extended by WordNet synonym relationships, namely:
Figure BDA0002487761030000041
the rule 2 is as follows: extending the parent class related to the is-a relationship through the hypernym:
Figure BDA0002487761030000042
Grdpi=Grdpi∪Hypew(Supi),
Supi=Supi∪Hypew(Ci).
the rule 3 is as follows: the related subclasses of part-of relationships are extended by hyponyms:
Figure BDA0002487761030000043
Grdci=Grdci∪Hypow(Subi),
Subi=Subi∪Hypow(Ci).
wherein C represents an ocean environment data ontology model OEDO concept set, CiRepresenting the ith class, E, in the ocean environmental data ontology model OEDOiIs represented by CiOf the equivalence class Synw(Ci) represents CiSynonym of (5), SuiIs represented by CiParent class of (3), Hypew(Ci) Is represented by CiSub, the superior word ofiIs represented by CiSubclass of (2), Hypow(Ci) Is represented by CiLower-order word of (2), GrdpiIs represented by CiGrandfather of (1), GrdciIs represented by CiAncestor-father class of (1).
The top-level concept of the ocean environment data ontology model OEDO comprises observation data, a sensor, an observation system and an observation platform, the relationship among the concepts is represented by object attributes, the observation platform is uniformly represented by hierarchical classification and attribute description, the observation platform has 4 subclasses and comprises a land-based platform, a sea-based platform, a space-based platform and a space-based platform, the observation platform has 10 basic attributes comprising a platform identifier, a URL address, a platform type, a platform characteristic, a geographic position, a sensor, effective time, transmission time, a data format and an organization to which the observation platform belongs;
the observation data are also represented by hierarchical classification, the observation data have 7 subclasses and comprise marine organisms, marine hydrology, marine chemistry, submarine topography, marine substrate, oceanographic meteorology and marine geophysical, a data set of the observation data is described by using metadata, and the metadata is modeled into data type attributes and object attributes of a unified body, so that different data sets can be operated mutually.
Each QSQL element in the rapid service query list represents an ontology concept and consists of a link domain and a data domain, wherein the link domain stores the relationship deduced from a service model by a semantic reasoning tool and comprises links pointing to an equivalent class Equalclass, a parent class Superclass, a SubClass Subclass, a sibling class Sibclass, a grandparent class Grdparclass and a descendant class Grdcclass, the service query is accelerated by avoiding repeated reasoning, and the data domain stores the service using related concepts as input or output with different matching degrees.
The method provides a unified semantic representation model OEDO for heterogeneous ocean data resources so as to improve data interoperation service. The domain concept is expanded through WordNet, and the latest QSQL data structure is further optimized to improve the data discovery service. Based on an OEDO model and optimized QSQL, a marine data service publishing method DOLP is provided to improve data discovery and data access services.
Drawings
FIG. 1 is a core concept and relationship diagram of the OEDO model of the present invention;
FIG. 2 is an observation platform class diagram of the present invention;
FIG. 3 is an observation classification diagram of the present invention;
FIG. 4 is a simplified representation of a data set;
FIG. 5 is a diagram of the QSQL data structure;
FIG. 6 is a schematic flow diagram of the method of the present invention;
FIG. 7 is a schematic diagram of data service query response times;
FIG. 8 is a graph of average response time trend;
FIG. 9 is a graph of data discovery accuracy versus recall results.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The top-level concept of the OEDO model proposed by the method of the present invention is shown in fig. 1, and the core concept thereof includes "observation data", "sensor", "observation system" and "observation platform". Furthermore, the relationships between these concepts are represented by object attributes, in particular, "observed data" is observed (isoservedby) by certain sensors that are deployed on (isopolyplayedon) "observation platform", and the observation system comprises (hassplatform) a different platform.
As shown in FIG. 1, the OEDO includes reciprocal object attributes, such as the object attribute iso bservBy and hasbservation, which are reciprocal attributes (own: invertseOf). On the other hand, if a certain observation value is retrieved, a sensor for sensing observation data, a platform for deploying the sensor and related information can be obtained, and data link capacity and data discovery efficiency are enhanced through the attribute relationship.
In the present invention, attention is mainly paid to "observation platforms" and "observation data" closely related to data collection and services. The representation of these two classes in the OEDO model will be described in detail below.
Observation platforms are infrastructures for marine observation activities, but the variety of platforms produced by different manufacturers and deployed and managed by different organizations is wide, and the standards adopted by different organizations are different, so that the description of data sets generated by the observation platforms is unclear, and even the data sets are not reusable. In the invention, different observation platforms are uniformly expressed through hierarchical classification and attribute description.
(1) And (3) hierarchical classification: to describe the observation platform in a hierarchical manner, it is first divided into 4 subclasses, labeled by the is-a relationship, which is transitive (owl: transitive property). As shown in fig. 2, the platform is further subdivided into 3 layers, wherein in layer 2, the ocean observation platforms are divided into 4 types according to spatial deployment: (i) a "land-based platform" deployed on the ground; (ii) "sea-based platform" including at the sea surface and underwater platforms; (iii) the "empty base platform" mainly refers to a satellite; (iv) "space-based platforms" include above-ground and below-atmosphere platform facilities such as sounding balloons, airplanes, and the like.
In order to more clearly illustrate the hierarchical classification representation of the OEDO model, an example of "sea-surface platform" is illustrated. Survey platforms deployed at the surface include fixed platforms (e.g., mooring buoys) and mobile platforms (e.g., ships, boats, research ships, and drift buoys) that are typically used to acquire surface data such as temperature, salinity, and wave height. These physical fields are common keywords for users to find the desired dataset, and therefore they are used as examples of the OEDO ontology. The semantics of the OEDO is expanded through the mode, and the data discovery capability is further improved.
(2) And attribute description: different observation platforms can be represented in an interoperable manner by means of ODAS metadata and the latest version of the Global Change Master Directory (GCMD) key (v 9.1). As shown in the table, 10 basic classes are used to describe the observation platform, i.e., "platform id", "URL", "platform type", "platform feature", "geographical location", "effective time", "transmission time", "data format", and "belonging organization" in the OEDO model. The detailed information of the corresponding object attribute is described in table 1.
TABLE 1 Observation platform Attribute description
Figure BDA0002487761030000081
In OEDO, to stay consistent with the Semantic Sensor Ontology (SSO) design schema, the observation data is modeled to account for the content of the afferent stimuli, i.e., changes in the physical world.
(1) And (3) hierarchical classification: with the development of marine observation systems, the observation data includes a variety of physical parameters from different disciplines. The observations are represented hierarchically and categorically in the OEDO model, intended to facilitate access to data by different subject data users.
As shown in fig. 3, ocean observations are divided into 7 sub-categories. Taking the oceanographic data as an example, sensors deployed on the buoy platform can detect external stimuli such as wind (speed and direction), visibility, humidity, and barometric pressure 10 meters above sea level.
(2) Data set metadata description: typically, an observation is a collection of datasets with different spatial resolutions and time grids received by a sensor, which may be described in terms of its metadata. In the OEDO model in question, the main items of metadata representing the FAIR service improving ocean data are emphasized and modeled as data type attributes and object attributes of a unified ontology, so that different data sets are interoperable. A simplified representation of the data set in the OEDO model is shown in fig. 4.
In the model, any data set must be identified by a name (hasName) and a unique identifier (hasID), represented in FIG. 4 by a data type attribute. It is worth noting that the name of a data set is crucial for data discovery, that is, the name is an important factor that makes the data set searchable. The data user will know the accessibility information through the base classes "access restrictions", "URLs" and corresponding object attributes, which meet the accessibility requirements of the FAIR principle. In addition, users can reference data sets by "reference information," which is important to encourage data producers and organizations to share their data and further facilitate data access. The stimuli observed at a certain "spatial resolution" and "time step" are interpreted as "physical parameters", which can be represented by four dimensions including a "spatial range" and a "time range", i.e. longitude, latitude, vertical depth and time. Through the information of the 'affiliated mechanism' of the data set and the detailed quality control and quality assurance ('quality'), the reuse of the data set can be realized without additional data tracing work.
The OEDO model represents heterogeneous ocean data in a unified form through ocean metadata and hierarchical classification, and aims to improve the interoperability of the ocean data. In the present invention, attention is mainly paid to how to publish data in the cloud, implement data as a service (DaaS), and improve data discovery and access performance. For service releases, the fast service query list (QSQL) has proven to be a very efficient data structure in the prior art. Therefore, the basic structure of QSQL is first introduced; then, the optimized QSQL and the corresponding domain concept are described; finally, the service delivery method DOLP proposed by the present invention is described in detail.
Semantic information in an ontology model may be represented by directed graphs, where vertices represent each basic ontology class and arcs represent relationships between corresponding concepts. The traditional semantic service discovery algorithm based on direct reasoning has low service efficiency, and researchers specially design QSQL data structures in order to overcome the problem. QSQL stores the semantic network graph in the adjacency list, and can efficiently complete service inquiry.
The basic structure of QSQL is created during the service publishing phase as shown in fig. 5, where each QSQL element represents an ontology concept consisting of a link field (upper half of fig. 5) and a data field (lower half of fig. 5). The link domain stores relationships inferred from the service model by the semantic reasoning tools, including links to their equivalence classes (EqualClass), parent classes (SuperClass), SubClass (SubClass), sibling classes (SibClass), grandparent class (GrdparClass), and descendant classes (grdclass), speeding up service queries by avoiding duplicate reasoning. The data field stores services that use this concept as input or output for varying degrees of matching.
QSQL aims to reduce service query time and simplifies the service model by assuming that QSQL elements are specific ontological concepts of service model publications. However, under this assumption, the service publisher needs to label its service interface with domain concepts, and also needs the service user to query the required service through specific semantic concepts. However, there is typically a lack of a recognized and interoperable domain ontology. Furthermore, most cloud service publishers and application users lack relevant domain knowledge, especially in the data intensive marine science domain, which will severely impact data service discovery and access capabilities.
In order to solve the above problems, the domain concept is expanded in OEDO, and synonyms (Synonym), hypernyms (Hypernym) and hyponyms (Hyponym) in WordNet of the corresponding concept are added to QSQL. The extension rule will be described in detail below, and the symbol definition related to the extension rule is as described in table 2.
Table 2 symbol definitions in extended rules
Figure BDA0002487761030000101
Rule 1: since there may not be a specific concept in the model that exactly matches the input or output parameters of the data service, the equivalent classes are extended by WordNet Synonym (Synonym) relationships, namely:
Figure BDA0002487761030000102
rule 2: the parent class related to the is-a relationship is extended by hypernyms (hypernyms):
Figure BDA0002487761030000103
Grdpi=Grdpi∪Hypew(Supi),
Supi=Supi∪Hypew(Ci).
rule 3: the related subclasses of part-of relationships are extended by hyponyms (Hypopym):
Figure BDA0002487761030000111
Crdci=Grdci∪Hypow(Subi),
Subi=Subi∪Hypow(Ci).
in addition to extending the basic classes in fig. 5, a new subclass is added in the present invention to represent the part-of relationship in the OEDO, which is implemented by rule 3. The reason for adding this is that the ocean data has its own characteristics, for example, the ocean data usually has space-time characteristics, the time span and the space coverage may not completely meet the requirements of the user, but the existing data in the space-time range can be recommended or returned to the user. Similarly, a marine data application user may require a data set containing multiple marine physical fields (e.g., marine temperature and salinity), and if there is not a complete data set containing all of the required physical fields, the user requirements may be met by providing multiple subsets, each of which contains one or more of the user's required physical field data.
An interoperable ocean data representation model OEDO has been constructed in the foregoing, and an optimized QSQL data structure has been proposed. Based on OEDO and optimized QSQL, the DOLP method of the present invention is proposed to achieve marine data distribution. As shown in fig. 6, the method specifically includes the following steps:
step 1, modeling concepts in an ocean environment data ontology model OEDO as input and output of a data service interface;
step 2, issuing the service to an optimized quick service query list QSQL by using an expansion rule;
step 3, generating a data service index list for data access and improving data discovery;
the service publishing process described in step 2 includes the following steps:
step 201, acquiring specific concepts from the marine environment data ontology model OEDO, acquiring synonyms of each parameter from a vocabulary database WordNet, and expanding the equivalence classes through a rule 1;
step 202, for each element in the equivalence class, searching whether the element is added to a quick service query list QSQL, constructing a concept node, attaching the service identifier of the current data service to an exact vector of a data domain of the node, and constructing an equivalent chain EqualLink of a link domain of the node;
step 203, deducing the parent class of each element in the equivalence class through an inference program, expanding the parent class through the superior word in a vocabulary database WordNet according to a rule 2, and setting a plug-in vector PluginVector of a data domain of the element and a hyperlink SuperLink of a link domain;
step 204, expanding the ancestor parent class and the descendant class respectively according to the rule 2 and the rule 3;
step 205, return to the published model generated data service quick search list OQSQL.
In the experiment of this embodiment, the environment configuration is as follows: the server operating system is Ubuntu 16.4 LTS; the ontology model is constructed by adopting prote 'ge' 5.5, the software is a free ontology editor and a free framework, and numerous scientific and industrial fields use the software to construct a solution based on knowledge; the semantic reasoning software adopts Racer 2.0; published data services are stored with mysql 8.0.18.
The experiment evaluates the effectiveness and efficiency of the proposed scheme in terms of data interoperability, data lookup, and data access. Considering the lack of actual data service instances of ocean data, an OEDO body is used as the output of service in the experiment, 500 data service models are randomly issued, and a request model is generated to simulate the service request of a data user.
The invention establishes an OEDO model and improves the interoperability of heterogeneous ocean data in a unified representation mode. To verify the rationality of the model, conflicts in concepts and relationships, class hierarchies in the model, object properties represented by metadata, and data type properties are first detected by prote 'ge'. The extended semantic relationships between classes are then checked by Racer. Through the unified model, data resources can interoperate in the form of data services.
To evaluate the efficiency of the DOLP method of the present invention, the DOLP-based data search performance was compared to the following 3 methods: the method comprises a traditional semantic query method (direct query-based) based on direct reasoning, a Keyword query method (Keyword-based) and a semantic query method (QSQL-based) based on an original QSQL structure.
Fig. 7 shows response times for processing different numbers of service queries respectively using the above 4 methods, from which it can be seen that the response time based on the direct inference method increases as the number of query services increases, and it takes the longest time to query the same number of services. The QSQL-based and DOLP-based methods are less time-consuming because the QSQL data structure stores the semantic relationships of the OEDO model and generates a service index list, so that only relevant services need to be obtained from the list when processing a query. Direct inference based approaches require semantic reasoning to be performed during the query process and the reasoning time is a significant fraction of the response time. The response time of the keyword-based method is between the above 3 methods, but the method does not support semantic information.
The response time trend for each query, through which the stability of each method is illustrated, is shown in fig. 8. The temporal trend based direct inference methods changes dramatically with the number of queries. In contrast, the trends for the other 3 methods are relatively stable.
TABLE 3 mean query time
Figure BDA0002487761030000131
The average query times for the different methods are summarized in table 3, from which it can be seen that the performance of the QSQL-based and DOLP-based methods is significantly better than the other two methods. In addition, although the QSQL structure is expanded, the processing time of the expanded node is very short compared to the inference time and is completed in the release phase, and thus the average query time of the method is not increased compared to the original QSQL.
In terms of data access improvement, keyword-based and direct inference-based semantic query methods are traditional service query methods, where the advantage of semantic query is that it supports inference based on semantic relationships. Besides semantic reasoning, the DOLP-based method also expands the semantic relationship of services through WordNet. Table 4 summarizes the semantic capabilities of these 4 methods.
TABLE 4 semantic capability comparison
Figure BDA0002487761030000132
Figure BDA0002487761030000141
As shown in table 4, the keyword-based method only supports querying services that exactly match the user's request, and if the keyword input by the data user does not exactly match the published data service, the user will not get the result. The method based on direct reasoning can not only return the service completely matched with the requested service, but also recommend the related service according to the semantic relation between the requested service and the published service. In addition to the first 5 semantic relationships supported by direct inference based and QSQL based methods, the DOLP based method of the present invention adds user requested sub-services corresponding to the optimized data domain of the QSQL structure in fig. 5.
The effectiveness of the method is verified from two aspects of accuracy (Precision) and Recall (Recall) of data discovery in the experiment, and the experimental result is shown in fig. 9.
The accuracy of the DOLP-based method is slightly lower than the other two semantic-based query methods, namely the direct reasoning-based and original QSQL-based query methods, depending on the semantic accuracy of the expanded concept. However, the accuracy of the DOLP-based method of the present invention is much higher than the keyword-based method, since the keyword-based method does not support fuzzy matching. In terms of recall rate, after semantic expansion through WordNet, the DOLP-based method can find synonyms, hypernyms and hyponyms of the requested data service, and thus the DOLP-based method has the highest recall rate (up to 90.1%), which means that the method can effectively improve data discovery and data access.
According to the invention content and the embodiment, the invention firstly provides the unified semantic representation model OEDO for the heterogeneous ocean data resources so as to improve the data interoperation service; the field concept is expanded through WordNet, and the latest QSQL data structure is further optimized to improve the data discovery service; based on an OEDO model and optimized QSQL, a marine data service publishing method DOLP based on a data body and a list is provided to improve data discovery and data access services, and a large number of experiments are carried out in the last embodiment to verify the effectiveness and efficiency of the method in the aspects of data interoperation, data searching and data access.

Claims (3)

1. A marine data service publishing method based on data ontology and list is characterized by comprising the following steps:
step 1, modeling concepts in an ocean environment data ontology model OEDO as input and output of a data service interface;
step 2, issuing the service to an optimized quick service query list QSQL by using an expansion rule;
step 3, generating a data service index list for data access and data discovery improvement;
the service publishing process described in step 2 includes the following steps:
step 201, acquiring specific concepts from the marine environment data ontology model OEDO, acquiring synonyms of each parameter from a vocabulary database WordNet, and expanding equivalence classes of the synonyms through a rule 1;
step 202, for each element in the equivalence class, searching whether the element is added to a quick service query list QSQL, constructing a concept node, attaching a service identifier of the current data service to an exact match vector ExactVector of a data domain of the concept node, and constructing an equivalent chain EqualLink of the node link domain;
step 203, deducing the parent class of each element in the equivalence class through an inference program, expanding the parent class through the superior word in a vocabulary database WordNet according to a rule 2, and setting a parent class vector PluginVector of an element data domain and a super Link of a link domain;
step 204, expanding the ancestor parent class and the descendant class respectively according to the rule 2 and the rule 3;
step 205, returning a data service quick retrieval list OQSQL generated by the published model;
the rule 1: since there may not be a specific concept in the model that exactly matches the input or output parameters of the data service, the equivalent classes are extended by WordNet synonym relationships, namely:
Figure FDA0002487761020000011
Ei=Ei∪Synw(Ci).
the rule 2 is as follows: extending the parent class related to the is-a relationship through the hypernym:
Figure FDA0002487761020000012
Grdpi=Grdpi∪Hypew(Supi),
Supi=Supi∪Hypew(Ci).
the rule 3 is as follows: the related subclasses of part-of relationships are extended by hyponyms:
Figure FDA0002487761020000021
Grdci=Grdci∪Hypow(Subi),
Subi=Subi∪Hypow(Ci).
wherein C represents an ocean environment data ontology model OEDO concept set, CiRepresenting the ith class, E, in the ocean environmental data ontology model OEDOiDenotes ciOf the equivalence class Synw(Ci) Denotes ciSynonym of (5), SuiDenotes ciParent class of (3), Hypew(Ci) Denotes ciSub, the superior word ofiDenotes ciSubclass of (2), Hypow(Ci) Denotes ciLower-order word of (2), GrdpiDenotes ciGrandfather of (1), GrdciDenotes ciAncestor-father class of (1).
2. The marine data service publishing method as claimed in claim 1, wherein the top-level concepts of the marine environment data ontology model OEDO include observation data, sensors, observation systems and observation platforms, the relationships between the concepts are represented by object attributes, the observation platforms are represented uniformly by hierarchical classification and attribute description, the observation platforms have 4 subclasses including land-based platforms, sea-based platforms, air-based platforms and space-based platforms, the observation platforms have 10 basic attributes including platform identification, URL address, platform type, platform characteristics, geographical location, sensors, validity time, transmission time, data format and organization to which they belong;
the observation data are also represented by hierarchical classification, the observation data have 7 subclasses and comprise marine organisms, marine hydrology, marine chemistry, submarine topography, marine substrate, oceanographic meteorology and marine geophysical, a data set of the observation data is described by using metadata, and the metadata is modeled into data type attributes and object attributes of a unified body, so that different data sets can be operated mutually.
3. The marine data service publishing method of claim 1, wherein each QSQL element in the fast service query list represents an ontology concept consisting of a link field and a data field, the link field storing relationships inferred from the service model by semantic reasoning tools, including links to its equivalence classes EqualClass, SuperClass, SubClass, sibling class SibClass, grandparent class, and descendant class grdchlast, speeding up service queries by avoiding repeated reasoning, the data field storing services that use related concepts as inputs or outputs of different degrees of matching.
CN202010396453.6A 2020-05-12 2020-05-12 Ocean data service publishing method based on data ontology and list Pending CN111581334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010396453.6A CN111581334A (en) 2020-05-12 2020-05-12 Ocean data service publishing method based on data ontology and list

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010396453.6A CN111581334A (en) 2020-05-12 2020-05-12 Ocean data service publishing method based on data ontology and list

Publications (1)

Publication Number Publication Date
CN111581334A true CN111581334A (en) 2020-08-25

Family

ID=72113394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010396453.6A Pending CN111581334A (en) 2020-05-12 2020-05-12 Ocean data service publishing method based on data ontology and list

Country Status (1)

Country Link
CN (1) CN111581334A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543960A (en) * 2022-09-16 2022-12-30 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890484B1 (en) * 2004-11-10 2011-02-15 At&T Intellectual Property Ii, L.P. Method and apparatus for selecting services based on behavior models
CN103064673A (en) * 2012-12-21 2013-04-24 武汉大学 Mapping method and system for supporting direct registration of sensor
CN204557102U (en) * 2015-01-26 2015-08-12 中国海洋大学 There is cable online observation system in a kind of ocean dynamical environment seabed
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data
CN109948150A (en) * 2019-03-01 2019-06-28 北京航空航天大学 The high performance service context of knowledge based map finds method in a kind of multi-domain environment
CN110633348A (en) * 2019-07-30 2019-12-31 中国人民解放军国防科技大学 Ontology-based high-performance computing resource pooling index query method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7890484B1 (en) * 2004-11-10 2011-02-15 At&T Intellectual Property Ii, L.P. Method and apparatus for selecting services based on behavior models
CN103064673A (en) * 2012-12-21 2013-04-24 武汉大学 Mapping method and system for supporting direct registration of sensor
CN204557102U (en) * 2015-01-26 2015-08-12 中国海洋大学 There is cable online observation system in a kind of ocean dynamical environment seabed
CN108628959A (en) * 2018-04-13 2018-10-09 长安大学 A kind of body constructing method based on traffic big data
CN109948150A (en) * 2019-03-01 2019-06-28 北京航空航天大学 The high performance service context of knowledge based map finds method in a kind of multi-domain environment
CN110633348A (en) * 2019-07-30 2019-12-31 中国人民解放军国防科技大学 Ontology-based high-performance computing resource pooling index query method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
AOLONG ZHOU 等: "Building Quick Resource Index List Using WordNet and High-Performance Computing Resource Ontology towards Efficient Resource Discovery", 《2019 IEEE 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS; IEEE 17TH INTERNATIONAL CONFERENCE ON SMART CITY; IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS》 *
KAIJUN REN 等: "Building Quick Service Query List Using WordNet and Multiple Heterogeneous Ontologies toward More Realistic Service Composition", 《IEEE TRANSACTIONS ON SERVICES COMPUTING》 *
刘婧: "基于元数据的多源异构海洋情报数据交互共享研究", 《情报杂志》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115543960A (en) * 2022-09-16 2022-12-30 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object
CN115543960B (en) * 2022-09-16 2024-01-05 北京神舟航天软件技术股份有限公司 Dynamic modeling method and system for business object

Similar Documents

Publication Publication Date Title
Bhargava et al. Who, what, when, and where: Multi-dimensional collaborative recommendations using tensor factorization on sparse user-generated data
Wei et al. A survey of faceted search
US20090037403A1 (en) Generalized location identification
CN106528648B (en) In conjunction with the distributed RDF keyword proximity search method of Redis memory database
CN101901247A (en) Vertical engine searching method and system for domain body restraint
Regueiro et al. Semantic mediation of observation datasets through sensor observation services
CN110569367A (en) Knowledge graph-based space keyword query method, device and equipment
Matono et al. An Indexing Scheme for RDF and RDF Schema based on Suffix Arrays.
Jin et al. Collective keyword query on a spatial knowledge base
CN111581334A (en) Ocean data service publishing method based on data ontology and list
Yang et al. Ontology based service discovery method for internet of things
Butt et al. A taxonomy of semantic web data retrieval techniques
Wen et al. Heterogeneous information network‐based scientific workflow recommendation for complex applications
Farshidi et al. An adaptable indexing pipeline for enriching meta information of datasets from heterogeneous repositories
CN107436919B (en) Cloud manufacturing standard service modeling method based on ontology and BOSS
Samson et al. Spatial databases: An overview
Ren et al. Bringing semantics to support ocean FAIR data services with ontologies
Noor et al. Latent dirichlet allocation based semantic clustering of heterogeneous deep web sources
Sun et al. An efficient algorithm of star subgraph queries on urban traffic knowledge graph
El Midaoui et al. Geographical queries reformulation using a parallel association rules generator to build spatial taxonomies
Zhan et al. Ontology-based semantic description model for discovery and retrieval of geospatial information
Paul et al. A framework for semantic interoperability for distributed geospatial repositories
Lin et al. An automatic approach for tagging web services using machine learning techniques1
Li Geographic Ontology
Ladra et al. A toponym resolution service following the OGC WPS standard

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200825