WO2023010427A1

WO2023010427A1 - Systems and methods generating internet-of-things-specific knowledge graphs, and search systems and methods using such graphs

Info

Publication number: WO2023010427A1
Application number: PCT/CN2021/110940
Authority: WO
Inventors: Wu SI; Chuantao YIN; Xi Wang
Original assignee: Orange
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-02-09

Abstract

Internet-of-Things-specific knowledge graphs are generated by a method comprising (A) filtering of a general knowledge graph using words in one or more target fields of an Internet-of-Things dataset and/or (B) a sequence of processes including extracting text from a corpus of documents relevant to the Internet-of-things, forming chunks from the extracted text, and generating triples consisting of first and second noun phrases and a verb phrase that associates them in the extracted text. Search systems and methods (200) process user queries by exploiting such Internet-of-Things-specific knowledge graphs to identify (S203), in an Internet-of-Things dataset, entities having a target search field that contains a word which, although not semantically similar to input words in the user query, are shown, by the Internet-of-Things-specific knowledge graph, to have an association with one or more of the input words. Hybrid search methods and systems comprise evaluation of semantic similarity (S204) as well as the searching (S203) based on the Internet-of-Things-specific knowledge graph.

Description

SYSTEMS AND METHODS GENERATING INTERNET-OF-THINGS-SPECIFIC KNOWLEDGE GRAPHS, AND SEARCH SYSTEMS AND METHODS USING SUCH GRAPHS

Field of the Invention

The present invention relates to the Internet of Things (IoT) and, more particularly, to IoT-oriented search and/or recommendation systems and computer-implemented methods that make use of IoT-specific knowledge graphs, as well as to techniques generating IoT-specific knowledge graphs.

Technical Background

The Internet of Things (IoT) includes vast numbers of network-enabled devices ranging, for example, from tractors and soil sensors in the field of agriculture, to trucks, port utility equipment and shipment-tracking digital ledgers in the field of logistics and supply-chain control, to industrial machines and sensors in factories, to smart devices in homes and wearable health tracker devices, and many more. In this document, the expression “IoT devices” denotes network-enabled devices (notably, devices having the capability to connect to the Internet) that are able to interact with other devices over the network; the IoT devices may have one or more sensors to produce data that can be transmitted over the network connection and may have one or more actuators that may be controlled by data received over the network connection. As is conventional, the expression “IoT device” as used here is not intended to denote standard computing devices, such as computers and smartphones but, rather, any range of physical objects that, traditionally, were non-Internet-enabled but that have been provided with network connectivity for communication and interaction with remote devices/users.

Engineers and application developers often wish to exploit existing IoT devices, and/or the data they produce, for instance in order to be able to implement a new technical system. As an example, consider a set of network-enabled grow lights in a greenhouse. Although the lights were installed in view of illuminating plants in the greenhouse to stimulate their growth, an engineer may design a new system which, when a horticulturalist needs to inspect a specific plant within the greenhouse, uses a selection of the network-enabled grow lights to light up a path leading in the direction towards the target plant. As another example, consider an engineer seeking to design a non-audio notification system for deaf people, for instance a system that notifies the deaf person when a fire alarm has gone off in the person’s home. The engineer may seek information regarding the types of IoT devices that may be present in a so-called smart home, and their technical capabilities, in view of designing a system which can use such a device to provide the deaf person with a non-audio alert (e.g., by making an IoT light flash) .

When designing the technical implementation of a new service or application that will exploit one or more existing IoT devices, and/or the data they produce, an engineer or application developer may require technical information about the IoT devices (e.g., types of device, their capabilities, location, environmental constraints, and so on) or the produced data (e.g., sampling frequency, range, etc) . The required information may relate to specific instances of IoT device (e.g. the attributes of a particular smart refrigerator that is located at GPS coordinates X in restaurant Y) or, more generally, the required information may relate to one or more class properties of IoT devices/produced data as defined according to an ontology, for instance, classes/categories of IoT-device properties such as name of device (light, temperature sensor, etc. ) , main functional class of device (sensor, actuator, etc. ) , and/or classes/categories of IoT-data such as type of generated data (device status, measured parameter, sampling frequency, etc. ) and so on.

There are numerous other applications in which it is desired to obtain details of IoT devices and/or details of IoT object ontologies. As an example, consider a maintenance engineer involved in an operation of repairing or managing a device. In order to be able to analyse the functioning of a target device it may be necessary or beneficial for the engineer to obtain additional information regarding the functioning of other, related devices so as to be able to put the target device’s behavior into proper context. In the event that the devices in question include one or more IoT devices, the engineer may need to perform a search to obtain the desired additional information.

As another example, consider a user who is located in a determined geographical region when the battery of their telephone (electric car, etc. ) becomes low. It may be that in this region there is/are one or more IoT devices that would be able to charge the battery of the user’s device, including battery charging stations but, potentially, also other IoT devices which can provide charging as an ancillary function. The user has a need for a system enabling them to search for available devices capable of delivering the desired charging function.

Although some platforms exist which enable users to discover IoT devices and/or to explore environments in which IoT devices are deployed, searches for IoT devices, or IoT object ontologies, using know methods and systems may fail to produce adequate results, notably in terms of the number and/or variety of results produced in response to the user query.

For example, some search/recommendation systems use a collaborative filtering process to identify the results to return to the user query. The collaborative filtering process determines other users having behavior or characteristics that are similar to those of the current user generating the user query, and returns results that have been of interest to the “similar” users. However, collaborative filtering processes tend to fail in the event that the person generating a query is a new user for whom little data is available regarding their behavior and’ /or characteristics.

As another example, some search/recommendation systems identify results to return to the user query by searching for items having semantic similarity to expressions included in the user query. However, searching by semantic similarity may provide an insufficient number of results, or results that do not match the user’s request, in cases where the user is a non-expert unfamiliar with the terminology used in the relevant dataset.

Accordingly, a need exists for improved IoT-oriented search and/or recommendation systems and methods to improve the probability of obtaining a satisfactory number and/or variety of search results.

The present invention has been made in the light of the above issues.

Summary of the Invention

The present invention provides a computer-implemented method of generating an Internet-of-Things-specific knowledge graph, the method comprising:

obtaining an Internet-of-Things dataset describing Internet-of-Things entities by information in one or more fields;

obtaining a general knowledge graph comprising triples each consisting of a first item, a second item and a relationship associating the first item with the second item; and

processing the general knowledge graph to remove a subset of said triples, said sub-set comprising triples both of whose first and second items fail to include any word in a target field, or group of target fields, in the Internet-of-Things dataset.

The obtaining of the Internet-of-Things dataset may consist in generating the dataset, and/or receiving or otherwise accessing the dataset (e.g., from another server, from a remote device or user, etc. ) . Likewise, the obtaining of general knowledge graph may consist in generating the general knowledge graph, and/or receiving or accessing it. There can be gains in efficiency from reusing a pre-existing general knowledge graph.

The present invention further provides a computer-implemented method of generating an Internet-of-Things-specific knowledge graph, the method comprising:

extracting text from a corpus of documents describing the Internet-of-Things or describing Internet-of-Things entities;

forming chunks of the extracted text, said chunks being tagged to indicate parts of speech of words in the chunks;

processing the chunks to generate triples each consisting of a first noun phrase, a second noun phrase and a verb phrase which, in the extracted text, associates the first item with the second item; and

aggregating said triples to form said Internet-of-Things-specific knowledge graph.

The latter method may further comprise:

generating or receiving an Internet-of-Things dataset describing Internet-of-Things entities by information in one or more fields; and

processing the Internet-of-Things-specific knowledge graph to remove a subset of said triples, said sub-set comprising triples both of whose first and second items fail to include any word in a target field, or group of target fields, in the Internet-of-Things dataset.

The step of extracting text from a corpus of documents may comprise using a web crawling or web scraping tool to extract, from websites, text corresponding to one or more selection criterion defined in the web crawling or web scraping tool.

The Internet-of-Things-specific knowledge graph may aggregate triples of an Internet-of-Things-specific knowledge graph produced by removing a sub-set of triples from a general knowledge graph with triples of an Internet-of-Things-specific knowledge graph produced by extracting and chunking text from a corpus of documents.

The present invention still further provides a system to generate an Internet-of-Things-specific knowledge graph, the system comprising a computing apparatus programmed to execute instructions to perform any of the above methods to generate an Internet-of-Things-specific knowledge graph.

The present invention yet further provides a computer program comprising instructions which, when the program is executed by a processing unit of a computing apparatus, cause the processing unit to perform any of the above methods to generate an Internet-of-Things-specific knowledge graph.

The present invention still further provides a computer-readable medium comprising instructions which, when executed by a processor of a computing apparatus, cause the processor to perform a method according any of the above methods to generate an Internet-of-Things-specific knowledge graph.

The present invention yet further provides a computer-implemented method of searching in an Internet-of-Things dataset comprising descriptions of Internet-of-Things entities, said descriptions comprising information in one or more fields, the method comprising:

receiving a query comprising one or more input words;

determining whether input words in the query correspond to a first item or a second item in a triple of an Internet-of-Things-specific knowledge graph generated by any of the above methods;

in the event of determining that an input word in the query corresponds to a first item in a triple of said Internet-of-Things-specific knowledge graph, adding the second item of the triple to a keyword list;

in the event of determining that an input word in the query corresponds to a second item in a triple of said Internet-of-Things-specific knowledge graph, adding the first item of the triple to a keyword list;

searching in the Internet-of-Things dataset using keywords in the keyword list as search terms, to generate a first set of search results; and

outputting search results based on said first set of search results.

The latter search/recommendation method enables a set of numerous and varied search results to be provided in response to a user query even in a case where (a) little data is available regarding user behavior and characteristics, thus overcoming the above-described disadvantage of search/recommendation methods based on collaborative filtering, or (b) the expressions used in the user query are badly matched to the terminology used in the IoT dataset, thus overcoming the above-described disadvantage of search/recommendation methods based on semantic similarity.

In the above-mentioned computer-implemented searching method, the words in the keyword list may be sorted by frequency of word occurrence, and one or more keywords occurring the least frequently in the keyword list may be removed before searching in the Internet-of-Things dataset.

In the above-mentioned computer-implemented searching method, the first set of search results may be filtered according to the degree of correspondence between the searched field and the respective keyword, to generate an adjusted first set of search results, and the outputting step may then output search results based on said adjusted first set of search results.

In the above-mentioned computer-implemented searching method, the filtering of the first set of search results according to the degree of correspondence between the searched field and the respective keyword may comprise:

filtering out search results consisting of only one word, unless said word is the same as one of the keywords that has been used as a search term generating the first set of search results, and

filtering out search results consisting of n words, where n is an integer equal to or greater than 2, unless two or more of the words in said search result are keywords that have been used as search terms generating the first set of search results.

The above-mentioned computer-implemented searching method may further comprise:

searching in the Internet-of-Things dataset for objects whose searched fields contain content having semantic similarity with said input words, to generate a second set of search results; and

combining the first and second sets of search results to produce the results outputted in the outputting step.

The present invention yet further provides a search system configured to search in an Internet-of-Things dataset comprising descriptions of Internet-of-Things entities, said descriptions comprising information in one or more fields, said system comprising a computing apparatus programmed to execute instructions to perform any of the above-described searching methods.

The present invention still further provides a computer program comprising instructions which, when the program is executed by a processing unit of a computing apparatus, cause said processing unit to perform any of the above-described searching methods.

The present invention yet further provides a computer-readable medium comprising instructions which, when executed by a processor of a computing apparatus, cause the processor to perform to perform any of the above-described searching methods.

Brief Description of the Drawings

Further features and advantages of the present invention will become apparent from the following description of certain embodiments thereof, given by way of illustration only, not limitation, with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram illustrating a computer-implemented search method according to a first embodiment of the invention, employing a new IoT-specific knowledge graph;

FIG. 2 is a flow diagram illustrating a computer-implemented hybrid search method according to a second embodiment of the invention, combining searching by the method according to FIG. 1 and searching based on semantic similarity;

FIG. 3 is a flow diagram illustrating a computer-implemented search process, based on semantic similarity, that may be employed in the method according to FIG. 2;

FIG. 4 is a flow diagram illustrating an example of a computer-implemented text-processing process that may be used in a method, according to an embodiment of the invention, of generating an IoT-specific knowledge graph;

FIG. 5 is a diagram illustrating an example of a computer-implemented method, according to an embodiment of the invention, of generating an IoT-specific knowledge graph;

FIG. 6 is a simplified representation of an example of a region in an IoT-specific knowledge graph; and

FIG. 7 illustrates deletion of a triple from a knowledge graph.

Detailed Description of Example Embodiments

The present invention provides embodiments of IoT-oriented search and/or recommendation systems and computer-implemented methods that make use of IoT-specific knowledge graphs, as well as to embodiments of methods and systems to generate IoT-specific knowledge graphs.

A computer-implemented search method 100 according to a first embodiment of the invention will now be described with reference to FIG. 1. This method 100 employs a new IoT-specific knowledge graph that will be described in greater detail below.

The method 100 illustrated by FIG. 1 searches an Internet-of-Things dataset so as to provide search results/recommendations in response to a user query. The user generating the query to the search system may be a human (e.g., an engineer or application-developer) , but the invention is not limited to this example and is applicable to queries generated by non-human agents, for example, bots and software applications. The entities whose details are listed in the IoT dataset may correspond to “things” in the IoT (e.g., specific instances of network-enabled devices, including, for instance, specific smart doorbells, smart vehicles, voice-control communication devices, etc. ) , but the invention is not limited to this example and is applicable to IoT datasets that describe other entities, for example, IoT datasets which list details of IoT ontologies (e.g., object classes) .

Typically, each entity in the IoT dataset is described by data in a plurality of fields. The example of method 100 that will be described in relation to FIG. 1 searches in a field of the IoT dataset that lists the name of the entity, but the embodiment is not limited to this example and the searched field may be a field other than the name of the entities listed in the IoT dataset, or a plurality of fields including or excluding the “name” field.

The search results/recommendations generated by the method 100 illustrated by FIG. 1 identify entities in the IoT dataset whose searched field (name) does not necessarily have semantic similarity to the search terms included in the user query but, instead, the searched field (name) includes at least one expression which, according to an IoT-specific knowledge graph, is associated with a search term in the user query. The IoT-specific knowledge graph used in method 100 contains a plurality of triples embodying information regarding associations that occur in the field of the Internet-of-Things, each triple having the general form:

Item 1 –Relationship –Item 2,

for example, a triple T may have the content indicated in Table 1 below.

TABLE 1

Item 1	Relationship	Item 2
greenhouse	contains	grow-lights

Typically, the IoT-specific knowledge graphs used in embodiments of the present invention include a very large number of triples. For the purposes of illustration, FIG. 6 represents a highly-simplified example of a region in an IoT-specific knowledge graph. In the representation illustrated in FIG. 6, the relationship in each triple is represented by an arrow extending from a first oval (representing Item 1 of the triple) to a second oval (representing Item 2 of the triple) . In this example representation, a node of the knowledge graph represented by a rectangle corresponds to an object ontology. Of course, a given object ontology (e.g., “robot” ) may be involved in a plurality of triples (as Item 1 and/or as item 2) .

In the discussion below it is considered that the IoT-specific knowledge graph used in embodiments of the invention includes triples t _j, where j is an integer from 1 to j _max.

More information will be provided below regarding the knowledge graphs used in the invention, and techniques for generating them.

The method 100 illustrated by FIG. 1 begins with reception of a query (S101) . Typically, the query is received from a remote user over a wired or wireless network connection, or from a local user using a local user interface, but the invention is not limited in regard to the manner by which the query is formulated/arrives at the search system.

The received query is pre-processed (S102) to determine which expression (s) therein to use as “input words” for the purposes of the search process. The pre-processing may include various different operations. Thus, for example, user search queries may be expressed in sentences or phrases that include expressions that are not helpful for search purposes (e.g., stopwords such as conjunctions, definite and indefinite articles, etc. ) . The pre-processing may include an operation of removal of stop words, for example by exploiting a list of stopwords provided in the Natural Language Toolkit (NLTK) , a Python library commonly used in the field of natural language processing. The NLTK provides easy-to-use interfaces through which more than 50 corpora and vocabulary resources (such as WordNet) may be accessed. Removal of stopwords might, for example, take an input query “I am looking for smart grow-lights” and extract as input words “smart” , “grow” , and “lights” .

A set of input words d _i is generated by the pre-processing (S102) . Let us consider that i takes integer values from 1 to i _max, where i _max represents the number of words in the set of input words.

Method 100 proceeds to identify expressions which, according to an IoT-specific knowledge graph (IoT KG) , are associated with the input words d _i determined from the user query. The identified expressions are used to form a keyword list (KWL) . This process may be implemented according to the loops S103-S107 illustrated in FIG. 1.

More specifically, for a first one of the input words, d ₁, an assessment is made (S103) as to whether or not this input word is included in one of the items in a first triple t ₁ of the IoT KG. If input word d ₁ is included in an item in the first triple t ₁, the method determines (S104) whether d ₁ is included in Item 1 or in Item 2 of triple t ₁. If d ₁ is included in Item 1 of triple t ₁, Item 2 of triple t ₁ is added to the keyword list, KWL. For example, if d ₁ is “greenhouse” and the triple t ₁ under consideration corresponds to triple T in the table above, then “grow-light” is added to the keyword list. On the other hand, if d ₁ is included in Item 2 of triple t ₁, Item 1 of triple t ₁ is added to the keyword list, KWL. For example, if d ₁ is “light” and the triple t ₁ under consideration corresponds to triple T in the table above, then “greenhouse” is added to the keyword list.

If the assessment made in step S103 indicates that d ₁ is not included in t ₁, or after addition of words to the keyword list following S104, the flow continues to assessment of whether or not d ₁ is included in the second triple t ₂ of the IoT KG (see the loop from the “NO” output of S103, through S106 to the input to step S103 in FIG. 1) . The method systematically evaluates whether or not d ₁ is included in the different triples of the IoT KG until all triples have been considered, i.e., until a stop condition, j = j _max, of step S106 is met.

The method then moves on to assessing whether the next of the input words, d ₂, (if there is one) is or is not included in the triples of the IoT KG. More specifically, when the stop condition of step S106 is met, provided that there is still an input word d _i that has not yet been checked, the process increases the value of i by one, resets the value of j to 1, and loops back to the input of S103. The loop from the “NO” output of S107 through steps S103-S106 is repeated until all of the input words has been checked, i.e., until a stop condition, i=i _max, of S107 has been met.

When the stop condition of S107 has been met, i.e., it has been checked, for each of the input words di, whether or not this input word is included in an item of a triple of the IoT KG, the full keyword list has been established. It may be advantageous to filter and/or sort the full keyword list based on word frequency (see S108) . Thus, for example, words that occur in the keyword list only once or twice might be considered to have a very weak association to the input words and, thus, might be considered as unlikely to produce useful or relevant search results. If desired, the keyword list may be cut down so as to include only a single instance of each of the X keywords that occurred most frequently in the full keyword list, with X being a number that may be set in any desired manner (e.g., it may be predetermined, it may be set by user choice, etc. ) .

The effective keyword list (which either includes a single instance of each of the words in the full keyword list at the output of S107, or is the cut-down keyword list at the output of S108) is then used to search in the IoT dataset (S109) . More particularly, in step S109 each keyword in the effective keyword list is used as a search term to query the target field (e.g., the “name” field) in the IoT dataset.

In some cases, it may be appropriate to apply pre-processing to the contents of the searched fields of the IoT dataset before they are queried using the keywords of the effective keyword list. For example, the IoT dataset being searched may employ compound names for object ontologies, for example “ParkingEquipmentOrServiceFacilityStatus” , “ParkingSite” , or “Parking_lot” . It may be desirable to pre- process such compound names to split the single object ontology name into several component words, such as splitting based on the presence of an underline “_” or a capital letter. Additionally, it may be beneficial to apply pre-processing to remove stopwords (e.g. as discussed above in relation to step S102 of FIG. 1) . Pre-processing of the aforementioned type might produce from an object ontology name “ParkingSite” the component words “parking” and, “site” , and from an ontology name “Parking_lot” the component words “parking” and, “lot” , and from the object ontology name “ParkingEquipmentOrServiceFacilityStatus” the component words “parking” , “equipment” , “service” , “facility” , “status” .

In order to assure a suitable degree of relevance of the search results produced by the method, the search results obtained by step S109 are then filtered (S110) to exclude entities for which the contents of the searched field only had a low degree of correspondence to a keyword in the effective keyword list. Thus, in this example where the searched field corresponds to the name of an IoT thing or a name of an IoT ontology, the following filtering rules are applied:

Rule 1: for IoT dataset entities having a one-word name:

delete the entity from the search results unless the entity name = keyword in the effective keyword list.

Rule 2: for IoT dataset entities having a name consisting of 2 or more words (or·components) :

delete the entity from the search results unless the entity name contains 2 or more keywords in the effective keyword list.

The results output from S110 may be output in any desired manner, notably, these search results may be returned to a remote user by sending a message over the relevant wired or wireless connection, they may be provided to a local user by any suitable means (e.g., printing out, display on a screen) , etc. The results output from S110 may be output in any desired form but, typically, they are presented to the user in the form of an ordered list of the entities found by the search. Thus, before outputting the search results, method 100 may include a step (S111) of sorting/ordering the search results based on a sorting criterion. In preferred implementations of step S111, the sorting criterion is semantic similarity between the search result (e.g., entity name) and the input query. If desired, only the Q top results may be output, with the value of Q being set in any desired way (e.g., it may be predetermined, set by the user, etc. ) . If desired, entities may be discarded from the search results if the degree of semantic similarity between the entity and the input query is below a determined threshold value.

The search results output from S110 (or S111) may then be used (S112) by the originator of the search query. The nature of the use of the search results varies depending on the application targeted by the person (machine, software, etc. ) that generated the search query. The scope of use cases is very large but some examples are provided below for the purposes of illustration, not limitation:

· An application developer wishing to exploit existing IoT devices, and/or the data they produce, in order to be able to implement a new technical system generates a search query to find IoT devices and/or IoT data having defined properties/functionality and, upon receiving search results identifying the IoT devices and/or IoT data having the desired properties/functionality, generates or adapts an application (software code, API, etc. ) to include one or more modules or functions employing the identified IoT devices and/or IoT data.

· A maintenance engineer involved in an operation of repairing or managing a device generates a search query to find out comparable devices operating in a defined local geographical region and, upon obtaining search results identifying a group of local devices, uses the results to formulate a request for data output from the identified local devices;

· A user device having a low battery level generates a search query for available devices capable of delivering the desired charging function and, upon obtaining search results identifying a group of suitable charging devices, activates a map function to determine (and. Optionally, to display) distances/routes to one or more of the identified devices.

The search/recommendation method according to the first embodiment of the invention, illustrated by the example in FIG. 1, enables a set of numerous and varied search results to be provided in response to a user query even in a case where little data is available regarding user behavior and characteristics, thus overcoming the above-described disadvantage of search/recommendation methods based on collaborative filtering.

The search/recommendation method according to the first embodiment of the invention, illustrated by the example in FIG. 1, enables a set of numerous and varied search results to be provided in response to a user query even in a case where the expressions used in the user query are badly matched to the terminology used in the IoT dataset (e.g. because of a user’s lack of experience in regard to the relevant IoT dataset) , thus overcoming the above-described disadvantage of search/recommendation methods based on semantic similarity.

A preferred embodiment of the invention will now be described, as a second embodiment of the invention, with reference to FIGs. 2 and 3. FIG. 2 is a flow diagram illustrating a computer-implemented hybrid search method 200 according to the second embodiment of the invention, combining searching by the method according to FIG. 1 and searching based on semantic similarity so as to yet further enrich the search results. FIG. 3 is a flow diagram illustrating a computer-implemented search process, based on semantic similarity, that may be employed in the method according to FIG. 2.

The method 200 illustrated by FIG. 2 begins with reception of a query (S201) and determination of input words d _i (S202) . Steps S201 and S202 may be performed in the same manner as steps S101 and S102 of the method 100 described above with reference to FIG. 1 and so, for conciseness, no further details are given here.

In the hybrid method 200 according to the second embodiment, the input words d _i are used (S203) , according to steps S103-S100 of the method 100 of the first embodiment, in a process to search the IoT dataset based on keywords identified based on an IoT-specific knowledge graph. Further, the input words d _i are used (S204) to perform a search in the IoT dataset based on semantic similarity between the input words and the content of the searched field (s) in the dataset. A suitable process for performing the search based on semantic similarity will be described below with reference to FIG. 3. The results of the searches performed in S203 and S204 are combined (S205) , and the combined results are output in any desired form (e.g., returned to a remote user over a wired or wireless connection, printed out, displayed, etc. ) .

The two types of search results may be integrated with one another in step S205 in various different ways. According to preferred implementations of the second embodiment, a selected number Y of results from the knowledge-graph-based search is combined with a certain number Z of results of the semantic-similarity-based search to produce the hybrid search results. The numbers Y and Z may be set in any desired manner, for example, they may be pre-determined, specified by the user who inputs the query, and so on. In general, the selected Y, Z results are the “best” results of each search methodology, i.e., the top Y, Z results when the search results are ordered according to a sorting criterion (which may be, for example, degree of semantic similarity between the relevant search result and the input query) .

According to preferred implementations of the second embodiment, the KG-based search results are interleaved in a regular pattern among the semantic-similarity-based search results. In a preferred implementation, Y and Z have the same value and the KG-based search results are caused to alternate with the semantic-similarity-based search results in a hybrid results list that is then output to the user.

The search results output from S206 may then be used (S207) by the originator of the search query, for example as described above in relation to step S112 of FIG. 1.

The hybrid search/recommendation method according to the second embodiment of the invention, illustrated by the example in FIG. 2, provides corresponding advantages to the first embodiment and enables a yet richer set of search results to be provided in response to a user.

FIG. 3 will now be described as an example of a computer-implemented search method 300, based on semantic similarity, that may be used to implement step S204 in FIG. 2. It should be understood that the second embodiment is not limited to use of the method of FIG. 3 for performance of the search based on semantic similarity; to the contrary, various known computer-implemented, semantic-similarity-based search techniques may be used instead.

In the method 300 according to the example of FIG. 3, a representation of the semantic content of the search query is generated in a process S301. In preferred implementations of the method 300, process S301 involves generating a vector representation of the meaning of the search query. A technique for generating such a vector representation may comprise:

Step S301a which generates a word vector for each of the input words d _i identified for this search query, each word vector being a vector representation of the meaning of the word in question, and

Step S301b which generates an average word vector for the overall set of input words d _i. Step S301b is unnecessary if there is only one input word for the query being processed.

Various techniques may be used in step S301a to generate the desired word vectors, for example a trained word2vector model, trained using a corpus of data, to generate a vector representation, in a number of dimensions M, encoding the semantic meaning of input words, may be used to generate a word vector for each input word d _i. In preferred implementations of step S301a, in order to avoid bias or artefacts that might be caused by the semantic context of the word vector dataset used in training the neural network (or other machine learning architecture) embodying the word2vector model, the word vectors are generated using the word2vector model offered by Stanford University (this architecture having been trained using general knowledge from Wikipedia) . Currently, the dataset has a total of 400,000 words and their corresponding vectors, and the vectors are 50-dimensional. Using the Stanford model/dataset, a respective word vector w _i is generated for each input word d _i.

Various techniques may be used in step S301b to generate the desired overall vector representing the search query/set of input words di. In preferred implementations of step S301a, in order to express the semantic content of a search query/set of input words d _i containing multiple words, we use the method of calculating the average word vector (AWV) of the phrase to represent its semantic vector. Assuming that there are N input words d _i for the search query, and the word vector of each word is Wi, then the semantic vector Wave of the phrase is:

The method 300 is not limited to implementing process S301 according to steps S301a and S301b described above. Thus, for example, another method for implementing S301 can be to use a sentence2vector model, such as google BERT (Bidirectional Encoder Representations from Transformers) , which can generate a vector with desired dimensionality (from 128 to 768 dimensions) for an input query sentence directly.

In the method 300 according to the example of FIG. 3, after process S301 has generated the representation of the semantic content of the search query (e.g. the average word vector Wave) this representation is then compared with representations of the same type (e.g. average word vectors, Wave, produced by the techniques described above in connections with S301a and S301b) generated in respect of the entities in the IoT dataset being searched, in order to filter the IoT dataset based on semantic similarity (S302) . The representation of the semantic content of the entities in the IoT dataset may be generated after the IoT entities have been pre-processed. For example, a compound object ontology name may be converted into a group of component words (e.g., in the manner described above in relation to step S109 of FIG. 1) and then word vectors may be generated for each component word, and an average word vector produced for the overall ontology name.

Various different techniques may be used in step S302 to evaluate the similarity between representations of the semantic content of the search query and the semantic content of the searched field (s) of the entities in the IoT dataset. In the case where the search query and the content of the searched field (s) are represented by vectors encoding semantic meaning, known methods for evaluating the similarity of two vectors may be employed, such as determining the Euclidean distance or Manhattan distance between the vectors. However, when using Euclidean distance or Manhattan distance to calculate similarity of vectors representing semantic meaning of text, the determined value of similarity is easily affected by individual abnormal data, leading to instability. Accordingly, in preferred implementations of step S302, cosine similarity (spatially representing the angle between two vectors) is used to quantify the similarity between the vector representing the semantic meaning of the search query and the vectors representing the content of the searched field (s) for the entities in the IoT dataset.

For two N-dimensional semantic vectors W1 and W2, the similarity calculation formula is as follows:

where W ₁ is the word vector representing the search query, W ₂ is the vector representing the content of a specific instance of the searched field, and N is the number of dimensions of these word vectors. W _1i and W _2i, i=1, …, N denote respectively the components of the word vector W ₁ and of the word vector W _2.

Before outputting the results of the semantic-similarity-based search process, method 300 may include a step (S303) of sorting/ordering the search results based on a sorting criterion. In preferred implementations of step S303, the sorting criterion is the already-determined degree of semantic similarity. If desired, the output search results may include only the R top results, with the value of R being set in any desired way (e.g., it may be predetermined, set by the user, etc. ) . If desired, entities may be discarded from the search results if the degree of semantic similarity between the entity (e.g., entity name) and the input query is below a determined threshold value

with the determined threshold

being set in any desired way (e.g., it may be predetermined, set by the user, etc. ) .

As mentioned above, the preferred embodiments of the invention make use of a new IoT-specific knowledge graph that has been developed by the present inventors. Methods to construct the new IoT-specific knowledge graph are described below, with reference to FIGs. 4 and 5.

Two methods are proposed for use in establishing the new IoT-specific knowledge graph:

Method A: Derive the new IoT-specific knowledge graph using, as the initial data source, an existing general knowledge graph or IoT knowledge graph (e.g., one of the known knowledge graphs of these kinds accessible via internet) and applying adaptation processing thereto.

In a preferred implementation of Method A, the adaptation processing applied to a general knowledge graph comprises filtering the general knowledge graph using IoT entity data in an IoT dataset (e.g., the IoT dataset in which the searching is to be performed) . More specifically, for each triple in the general knowledge graph, if the two items and the relationship between them have no words that appear in the selected entity data of the IoT dataset this triple is deleted from the knowledge graph, to adapt it more closely to the targeted IoT domain. Thus, for example, for each triple in the general knowledge graph, if the two items and the relationship between them have no words that appear in the “name” field of entities in the IoT dataset, this triple is deleted from the general knowledge graph.

FIG. 7 illustrates a simplified example of deletion of a triple. In the example illustrated in FIG. 7, a region in the general knowledge graph includes, in addition to IoT-specific triples as illustrated in FIG. 6, an additional triple tx consisting of the knowledge “gas -can cause –laughing” , where “gas” is Item 1, “can cause” is the relation, and “laughing” is Item 2. In FIG. 7, triple tx is indicated using a dashed outline. In this example, triple tx is the only triple in the knowledge graph that includes the item “laughing” . Accordingly, when triple tx is deleted from the general knowledge graph, not only is there deletion of a connection from node “gas” to node “laughing” but also, the node corresponding to “laughing” is deleted from the knowledge graph.

Method B: Generate the new IoT-specific knowledge graph from scratch using, as the initial data source, a corpus of texts relating to the Internet-of-things.

In a preferred implementation of Method B, a web crawler/scraping software tool is used to extract target data from webpages. The invention is not particularly limited having regard to the specific web crawling/scraping tool used in this implementation but, for example, SCRAPY may be used as it provides a fast web crawling framework in which selectors determining which data is to be extracted can be specified in the form of selectors in standards defined by the World Wide Web Consortium (e.g., CSS (Cascading Style Sheet) selectors or Xpath selectors) . Thus, for example, SCRAPY may be used to crawl the news titles, tags, abstracts and article body of articles and scientific papers in journals relating to the Internet of Things (e.g., the IOT module of IEEE XPlore) . The extracted data is stored in a specified format, for example JSON (JavaScript Object Notation) format. The extracted data is then processed to generate triples to constitute the IoT-specific knowledge graph.

According to one example, the triples constituting the IoT-specific knowledge graph have the general form indicated in Table 2 below:

TABLE 2

Item 1	Relationship	Item 2
Noun Phrase 1	Verb Phrase	Nous Phrase 2

The processing of the texts to generate triples of the above form may be implemented according to the example described below.

According to the present example, the texts (e.g., article bodies, and so on) are processed to form blocks, or chunks, from which the triples are then generated. The present invention is not particularly limited having regard to the steps by which the texts are processed to form the chunks/blocks. However, according to a preferred example, the processing of the texts to produce the chunks/blocks employs a chunk-generation method comprising sentence segmentation, word segmentation, part-of-speech tagging, and block partition. A convenient approach for performing the chunk-generation process is to make use of tools from the NLTK, as the NLTK includes a set of text processing libraries for classification, tokenization, stemming, parsing and semantic inference.

FIG. 4 illustrates an example of a chunk-generation method 400 that may be used in the present invention. The example illustrated in FIG. 4 illustrates the application of a sequence of the following processes –sentence tokenizing (S401) , word tokenizing (S402) , removal of stop words (S403) , tagging of parts-of-speech (S404) and chunking (S405) –to input text consisting of the following words extracted from an article:

“A LoRaWAN-based solution is bringing visibility to assets, equipment and cargo, thereby decreasing energy consumption, improving passenger comfort, making operations more efficient and reducing flight delays. ”

Many corpora in NLTK have parts of speech already marked for specified words, so the method 400 can exploit this marking rather than making its own determination as to the relevant part of speech for these words. The aim of chunking step S405 is to divide the vocabulary into meaningful blocks. In this example, one of the main goals of chunking is to group so-called "noun phrases" (NP) and “verb phrases” (VP) . More specifically, in a preferred implementation of chunking step S405, the part-of-speech tags are combined with regular expressions to separate the sentence into ‘VP’ blocks and ‘NP’ blocks. The ‘NP’ block contains various possible combinations of nouns, and the ‘VP’ block contains various possible combinations of verbs.

The final results of the application of chunking method 400 to this input text consists of the following 12 chunks including words from the input text and their assigned part-of-speech tags:

LoRaWAN-based solution : [ ‘NP’ , ‘JJ NN’ ]

bringing : [ ‘VP’ , ‘VBG’ ]

visibility assets : [ ‘NP’ , ‘NN NNS’ ]

equipment cargo : [ ‘NP’ , ‘NN NN’ ]

decreasing : [ ‘VP’ , ‘VBG’ ]

energy consumption : [ ‘NP’ , ‘NN NN’ ]

improving : [ ‘VP’ , ‘VBG’ ]

passenger comfort : [ ‘NP’ , ‘NN NN’ ]

making : [ ‘VP’ , ‘VBG’ ]

operations efficient : [ ‘NP’ , ‘NNS JJ’ ]

reducing : [ ‘VP’ , ‘VBG’ ]

flight delays : [ ‘NP’ , ‘NN NNS’ ]

where, according to the NLTK, the assigned part-of-speech tag ‘NP’ signifies a noun phrase, ‘JJ NN’ signifies that the noun phrase in question consists of an adjective followed by a noun in the singular, ‘NN NNS’ signifies that the noun phrase in question consists of a noun in the singular followed by a noun in the plural, ‘NN NN’ signifies that the noun phrase in question consists of a noun in the singular followed by a noun in the singular, and ‘NNS JJ signifies that the noun phrase in question consists of a noun in the plural followed by an adjective, ‘VP’ signifies a verb phrase, and ‘VBG’ signifies that the verb phrase in question consists of a present participle or gerund.

When extracting a relationship from a sentence, a first 'NP' block appearing in the sentence is designated as Noun Phrase 1 for the knowledge graph triple under construction, the subsequent 'NP' block is designated as Noun Phrase 2 for the knowledge graph triple under construction, and the 'VP' block between these two NP blocks is designated as the relationship between the entities. So, for example, the triple shown in Table 3 below may be extracted from an input text segment “vehicles are sent to dealerships” :

TABLE 3

Noun Phrase 1 (Item 1)	Verb Phrase (Relationship)	Nous Phrase 2 (Item 2)
vehicles	sent	dealerships

Optionally, the set of triples generated by Method B may be filtered, for example using the content of the field (or fields) that is (are) to be searched in the targeted IoT dataset, so as to ensure that only relevant triples are retained in the IoT-specific knowledge graph.

In certain preferred embodiments of the invention a new IoT-specific knowledge graph is generated by combining Method A and Method B. An example of such an approach will be described below, with reference to FIG. 5, in the context of a particular application wherein the IoT-specific knowledge graph was designed in view of use in searching within an IoT dataset that corresponds to content of the “Thing in the future” platform (hereafter Thing’in) established by Orange.

Thing’in establishes and maintains a graph of things which, like a social network providing similar functionality for people, is composed of physical things and entities (the nodes) and logs the relationships between them. Avatars, or “digital twins” , on this platform are digital representations of things. A core aspect of the platform is the indexation of real-world devices and objects, along with their descriptions, according to a defined ontology. Thing’in maintains a structural and semantic definition not only of the IoT devices (connected objects) themselves and their relationships, but also of the environments in which the connected objects are located. The environment may include objects that may not, themselves, be IoT devices but which are described by ontologies included in the Thing’ In dataset (e.g. streetlight, fountain, car park) . The techniques of the present invention may be applied to search for such objects and/or ontologies.

The Thing’in platform is equipped to support expansion by addition of new objects/devices. Thing’in is a multi-interface platform which engineers and IoT app developers can connect to and interact with at its front end, and data providers can do so at its back end. Thus, the number of objects logged on the platform evolves over time. The platform has respective APIs that allow data providers to insert objects, and that allow engineers and IoT app developers to connect to and interact with logged objects. Thing’in” may be considered to embody a knowledge graph of its own. However, preferred embodiments of the present invention make use of above-described Method A and/or Method B in order to generate IoT-specific knowledge graphs to use in the searching methods according to the invention, so as to enable exploration of more related entities.

More specifically, the knowledge graph developed in this example was designed in view of searching in the names of IoT object ontologies which are defined, in dictionary format, on the Thing’in, taking 40371 Thing’in object ontologies into account. (As mentioned above, the number of ontologies and objects logged on the Thing’in platform is evolving and, indeed, it has now exceeded 121, 537 object ontologies and 54,949,802 objects) .

In the present example a hybrid, IoT-specific knowledge graph was generated consisting of two parts, a first part produced using above-described Method A and a second part generated according to above-described Method B.

Method A was applied taking, as the initial data source, the YAGO general knowledge graph. YAGO is a project started by the Max Planck Institute in Germany in 2007 and it combines the knowledge of WordNet and Wikipedia. That is, YAGO uses the ontology knowledge of WordNet to supplement the hypernym knowledge of entities in Wikipedia to obtain a large-scale high-quality, high-coverage database. YAGO comprises knowledge of 120 million facts relating to more than 10 million entities. In this example, the YAGO general knowledge graph was filtered using the names of the 40371 Thing’in ontologies (represented by S501 in FIG. 5) .

Method B was applied to generate triples from text extracted from articles and scientific literature crawled from RFIDJournal, IEEEXplore and ScienceDirect. RFIDjournal is currently the world's largest and most important website focusing on RFID. The abstracts of all 44 articles in ScienceDirect’s IOT magazine at the time were crawled, as well as the abstracts of 5000 articles in the IOT module of IEEE Xplore. Tools in the NLTK library were employed for sentence segmentation, word segmentation, part-of-speech tagging, and block partition. This process yielded 97, 662 triples from the IEEE &ScienceDirect sources and a total of 80107 triples from the RFID journals. Finally, the list containing all Thing’in object ontology names was used to filter the triples generated from the articles (see S502, S503 and S504 in FIG. 5) .

The triples generated in steps S501 to S504 were aggregated to form the IoT-specific knowledge graph for use in knowledge-graph-based searching, notably searching according to the above-described first and second embodiments of the invention in an IoT dataset.

The IoT-specific knowledge graph produced according to the example illustrated in FIG. 5 was used in a search system implementing a method according to the second embodiment of the invention to perform searches in the names of ontologies in the Orange Thing’in platform. In this example application, the semantic-similarity-based searching (S204 in FIG. 2) made use of the word vector dataset offered by Stanford University to generate average word vectors representing the search queries and representing the names of the Thing’in ontologies, respectively. Then the semantic similarity between the search query and the Thing’in ontology names was assessed using cosine similarity. The top N results were retained. The knowledge-graph-based searching (S203 in FIG. 2) was performed using the knowledge graph produced by the method described in FIG. 5 and the top N results were retained. The results were integrated (step S205 in FIG. 2) to form a results list in which the entity names listed in the odd positions of the list were results from the semantic-similarity-based searching (ordered from the most relevant to the least relevant of the retained N results) and the entity names listed in the even positions of the list were results from the knowledge-graph-based searching (likewise ordered from the most relevant to the least relevant of the retained N results) .

The methods described above are conveniently put into practice as computer-implemented methods. Thus, systems according to the present invention may be implemented on a general-purpose computer or device having computing capabilities, by suitable programming of the computer. In several applications the search systems according to the invention make comprise servers of IoT platforms, such as one or more servers supporting Orange’s Thing’in platform.

Accordingly, the present invention provides a computer program containing instructions which, when executed on computing apparatus, cause the apparatus to perform the method steps of one or more of the methods described above.

The present invention further provides a non-transitory computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform the method steps of one or more of the methods described above.

Additional Variants

Although the present invention has been described above with reference to certain specific embodiments, it will be understood that the invention is not limited by the particularities of the specific embodiments but, to the contrary, that numerous variations, modifications and developments may be made in the above-described embodiments within the scope of the appended claims.

For example, although the specific embodiments described above relate to IoT-oriented searching, corresponding techniques may be applied to search in other domains. For example, in the domain of e-commerce, little behavioral or characteristic data may be available in respect of new users, and such users may have little knowledge of the terminology used by the e-commerce platform, and yet it may be desired to enable them to benefit from searching/recommendation of products. In such a case, in order to be able to recommend products related to a user search, a knowledge graph specific to products (notably the products available on the e-commerce platform) may be created using techniques based on those described above with reference to FIGs. 4-6, and search techniques may be employed, using the product-specific knowledge graph, based on the methods described above with reference to FIGs. 1-3.

Incidentally, the specific embodiments described above focus on the processing of search queries generated by a human user. However, as mentioned above, the query could be generated by a non-human agent: for example, upon detection that a trigger condition has been met (e.g., low battery level) , software in a user’s chargeable device (smartphone, electric car) could launch a query (e.g., to find a nearby IoT device available to charge the battery) .

Claims

A computer-implemented method of generating an Internet-of-Things-specific knowledge graph, the method comprising:

obtaining an Internet-of-Things dataset describing Internet-of-Things entities by information in one or more fields;

obtaining a general knowledge graph comprising triples each consisting of a first item, a second item and a relationship associating the first item with the second item; and

processing the general knowledge graph to remove a subset of said triples, said sub-set comprising triples both of whose first and second items fail to include any word in a target field, or group of target fields, in the Internet-of-Things dataset.
A computer-implemented method of generating an Internet-of-Things-specific knowledge graph, the method comprising:

extracting text from a corpus of documents describing the Internet-of-Things or describing Internet-of-Things entities;

forming chunks of the extracted text, said chunks being tagged to indicate parts of speech of words in the chunks;

processing the chunks to generate triples each consisting of a first noun phrase, a second noun phrase and a verb phrase which, in the extracted text, associates the first item with the second item; and

aggregating said triples to form said Internet-of-Things-specific knowledge graph.
A computer-implemented method of generating an Internet-of-Things-specific knowledge graph according to claim 2, and comprising:

obtaining an Internet-of-Things dataset describing Internet-of-Things entities by information in one or more fields; and

processing the Internet-of-Things-specific knowledge graph to remove a subset of said triples, said sub-set comprising triples both of whose first and second items fail to include any word in a target field, or group of target fields, in the Internet-of-Things dataset.
A computer-implemented method of generating an Internet-of-Things-specific knowledge graph according to claim 2 or 3, wherein the step of extracting text from a corpus of documents comprises using a web crawling or web scraping tool to extract, from websites, text corresponding to one or more selection criterion defined in the web crawling or web scraping tool.
A computer-implemented method of generating an Internet-of-Things-specific knowledge graph according to any one of claims 2 to 4, and comprising aggregating the triples in said Internet-of-Things-specific knowledge graph with triples produced by the method of claim 1.
A system to generate an Internet-of-Things-specific knowledge graph, said system comprising a computing apparatus programmed to execute instructions to perform a method according to any one of claims 1 to 5.
A computer-implemented method of searching in an Internet-of-Things dataset comprising descriptions of Internet-of-Things entities, said descriptions comprising information in one or more fields, the method comprising:

receiving a query comprising one or more input words;

determining whether input words in the query correspond to a first item or a second item in a triple of an Internet-of-Things-specific knowledge graph generated by a method according to any one of claims 1-5;

in the event of determining that an input word in the query corresponds to a first item in a triple of said Internet-of-Things-specific knowledge graph, adding the second item of the triple to a keyword list;

in the event of determining that an input word in the query corresponds to a second item in a triple of said Internet-of-Things-specific knowledge graph, adding the first item of the triple to a keyword list;

searching in the Internet-of-Things dataset using keywords in the keyword list as search terms, to generate a first set of search results; and

outputting search results based on said first set of search results.
The computer-implemented search method according to claim 7, and comprising:

sorting the keyword list by frequency of word occurrence, and removing one or more keywords occurring the least frequently in the keyword list, before searching in the Internet-of-Things dataset.
The computer-implemented search method according to claim 7 or 8, and comprising:

filtering the first set of search results according to the degree of correspondence between the searched field and the respective keyword, to generate an adjusted first set of search results;

wherein the outputting step outputs search results based on said adjusted first set of search results.
The computer-implemented search method according to claim 9, wherein:

the filtering of the first set of search results according to the degree of correspondence between the searched field and the respective keyword comprises:

filtering out search results consisting of only one word, unless said word is the same as one of the keywords that has been used as a search term generating the first set of search results, and

filtering out search results consisting of n words, where n is an integer equal to or greater than 2, unless two or more of the words in said search result are keywords that have been used as search terms generating the first set of search results.
The computer-implemented search method according to any one of claims 7 to 10, and comprising:

searching in the Internet-of-Things dataset for objects whose searched fields contain content having semantic similarity with said input words, to generate a second set of search results; and

combining the first and second sets of search results to produce the results outputted in the outputting step.
A search system configured to search in an Internet-of-Things dataset comprising descriptions of Internet-of-Things entities, said descriptions comprising information in one or more fields, said system comprising a computing apparatus programmed to execute instructions to perform a method according to any one of claims 7 to 11.
A computer program comprising instructions which, when the program is executed by a processing unit of a computing apparatus, cause said processing unit to perform a method according to any one of claims 1 to 5 to generate an Internet-of-Things-specific knowledge graph, or a search method according to any one of claims 7 to 11.
A computer-readable medium comprising instructions which, when executed by a processor of a computing apparatus, cause the processor to perform a method according to any one of claims 1 to 5 to generate an Internet-of-Things-specific knowledge graph, or a search method according to any one of claims 7 to 11.