NL2031551B1

NL2031551B1 - Computer-implemented method of extracting knowledge

Info

Publication number: NL2031551B1
Application number: NL2031551A
Authority: NL
Inventors: He Gaole; Yang Jie; Marie Anne Balayn Agathe; Hu Andrea; Kumar Gadiraju Ujwal
Original assignee: Univ Delft Tech
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2023-11-03

Abstract

A computer-implemented method of extracting knowledge; the method comprising at least one iteration, wherein the or each iteration comprises: - presenting a board to at least two users, the board comprising a plurality of items, wherein at least one, preferably exactly one, item of the plurality of items is selected as a target item; - receiving a question from a user of the at least two users, wherein the question is intended to identify the target item; - receiving from another user of the at least two users an answer in reaction to the question; - receiving from the user of the at least two users a selection of items of the board, in reaction to the answer, wherein the selection of items comprises at most all items except one item; and - processing the received question, the received answer and the received selection of items, in order to determine at least one knowledge tuple configured for linking at least one item of the plurality of items to at least one semantic feature of the question.

Description

Computer-implemented method of extracting knowledge

TECHNICAL FIELD

The present disclosure relates to knowledge extraction. Particular embodiments relate to a computer-implemented method of extracting knowledge, to a computer program, to a computer-readable storage medium, and to a data processing apparatus.

BACKGROUND

Within the field of knowledge engineering, research is done on developing methods to gather knowledge. This is conventionally done by interrogating humans through simple interfaces or through more complicated interactions, such as games with a purpose (GWAP), by mining textual resources, or by logically reasoning about known facts to infer new facts.

Knowledge can be categorised according to its intended use, varying from explicit knowledge to tacit knowledge (i.e. common sense) and from discriminative knowledge (i.e. how concepts differ) to generative knowledge (i.e. what attributes belong to concepts).

While generative knowledge broadly corresponds to information about different entities, discriminative knowledge allows to identify differences between these entities, which allow to grasp subtle aspects of meaning and contribute to the progress in computational modelling of meaning. The importance of negative knowledge has recently also been highlighted, as this may enhance knowledge bases for knowledge exploration and question answering. It has been proposed to leverage negative statements as clues to help players find answers to specific questions.

SUMMARY

It is a shortcoming of conventional approaches to gather knowledge that they fail to extract tacit knowledge. It is believed that this is due to those approaches relying on structured, factual knowledge (e.g. Wikipedia). Common sense knowledge is however not easily available in a structured format.

It is therefore an aim of at least some embodiments according to the present disclosure to extract discriminative and/or tacit knowledge from common sense of humans. It is further noted that discriminative and negative knowledge can be transformed into generative knowledge via post-processing.

Accordingly, there is provided in a first aspect of the present disclosure a computer- implemented method of extracting knowledge. The method comprises at least one iteration. The or each iteration comprises the following steps. Presenting a board to at least two users, the board comprising a plurality of items, wherein at least one, preferably exactly one, item of the plurality of items is selected as a target item.

Receiving a question from a user of the at least two users, wherein the question is intended to identify the target item. Receiving from another user of the at least two users an answer in reaction to the question. Receiving from the user of the at least two users a selection of items of the board, in reaction to the answer, wherein the selection of items comprises at most all items except one item. Processing the received question, the received answer and the received selection of items, in order to determine at least one knowledge tuple configured for linking at least one item of the plurality of items to at least one semantic feature of the question.

It is an insight of the inventors that iterations played between two or more users, wherein questions are asked and answers to those questions are given, based on items of a board, allow to define useful, discriminative and/or tacit knowledge.

Moreover, by receiving a selection of items of the board, the defined knowledge can be applied, either positively or negatively, to the items.

Preferably, the question is also intended to discriminate amongst the plurality of items.

In at least some preferred embodiments, the method comprises, based on the at least one determined knowledge tuple, at least one of the following: performing an inference task; controlling a computer-based agent to interact with its physical environment; and controlling a machine element to operate within its physical environment.

In at least some embodiments, the method comprises training a machine learning model based on the at least one determined knowledge tuple, and using the model for at least one of the following: performing an inference task; controlling a computer- based agent to interact with its physical environment, and controlling a machine element to operate within its physical environment.

In at least some embodiments, the method is performed respectively for each user of the at least two users, preferably in an adversarial game format.

In at least some embodiments, if the method comprises two or more iterations, the method comprises iterating to a next iteration, wherein the selection of items or an inverse of the selection of items are removed from the board, depending on whether the received answer of the last iteration was negative or affirmative, and wherein the target item is preferably maintained.

In at least some embodiments, the method is performed respectively for each user of the at least two users, preferably in an adversarial game format, and iterating to the next iteration comprises alternating the users of the at least two users from whom to receive the question, the answer and the selection of items.

In at least some embodiments, the answer is chosen from the following, or is logically equivalent to one of the following: “yes”; and “no”; and the step of processing the received question, the received answer and the received selection of items, in order to determine the at least one knowledge tuple configured for linking at least one item of the plurality of items to at least one semantic feature of the question, comprises: if the answer is or is logically equivalent to “yes”: constructing at least one knowledge tuple comprising a positive relation based on semantic features of the question; an input value based on semantic features of the question; a first object from amongst an inverse of the selection of items; and a second object from among the selection of items; and constructing at least one knowledge tuple comprising a negative relation based on semantic features of the question; an input value based on semantic features of the question; a first object from amongst the selection of items; and a second object from among an inverse of the selection of items; and if the answer is or is logically equivalent to “no”: constructing at least one knowledge tuple comprising a negative relation based on semantic features of the question; an input value based on semantic features of the question; a first object from amongst an inverse of the selection of items; and a second object from among the selection of items; and constructing at least one knowledge tuple comprising a positive relation based on semantic features of the question; an input value based on semantic features of the question; a first object from amongst the selection of items; and a second object from among an inverse of the selection of items.

In at least some embodiments, the answer is or is logically equivalent to “maybe”, and the method comprises suppressing the step of determining the at least one knowledge tuple.

In at least some embodiments, the answer is or is equivalent to “unclear”, and the method comprises: sending a notification to the user from whom the question was received, the notification notifying the user to rephrase the user's question; and receiving another question from the user.

In at least some embodiments, the method comprises, prior to the step of presenting the board, generating the board.

In at least some embodiments, the question is a natural-language question.

In at least some embodiments, an initial state of the board is randomized.

In at least some embodiments, the method is executed multiple times, and an initial state of the board of a first execution of the method is configured to partially overlap with an initial state of the board of a second execution of the method.

In a second aspect of the present disclosure, there is provided a computer program comprising instructions configured for, when executed by at least one processor, causing the at least one processor to execute the steps of the method of any previously described embodiment. 5

In a third aspect of the present disclosure, there is provided a computer-readable storage medium storing the above-described computer program.

In a fourth aspect of the present disclosure, there is provided a data processing apparatus comprising at least one processor and at least one memory, the at least one memory storing instructions configured for, when executed by the at least one processor, causing the data processing apparatus to execute the steps of the method of any previously described embodiment.

BRIEF DESCRIPTION OF THE DRAWINGS

The following description is intended to be illustrative and not limiting, and is intended to help the skilled person achieve a more complete understanding of the invention, by means of the appended drawings, in which:

Figure 1 schematically represents a flowchart of an embodiment 100 of a method according to the present disclosure;

Figure 2 schematically represents a setup 200 with a data processing apparatus 201 according to the present disclosure and with two user terminals 204A-B; and

Figure 3 schematically represents an example board 300 for use in at least some embodiments according to the present disclosure.

DETAILED DESCRIPTION

Knowledge can be categorized with different typologies of qualities depending on its envisioned use. It can vary from explicit to tacit, situational to conceptual, discriminative to generative, general to specific, commonsense to expertise, etc.

Although GWAPs have been shown to be promising to efficiently collect knowledge,

the types of knowledge they can support have not been studied extensively, and seem limited, e.g. not discriminative and possibly not tacit.

It is an aim for at least some embodiments according to the present disclosure to collect more diverse knowledge, which may help Al practitioners to perform Al tasks more effectively, and which may also help Al researchers to better characterize the different types of knowledge one can set out to collect through GWAPSs.

Figure 1 schematically represents a flowchart of an embodiment 100 of a method according to the present disclosure.

The method is computer-implemented, and may be implemented on any suitable apparatus, for example the data processing apparatus of Figure 2 described below.

The method comprises at least one iteration, wherein the or each iteration comprises steps 101, 102, 103, 104, and 105.

Step 101 represents presenting a board to at least two users, the board comprising a plurality of items, wherein at least one, preferably exactly one, item of the plurality of items is selected as a target item.

Step 102 represents receiving a question from a user of the at least two users, wherein the question is intended to identify the target item.

Steps 103 represents receiving from another user of the at least two users an answer in reaction to the question.

Steps 104 represents receiving from the user of the at least two users a selection of items of the board, in reaction to the answer, wherein the selection of items comprises at most all items except one item.

Steps 105 represents processing the received question, the received answer and the received selection of items, in order to determine at least one knowledge tuple configured for linking at least one item of the plurality of items to at least one semantic feature of the question.

It will be appreciated by the skilled person how various further developments, for example various more specific embodiments described herein, may be derived from the method represented in Figure 1. For example, to iterate, operation may return to step 101 after step 105 (or another end step if appropriate). Likewise, additional steps may be envisioned with the method, for example in order to perform any of the following additional steps: training a machine learning model based on the at least one determined knowledge tuple (which may follow step 105), and using the model (which may follow the step of training the model) for at least one of the following: performing an inference task; controlling a computer-based agent to interact with its physical environment; and controlling a machine element to operate within its physical environment. In this context, it is noted that these tasks are not necessarily mutually exclusive; for example, for a computer-based agent to interact with its physical environment, the agent may perform an inference task and/or the agent may comprise or may be operatively connected to a machine element that is controlled to operate within the physical environment of the machine element. In that sense, these tasks need not be mutually exclusive.

In a first preferred embodiment, the method may involve two users who may take turns being the Asker (i.e. the user who poses a question) and the Replier (i.e. the user who answers the posed question). At the start of the method, both users may be presented with an identical board of multiple items, e.g. cards, where each item may correspond to a concept with its name, picture (e.g. generated from Google Image Search), and definitions (e.g. taken from WordNet). One item may be selected as a target item, for each user. It is possible to select multiple items as target item, but it is preferred to select only one item as target item. The target item or target items may be selected randomly, or may be selected based on some heuristic.

The goal for each user may be to guess the other user's target item, preferably before the other user or users guess(es) their own target item. This may be achieved by asking questions and reducing possible candidates among the items on the board in a number of iterations. The items on the board can e.g. be flipped, or shaded, or darkened, or reduced in size, or struck through, or any combination thereof, in order to help the users keep track of the possible choices. In order to improve fairness, the option to guess a target item may only be activated after a number of iterations, e.g. after two or three questions, preferably depending on an optional difficulty level, as this may dissuade the users from making early random guesses.

The method may involve multiple iterations, which may be considered to correspond with turns or game rounds. One such iteration will now be explained. At the beginning of the iteration, the Asker may choose an action between asking a question to the replier, and guessing their target card. The guessing action may reward the user if the guess was correct, or may penalize the user if the guess was incorrect. The asking action may involve the Asker formulating a question intended to identify the target item. The question may be received by a knowledge extraction system. The Replier may then answer the question in response. The question may also be received by the knowledge extraction system. In reaction to the answer, the Asker may indicate a selection of items of the board, wherein the selection of items comprises at most all items except one item, e.g. by flipping relevant cards on the board. The selection of items may also be received by the knowledge extraction system. Preferably, afterwards, the users may switch roles, i.e. the Replier may become the next Asker, and the Asker may become the next Replier.

The questions may follow a template of <relation, input>. The relation may be selected among a pre-defined set of relations, preferably based on their commonality, their applicability to nouns, and suitability for the concepts on the board, and may e.g. be extracted from ConceptNet. An example set of relation is be the following: ISA, HasA,

HasProperty, UsedFor, CapableOf, MadeOf, PartOf, AtLocation. The input may be a natural language proposition, preferably limited to at most five words in order to ease post-processing and to help avoid cheating. The input may be manually entered by the

Asker. The Replier may select the answer among four choices: “yes”, “no”, “maybe”, and “unclear question”. If the answer is “unclear question”, the Asker may be invited to pose a new question.

An advantage of using templates of <relation, input> is the potential efficiency of that type of template. It has for example been found that structured, relation-based templates are more efficient at collecting rule-type knowledge and the results are more usable than relying entirely on natural language. Thus, by using a combination of template-based and natural language question formulation, there is an improved configurability of tuning the potential target knowledge

In a further developed embodiment, several measures may be taken to improve user retention and/or to prevent cheating. For example, multiple levels of difficulty may be proposed, e.g. three levels, wherein the number of initial items and the similarity of the concepts on the items may vary, e.g. 8, 16, or 24 initial items. As another example, the other user's target item and a history of relevant questions and answers may be stored and displayed, during and/or after the method, in order to encourage users to answer truthfully. Moreover, the possibility may be provided to report incorrect answers and/or other types of cheating. A further example to prevent cheating and to help better achieve the aim, is to disallow that the item’s concept’s name occurs within the natural language input, because this could otherwise hinder extracting relevant knowledge. In a further developed embodiment, there may be a step of natural language processing to determine a root or a lexeme of the concept, in order to disallow any natural language input featuring that root or lexeme. Moreover, other words may be disallowed from occurring, e.g. other taboo words. Additionally, a leaderboard may be provided to track scores of users. Additionally, a time limit may be set for users to finish their iterations.

Knowledge collected via the method depends on the concepts on the initial items on the board. It may be considered to use very different initial items on the board, in order to extract more general knowledge. Alternatively, it may be considered to use more similar initial items on the board, in order to extract more specific knowledge. In at least some embodiments, a method may be used to automatically populate the board with items by collecting concepts related to a chosen seed concept, e.g. based on concept-similarity measures computed using ConceptNet. For example, the board may be generated with a greedy approach: once a few initial concepts are retrieved for one board, other related ones are appended to the board, either by searching within the

WordNet taxonomy, or by adapting to the task at hand —when one wants to understand the difference between two pre-defined concepts, these two concepts can be added simultaneously).

An example algorithm to generate boards is as follows:

Require: Triple set T, concept set C, board size n. 1: Input: seed concept CO.

2: Output: board g. 3: initialize board g = {c0} 4: fori=1..n-1do 5: ci = MaximizeTripleCover(g, C\g, T) 6: g=gUci 7: end for 8: return g

Another example algorithm to generate boards is as follows:

Require: Question-concept connection set T, question set Q, board size n. 1: Input : seed question q0. 2: Output : board g. 3: initialize board g = ObtainQuestionConcepts(T, q0) 4: initalize covered question set Qc = {q0} 5: while Size(g) <n do 6: gi = MaximizeConceptOverlap(g, Q \ Qc, T) 7: g = g U ObtainQuestionConcepts(T, qi) 8: Qc = FindQuestionCovered(g, Q, T) 9: end while 10: g = FilterSize(g, n) 11: return g

In at least some embodiments, the received question, the received answer and the received selection of items may be processed, e.g. by the knowledge extraction system, in order to determine at least one knowledge tuple configured for linking at least one item of the plurality of items to at least one semantic feature of the question.

Such processing may involve constructing a tuple, preferably along two dimensions.

In a first dimension, each piece of knowledge is either generative or discriminative.

Generative knowledge may be represented using a tuple of the form <concept, relation, input>, whereas discriminative knowledge may be represented using a tuple of the form <concept1, concept2, relation, input>, wherein the <relation, input> applies to concept! but does not apply to concept2. Thus, a discriminating feature between concept1 and concept2 may be encoded. In a second dimension, each piece of knowledge is either positive (the <relation, input> applies to the concept or discriminates between concept1 and concept?) or negative (the <relation, input> does not apply to the concept or does not discriminate between concept1 and concept2).

As soon as the relevant information is available, that is, as soon as the received question, the received answer and the received selection of items are available, these data may be processed. Alternatively, these data may be collected over multiple iterations and/or stored for some time, and may be processed afterwards.

Processing the data may involve simple heuristics. For example, if the answer is “yes”, the <relation, input> may apply positively to all items that were not among the selection of items, i.e. to all cards that were uncovered before receiving the selection of items and that still remained uncovered after receiving the selection of items. In this case, a positive tuple may be constructed, e.g. +<concept, relation, input> or +<concept1, concept, relation input>. Moreover, the <relation, input> may also apply negatively to all items that were among the selection of items. In this case, a negative tuple may be constructed, e.g. -<concept, relation, input> or -<concept1, concept2, relation input>.

In this context, the notation +<concept, relation, input> means that <relation, input> is positive generative knowledge about the concept, i.e. the concept is thus positively defined, whereas the notation -<concept, relation, input> means that <relation, input> is negative generative knowledge about the concept, i.e. the concept is thus negatively defined.

In contrast, for example, if the answer is “no”, the <relation, input> may apply negatively to all items that were not among the selection of items, i.e. to all cards that were uncovered before receiving the selection of items and that still remained uncovered after receiving the selection of items. In this case, a negative tuple may be constructed, e.g. -<concept, relation, input> or -<concept1, concept2, relation input>.

Moreover, the <relation, input> may also apply positively to all items that were among the selection of items. In this case, a positive tuple may be constructed, e.g. +<concept, relation, input> or +<concept1, concept2, relation input>.

A tuple of generative knowledge may correspond to a concept, a relation and a characterizing input (and may thus also be termed a triple, i.e. a 3-tuple), and may for instance take one of the following two possible formats. It may be a positive triple +<concept, relation, input> where the input is text entered by users. For instance, +<teapot, UsedFor, making tea> indicates that the concept teapot can be used for making tea. Negative knowledge may also be collected, as negative triples -<concept, relation, input>, that indicate that the relation and input do not apply to the concept.

For instance, -<teapot, UserFor, running> indicates that the concept teapot cannot be used for running. Discriminative knowledge may also be collected. This knowledge may for example be represented by positive tuples (in this case, 4-tuples, i.e. quadruples), taking the form +<concept1, concept, relation, input>, where the relation and its associated input apply to concept but not to concept2, allowing to discriminate between the two. For instance, <teapot, shoe, UsedFor, making tea> indicates that the concept teapot is different from the concept shoe in that only the teapot can be used for making tea. In contrast, negative quadruples mean that the relation and input do not allow to discriminate between the two concepts.

If the answer is “maybe”, this may be taken to mean that the answer is not a clear “yes” or a clear “no”. In this case, further processing may be required to assimilate this answer into the extracted knowledge.

Figure 2 schematically represents a setup 200 with a data processing apparatus 201 according to the present disclosure and with two user terminals 204A-B. The figure shows a data processing apparatus 201 comprising at least one processor 202 and at least one memory 203, the at least one memory 203 storing instructions configured for, when executed by the at least one processor 202, causing the data processing apparatus 201 to execute the steps of the method of any above-described embodiment, for example the method of Figure 1. The figure further shows that data processing apparatus 201 may comprise one or more (here: two) interfaces 208A-B to communicate with user terminals 204A-B. Each user terminal may be operated by a respective user 205A-B, although variant setups may involve a single user operating multiple user terminals. It is further noted that one or more users may be constituted by artificial intelligences, for example if the data processing apparatus 201 according to the present disclosure is used in a context of automated generative adversarial machine learning.

Figure 3 schematically represents an example board 300 for use in at least some embodiments according to the present disclosure. The board 300 comprises a plurality of items 301, 302, 303, 304, and 305. In this example, there are five items, but there may of course be any suitable number of items, of course normally a plurality of items.

The items 301-305 may be represented on the board 300 in the form of cards, preferably virtual cards, which makes the items easy to discern and manipulate. An example of manipulation is indicated for items 303 and 304, which have been flipped, which is shown by their skewed line fill pattern. This means that the user {not shown) has, in response to an answer of another user, selected these items for flipping, i.e. excluding from further questions and guesses.

Figure 3 further shows optional interface elements 306, 307, 308 and 309. These may for example be buttons providing shortcuts to useful functions, such as responding “ves”, “no”, “maybe”, or “unclear question”.

Below, two example boards will be given, with corresponding types of questions and resulting knowledge tuples.

For a first board, comprising items with the concepts of: floor, window, bathroom, walls, ceiling, chandelier, mirror, bedroom, an explicit question may for example be “Can your card be found inside an apartment®?”. This may for example lead to the tuple <bathroom, AtLocation, inside apartment>. A tacit question may for example be “Can your card be used for decoration?”. This may for example lead to the tuple <chandelier,

UsedFor, decoration>.

For a second board, comprising items with the concepts of: necklace, dress, boots, shoes, pants, trousers, jeans, skirt, an explicit question may for example be “Can your card be found in your wardrobe?”. This may for example lead to the tuple <dress,

AtLocation, wardrobe>. A tacit question may for example be “Is your card typically worn by cowboys?”. This may for example lead to the tuple <boots, HasProperty, worn by cowboys>.

In some further developed embodiments, additional steps may be taken to process the collected data, in order to ensure the validity of the extracted knowledge. For example, depending on the available budget, sessions may be started with the same initial board in order to aggregate and corroborate answers.

In another embodiment, generative and discriminative knowledge may be extracted based on heuristics as follows. After receiving a response from the Replier, the Asker’s flipping card actions provide all information needed to gather new tuples in the form of (+/-)<card?, relation, input>, where “card?” and sign (+/-) are inferred based on whether the card is flipped. Specifically, when the answer to a question is received, the relation and input in the question directly apply to batch A: reserved cards, i.e. the batch of cards that were previously unflipped and that remain unflipped, with the sign corresponding to the answer (yes is +, and no is —). On the contrary, the batch of cards that were previously unflipped and are flipped during the turn (batch B: flipped cards) receives the inverse of the sign of the answer. For example, consider the sequence where the question is “does my card have wings”, the answer is “no”, and then the

Asker flips the “bird” card, we build the knowledge triple +<bird, has, wings>.

Discriminative knowledge may be generated in a similar fashion, using the quadruple template. The concepts in batch A (or batch B) can be combined to form tuples of negative discriminative knowledge (if there are n concepts in a batch, (£) tuples of knowledge are created), while one concept from each batch results in positive, discriminative knowledge (if batch A has k concepts and batch B has m concepts, k * m tuples of knowledge are created). In contrast to generative knowledge, discriminative knowledge is extracted with two concepts in the batch (both A and B) and with a quadruple template. Any concept pair can be gathered to generate discriminative knowledge, which results in ©) tuples of knowledge. Considering one concept from each batch then allows to create positive discriminative knowledge, while both concepts from the same batch result in negative discriminative knowledge.

Initial experiments show that the type of knowledge extracted with the method may vary. Initial items with similar concepts may nudge users to think of tacit knowledge to ask a question about, in order to more efficiently discriminate across multiple concepts.

If there are more items available, this may force users to think of specific pieces of knowledge, especially in later iterations since items that have not yet been selected (i.e. cards that have not yet been flipped on the board) may likely be more similar.

Compared to conventional approaches, at least some embodiments according to the present disclosure may allow for a higher throughput, by making both questions and answers relevant to the extraction of knowledge. As users’ input may contribute to the construction of distinct tuples of knowledge simultaneously, throughput may be improved over conventional approaches wherein interactions allow for the creation of only a single knowledge tuple. To determine the throughput or efficiency, the number of tuples (positive and negative triples and quadruples) resulting from the method may be measured, as well as the fraction of overlapping knowledge tuples generated by the two users across iterations within the method and repeated executions of the entire method. By also considering the average time and number of rounds that an execution of the method lasts as well as its cost, one can measure the throughput and utility of knowledge generation.

Moreover, compared to conventional approaches, at least some embodiments according to the present disclosure may allow to directly extract discriminative and negative knowledge. Conventional approaches require either to directly input concepts in relation to a pre-existing characteristic, or to fill in a template. They do not leave the space for negative inputs, which also removes the opportunity to indirectly elicit discriminative knowledge.

Moreover, compared to conventional approaches, the knowledge elicited by at least some embodiments according to the present disclosure may be, by design, more diverse, in that the knowledge it creates may be more varied since the rules within the templates may be human-generated, and richer than single words (e.g. association of relation and up to 5 words). The correctness and diversity of knowledge may for example be scored manually on the following dimensions: correctness, truth, bias, tacitness, typicality, specificity.

Table 1 shows dimensions on which knowledge tuples may be analysed. Labels correspond to the scales used to gather annotations. In the table, (C) refers to correctness, and (D) refers to diversity.

The above-described approach may help to understand and quantify the characteristics of the generated knowledge. In order to more directly highlight the usefulness for concrete Al tasks of generative and discriminative knowledge extracted by the method according to the present disclosure, the following types of inference tasks benefitting from the above-described knowledge extraction may be considered:

Applications in search and ranking: * Queries and query analysis (e.g., query intent, query understanding, query suggestion and prediction, query representation and reformulation, spoken queries). * Web search (e.g., ranking at web-scale, link analysis, sponsored search, search advertising, adversarial search and spam, vertical search). * Retrieval models and ranking (e.g., ranking algorithms, learning to rank, language models, retrieval models, combining searches, diversity, aggregated search, dealing with bias). * Efficiency and scalability (e.g., indexing, crawling, compression, search engine architecture, distributed search, metasearch, peer-to-peer search, search in the cloud). * Theoretical models and foundations of information retrieval and access (e.g., new theory, fundamental concepts, theoretical analysis).

Applications in content recommendation, analysis and classification:

* Filtering and recommendation (e.g., content-based filtering, collaborative filtering, recommender systems, recommendation algorithms, personalized recommendation). * Document representation and content analysis (e.g., summarization, text representation, linguistic analysis, readability, NLP for search, cross-lingual and multilingual search, information extraction, opinion mining and sentiment analysis, clustering, classification, topic models).

Applications in improving Machine Learning (ML) models and Natural Language

Processing (NLP) for search and recommendation: * Intelligent personal assistants and agents * Question answering (e.g., factoid and non-factoid question answering, interactive question answering, community-based question answering, question answering systems). * Conversational systems (e.g., conversational search interaction, dialog systems, spoken language interfaces, intelligent chat systems). * Explicit semantics (e.g. semantic search, named-entities, relation and event extraction). * Knowledge representation and reasoning (e.g., link prediction, knowledge graph completion, query understanding, knowledge-guided query and document representation, ontology modeling).

Applications to improve human behavior modeling: * Mining and modeling users (e.g., user and task models, click models, log analysis, behavioral analysis, modeling and simulation of information interaction, attention modeling).

* Interactive search {e.g., search interfaces, information access, exploratory search, search context, whole-session support, proactive search, personalized search). * Social search (e.g., social media search, social tagging, crowdsourcing). * Collaborative search (e.g., human-in-the-loop, knowledge acquisition). * Information security (e.g., privacy, surveillance, censorship, encryption, security).

Domain-specific applications: * Local and mobile search (e.qg., location-based search, mobile usage understanding, mobile result presentation, audio and touch interfaces, geographic search, location context in search). * Social search (e.g., social networks in search, social media in search, blog and microblog search, forum search). * Search in structured data (e.g., XML search, graph search, ranking in databases, desktop search, email search, entity-oriented search). * Multimedia search (e.g., image search, video search, speech and audio search, music search). * Education (e.g., search for educational support, peer matching, info seeking in online courses}. * Legal (e.g., e-discovery, patents, other applications in law). * Health (e.g., medical, genomics, bioinformatics, other applications in health).

* Knowledge graph applications (e.g. conversational search, semantic search, entity search, KB question answering, knowledge-guided NLP, search and recommendation). * Other applications and domains (e.g., digital libraries, enterprise, expert search, news search, app search, archival search, new retrieval problems including applications of search technology for social good).

Although the skilled person will appreciate that many programming languages, frameworks and libraries may be suitable for developing working implementations of the method according to the present disclosure (and the related computer program) and that these may be implemented on any suitable hardware, e.g. personal computers or smartphones, it is currently preferred to implement the overall computer program as a web application, with a backend API to manage the game logic, written in Python and Flask (for simplicity and fast setup), and with a frontend to render the game screens, written in React javascript library in conjunction with Redux state library, which allows unidirectional data flow, making it predictable, easy to test and flexible. The Python programming language is well-known, and Flask, React and

Redux are well-known frameworks/libraries. Real-time communication may preferably be achieved using SocketlO, which is a library for low-latency, bidirectional and event- based communication between a client and a server and which is available with implementations for many different programming languages. The communication between the backend and frontend preferably uses classic HTTP REST API for user information, JWT authentication and WebSocket for game lobbying and gameplay, allowing for continuous and bidirectional data flow between the server and client. All game data are preferably stored in a PostgreSQL database.

Advantageously, this allows for a large number of simultaneous games, which speeds up the process of extracting knowledge.

Oo = 8 _ = 52 Sz ® a 0 Csi —~ wm! - » = O | ow mene oO a } hd 0 i — oO 5 Sssessseass — oO i ov i O = i

STIIT = | D 5 S5o 8

Dz Sad 2 SI | | 5 | 93 oT

Io 5 oo 3 ol o Cc 0 meerde i Oo!

Do DEE 5285057 OS!

ECE 2x8 s28°0gss co SEE i

Eze =e Q SES c 5 3 DIS 2 o2® 3a 28 20 eee cosi 223 9 GS GE

A

GSE Zh 32 a © Si cocifazhy OST op355 Sp gO ong DO = 1238738 0750 29002 = Sale HEE EEE) 0 E ToS GS: = © 2 2 883 5s w 3S al So33 o | ou 392% sg: 2 ze < a2 Caos = —_ i

A oS 09 0 Oi 23 cl 0ooC ci <= 3 i

SE53 0 cB ea aq <9! i oO c GS! . i daw © 35 Z 238 9K o Sz = os i = a= SE 98 a 9 i cco i 7 a ®a ZIE i 9 o 7. SST 228 a 98 | eg <5 8 bi oes

Da 7 908 =. 5 2. £3 Qi Soa i

QT 3 =| 203 o3 37 ® | zoë

CC soe = : ai Ss 0 ni 77e i 3 295 33 | NEE oi ooo 27 Soa Q o 3. = : 5 DS oc a 0538 2 3 = o J a | S32 0 c | DT © 350 eal + + 3 | pS =1 Zoi +AA } | a = i i 200 + | oo 3283 = = 53a ALA D CES 53h 588 ZN © 1 : 2: Saa SA

Sg 5% AAA i D 4 oT

De: 342 DET 9 = 5==I SSA C33 ++ i i ~0 9 SL seg ool + =F I59 ay 22 tt52 za” DEP Cr gg 533% 14 ~ = 35 0 R= 3 <cos + +m 598 EE SEL S253 SIX 392 Son? So 8 u. Qe PD Qo gg Do Jo 8 ow a _ 253

DZT ® Qo ons PO] 3 0% i a i =ge 292 = = = o> 22

S= 7x voe oO TO: DO] Go. = — _ 232 EP Tio pO i

ST 3 350 j | | vels ~a i ~ 33 oo vos o =i To 3. DT ‚a 9 3 DS; 0 J Tos i i +35 xD =. : ® SE. 202% & 2 i 52g 200 225 sä 398% 8 - gs—e 2 Ss A | © SZ 9 0 a 2 530 SB <i SD > © Suf GE” 3823 23

D =o = » 0° | i GD 5 i > © 85 £3 | os 58 3 28 ~ | | 5855 9 5 : i = i 0 i 7 = | Soos OQ = | = a; 8 3 i 5 = S3c2 ai = Q | i 9 Zi c © 0 : i 39 i wn i = > | | SoS | D » | | © Tj > | = S 3 i

D | | S = | S le | 3 g | ol 3 5 a

Claims

CONCLUSIONS

1. Computer-implemented method for extracting knowledge; wherein the method comprises at least one iteration, wherein the or each iteration comprises: - presenting a board to at least two users, which board comprises multiple items, wherein at least one, preferably exactly one, item of the multiple items is selected as an target item; - receiving a query from a user of the at least two users, where the query is intended to identify the target item; -receiving from another user of the at least two users an answer in response to the question; - receiving from the at least two users a selection of items from the board, in response to the response, where the selection of items includes at most all items except one item; and - processing the received question, the received answer and the received selection of items, to determine at least one knowledge tuple adapted to link at least one item of the multiple items with at least one semantic feature of the question.

Method according to claim 1, comprising training a machine learning model based on the at least one particular knowledge tuple, and using the model for at least one of the following: - performing an inference task; - controlling a computer-based agent to interact with its physical environment; and - controlling a machine element to operate within its physical environment.

Method according to any one of the preceding claims, wherein the method is respectively executed for each user of the at least two users, preferably in a hostile game format.

4. Method according to any one of the preceding claims, wherein, if the method comprises two or more iterations, the method comprises iterating to a next iteration, wherein the selection of items or an inverse of the selection of items are removed from the board, depending of whether the response received from the last iteration was negative or affirmative, preferably retaining the target item.

Method according to the preceding claim, wherein the method is respectively executed for each user of the at least two users, preferably in an adversarial game format, and wherein the iteration to the next iteration includes alternating the users of the at least two users of who will receive the question, answer and selection of items.

A method according to any one of the preceding claims, wherein the answer is selected from, or is logically equivalent to, one of the following: “yes”; and “no”; and wherein the step of processing the received question, the received answer and the received selection of items, to determine the at least one knowledge tuple arranged to link at least one item of the plurality of items with at least one semantic feature of the question, includes: if the answer is “yes” or is logically equivalent: - constructing at least one knowledge tuple comprising a positive relationship based on semantic characteristics of the question; an input value based on semantic characteristics of the question; a first object of an inverse of the selection of items; and a second object of item selection; and - constructing at least one knowledge tuple comprising a negative relationship based on semantic characteristics of the question; an input value based on semantic characteristics of the question; a first object of item selection; and a second object of an inverse of the selection of items; and if the answer is “no” or is logically equivalent: - constructing at least one knowledge tuple comprising a negative relationship based on semantic characteristics of the question; an input value based on semantic features of the query, a first object of an inverse of the selection of items; and a second object of item selection; and - constructing at least one knowledge tuple comprising a positive relationship based on semantic characteristics of the question; an input value based on semantic characteristics of the question; a first object of item selection; and a second object of an inverse of the selection of items.

Method according to any one of the preceding claims, wherein the answer is "maybe" or is logically equivalent thereto, and wherein the method comprises suppressing the step of determining the at least one knowledge tuple.

8. Method according to any one of the preceding claims, wherein the answer is "unclear" or equivalent, and wherein the method comprises: - sending a notification to the user from whom the question has been received, which notification notifies the user to to reformulate user questions; and - receiving another query from the user.

9. Method according to any one of the preceding claims, comprising generating the board prior to the step of presenting the board.

10. Method according to any one of the preceding claims, wherein the question is a question in natural language.

11. Method according to any one of the preceding claims, wherein an initial state of the board is randomized.

12. A method according to any one of the preceding claims, performed several times, wherein an initial state of a first embodiment of the method is designed to partially overlap with an initial state of a second embodiment of the method.

13. Computer program comprising instructions designed to, when executed by at least one processor, induce the at least one processor to perform the steps of the method according to any of the preceding claims.

14. Computer-readable storage medium with the computer program according to claim 13 stored thereon.

15. Data processing device comprising at least one processor and at least one memory, on which at least one memory instructions are stored which are designed to, when executed by the at least one processor, cause the data processing device to perform the steps of the method according to one of the following: previous conclusions.