WO2022234273A1 - Procédé et appareil de traitement de données de projet - Google Patents

Procédé et appareil de traitement de données de projet Download PDF

Info

Publication number
WO2022234273A1
WO2022234273A1 PCT/GB2022/051134 GB2022051134W WO2022234273A1 WO 2022234273 A1 WO2022234273 A1 WO 2022234273A1 GB 2022051134 W GB2022051134 W GB 2022051134W WO 2022234273 A1 WO2022234273 A1 WO 2022234273A1
Authority
WO
WIPO (PCT)
Prior art keywords
project
items
knowledge graph
data
user
Prior art date
Application number
PCT/GB2022/051134
Other languages
English (en)
Inventor
Matthew James HOBBY
David Alexander BETTERIDGE
Torsten WOLTER
Original Assignee
Hivemap Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hivemap Limited filed Critical Hivemap Limited
Publication of WO2022234273A1 publication Critical patent/WO2022234273A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management

Definitions

  • the present invention relates to a project data processing method and apparatus.
  • the present invention relates to a computer implemented project data processing method for making predictions based on project data using a supervised machine learning algorithm or a Hidden Markov Model (HMM) algorithm.
  • the present invention further relates to a computer implemented method of training a supervised machine learning algorithm or a HMM algorithm for making such predictions.
  • the present invention further relates to an apparatus for implementing the prediction method or the training method.
  • the predictions may relate to one or more of a project impediment such as project delay, a project quality issue or a project safety issue.
  • the project data relates to a construction project.
  • a construction project for example the construction of a new building, requires a project manager and others responsible for implementing the project to review, process and act on a very large amount of data.
  • This data may arrive in a range of different formats, for instance emails, documents, messaging or proprietary project management system data.
  • On a project of even modest scale there is a risk that the volume of data generated may become overwhelming and key project data may be overlooked.
  • project delays, project quality issues and project safety issues, that were predictable based on the project data are only identified as they occur (if at all). That is, it may be that data indicating one or more of these types of issues may have been included within received project data, perhaps even explicitly (such as a received email explicitly identifying an issue) yet that data was overlooked, not fully appreciated or not acted upon due to information overload.
  • a computer implemented project data processing method comprising: receiving project data; performing natural language processing to extract canonical items from the project data; processing the canonical items to form a knowledge graph indicative of relationships between items; determining at least one metric parameterising the knowledge graph or items within the knowledge graph and a change to the at least one metric over time; and predicting a project impediment using a trained supervised machine learning algorithm or a HMM algorithm based on changes to the at least one metric over time.
  • a computer implemented method of training a supervised machine learning algorithm or a HMM algorithm for project data to predict a project impediment comprising: receiving project data for at least one project including at least one identified project impediment that occurred during that project; performing natural language processing to extract canonical items from the project data; processing the canonical items to form a knowledge graph indicative of relationships between items; determining as a training set at least one metric parameterising the knowledge graph or items within the knowledge graph and a change to the at least one metric over time together with a data label comprising the at least one identified project impediment that occurred during the project; and training the supervised machine learning algorithm or HMM algorithm to identify changes in the at least one metric over time indicative of at least one identified project impediment that occurred during that project.
  • a computer- readable storage medium having computer-readable program code stored therein that, in response to execution by a processor, cause the processor to perform the method of either of the above aspects of the present invention.
  • an apparatus comprising a processor and a memory storing executable instructions that, in response to execution by the processor, cause the apparatus to perform the method of either of the above aspects of the present invention.
  • examples of the present invention make use of machine learning or Artificial Intelligence (Al) to cut through the noise of project data to form predictions of potential impediments (also referred to as project issues) within a project while it remains possible to take preventative action.
  • Examples of the present invention also present project data to users in a structured format that allows both a simplified overview of current status as well as a deeper dive into possible issues. Examples of the present invention allow the user to drill down into the project data through specific queries such as “what’s the delivery date for the concrete?” or “what impact will the 4 th floor reworks have on the schedule?”, and locate the answer no matter what data format the answer was present in, alongside related data irrespective of their original source.
  • queries such as “what’s the delivery date for the concrete?” or “what impact will the 4 th floor reworks have on the schedule?”
  • FIG. 1 is a schematic overview of a project data processing system in accordance with an example of the present invention
  • Figure 2 is a first example of a user interface presenting a navigable knowledge map according to an example of the present invention
  • Figure 3 is a second example of a user interface presenting a navigable knowledge map according to an example of the present invention
  • Figure 4 is an example of a user interface presenting a settings menu according to an example of the present invention
  • Figures 5A and 5B show an example of a user interface presenting search results according to an example of the present invention
  • Figure 6 is an example of a user interface presenting a status feed according to an example of the present invention.
  • Figure 7 is an example of a user interface presenting an action list according to an example of the present invention.
  • Figure 8 is an example of a user interface presenting an action item dialog box according to an example of the present invention.
  • Figure 9 is an example of a user interface presenting a content manager according to an example of the present invention.
  • Figure 10 is an example of a user interface presenting a project team editing interface according to an example of the present invention.
  • Figure 11 is a flow chart illustrating a computer implemented project data processing method using a supervised machine learning algorithm according to an example of the present invention
  • Figure 12 is a flow chart illustrating a first part of a machine learning pipeline according to an example of the present invention.
  • Figure 13 is a flow chart illustrating a second part of a machine learning pipeline according to an example of the present invention.
  • Figure 14 is a flow chart illustrating a third part of a machine learning pipeline according to an example of the present invention.
  • Figure 15 is a flow chart illustrating a fourth part of a machine learning pipeline according to an example of the present invention.
  • Figure 16 is a graphical representation of project delay prediction according to an example of the present invention
  • Figure 17 is a graphical representation of a first example use of a Hidden Markov Model (HMM) to predict project delays according to an example of the present invention
  • HMM Hidden Markov Model
  • Figure 18 is a flow chart illustrating a computer implemented method of training a supervised machine learning algorithm for project data according to an example of the present invention
  • Figure 19 is a schematic overview of network according to an example of the present invention
  • Figure 20 illustrates a computing apparatus according to an example of the present invention
  • Figure 21 illustrates sentence dependency between tokens in an example sentence according to an example of the present invention.
  • Figure 22 is a graph showing correlations between parameters extracted according to an example of the present invention and delay predictions
  • Figure 23 is a graph showing a precision-recall curve indicating the efficacy of an example of the present invention.
  • Figure 24 is a graphical representation of a second example use of a HMM to predict project delays according to an example of the present invention.
  • a project data processing method acts to extract insights from project data through the use of Al to review and process unsorted or unstructured project data in the format, for instance, of reports, documents and emails.
  • the volume of data is distilled down to the core pieces of information to allow users to make informed decisions.
  • Examples of the present invention allow the user to form queries to answer complex questions.
  • a project data processing method acts to bring together related information - no matter the filetype, storage location or tool used to generate, transfer or store that information.
  • examples of the present invention operate to build a knowledge graph of a project’s people, places and activities, automatically group connected items into topics, summarise topics by key sentences and the people involved, with links to the original source, automatically keep the user up to date with changes to the key project aspects.
  • the knowledge graph may be visually represented in the format of a navigable knowledge map allowing the user to drill down into the detail of selected topics.
  • the knowledge graph may further be used to predict future project impediments before they happen.
  • predictions of a project impediment for instance a project delay, a project quality issue or a project safety issue
  • suitable predictions that may be made will be particular to the type of project data being processed.
  • predictions that can be made can be considered to be any form of quantitative prediction (such as the length of a project delay) or a binary classification (the presence or absence of a project delay).
  • the ability to form quantitative predictions or binary classifications from project data is limited only to the availability of a quantitative or binary classification dataset with which to train an Al algorithm to recognise salient features in the unstructured project data and resulting knowledge graph.
  • FIG 1 a system 100 for processing project data according to an example of the present invention is schematically presented. It will be appreciated that the schematic overview of figure 1 does not necessarily imply any physical or network structure. Rather figure 1 should be considered as a logical representation of the key aspects of a system according to an example of the present invention.
  • Figure 1 shows a data ingestion section 102 and a map section 104.
  • at the heart of examples of the present invention is a process of receiving (or ingesting) project data, processing that project data and forming a knowledge graph. That knowledge graph, or a portion of the knowledge graph, may be visually represented in the form of a knowledge map 104.
  • a search section 106 allows a user to search for information by interacting with the knowledge map 104.
  • An actions section 108 generates user actions from the network of information within the knowledge graph.
  • a status feed section 110 provides a user with updates based upon information within the knowledge graph.
  • a predictions section 112 allows for predictions to be made based upon changes in information within the knowledge graph. The process of making predictions is described in greater detail in the description below in connection with figure 15. The predictions section 112 also feeds into the status feed section 110 to present those predictions to a user.
  • a user such as a project manager may interact with the system 100 of figure 1 through a graphical user interface on a suitable computing device.
  • a suitable computing device This may be, for instance, a PC, laptop or mobile computing device such as a smartphone or tablet.
  • the graphical interface may be presented to the user through a conventional browser, for instance by the user navigating to and logging into a system website.
  • access to the system may be through a bespoke application.
  • Suitable techniques for permitting system access will be well known to the skilled person and the present invention is not limited to any particular technique, nor indeed to the following examples of suitable graphical user interfaces which may be presented to a user.
  • a system 100 processes disparate sources of project data, processes that data to identify particular topics (for instance, based on people, places, activities, materials etc) and forms connections between those topics to form a knowledge graph that may be schematically presented to the user as a knowledge map, at section 104 of figure 1.
  • the system 100 may be considered to begin with the data ingestion section 102.
  • the data ingestion section 102 operates continuously such that as new project data is identified it is drawn into the system and used to continuously update the knowledge graph.
  • the terms knowledge graph and knowledge map are used loosely interchangeably. However, in the context of examples of the present invention these terms are not interchangeable and should be precisely understood.
  • a knowledge graph comprises the underlying network of topics and relationships
  • a knowledge map is a visual representation of part or all of the knowledge graph.
  • Information in the map may be compressed, omitted or otherwise obfuscated in order to enhance clarity for the user relative to the underlying graph.
  • An analogy may be drawn with a tube/metro map whereby the knowledge graph represents the actual connections with all entities represented, akin to a civil engineering blueprint of the train tracks.
  • the knowledge map may filter out some entities considered irrelevant or noisy, potentially moving topics closer to each other based on some external knowledge, akin to the passenger tube map. The analogy holds in that the graph represents fact whereas the map enhances usability [0017]
  • Data ingestion may comprise the appropriate identification and capture of a wide range of different sources of project data.
  • authorised system users may manually identify and provide project data to the system.
  • project data may be automatically captured and transferred to a system server.
  • this may be automatically transferred to the system without requiring user action.
  • this may be set up in advance through a user template identifying which emails or documents are pertinent to a project, for instance on the basis of key words contained within an email subject heading, sender and recipient information or content within the email.
  • a template or rule set may be established in advance so that only email data pertinent to a particular project is identified and ingested (as is expanded upon in the following description). This is also desirable to preserve the privacy of data unrelated to a project.
  • a further example of how email content may be provided to a project data processing system according to an example of the present invention comprises the use of a bcc (blind carbon copy) email address.
  • a bcc blind carbon copy
  • an instantiation of the present invention residing at https://app.hivemap.io may provide a project workspace email address such as myproject@app.hivemap.io.
  • a user can include this email address in the to, cc or bcc list of an email, in which case the project workspace will receive the email along with any attachments, as a virtual user and process the data accordingly as it would process a manual upload.
  • the data that is ingested may comprises substantially any form of text data, for instance any data that may be provided as an attachment to an email, such as stakeholder reports, daily site diaries, contractor and quality assurance (QA) reports, design outlines and daily communications (along with the content of the email itself).
  • data may be ingested in the form in any known file format, for instance PDF, DOCX and XLSX, PPTX and MPP documents.
  • image data included within documents may also be processed.
  • messaging data may be captured, for instance Whatsapp (via Twilio) for text messages.
  • Project data may be obtained by taking advantage of readily available Application Programming Interface (API) tools for suitable sources of data.
  • API integration may allow access to emails (for instance through Microsoft Outlook or Google Mail) and cloud storage providers (for instance Microsoft OneDrive, Google Drive or Apple iCIoud).
  • a rule set also referred to as a template
  • the user may create a rule set whereby all emails with a.n.other@example.com in the to, cc or bcc list and the words “Project X”: in the email subject heading be routed to the project x within the present invention.
  • API integration may allow access to Aconex (for site management reports), Primavera (for project plans), FieldView (for site reports) or a Common Data Environment (CDE, for construction project information).
  • Aconex for site management reports
  • Primavera for project plans
  • FieldView for site reports
  • CDE Common Data Environment
  • speech obtained from an audio file or a video file may also be processed.
  • user phone calls may be received and processed by converting to text and then processed in an analogous fashion to email content.
  • automatically generated data may also be captured and processed.
  • sensor data from a project site for instance, weather data, equipment tracker, maintenance tracker, concrete sensor, motion detector, biometric site access and ground settlement data
  • a project site for instance, weather data, equipment tracker, maintenance tracker, concrete sensor, motion detector, biometric site access and ground settlement data
  • Project data uploaded to or captured by the system is stored (as well as being processed) and may be accessed by authorised team members via a standard web browser.
  • a native mobile application may be used, applying an instant messaging style interface to encourage near real-time exchange of data between project members, while retaining records of who did, said or accessed which piece of data and when.
  • FIG 2 this illustrates a first example of a navigable knowledge map 104 according to an example of the present invention.
  • the navigable map decreases the time required to get up to speed across multiple documents simultaneously.
  • the user interface is centred around summarising content in a visual manner via a navigable map.
  • nodes within the map 104 are either individual items being discussed in the project content (particularly either a subject or an object referenced in the project data, for which see the discussion below concerning subject-object-predicate triples) or topics: groups of strongly related items and/or sub topics (a hierarchically lower topic contained within a higher level topic).
  • the description below beginning at figure 11 provides an explanation how entities may be extracted from project data, subjected to natural language processing to form items, and grouped to form topics, but in brief items and topics are automatically identified and labelled using an Al or machine learning algorithm.
  • the items and topics are relatively sparse. It has been shown that humans can only manage approximately seven items (in this instance, to include topics) in memory simultaneously. As a consequence, the number of items or topics shown is purposefully limited, aggregating items and subtopics together so as to maximise the modularity clustering metric of the underlying knowledge graph.
  • a further advantage of this clustering is that at the lowest level of the knowledge map, dependent on the underlying project data, some of the entities and their relationships may be noisy and so the presentation of the lowest level of the knowledge map may not be insightful for the user.
  • This aggregation may be performed is presented in the Newman 2006 paper referenced and described later in the present disclosure.
  • Items and topics extracted from the raw project data are provided with a hierarchical structure. Hence, as the user zooms in further, they discover more and more granular detail. That is, a user is able to double-click upon, or otherwise select, a node causing an expansion of that node opening up the hierarchical structure to reveal further items or sub-topics within that node.
  • FIG. 2 shows two types of node within the knowledge map 104, as explained by the key in the upper left part.
  • a high level cluster 200 is indicated by a solid circle.
  • a high level cluster 200 comprises a topic containing at least one hierarchically lower level of items or sub-topics within it, though of course it will be understood that the precise way in which nodes are presented is not germane to the present invention. For instance, it may be that all nodes are solid regardless of whether they possess children.
  • the size of a node may indicate or be dependent upon the number of items or subtopics within it.
  • a low level cluster 202 is indicated by a hashed circle in figure 2, though again this is only one example of its visual representation.
  • a low level cluster 202 comprises only a single item and it is not possible to drill down further. That is, a node which is a low level cluster 202 is a single item within the knowledge map.
  • a third visual element illustrated within the knowledge map 104 of figure 2 is a note 204 which a user is able to click upon to read.
  • a note relates to an action item as described below in connection with figures 7 and 8.
  • a user is able to interact with a node (whether a topic or an item) by adding that node to an action item, similar to a “to do” list entry.
  • notes are not shown but are accessible through the action item list.
  • the entities and relations depicted in the knowledge map may be limited to those defined by a schema or ontology which is pre defined or developed for the underlying knowledge graph. Such a schema or ontology describes the sorts of data and relationships expected within each knowledge graph, and hence to be depicted in the knowledge map.
  • An example process for defining a schema is given in the disclosure below in connection with figure 13.
  • the entities depicted in the knowledge map may include people and/or documents associated with a given topic.
  • Figure 2 further shows that the clusters may include labels 206 that the user is able to selectively display and hide by toggling “labels” switch 208.
  • the labels 206 comprise the item or topic indicated by each cluster, or optionally for a high level cluster 200 the items contained within the cluster.
  • the labels may comprise the actual text of the underlying subject or object item represented by a cluster.
  • the label may be automatically generated using an Al or machine learning algorithm, for instance using the GPT2/3 or Google T5 algorithms which will be familiar to the skilled person.
  • Figure 2 further shows the system interface including a free text search box 210.
  • the user is able to enter a text query, such as “What caused the staircase glazing issue?” which causes the relevant portion of the knowledge map to be displayed.
  • the search function is further described in connection with figure 5.
  • a calendar interface 212 At the top right of the knowledge map 104 is a calendar interface 212.
  • the calendar interface 212 allows the user to display the up to date knowledge map or to scroll back through the knowledge map as it would have appeared on any previous date.
  • a new knowledge graph and resulting knowledge map may be generated for each new project epoch (which may be hourly, daily, weekly or any other arbitrary period of time) such that the user is able to scroll back through the epochs and view the knowledge map as it would have been then.
  • project data ingested in between epochal knowledge map updates may be searchable but not represented in the map.
  • FIG 3 an alternative form of knowledge map is shown, offering similar functionality to figure 2. Where corresponding features are shown to figure 2, the same reference numerals are used.
  • a navigable user map 104 of figure 3 there is no differentiation between high level clusters and low level clusters. Instead, three nodes 300 are shown, each of which represents a topic which has been automatically generated from the project data by the Al algorithm.
  • three topics or items are initially presented, though it will be appreciated that it could be more or fewer (and optionally the number restricted by grouping topics as sub-topics within a broader topic to keep the number manageable to the user, for instance 7 topics).
  • Figure 3 shows the topics “Approvals” (highlighted), “Permits” and “Pile Arrival”.
  • each node 300 may indicate the amount of information contained there within than may be accessed by the user double-clicking on a node 300 to view hierarchically dependent items and sub-topics.
  • the right hand pane For a node 300 selected by the user, or preselected for instance by preselecting the largest node 300, the right hand pane provides a navigable interface allowing the user to access key items of information within that node topic.
  • the information displayed in the right hand pane is generated from the knowledge graph as is described below in connection with figure 14.
  • Box 302 indicates the selected node, in this case “Approvals”.
  • Option 304 “Inside this topic” if selected presents a list of the items and sub-topics contained within that node.
  • Option 306 “Key sentences” if selected presents a list of the sentences from the underlying project data relating to the items and sub-topics contained within that node.
  • Option 308 “People” if selected presents a list of the key people related to items and sub-topics contained within that node, for instance based on the senders or recipients of emails or other messages referring to those items and sub-topics.
  • Option 310 “Links” if selected presents a list of links relating to the items and sub-topics contained within that node as contained within the original project data, for instance a link to a copy of an original document or email.
  • Figure 3 differs from figure 2 in that the notes field and calendar interface are omitted, though it will be appreciated that in other examples features of the user interface from figures 2 and 3 may be present in any combination. It will be appreciated that the various examples of the knowledge map comprise essentially an automatically generated mind-map. The example presentations are indicative only.
  • FIG 4 illustrates a settings menu for controlling user access to a system according to an example of the present invention. It is expected that typically system users all work for a single company, and for that company there would be a single system administrator (or a small number of users with administrator rights). The administrator is able to apply company settings globally for selected users.
  • Figure 4 shows a list of names 400, email addresses 402 and an indication whether they are already an existing user 404 or a tick box 406 which if selected invites the user to the system. By selecting each user, the administrator enables that user to engage with a project data processing system in accordance with an example of the present invention.
  • FIG. 4 further shows a search box 412 allowing the administrator to search for and adjust the settings for a particular user.
  • FIG. 5A shows the same portion of the navigable map 104 presented in figure 3, with the same nodes identified. It can be seen a user has entered a search request “architectural decisions” through search box 210. For this example, it may be expected that each search result 500 comprises a topic for which an architectural decision has had an impact.
  • search result 500 For each search result 500, by selecting the “view document” button 502 the user is able to view the document in the project data in which that search result arises.
  • search result refers to a specific sentence in the original source project data, that refers to the details in the search query.
  • FIG 5A the displayed knowledge map has no specific focus.
  • Figure 5B shows how the knowledge map changes when a specific search result is selected. Particularly, the result “74RS Proposal” is highlighted and the portion of the knowledge map 104 with items corresponding to that search result is displayed. In the example of figure 5B this comprises expanding the “Permits” topic node 300 of figure 5A to reveal the item “Council” 504 contained within it, and expanding the “Pile Arrival” topic node 300 of figure 5A to reveal the item “Steel UK” 506 within it. The relation between “Council” and “Steel UK” is indicated by line 508.
  • the user By selecting the search result the user is shown the items in the map associated with the original search query connected via lines alongside related topics and items, thus enabling the user to view the search result in context. That is the portion of the knowledge graph represented by the displayed knowledge map changes according to the selected search result. Selecting such a result will navigate the user around the map accordingly. If the zoom level of the map is at a level where lower level detail of the search result is obfuscated in lower levels, the parent node is marked with a dotted line and a number used to indicate how many relevant items or subtopics are present in the layers below.
  • a user can search across all of these project data sources gathered by the system and presented in the format of the knowledge map 104 simultaneously with the details associated with their search results presented both in list form (search results 500) and in the navigable map 104.
  • a user can readily navigate from the search results back to the original project data source, such as a document by clicking on a link in the search results list.
  • the system is able to locate the nearest matches irrespective of whether the original project data or indeed the underlying knowledge graph contains identical terminology. For example, if a user searches for “delay”, the system will also return information containing words like “postpone”, “hold up” and “wait”, in a manner which will be familiar to the skilled person.
  • a user can enter a search query to generate information by day, week, month or for a custom time frame.
  • an advanced search interface (not illustrated) may provide options for the user to scroll back through different knowledge graph epochs. This permits a user to scroll back through the project history to identify how an issue has evolved.
  • a user is further able to interact with the project data management system through a status feed.
  • An example status feed is shown in figure 6. Access to the status feed may be through a dashboard user interface for a particular project (not illustrated). The project dashboard may also allow the user to navigate instead to the knowledge map, a search interface or an administration or configuration interface such as that of figure 4.
  • the system is configured to analyse the knowledge graph to identify key changes arising from the underlying project data. For instance, by analysing the growth of topics through a series of metrics parameterising the knowledge graph at each epoch (described below at figure 15), highlighting recent relationships that have occurred and are known to cause specific problems, from a predetermined ruleset library generated by human experts, or based on prior user specific status feed interactions indicating a topic space of interest within the map or referring to prediction metrics.
  • This analysis may be performed substantially continuously, on demand as project data changes (for instance when a particular type of piece of project data is received) or periodically (for instance, once per day). Updating the status feed daily may be beneficial in that it could encourage users to build checking the status feed as part of their daily routine. Examples of the present invention analyse the knowledge graph on a daily basis to surface the most important changes in the project documentation.
  • Figure 6 shows a status feed including five status updates 600.
  • the status feed is not an essentially infinite scrollable list of status updates, as will be familiar to social media users. Rather it may be that the number of status updates is restricted to a number for which it is reasonable for the user to review, appreciate and take action. For instance, this may be restricted to no more than seven, as for the number of highest level nodes presented in the knowledge map, as described above in connection with figures 2 and 3.
  • the restricted number of status updates may be selected automatically. As one example, this may be by ranking a series of possible status updates based on the factors noted above and selecting a cut of a quantity of updates that may be set through user testing or customisable.
  • the status updates may present the user with “must know” updates for a project or the projects for which a user is responsible and is making use of the system. Reviewing the status feed may therefore provide the user with a sense of project control - safe in the knowledge that they are aware of all the things that they need to be made aware of.
  • Figure 6 shows that for each status update there is a text explanation 602 automatically generated based on the underlying items and topics, for instance “Riser line - at risk of delay”. For each status update, if the user is responsible for more than one project, there may also be displayed the name of the project 604, for instance “Rougier St”. Each status update is also categorised according to its type, for instance on the basis of the factors used to generate status updates, as described above. Four examples are shown in figure 6, though of course for certain examples of the present invention there may be more or fewer. Each status update 600 is shown with an icon indicating the corresponding type.
  • a recent topic of discussion that have rapidly increased in volume - hot topics, for instance changes on a specific topic from an email or messaging conversation, such as a WhatsApp message thread linked to a task description in the project schedule or Gantt chart is indicated by a cloud icon 606.
  • a status update for a change in an area of interest (for instance an item or topic) previously selected by the user through the knowledge map and added to an action item as described below in connection with figures 7 and 8 is indicated by an eye icon 608.
  • a status update for a typical problem known to the system for example loose joints caused by damp conditions, produces a “known problem” update indicated by an arrows icon 610, for instance if the system spots the item “loose joints” within project data.
  • One way in which “known problems” may be implemented is through expert user contributions via interviews.
  • a user feedback widget may be built in whereby possible rules for known problems may be presented to the user through a pop up window when the user logs in for the user to confirm, thereby expanding the known problem database.
  • Those possible rules may be proposed by analysing repeatedly found item/topic relationships and triples associated with project impediments such as delays.
  • figure 6 shows the example of a delay risk prediction status update is indicated by a warning triangle icon 612. The generation of delay risk predictions is described in greater detail below beginning at figure 11.
  • buttons associated with the update For each status update 600, a user can perform a number of actions using buttons associated with the update. Selecting “Go to map” button 614 presents the user with the navigable map 104, or a portion thereof, centred on the node relevant to the status update 600. Selecting “Send to” button 616 transfers the status update to the status feed for a colleague for them to take action as required and additionally or alternatively via email containing full details, and optionally a link to the knowledge map to enable the colleague to explore the issue for themselves. This mechanism serves to ensure that someone else is tasked with investigating the update.
  • the “Add to” button 618 adds the status update to the users action list, discussed below in connection with figure 7.
  • the “Dismiss” button 620 dismisses the status update and removes it from the feed.
  • an actions list an example of which is illustrated in figure 7.
  • the actions list 700 can serve to maximise the value of the insights generated automatically by the project data processing system and discovered by the user through exploring the knowledge map, search or the status feed.
  • the actions interface of figure 7 serves as a tool to manage downstream actions. Essentially, the actions interface acts as a to-do list.
  • Each action item 702 includes a name field 704, an indication 706 whether the action is watched (and should be used to assist populating the status feed), an indication 708 whether the action is a priority for that user, an indication 710 of the status, for instance whether the action is started, a deadline 712 set by the user, whether there are associated user notes 714 and a link 716 to the relevant portion of the navigable map 104.
  • Selecting an action item 702 opens an action item dialog box 800 as illustrated in figure 8.
  • the user is able to edit the name 704, whether the action is watched 706 (through a button that may be toggled), the priority 708 (from a drop down menu), the status 710 (through a drop down menu), the deadline 712 (through a calendar menu) and edit or add notes 714.
  • the action item dialog box 800 further allows users to add comments through comment box 802. If a comment is added then other users associated with that action item or topic may be alerted through a message. It will be appreciated that this may be through a messaging system of the type familiar to users of WhatsApp, Slack, Microsoft Teams and other similar messaging systems.
  • the messaging functionality may also add the action item to the recipient’s action item list. New messages may be marked next to the action item in the action item table, for instance with a solid bullet.
  • the action item dialog box 800 may also include a link 804 to the map, which navigates to the knowledge map with a view of all the related items and/or topics. Multiple map links may be provided, as indicated by the example of there being 20 available map link for that action item.
  • the action list of figure 7 may integrate with one or more Excel action lists to synchronise actions across both tools and augment the list or lists in Excel with the reminders and associated navigable map details.
  • the status feed of figure 6 may be provided in the right hand side of figure 7, facilitating adding of status updates to action items.
  • Figure 7 further illustrates a drop down menu 718 indicating the current workspace.
  • a workspace corresponds to a single project. That is, for a project data processing system according to an example of the present invention one workspace is assigned to each project. Accordingly, through selection of a workspace through menus 718, the user is able to view an action list specific to that project.
  • FIG 9 this illustrates a content manager user interface 900 for a given project 902, for instance a “Rougier St” project.
  • a shared workspace 904 highlighted left hand tab
  • one or more teams workspace 906 centre tab
  • a private workspace 908 right hand tab
  • the shared workspace is accessible to everyone within a defined team (editable for instance by the user interface of figure 4).
  • the shared workspace 902 will include all project data which is accessible to all users for that project.
  • Figure 9 shows for the shared workspace 904 a link 910 to the knowledge graph for that project and one or more folders 912 containing documents which are accessible to all project team members.
  • Folder structure is synched with the underlying cloud storage if used, and new folders can be created which are then reflected in the cloud based storage.
  • the documents that are accessible through the content manager are those that have been manually or automatically transferred to the project data processing system, as described before.
  • Each project may optionally have different teams, potentially of overlapping groups of users.
  • an “interiors team” has been selected.
  • a user is able to create or manage teams for a given project: particularly, a pop up menu 1000 is illustrated for managing the members of a team through the selection of tick boxes 1002 available team members, and to rename the team 1004.
  • the selected team when the team workspace 906 is selected, only those documents that are accessible to the users within the defined team may be displayed.
  • each workspace - that is, each project - may have only one knowledge graph. That is, irrespective of whether a document is indicated as being shared publicly, available to a restricted team or private, it may be processed according to the following description and built in to a single knowledge graph which may be displayed as a single navigable knowledge map.
  • a document, topic or item is known to exist within the system, but a current user does not have access to that information (for instance because the underlying document is private to another user or to a team that the current user is not part of)
  • the document, topic or item may be obfuscated. For instance, it may be that a document will be hidden to the current user, or it may be visible but the current user is unable to access it. There may be an option for the user to request access to that information.
  • multiple knowledge graphs may be generated based only upon information available to a current user (that is, shared project data, data for teams of which the user is a member and information private to that user.
  • examples of the present invention provide a number of different optional ways for the user to interact with the system. For instance, it may be expected that a user will review a dashboard daily for each project for which they are involved. Through the status feed of figure 6 presented within the dashboard, the user may gain insight into the latest project activity, as reported by colleagues through the project data ingested into the system, or otherwise generated through the system. A user may then act upon the insights provided through the status feed, for instance by adding an action item directly from the status update, exploring a status update further within the knowledge map, before manually creating an action item, assigning an action item to a colleague, or searching for related details based on their personal understanding, before creating an action.
  • the dashboard may comprises the first point a user goes to for finding related details around a given issue, using the search feature or navigating the knowledge map to add clarification, or by referencing underlying project data and, if necessary, adding action items to follow up.
  • Certain examples of the present invention are intended to encourage a user usage model following the Hooked model by Nir Eyal, described in their book “Hooked: How to Build Habit-Forming Products”, November 2014. Accordingly, certain examples of the present invention work through four key steps:
  • Trigger (based on a basic human need): the need for a user to know what’s happening in a project so as to manage stress levels and maintain control over project data.
  • Action in response to the trigger: the user is able to review the dashboard, explore map or write search terms.
  • Reward (where the product serves the need that triggered the action): providing answers in context relevant to the user’s area of interest or aspects they control.
  • FIG. 11 there will now be described in greater detail an underlying machine learning platform through which project data is received, processed, used to form a knowledge graph, the knowledge graph processed discover or identify latent structures within the knowledge graph in the form of hierarchical topics, and used to make predictions.
  • a method of training the machine learning platform is also described.
  • the formation of the knowledge graph and the making of predictions underpin the previously described project insights available to the user through the knowledge map (which is a visual representation of the knowledge graph), the search facility and the status feed.
  • Figure 11 is a simplified form of the machine learning pipeline, which is expanded upon in the following figures.
  • a computer implemented project data processing method comprises the following steps. Firstly, at step 1100, project data is received. As has already been described, project data may be received in a wide range of different formats and from a range of different sources. But as one example, project data may be received in the format of an email sent to or received by a user who is associated with a certain project.
  • natural language processing is performed upon the received project data. The natural language processing serves firstly to homogenise the received project data, and secondly to extract subject-object-predicate triples from the project data in a format suitable for step 1104 at which the subject-object-predicate triples are processed to form a knowledge graph indicative of relationships between subject-object-predicate triples.
  • step 1104 a great deal of the insights and information is available to a user in accordance with certain examples of the invention, for instance by the generation of a navigable knowledge map presenting a visualisation of knowledge graph, and the ability to construct searches and generate certain status updates based on the knowledge graph.
  • the method of figure 11 continues with the determination of at least one metric that serves to parameterise the knowledge graph. Those metrics may then be observed as the knowledge graph evolves over time. Particularly, those metrics may be computed for each knowledge graph epoch.
  • predictions may be made, using a trained machine learning algorithm and based on changes for at least one metric over time, for instance from one epoch to the next. Those predictions may relate to a project impediment, for instance one or more of a project delay, a project quality issue or a project safety issue. As previously described, predictions made from parameterised changes to the knowledge graph overtime may be used to provide further insights to the user through status updates.
  • a project data processing system may obtain project data from a variety of sources and in a variety of formats.
  • the system may integrate with one or more third party sources of data 1200 through associated third party APIs to ingest project data.
  • one such data source 1200 is exemplified as Microsoft Office 365.
  • the skilled person will of course be well versed in API integration. Examples of suitable APIs are described at the following URLs, though of course this should not be considered to be a definitive list:
  • Data sources 1200 feed into a step of auto ingestion 1202. Specifically, for each required source of project data, a user (or an administrator on behalf of a group of users associated with a project) may first configure access for the system to the data source, for instance through the interface of figure 4. For the example of Microsoft Office 365 integration, this may comprise a user connecting the system to their Office 365 account and pointing the system to one or more folders associated with the project at hand.
  • a user may suitably configure which emails are to be transferred to the system, for instance through a configuration template or set of rules identifying one or more of a sender or recipient address for emails, one or more keywords from a subject heading or other information pertinent to filtering out emails relating to the project at hand from unrelated emails.
  • a user may manually upload project data to the system at step 1204. This may be a one-time operation, for instance at the start of a project (which may be considered to initialise the project data processing system for that project) or may occur periodically or on demand.
  • each uploaded piece of project data is assigned to a separate file.
  • email or other messaging information this may include a sub-step of separating out each attachment to a separate system file.
  • attachments separate from an email may be appended with information taken from the email, including one or more pieces of metadata associated with that email, for instance sender, recipient (which may implicitly be considered to be a reader of the attachment), time and date information, email subject or any other extractable piece of metadata.
  • a link between a file for the email and a file for an attachment may be preserved.
  • email content itself may be stored as a separate file at step 1206.
  • the email message content may be cleansed for instance by removing or simplifying signature blocks and salutations.
  • the original file content for the project data may be stored at this point for later access and retrieval by the user, for instance through the content manager of figure 9.
  • step 1102 of natural language processing will now be described in greater detail. This may be considered to comprise three sub-steps of parsing, natural language processing (NLP modelling and natural language understanding (NLU).
  • NLP modelling natural language processing
  • NLU natural language understanding
  • Figure 13 begins with step 1300 which takes as its input the output from the new file step 1206 of figure 12.
  • the processing of figure 13 may be applied recursively for each new file.
  • Three alternative parsing steps are shown: parse email (1302, for an email .msg file), parse pdf (1304 for a portable document format .pdf file) and parse docx (1306 for a Microsoft Word .docx file). It will of course be appreciated that this is extensible to any number of different parsing steps according the data formats of received project data.
  • Each parsing step comprises a process of analysing the text and layout within a document.
  • Blocks of text can be marked as a heading. It will be appreciated that particularly for a .docx file the headings are identified by document metadata.
  • Non heading based text may be linked to heading text so as to provide context to the entities extracted as part of the NLU process described below. For instance, regular reporting might identify project risks within a report under a section entitled ’’Project Risks”. The following entities and relations extracted from the non-heading text should therefore be ’’tagged” as ’’project risk” to assist in downstream machine learning processes.
  • These context links are considered as relationships for resulting items, alongside those extracted via the NLU process.
  • document metadata embedded within the document file format, is also parsed to extract the author data and, in the case of an email, the implied readers as indicated by the recipient list. Attachments sent via email have metadata implied via the email that sent the attachment. In some examples it may be that the split of email attachments into separate files occurs as part of parse email step 1302 and triggers branches to other new files, which then feed into file type decision step 1300.
  • image content contained within a document file may be subject to image recognition processing to identify the image content and convert to text. Recognised image content may also be linked to (or partially identified through) surrounding text content, for instance through a process of transfer learning as will be well understood by the skilled person.
  • each parsing step serves to homogenise the received file to a text file along with associated document metadata in a format suitable for processing at the next step.
  • Each parsed file is then processed step 1308 by conversion to an NLP object. Specifically, the text content of each parsed file is modelled using a language model to identify and extract the following components:
  • a token may also be referred to as a 1-gram word, which may comprise either a single word, a punctuation mark or a number.
  • a token lemma may be found for each token that comprises a word.
  • the process of forming a token lemma is described in detail at the following URL: https://nlp.stanford.edU/l R-book/html/htmledition/stemming-and-lemmatization-1.html
  • the process of forming a token lemma reduces the various inflectional forms of a word, or derivationally related forms of a word to a common base form. For instance, each of the tokens “am”, “are”, “is” are converted to the same token lemma “be” and each of the tokens “car”, “cars”, “car's”, “cars'” are converted to the same token lemma “car”.
  • word stemming may be applied instead of the formation of a lemma.
  • POS part of sentence
  • the determination of a POS comprises a process of grammatical tagging, which at a basic level may comprise the identification of words as nouns, verbs, adjectives, adverbs, etc.
  • a token sentence dependency is determined.
  • the process of determining a POS is described in detail at the following URL: https://spacy.io/usage/linguistic- features#dependency-parse In brief, this comprises the identification of a type of clause or part of a sentence including the token lemma, or alternatively it may be considered to be how a word depends on other surrounding words.
  • the language modelling of a file content of step 1308 may be implemented using the Spacy software libraries described in detail at the following URL: https://spacy.io/ Additionally, tokens may also be modelled using Bidirectional Encoder Representations from Transformers (BERT) word embeddings, described in detail at the following URL: https://mccormickmi.com/2019/201714/BERT-word-embeddings-tuforial/
  • BERT Bidirectional Encoder Representations from Transformers
  • Table 1 shown below gives an example of a short sentence (“Joe Bloggs visited the project site today to check on progress.”) and how this is broken down into tokens, lemmas, POS Tag, sentence dependency tag and dependency head according to the application of the Spacy libraries. Part-of-sentence (POS) tags are described in full at the following URL: https://universaldependencies.Org/docs/u/pos/ and provide a grammatical breakdown of the terms in a sentence.
  • the example shown in Table 1 makes use of PROPN: a proper noun, VERB: a verb, DET: a determinant, NOUN: a noun, PART: a particle, ADP: an adposition and PUNCT: punctuation.
  • Sentence dependency tags indicate the syntactic dependency between the tokens in the sentence. These are described in full at the following URL: https://spacy.io/ odels/en.
  • Table 1 makes use of compound: a multi word expression, nsubj: nominal subject, ROOT: sentence root, det: determiner , dobj: direct object , npadvmod: noun phrase as adverbial modifier, aux: auxiliary, advcl: adverbial clause modifier, prep: prepositional modifier, pobj: object of preposition, punct: punctuation.
  • Figure 21 also graphically illustrates the dependency head for each token in the sentence.
  • Each NLP object generated at step 1308 is then passed to step 1310 where a NLU process is applied.
  • the NLU process extracts entities discussed within the NLP object and the relationships between them in the form of subject-object-predicate triples. For instance, in the example sentence: “Joe plays football” the subject would be “Joe”, the object would be “football” and the predicate linking the subject to the object, typically a verb, in this case would be “play”.
  • step 1308 may be applied on a per sentence basis to identify subject-object-predicate triples. In further examples, this process may be applied across a whole paragraph or even a whole file.
  • this process of subject-object-predicate triple extraction comprises two successive extraction processes. Firstly, sentences (particularly, sentence information extracted at step 1308) are fed through a Hierarchical Multitask Learning (HMTL) process that has been trained to extract Named Entities (NE), Entity Mention Detections (EMD), coreferences and relations.
  • HMTL Hierarchical Multitask Learning
  • the processing of step 1308 may be omitted and raw project data text can fed through the HMTL process.
  • the optional processing of step 1308 provides additional sentence dependency details that enable relations between entities extracted via HMTL to be found should the HMTL process not identify a specific relationship (as is further described below).
  • a Named Entity may be broadly considered to be a proper noun, such as a place name.
  • An Entity Mention Detection may be broadly considered to be a sentence object referenced to a Named Entity.
  • a coreference may for instance comprise the use of the pronoun “it” in a sentence where “it” is separately defined.
  • a relation may belong to a predefined set, as described in the Automatic Content Extraction 05 (ACE05) dataset described in detail at the following URL: https://www.ldc.upenn.edu/collaborations/past-proiects/ace
  • this comprises the identification of types of relationships between tokens, for instance social relationships including business, family, other. The process used is described by Sanh et al.
  • Tokens in a sentence, not extracted by the HMTL process, are analysed for their POS tags. Any noun terms are treated as potential entities and added to the list of entities extracted by the HMTL process.
  • the sentence dependency may also be analysed to extract tokens between any entities that have been extracted via either of the processes mentioned. These relationship tokens are used to describe the relationship between the entities and augment the more specific relations extracted by the HTML process.
  • the entity and relation extraction process is improved by using existing knowledge bases (for instance, DBpedia available at the following URL: https://www.dbpedia.org/) to match extracted entities against.
  • DBpedia available at the following URL: https://www.dbpedia.org/
  • project content is project specific and not all entities exist within publicly available knowledge bases.
  • systems described herein build knowledge bases for comparison against which will provide an improved match probability. Nevertheless, advantageously a schema or ontology may be developed describing the sorts of data and relationships expected within each knowledge graph.
  • a predefined schema defining an object with specific properties or relations would ensure that instances of that object inherit these properties or relations.
  • an object might be a “contractor”, which has the relation “receives payment”.
  • a subsequent object might be “groundworks contractor” which inherits from “contractor”. “Groundworks contractor” can therefore also “receive payment”.
  • the “groundworks contractor object” may also have the relationship “has digger”.
  • Knowledge graph updates centred around a “digger” entity might therefore be associated with this specific instance of the “groundworker contractor” despite no explicit reference being made, due to the implied association gained from the predefined schema.
  • the development of a project schema can improve the making of predictions from the project knowledge graph.
  • Table 2 shown below provides an explanation in the form of a table of relationships extracted from an input short piece of text following the processing at steps 1308 and 1310. For the sentence, “Joe Bloggs is the project manager on the tower”:
  • NLP conversion process of step 1308 and the NLU entity extraction process of step 1310 are only one example.
  • the present invention is not limited to any particular NLP or NLU processing techniques. Rather, in its broadest sense, examples of the present invention serve to take in as their input text extracted from a project data source and process that text to extract entity information, for instance in the form of subject-object-predicate triples.
  • the description given above uses a methodology based on MTL and POS.
  • step 1312 In parallel to step 1310, step 1312 also receives each sentence generated at step 1308 and uses this to generate an index for the purposes of project data searching (using ElasticSearch: an open source search engine and supporting software libraries).
  • entity is used generally to refer to the raw material (words, punctuation, numbers etc) obtained from raw project data. Those entities are then processed as described above. There now follows an explanation of how those entities are converted to items which comprise canonical entities where plurality, spelling mistakes and punctuation marks are resolved to a single object. Those items are then processed to identify relationships between items and grouped to form topics to generate the knowledge graph.
  • entity may instead be referred to as a “mention” in originating data and an “item” may map to the term “entity”. This different terminology is noted but not followed herein.
  • step 1104 of forming a knowledge graph indicative of relationships between items, that is between subjects and objects from subject-object- predicate triples will now be described in greater detail.
  • the output of step 1310 comprises the input to steps 1400 and 1402 of figure 14.
  • the process of entity extraction of step at step 1310 may result in the extraction of a number of similar entities that appear initially different due to punctuation, plurality and/or spelling mistakes.
  • the output of step 1310 comprises a large amount of entity information and links between entities. This output may be considered to be noisy: important information may be missed within that noise.
  • Step 1402 comprises a process of filtering and grouping entities. Entities are merged to form canonical items, used in downstream processes.
  • Minhash/LSH method is used to find similar entities, as described by Leskovec, Rajaraman and Ullman, “Mining of Massive Datasets”, 2020. This method provides small collections of entities that have a high probability of matching. Further analysis is carried out to extract exact matches using cosine similarity of the BERT embedding vector extracted in the NLP language modelling process described in connection with step 1308 of figure 13, and Jaccard similarity of shingled sets (using 3 characters) as described at the following URL: https://www.es. Utah. edu/ ⁇ jeffp/teaching/cs5955/L4-Jaccard+Shingle. pdf
  • a graph or network of related canonical items can be formed at step 1403 with edges weighted by the number of times a relation between canonical items occurs.
  • the graph can be made more specific by choosing only relations from specific relation groups. It will be appreciated that subject-object-predicate triples are determined between canonical items in the same way as triples are derivable from the raw entities. These canonical triples form complex networks of interaction with relations between canonical items (canonical relations).
  • the resulting graph or network is referred to herein as a knowledge graph, which can be analysed for insights to help project managers assess risk and complex site activity.
  • PageRank score may be calculated, which provides an indication how important/centrally connected the entity is.
  • the PageRank score uses the algorithm developed by Google for ranking web pages within search results, and is described in more detail at the following URL: https://en.wikipedia.org/wiki/PageRank
  • In-degree is a metric indicating how many incoming connections an entity has.
  • Out-degree is a metric indicating how many outgoing connections an entity has.
  • TFIDF frequency-inverse document frequency
  • These metrics can be used 1402 to filter out items that do not represent enough importance, connectivity or information for downstream analysis. In filtering out items, potentially noisy data is removed thus enabling downstream algorithms to more readily observe signals associated with a particular prediction.
  • Adaptive filtering of the knowledge graph is an important tool for maximizing utility for the user. For example, when exploring the map at a top level, a user probably doesn’t want to be overloaded with all the details and a heavy filter would be useful. Conversely, when reviewing search results at a low level of detail, a weak filter would be useful so as to ensure no specific detail related to the search is missed. As such, the filtering process forms an important part of ensuring downstream capability making meaningful predictions, in addition to helping the user manage information overload through the use of the knowledge map.
  • the knowledge map might represent “steel piles” as an item with the relation ’’manufactured by” the item; “Pile Company X”. This relation is referred to repeatedly and so is not removed using a weak filter.
  • the piles arrive they are unloaded onto a concrete bed, which is referred to once in the delivery report only.
  • the relation to the concrete bed is seemingly inconsequential and removed from the map. Days later the concrete bed shows signs of cracking caused by the steel piles being unloaded before the concrete had effectively cured.
  • a search for concrete cracking shows all the items related to the concrete bed, including the steel piles, helping the user find specific answers and address the cause for the cracked concrete.
  • the layout of a knowledge map representing the knowledge graph may be established.
  • this comprises identifying those entities that interact a lot to form topics.
  • the number of topics to be presented to the user as nodes within the knowledge graph may be restricted to a practicable level, for instance seven as noted above by merging topics together.
  • a decision as to which topics to merge together is based on minimising the number of merges whilst maximising the split modularity metric 1404.
  • the merging of topics may be performed by maximising cosine similarity between topic BERT embeddings using the techniques described above.
  • the knowledge map itself comprises a fractal topic structure that the user is able to navigate by zooming into information of interest.
  • the knowledge graph of entities can be structured into groups of entities that are strongly related. This process is very similar to the process of finding communities in social network analysis.
  • a system according to an example of the present invention analyses the knowledge graph to find cuts of the graph that optimise the modularity metric, as described in Newman, “Modularity and community structure in networks”, PNAS, 2006 as available at the following URL: https://www.pnas.Org/content/pnas/103/23/8577. full. pdf This metric has been shown to coincide well with human perception of communities in a graph.
  • a system may use the Louvain method described by Blondel et al., “Fast unfolding of communities in large networks” 2008, available at the following URL: https://arxiv.org/pdf/O803.0476.pdf to find communities, since it has been found to be a fast method and provides hierarchical structure of the graph (with communities within communities).
  • a knowledge graph of items does not present communities. Rather, groups of strongly related items form a topic and a hierarchical set of sub topics.
  • LDA Latent Dirichlet Allocation
  • the process of building topics presented in the description above of figure 14 is based upon extracting communities of densely related entities (in a process analogous to how social networks build communities of related people).
  • this is based on a presumption that each entity exists only in a single topic. In some situations this may be unrealistic or unduly limiting.
  • the topic modelling process can operate at the sentence level whereby a whole sentence may only be associated with a single topic but entities within a sentence can exist in multiple topics.
  • the metrics used to filter items described above in connection with steps 1400 to 1408 may also be used to rank items in topics. Furthermore, by summing these metrics across entities in a sentence, it is also possible to easily rank sentences by their importance in a given topic. These ranked items and sentences may be used to describe topics and appear in a popup window at the side of the knowledge map, as shown for instance in the right hand side of figure 3 for a selected node “Approvals”, particularly at 304 and 306.
  • Step 1410 topics in the knowledge map are labelled automatically using the Google T5 algorithm as described at the following URL: https://arxiv.Org/abs/1910.10683. This uses a model similar to that described at the following URL: https://arxiv.Org/abs/1910.10683
  • the automatically generated label is used to provide an indication to the user of the content of a node in the knowledge map.
  • Step 1412 provides a list of items in the topic, ranked by a weighted average of TFIDF and PageRank scores.
  • Step 1414 provides a list of sentences that best summarise the topic content as defined by a weighted average of the sum TFIDF and PageRank scores of the items making up the sentence.
  • Step 1416 provides a list of people associated with the topic, based on the amount of sentences they have contributed to the topic divided by the number of sentences they contribute to all topics.
  • Steps 1500 and 1504 of figure 15 take as their inputs the outputs of steps 1400 and 1404 of figure 14.
  • An example of a prediction may be understood as follows: for a given project, a weekly report document may be prepared and may be ingested into the project data system and processed as described above.
  • the weekly report may include a text field headed “areas of concern”. This comprises unstructured data consisting of people’s concerns. A skilled project manager may be able to pick from this those areas of concern likely to give rise to a project delay.
  • areas of concern This comprises unstructured data consisting of people’s concerns.
  • a skilled project manager may be able to pick from this those areas of concern likely to give rise to a project delay.
  • experiments with examples of the present invention have shown that the present invention is able to consistently outperform project managers in predicting delays to a construction project based on this information, as will now be explained.
  • Content on a project typically provides regular reporting and communication that provides: details of concerns on a project, for instance leading indicators for issues, and post mortem analyses, as trailing indicators of issues. Items and relations extracted from these documents form a knowledge graph that evolves over time; particularly in some examples from one epochal knowledge graph update to the next. For instance, new topics evolve, old topics can shrink and grow, items in the map change their importance/centrality (as measured using the PageRank algorithm), and the number of cross topic relationships can change.
  • the graph may be parameterised and changes to those metrics observed over time.
  • One or more of the following graph metrics may be used, or others as will be apparent to the skilled person:
  • Modularity metric for topics in the knowledge graph which indicates a measure of the likelihood of the items existing within a cut of the graph that leads to this specific topic (i.e.: how good is this cluster from an expectation perspective). For instance, the median or maximum modularity metric may be calculated.
  • in-degree is defined as the number of incoming relations connected to an item i.e.: the number of relations that refer to this item as an ‘object’ in subject-object-predicate triples.
  • out-degree is defined as the number of outgoing relations connected to an item i.e: number of relations that refer to an item as a ‘subject’ in subject-object-predicate triples.
  • TFIDF score of an item where the score indicates how much information is represented by this item by virtue of how many times this term appears in a given topic or project against how many times this item appears across all topics, documents or projects.
  • Topic identifier or the topic that the item resides within.
  • Topics can be parameterised using:
  • Items known to be associated with a delay can also be parameterised, for example with the number of times this item has been found to be associated with a delay.
  • evolution of the knowledge graph over time, and particularly changes to knowledge graph metrics (including item and topic metrics) over time may permit a prediction to be made.
  • a time window can be defined over which to analyse changes, for example, one week, which may be termed the analysis epoch.
  • One or more of the metrics outlined above may be computed for all epochs up to the current epoch, and the current epoch. These metrics thus quantify the details in the documents.
  • a project manager tends to focus predominantly on:
  • one or more target variables quantifying one or more of the above areas of focus for a project manager may be defined.
  • Target variables may be associated with projects, topics and/or items. For example, for a project delay the target variable might be number of tasks experiencing a new delay.
  • Each target variable may be predicted using one or more of the above knowledge graph metrics, using a supervised machine learning algorithm.
  • a supervised machine learning algorithm operates to map an input (or a series of inputs, for instance one or more of the above described knowledge graph metrics) to an output (for instance, an indication of possible project delay) based on example input-output pairs.
  • a supervised machine learning algorithm infers a function from labelled training data consisting of a set of training examples, as will be explained in greater detail below in connection with figure 18.
  • each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).
  • a supervised learning algorithm analyses the training data and produces an inferred function, which can be used for mapping new examples.
  • the simple predictions step 1502 operate based upon a previously trained machine learning algorithm to map new examples from new project data, particularly to map new examples corresponding to a project impediment, for instance to one or more of a project delay, a project quality issue or a project safety issue.
  • each new project processed by the system further trains the machine learning algorithm through feedback indicating whether predicted project impediments are correctly identified or have been missed by the system.
  • Such a feedback loop may be implemented by users integrating the present invention with their scheduling software whereby changes to the schedule as a consequence of project impediments can be monitored.
  • users creating an action following the user reading a predicted issue, for example; via the status feed can be interpreted as soft confirmation of a correct prediction.
  • a harder confirmation can be achieved by presenting a dialog box in the user interface asking the user to confirm the prediction.
  • One suitable supervised algorithm that has produced good results for predicting delays for certain examples of the present invention is the Support Vector Machine (SVM), which will be familiar to the skilled person and is described in greater detail at the following URL: https://link.sphnger.com/article/10.1007/s10115-007-0114-2.
  • a second suitable supervised algorithm for some examples of the present invention is an XGBoost algorithm as described at the following URL: https://arxiv.org/abs/1603.02754.
  • Particular examples of the present invention make use of an SVM using a tuned radial basis function (RE3F)/gaussian kernel.
  • RE3F tuned radial basis function
  • Such a kernel enables the data to be mapped such that it can be linearly separated into two categories (e.g. - delay vs no delay).
  • Such a mapping is tuned by adjusting the range of influence each data point has in the mapping (y).
  • a regularisation parameter (C) can be used to decrease the potential for overfitting to the training data. The skilled person will be familiar with tuning such an algorithm using these, or similar, parameters.
  • each frame represents a new knowledge graph epoch, for instance a week in the project (with the earliest frame at the top left and advancing forward week by week left to right and then down the rows).
  • Each frame has two parts: in the upper portion a part of the knowledge graph for that project is shown.
  • Circular nodes show items associated with regular weekly reporting of concerns on the project.
  • Cross nodes are used to indicate items associated with delays explicitly reported within project data. Old nodes fade out progressively for each subsequent epoch (indicated by the use of grey tones).
  • each frame comprises a line plot where the X axis is time in project epochs (weeks) thus showing a section of a full time line for the project at hand.
  • the line plot indicates a binary project delay prediction generated by a system according to an example of the present invention. Where the line goes high this indicates that a delay has been predicted. Where the line goes low this indicates that no delay is predicted. The right hand end of the line (whether it is currently high or low) thus indicates whether at the current time (the current epoch) the system predicts project delay. According to certain examples of the present invention, the line always goes high when new cross nodes appear (100% recall).
  • an example of the present invention has been found to incorrectly predict delays (false positive for delay prediction), that is the line goes high incorrectly, when no new cross nodes appear, approximately 66 % of the time (33% precision). It may be expected that the precision will improve further as the number of projects analysed, and thus the size of the training corpus for the machine learning algorithm expands.
  • the simple predictions previously described at step 1502 may in some examples predict only that a delay may occur. While predicting a delay is useful, being able to pinpoint what topic is the cause of a chain reaction can provide added value and help users to implement more effective mitigating actions to avoid the delay entirely.
  • Step 1504 comprises a step of schema alignment, and takes as its input the knowledge graph data output from steps 1400 and 1404 (as does step 1500). Aligning a project’s knowledge graph with a predefined schema/ontology enables the production of a set of Hidden Markov Models (HMM) which may be used at step 1506 to help predict project impediments such as delays more accurately.
  • HMM model predictions output by step 1506 may, in accordance with certain example of the present invention, be combined with the simple predictions described above in connection with step 1502 so as to minimise processing overhead (for instance, a HMM model prediction at step 1506 may only be run when a project impediment has been predicted at step 1502).
  • HMM model predictions output by step 1506 may, in accordance with certain example of the present invention, be combined with the simple predictions described above in connection with step 1502 so as to minimise processing overhead (for instance, a HMM model prediction at step 1506 may only be run when a project impediment has been predicted at step 1502).
  • an alternative ontology might include an object of type “drawings”. “Drawings” have a known relation “requires approval”. Another object might be “client” with known relation “provides approval”.
  • a project document may include a sentence such as “drawings have been sent to the client for approval”, which fits with the predefined ontology (our expectation) and therefore has a high confidence of having been observed.
  • a supervised machine learning algorithm may be limited by the causal nature of the observations. As an example, for the case of predicting project delays, an action is more likely to occur as a consequence of a reported problem and thus avert a delay. However, the only indication of a potential delay is that a problem has been reported. This causality limits the performance of a supervised algorithm. In contrast, a HMM partly resolves this causality problem.
  • a HMM assumes a limited number of hidden states for which there may be some dependent observations. For instance, in a different context, you might consider the weather outside to be one of a series of hidden states and the clothes worn by people entering a building may comprise observations.
  • a HMM consists of a series of known states, a series of observations, transition probabilities between the states and emission probabilities indicating the observation probability given the current state.
  • known states could be related to a specific project task.
  • the task state might for instance be one of:
  • Observations may be a series of events previously defined and specific to that type of project, according to the schema defined at step 1504, for instance:
  • Such observation events may be recorded in the knowledge graph and prior knowledge of the project schema enables the calculation of the most likely sequence of events for the task state using the Viterbi algorithm, which is a dynamic programming algorithm for obtaining a maximum a posteriori probability estimate of the most likely sequence of hidden states - called the Viterbi path - that results in a sequence of observed events.
  • the Viterbi algorithm will be well known to the skilled person, and is described in further detail by Rabiner 1989 as available at the following URL: https://ieeexplore.ieee.org/document/18626
  • FIG 17 An first example of a HMM is shown in figure 17, and demonstrates the use of events shown in a knowledge graph being used to predict delays on a task. Three events are defined, corresponding to the example given in the paragraphs above.
  • Event A comprises a delay in obtaining site permits.
  • Event B comprises client approval of a set of drawings.
  • Event C comprises steel piles being unloaded early.
  • Three task states are shown: trail, stable and advance.
  • a HMM allows the calculation of a probability for the state of each event changing. For instance, it might be known from previous data, that if drawings are sent to the client for approval and not returned within one week, then the probability of tasks associated with these drawings experiencing a delay increases. Hence, if Event B is observed in the absence of an event indicating the client approval return, then such an algorithm would increase delay probability.
  • each task in a project plan may be in one of four states: No problem, Problem, Action or Delayed.
  • Each project plan may typically comprise a large number of discrete tasks, for instance 10,000 or more fora large construction project.
  • the observations used by the HMM algorithm to determine a task state are built from the knowledge graph as previously described and the generation of the HMM algorithm follows the same learning process such as the used of the Viterbi algorithm as previously described.
  • HMM states presented in figures 17 and 24 are merely examples and that for any given implementation of the present invention, if a HMM algorithm is used, the states selected will be those required in order to present the user with the desired insights formed from the underlying project data.
  • a blocking algorithm may be defined so as to limit the area of the knowledge graph over which to run the Viterbi algorithm.
  • the concept of a blocking algorithm may be known to the skilled person from its use in record linkage problems and is described by Bilenko et al. , “Adaptive Blocking: Learning to Scale Up Record Linkage” in Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM-06). pp. 87-96, Hong Kong, December 2006.
  • ICDM-06 Sixth IEEE International Conference on Data Mining
  • Such a blocking algorithm may in some examples of the present invention be used to limit the task state prediction to events related only to entities in the task description.
  • a task description for the example of a construction project, is typically provided in work breakdown structure schedule management documentation.
  • a task description may be ingested into a system according to an example of the present invention through integration with a third party application, for instance Microsoft Onedrive, Microsoft Project (MPP files) and Primavera software.
  • Items within a task description indicate those items that are closely related to the task and its successful completion.
  • figure 18 describes a processing of training such a machine learning algorithm, for instance an SVM, based upon a set of training data.
  • the method illustrated in figure 18 should be viewed as the counterpart to the method of using a supervised machine learning algorithm to make predictions from project data described above beginning at figure 11.
  • the training method may be recursive in the sense that use of a (partially) trained algorithm for live project data helps to further train and build the model in order to improve the ability of the system to make predictions for future project data.
  • a computer implemented training method project data processing method comprises the following steps. Firstly, at step 1800, project data for at least one project is received. Step 1800 corresponds generally to step 1100 of figure 11 , with the exception that the received data further includes at least one identified project impediment, for instance project delay, project quality issue or project safety issue that occurred during that project. At step 1802 natural language processing is performed upon the received project data. This may correspond to the processing of step 1102 of figure 11 , previously described in detail, and comprises the extraction of subject-object-predicate triples. As for step 1104 of figure 11, at step 1804 the subject-object-predicate triples are processed to form a knowledge graph.
  • At step 1806 at least one metric parameterising the knowledge graph is determined for each graph epoch such that changes to the metric over time are determined.
  • These graph metrics, their time based changes and the identified impediment comprise the training data set.
  • the training data set comprises one metric that parameterises or quantifies the knowledge graph, with a value for that metric recorded for at least two different time points (for instance a first project epoch and a second project epoch) and the at least one identified project impediment (which may be a binary classification, for instance that a delay occurred, or quantitative, for instance the length of a project delay).
  • predictions may be made, as for step 1108 of figure 11 , however those predictions may then be compared to the at least one known project impediment from the training data set and used to train the machine learning algorithm.
  • FIG 19 an example of a network implementation of the present invention is illustrated.
  • the computer-implemented project data processing method and method of training a supervised machine learning algorithm described above may suitably be provided through cloud computing. Specifically, they may be provided following a software as a service (SAAS) paradigm. As examples of the present invention provide data science on demand, this may be termed data science as a service.
  • SAAS software as a service
  • a user may access a system in accordance with the present invention through a conventional computing device 1900. It has already been described how they may access the system through a web browser or alternatively through a bespoke application (particularly where a mobile computer device is used). That user may already access one or more project data source servers 1902 in a conventional fashion across a network 1904.
  • the network may comprise any suitable wired or wireless network including the Internet. The details of the network structure are not germane to the present invention.
  • a user may already have configured access to a Microsoft Office 365 server for access project documentation, and so the Microsoft Office 365 server comprises a project data source server 1902.
  • a project data processing system may be provided by a server 1906, again accessed across the network 1904.
  • the user may interact with the system by configuring access for the project data processing server 1906 to particular sources of project data, for instance particular folders hosted by Microsoft Office 365 server 1902, as has already been described above in connection with figure 4. It will be appreciated that access to further project data sources, such as an email account may be similarly configured.
  • Project data processing server 1906 is thus able to automatically receive or retrieve project data from one or more source servers, store that data and perform data processing as described above. Through their computer 1900 a user is able to interact with the system, for instance by receiving knowledge map data and status updates from project data processing server 1906.
  • Project data processing server 1906 is further illustrated in figure 20, and may suitably comprise a processor 2000 and a memory 2002 storing executable instructions that, in response to execution by the processor 2000, cause the server 1906 to perform above described computer-implemented methods.
  • examples of the present invention may be implemented through a third party cloud computing service, for instance Amazon Web Services (AWS) such that the above described computer-implemented methods are implemented by calling cloud computing functions on demand to operate upon project data and processed data also stored in the cloud.
  • AWS Amazon Web Services
  • the skilled person will appreciate that where a third party cloud computing service is used then reference to a single project data processing server 1906 should be understood broadly to mean one or more third party servers hosting the system.
  • the scope of the present invention as defined by the appended claims should not be considered to be limited to any specific physical implementation or network topology.
  • Figure 22 presents the correlation coefficient between the parameters described in the description above for a given item and the shortest distance in the knowledge graph to an item associated with a delay in any project.
  • Figure 22 is generated using an example of the present invention processing project data relating to projects with known impediments, particularly delays. Accordingly, figure 22 demonstrates the efficacy of the present invention in identifying impediments from project data using the knowledge graph metrics previously described.
  • Distance is defined as the number of steps required to traverse the knowledge graph between two items. That is to say that a distance of zero here implies that no steps are required and the item is indeed directly associated with a delay.
  • the correlation coefficient will be familiar to the skilled person and indicates how trends in a target variable, in this instance; the distance to an associated delay, correlate with trends in that parameter.
  • the correlation coefficient ranges from -1, indicating a perfect anticorrelation, and +1 indicating a perfect correlation.
  • Figure 22 shows the absolute value of the correlation coefficient since the sign is trivial.
  • the plot in figure 22 is typically referred to as a box-and-whisker plot and will be familiar to the skilled person.
  • the box indicates the first and third quartile range, with the central bar within the box indicating the median value.
  • the whiskers extend to 1.5 times the first to third interquartile range, with outliers marked as circles.
  • a box-and-whisker plot is useful for demonstrating the variance in data across multiple sources. In the case of the figure presented, the sources are a range of different projects used to calculate the correlation coefficient.
  • the figure clearly shows good correlation for three of the parameters described (correlation coefficient>0.5), moderate correlation for nine of the parameters described (correlation coefficient>0.25) and weak correlation for a further five parameters.
  • minpathlen the minimum path length, or distance, between this item and an item in the knowledge graph associated with a delay i.e. : how close is this item to a known impediment?
  • PageRank_score the PageRank metric for this item across a knowledge graph describing all projects i.e.: how central is this item across all known activities?
  • topic_pagerank_sum the sum of the pagerank metric for all items in the same topic as this item i.e.: how ‘central’ is the topic to the activity described by the knowledge graph?
  • topic_tfidf_sum the sum of the tfidf metric for all items in the same topic as this item i.e.: how ‘informative’ or specific’ is this topic?
  • topic_Q the modularity metric of the topic that this item resides within i.e.: how ‘compartmentalised’ is this topic?
  • topic_n_items the number of items in the same topic as this item
  • n_delaypaths the number of paths from the same topic as this item to items associated with a delay.
  • examples of the present invention can be realized in the form of hardware, software or a combination of hardware and software. Any such software may be stored in the form of volatile or non-volatile storage, for example a storage device like a ROM, whether erasable or rewritable or not, or in the form of memory, for example RAM, memory chips, device or integrated circuits or on an optically or magnetically readable medium, for example a CD, DVD, magnetic disk or magnetic tape or the like. It will be appreciated that the storage devices and storage media are examples of machine-readable storage that are suitable for storing a program or programs comprising instructions that, when executed, implement examples of the present invention.
  • examples provide a program comprising code for implementing apparatus or a method as claimed in any one of the claims of this specification and a machine-readable storage storing such a program. Still further, such programs may be conveyed electronically via any medium, for example a communication signal carried over a wired or wireless connection and examples suitably encompass the same.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Tourism & Hospitality (AREA)
  • Operations Research (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un procédé de traitement de données de projet mis en œuvre par ordinateur. Le procédé comprend les étapes consistant à : recevoir des données de projet ; effectuer un traitement de langage naturel pour extraire des éléments canoniques des données de projet ; traiter les éléments canoniques pour former un graphe de connaissances indiquant des relations entre des éléments ; déterminer au moins une mesure de paramétrage du graphe de connaissances ou des éléments dans le graphe de connaissances et un changement d'au moins une mesure au cours du temps ; et prédire un obstacle au projet à l'aide d'un algorithme d'apprentissage machine supervisé entraîné ou d'un modèle de Markov caché, HMM, basé sur des changements de la ou des mesures au cours du temps. Est en outre divulgué un procédé mis en œuvre par ordinateur d'apprentissage d'un algorithme d'apprentissage machine supervisé ou d'un algorithme de HMM pour projeter des données pour prédire un obstacle à un projet.
PCT/GB2022/051134 2021-05-06 2022-05-04 Procédé et appareil de traitement de données de projet WO2022234273A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB2106457.1 2021-05-06
GB202106457 2021-05-06

Publications (1)

Publication Number Publication Date
WO2022234273A1 true WO2022234273A1 (fr) 2022-11-10

Family

ID=81927535

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2022/051134 WO2022234273A1 (fr) 2021-05-06 2022-05-04 Procédé et appareil de traitement de données de projet

Country Status (1)

Country Link
WO (1) WO2022234273A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220318712A1 (en) * 2021-04-05 2022-10-06 Jpmorgan Chase Bank, N.A. Method and system for optimization of task management issue planning

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BILENKO ET AL.: "Adaptive Blocking: Learning to Scale Up Record Linkage", PROCEEDINGS OF THE SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM-06, December 2006 (2006-12-01), pages 87 - 96, XP031003020
BLEI ET AL.: "Latent Dirichlet Allocation", JOURNAL OF MACHINE LEARNING RESEARCH, vol. 3, 2003, pages 993 - 1022, XP002427366, Retrieved from the Internet <URL:https:/.,'oAwl.jmlr.org/papers/volume3./bleiO3a/bleiO3a.pd> DOI: 10.1162/jmlr.2003.3.4-5.993
BLONDEL ET AL., FAST UNFOLDING OF COMMUNITIES IN LARGE NETWORKS, 2008, Retrieved from the Internet <URL:https://arxiv.org/pdf/0803.0476.pdf>
GONDIA AHMED ET AL: "Machine Learning Algorithms for Construction Projects Delay Risk Prediction", JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT, vol. 146, no. 1, 1 January 2020 (2020-01-01), US, XP055943019, ISSN: 0733-9364, Retrieved from the Internet <URL:http://dx.doi.org/10.1061/(ASCE)CO.1943-7862.0001736> [retrieved on 20220715], DOI: 10.1061/(ASCE)CO.1943-7862.0001736 *
SANAGAPATI PAVAN: "Knowledge Graph & NLP Tutorial-(BERT,spaCy,NLTK) | Kaggle", 8 September 2020 (2020-09-08), pages 1 - 1, XP055947039, Retrieved from the Internet <URL:https://www.kaggle.com/code/pavansanagapati/knowledge-graph-nlp-tutorial-bert-spacy-nltk/notebook?scriptVersionId=42249493> [retrieved on 20220728] *
SANAGAPATI PAVAN: "Knowledge Graph & NLP Tutorial-(BERT,spaCy,NLTK) | Kaggle", 8 September 2020 (2020-09-08), pages 1 - 133, XP055947043, Retrieved from the Internet <URL:https://www.kaggleusercontent.com/kf/42249493/eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0..iqfZlRHAZ29X1zv9oHEVRg.ghHovQ0lzyFzXwewvYAsM6EOb4a8Du0fUXKS9PKImBwAHRABa_05L2agPPiASUxf-LEEAbEHdG6HY7kaEK-M_Fh8zgIAZ6jcBUfadX0Q5MQfKnn8IIGlgOm_mFYKqTPB98S-eyBH6Co1ayeOXkSIrsnuaqQqfzFFnUDudRNjcdWiln-hag4znQ> [retrieved on 20220728] *
SANH ET AL.: "A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks", 2018, CORNELL UNIVERSITY

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220318712A1 (en) * 2021-04-05 2022-10-06 Jpmorgan Chase Bank, N.A. Method and system for optimization of task management issue planning
US11681963B2 (en) * 2021-04-05 2023-06-20 Jpmorgan Chase Bank, N.A. Method and system for optimization of task management issue planning

Similar Documents

Publication Publication Date Title
CN113377850B (zh) 认知物联网大数据技术平台
US11019107B1 (en) Systems and methods for identifying violation conditions from electronic communications
JP6971853B2 (ja) コミュニケーション及びコンテンツからのコミットメント及びリクエストの自動抽出
US9904669B2 (en) Adaptive learning of actionable statements in natural language conversation
KR101972179B1 (ko) 자동 태스크 추출 및 캘린더 엔트리
JP5021640B2 (ja) ユーザのアクティビティ、アテンション、および関心事のデータ活用手段の検知、格納、索引作成、および検索
EP2478431B1 (fr) Découverte automatique des articles d&#39;une tâche contextuellement liés
US7421660B2 (en) Method and apparatus to visually present discussions for data mining purposes
EP1481346B1 (fr) Procede et dispositif de presentation visuelle de debats a des fins d&#39;exploration en profondeur de donnees
US20180196579A1 (en) Master View of Tasks
US20170200093A1 (en) Adaptive, personalized action-aware communication and conversation prioritization
US20160196336A1 (en) Cognitive Interactive Search Based on Personalized User Model and Context
CN112840335A (zh) 用于浏览器的以用户为中心的上下文信息
US20130060772A1 (en) Predictive analytic method and apparatus
CN106021387A (zh) 对话线程的概述
US20180123997A1 (en) Message management in a social networking environment
US10366359B2 (en) Automatic extraction and completion of tasks associated with communications
US20160196313A1 (en) Personalized Question and Answer System Output Based on Personality Traits
Garcia-Lopez et al. Analysis of relationships between tweets and stock market trends
US20210081459A1 (en) Notification system for a collaboration tool configured to generate user-specific natural language relevancy ranking and urgency ranking of notification content
US20240028997A1 (en) Method and System for Automatically Managing and Displaying a Visual Representation of Workflow Information
Nezhad et al. eAssistant: cognitive assistance for identification and auto-triage of actionable conversations
WO2022234273A1 (fr) Procédé et appareil de traitement de données de projet
US11106662B2 (en) Session-aware related search generation
Rashid Access methods for Big Data: current status and future directions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22727391

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.02.2024)