WO2023097353A1

WO2023097353A1 - A method for curating information

Info

Publication number: WO2023097353A1
Application number: PCT/AU2021/051448
Authority: WO
Inventors: David West
Original assignee: Batnav Pty Ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2023-06-08

Abstract

A method for curating information by using machine learning, the method including the steps of: defining a system/process information domain; aggregating an information database corresponding to the system/process information domain; identifying one or more risks associated with a system/process in the information database; assigning one or more risk levels to the one or more risks; preparing training data for training at least one risk classification machine learning algorithm; labelling the training data with the one or more risks and the one or more assigned risk levels; and training the least one risk classification machine learning algorithm with the training data until a risk classification machine learning model is generated.

Description

A METHOD FOR CURATING INFORMATION

FIELD OF THE INVENTION

[001] The present invention relates to curating information and in particular to curating information by using Artificial Intelligence.

[002] The invention has been developed primarily as a way of using Artificial Intelligence in the form of Machine Learning to curate engineering risk-management information. However, it will be appreciated that the invention is not limited to this particular field of use.

BACKGROUND OF THE INVENTION

[003] Engineers (or, i.e., Scientists, Specialists, Technicians, Managers, and others) acquire and apply technical knowledge to solve problems in various technical domains, such as mechanics, chemistry, electricity, and others. The competency and effectiveness of an Engineer is dependent on their ability to acquire and apply technical knowledge within a technical domain (i.e., information domain).

[004] Technology changes rapidly and constantly. An Engineer executing a task requiring the application of new technology unknown to the Engineer must acquire new technical knowledge or engage an Engineering Consultant possessing the knowledge and delegate that task.

[005] Depending on jurisdiction, an Engineer is often held to a standard of care requiring a certain level of competency. Therefore, it can be paramount for an Engineer to possess certain technical knowledge.

[006] Knowledge scarcity occurs when the rate of change in technology (i.e., a technical domain) causes a high demand for knowledge but its supply is constrained, such as by the amount of time it takes for the knowledge to be acquired.

[007] Presently, Engineers acquire technical knowledge by traditional study and/or training.

Engineers can also acquire technical knowledge by engaging Engineering or other consultants possessing the required knowledge. The availability of study and/or training, or knowledgeable Engineering or other consultants is a limiting factor to accessing the required knowledge.

[008] Engineers apply technical knowledge to deliver packages of Engineering or information deliverables (i.e., various types of documents, including video, audio, and others). Technical knowledge and information can be contained across one or more Engineering (or i.e., information) deliverables. Achieving consistency across Engineering deliverables is critical to delivering a solution that performs as intended.

[009] Presently, Engineers prepare Engineering deliverables manually by researching, referencing, and compiling information, often under time pressure. This process is complex and error can arise with potentially grave consequences. The time required to research, reference, correct, and produce Engineering deliverables is substantial and costly.

[010] It is an object of the present invention to overcome or ameliorate at least one of the disadvantages of the prior art, or to provide a useful alternative.

[Oil] It is an object of the invention in its preferred form to provide less expensive, easy and fast access to technical knowledge, scale knowledge to make it available as demand grows, and provide consistency across technical knowledge.

SUMMARY OF THE INVENTION

[012] According to an aspect of the invention, there is provided a method for curating information by using machine learning, the method including the steps of: defining a system/process information domain; aggregating an information database corresponding to the system/process information domain; identifying one or more risks associated with a system/process in the information database; assigning one or more risk levels to the one or more risks; preparing training data for training at least one risk classification machine learning algorithm; labelling the training data with the one or more risks and the one or more assigned risk levels; and training the least one risk classification machine learning algorithm with the training data until a risk classification machine learning model is generated. [013] Preferably, the step of defining the system/process information domain further includes identifying the cause and effect of independent variables on dependent variables in scientific/mathematical models of the information domain, wherein the one or more of the independent variables relate to the one or more risks and one or more risk levels.

[014] Preferably, the training data includes the independent variables.

[015] According to an aspect of the invention, there is provided a method for generating machine learning algorithm training data including the steps of: generating a data frame having one or more columns, each of the one or more columns corresponding to an independent variable in a scientific/mathematical model of a system/process; generating continuous or discrete distribution values of the independent variable of each of the one or more columns; generating one or more random values from the continuous or discrete distribution of the independent variable of each of the one or more columns: generating one or more rows in the data frame; assigning the one or more generated random values to a location in the one or more rows corresponding to the column of the independent variable from which the one or more random values was generated; and generating one or more columns corresponding to one or more risks labelled with one or more corresponding risk levels.

[016] Preferably, the step of preparing training data is implemented according to an aspect of the present invention.

[017] Preferably, machine learning algorithm training data is in the form of a data frame generated according to an aspect of the present invention.

[018] According to another aspect of the present invention, there is provided method for training a risk classification machine learning algorithm including: using training data to train a risk classification machine learning algorithm until a risk classification machine learning model is generated, the model being able to predict one or more system/process risks and/or risk levels according to one or more predetermined performance metrics.

[019] Preferably, the one or more performance metrics is selected based on the degree of imbalance of one or more risks and/or risk levels. [020] Preferably, the performance metric is a Kappa statistic.

[021] Preferably, the one or more risk levels is assigned to one or more risks in the training data on the basis of any one or more of: the type of performance metric used; the observed accuracy of the algorithm compared to the error rate of a human expert; and the potential consequence of a risk event occurring.

[022] According to another aspect of the present invention, there is provided a risk classification machine learning model including: a machine learning model trained by a machine learning classification algorithm to classify one or more system/process risks and/or risk levels, wherein the machine learning model has been trained according to an aspect of the present invention.

[023] Preferably, the method further includes the steps of: classifying one or more system/process risk and/or risk levels in a system/process, the classification being made by the risk classification machine learning model; identifying one or more risk-reduction controls in the information database, the one or more risk-reduction controls corresponding to the one or more classified risks and/or risk levels; selecting one or more risk-reduction controls and/or relevant knowledge elements from the information domain; using a text assembly algorithm to generate at least one text element from the one or more risk-reduction controls and/or relevant knowledge elements; using a document compiling algorithm to generate at least one document from the at least one text element; and using a document package algorithm to package the at least one document.

[024] Preferably, the one or more risk-reduction controls and/or relevant knowledge elements includes any one or more of: one or more risk control actions performable on the system/process to control a physical characteristic of the system/process; one or more inspections and/or tests of the system/process, wherein the frequency of application of any one or more of the aforementioned risk-reduction controls and/or relevant knowledge elements is based on any one or more of: the expected variability in one or more characteristics of the system/process; and the degree of sensitivity of one or more risks to the risk-reduction controls and/or knowledge elements.

[025] Preferably, the step of selecting and adjusting the frequency of the one or more riskreduction controls depending on how the one or more risks and/or risk levels have been classified by one or more risk classification machine learning model. [026] Preferably, the text element is a concatenable text string.

[027] Preferably, a concatenable text string corresponding to the one or more risk controls is concatenated into at least one sentence.

[028] Preferably, the at least one sentence is assigned to a variable in memory, the variable being retrievable by the compiling algorithm into a relevant location of the at least one document.

[029] According to another aspect of the present invention, there is provided machine-readable code containing a set of instructions for implementing a method according to an aspect of the present invention.

[030] According to another aspect of the present invention, there is provided machine-readable code containing a set of instructions for implementing the training data according to an aspect of the present invention.

[031] According to an aspect of the present invention, there is provided a machine-readable code containing a set of instructions for implementing the machine learning model according to an aspect of the present invention.

[032] According to another aspect of the present invention, there is provided a system for implementing the machine learning model according an aspect of the present invention, wherein the system includes any one or more of: a cloud-based system; a distributed system; and a client/server system.

[033] According to another aspect of the present invention, there is provided a system for implementing a method according to an aspect of the present invention, wherein the system includes any one or more of: a cloud-based system; a distributed system; and a client/server system.

[034] According to another aspect of the present invention there is provided one or more GUIs for interacting with a system/method/algorithm/model according to an aspect of the present invention.

BRIEF DESCRIPTION OF THE DRAWINGS

[035] A preferred embodiment of the invention will now be described, by way of example only, with reference to the accompanying drawing in which: [036] Figure 1 is a flowchart of a method for curating information by using machine learning.

PREFERRED EMBODIMENT OF THE INVENTION

[037] Referring to the drawing, in one aspect of the present invention there is provided a method 1 for curating information by using machine learning. Firstly, a system/process information domain is defined 4. A system/process can be any physical system/process, or a model of physical system/process. For example, a system may be a building, a vehicle, a mobile phone device, etc. A process may be the way the building degrades over time, the way the vehicle accelerates, or the way the mobile phone heats up during use. Defining the system/process includes identifying the cause and effect of independent variables on dependent variables in scientific/mathematical models corresponding to a system/process.

[038] A simple example of a scientific/mathematical model could be the formula Force = mass * acceleration, wherein Force is the dependent variable while mass and acceleration are the independent variables. Other types of independent variables can be those related to measurable physical operating qualities/conditions of a system/process such as dimensions, material, mechanical, thermal, and electrical, but also subjective inputs/preferences of a system/process operator (i.e., a user, technician, scientist, engineer, manager etc.).

[039] For example, in the context of a battery management system, the identified operating conditions might be those that affect performance and reliability such as system/process life, maximum/minimum operating temperatures, maximum charge/discharge rates, the number of charge/discharge cycles per day/week/month/year, and the depth of discharge. Some operating conditions might correspond to continuous variables (e.g., temperature) and some to discrete variables (e.g., marine environment).

[040] Once defined 4, the information domain as well as any other required information is aggregated into a system/process information database 6 (or i.e., knowledge database), whereby the information/knowledge database is created.

[041] The behaviour and other characteristics of a system/process can pose risks to the system/process, to the environment of the system/process, as well as human and other life. For example, a building may undergo a process of degeneration whereby its structural integrity progressively fails, causing the structure building to deform. Or, a building may be structurally weak and could fail catastrophically. It is ostensibly beneficial and desirable to control system/process risks.

[042] The method further includes identifying one or more risks associated with a system/process in the information database 6A and then assigning a risk level 10 to that risk. For example, a risk might be the building collapsing, and the level of risk might be the likelihood that the building could collapse within a certain period of time.

[043] System/process information domain experts (i.e., technician, scientist, engineers, managers, someone informed in the system/process information domain, etc., from now on "informed administrator") are able to identify the independent variables of a system/process and predict the effect of the independent variables on dependent variables. One way to classify a system is whether it is permanent, such as a fixed asset that does not undergo significant change, or, transient, such as a process that converts materials into a new system.

[044] The number of risks identifiable by an informed administrator can depend on the consequences of the risk eventuating (i.e., the failure mode) as caused by a potential combination of operating conditions. The number of levels of a risk (which could be either binary or multi-level) can depend on the ability to measure and control the independent variables known to affect the dependent variables. For example, in the context of a battery management system, three levels of risk might be determined for a risk related to battery degradation, the levels of risk corresponding to high, medium, and low risk. Two levels of risk might be identified in respect of battery corrosion, the levels corresponding to a yes and no (i.e., binary). Similarly, two levels of risk might be identified in respect of mechanical damage to the battery, also corresponding to a yes and no (i.e., binary).

[045] Next, training data is prepared 8 (i.e., created) for training one or more machine learning algorithms 12 until a risk classification machine learning model is generated 14, such that the model is able to classify a risk and/or a corresponding risk level associated with the system/process in the information database 6A. The training data includes one or more independent variables that relate to the one or more risks and assigned risk levels. The training data is labelled with one or more risk and corresponding assigned risk levels 10. The training data can include other information as deemed necessary or relevant to training a machine learning algorithm (not shown), such as dependent variables in scientific/mathematical models, complete scientific/mathematical models, and other similar information. [046] According to another aspect of the present invention, there is provided a method for generating (i.e., creating) machine learning algorithm training data 8. Firstly, a data frame (not shown) is generated with one or more columns. Each of the one or more columns corresponds to an independent variable (not shown) in a scientific/mathematical model of a system/process.

Continuous or discrete distribution values are generated for each of the independent variable(s), and random values are generated from the continuous or discrete distribution values. One or more data rows are then generated in the data frame and the random values are assigned to a location in one or more data frame rows intersecting with the column (i.e., a cell of the data frame) corresponding to the independent variable from which the random values were originally generated. A further column is generated corresponding to one or more risks related to the independent variables, and the risks are labelled 10 (i.e., distributed/assigned) with corresponding risk levels.

[047] The way the risk levels have been labelled 10 can be evaluated to ensure each level of a risk is sufficiently represented/balanced. Additional rows can be generated and populated with values that correspond to the level of risk that is insufficiently represented/imbalanced. Sufficiency of representation/balance is determined by considering performance metrics and the machine algorithm used.

[048] The method for generating machine learning algorithm training data can be used to prepare training data for training 12 the at least one risk classification machine learning algorithm according to an aspect of the present invention.

[049] According to another aspect of the present invention, there is provided machine learning algorithm training data in the form of the data frame (not shown) generated (i.e., created) according to a method for generating machine learning algorithm training data.

[050] According to yet another aspect of present the invention, there is provided a method for training 12 a risk classification machine learning algorithm. Training data is used to train 12 at least one risk classification learning algorithm until a risk classification machine learning model 14 is generated, the model 14 being able to predict at least one system/process risk and/or risk level according to one or more performance metrics selectable based on the degree of sufficiency of representation/imbalance of one or more system/process risk level (not shown). In an alternative preferred embodiment, the performance metric is a Kappa statistic (not shown). [051] The one or more risk levels can be labelled 10 (i.e., assigned/distributed) to the one or more risks on the basis of the type of performance metric used, the observed accuracy of the algorithm compared to the error rate of a system/process information domain expert 2, and/or the potential consequence of a risk even occurring. A variety of machine learning algorithms can be trained to generate a model for classifying (i.e., predicting) the risk and/or risk class. A list of preferred machine learning algorithms types includes: Regression, Decision-Tree, Neural Network, Ensemble (not shown). It will be appreciated that the present invention can take advantage of any suitable machine learning model, depending on the required/available classification performance of the model generated. The performance metric of a model generated from training the random forest algorithm is considered to be highly effective.

[052] According to yet another aspect of the invention, there is provided a risk classification machine learning model 14 trained 12 by a machine learning classification algorithm to classify one or more system/process risks and/or risk levels, the model 14 having been trained 12 using the method for training 12 a risk classification machine learning algorithm according to an aspect of the present invention.

[053] The method 1 for curating information by using machine learning further includes classifying one or more system/process risks and/or risk levels corresponding to risks in a system/process (i.e., machine learning model inputs) by using the risk classification machine learning model 14 according to an aspect of the present invention. One or more risk-reduction controls (not shown) corresponding to the one or more classified risks are then identified and selected along with relevant knowledge elements from the information database.

[054] The one or more risk-reduction controls can be risk control actions performable on the system/process to control a physical characteristic of the system/process. For example, if a risk corresponding to a degenerating building has been classified (i.e., predicted), then one risk control action may be to bolster the building's foundations based on how fast the building is sinking into the ground. The formula acceleration = Force/mass could be used to determine the Force required to balance the sinking of the building. Other preferred control actions are making inspections and/or tests of the system/process (not shown).

[055] The frequency of application of a risk reduction control is based on the expected variability of one or more characteristics and/or corresponding variables of the system/process, the degree of sensitivity of one or more risks and/or risk levels to one or more risk-reduction controls and or/knowledge elements, and/or how the one or more risks and/or risk levels have been classified by one or more risk classification machine learning model 14.

[056] Next, a text assembly algorithm 20 is used to generate at least one text element 22 from the risk-reduction controls and/or relevant knowledge elements and/or classified risks 24, the text element 22 being in the form of at least one concatenable text string concatenable into at least one sentence (not shown) containing risk-relevant information. The at least one sentence is assigned to a variable in memory and is retrievable by a document compiling algorithm 26 to thus generate one or more documents. The one or more documents is then packaged by a package algorithm 28 into a curated information deliverable 30 to a user 2A.

[057] In a preferred embodiment of the present invention, the method is partly or wholly performed with the use of at least one GU I 32 for one or more user 2A, the method being implemented on a cloud-based system 32 (i.e., cloud infrastructure).

[058] In a preferred embodiment of the present invention, the method is partly or wholly performed with the use of at least one GU I (not shown) for one or more informed administrator 2, the method being implemented on a cloud-based system 32.

[059] In a preferred embodiment, the text assembly/compiling/packaging algorithms are accessed from the LaTeX engine and suite of tools (i.e., software/programs, etc.). However, it will be appreciated other types of assembly/compiling/packaging algorithms may be suitable.

[060] According to yet another aspect of the present invention, there is provided machine readable code containing a set of instructions for implementing the various aspects of the invention including any one or more the method(s), machine learning algorithm(s), training data, or machine learning model(s).

[061] According to yet another aspect of the present invention, there is provided a cloud based- system implementing one or more machine learning models according to an aspect of the present invention. One or more suitable GUIs are provided to enable an informed administrator 2 to more easily interact with, manage, and/or operate the system. In an alternative embodiment, the one or more machine learning models can be implemented on a distributed system or a client/server system. [062] According to yet another aspect of the present invention, there is provided a cloud-based system implementing the one or more methods according to an aspect of the present invention. One or more suitable GUIs are provided to enable one or more informed administrators 2 to more easily interact with, manage, and/or operate the system. In an alternative embodiment, the one or more methods are implemented on a distributed system or a client/server system.

[063] By way of summary, the present invention can be viewed as one or more methods (or algorithms)/systems applying Machine Learning, and other algorithms, to deconstruct human knowledge into a knowledge database of knowledge elements; reconstruct the knowledge elements from the knowledge database into human-readable text; assemble the text into formatted and human-readable documents suitable for use in contracts and other legal documents; and propagating the knowledge elements and text across packages of Engineering deliverables (i.e., documents containing technical information).

[064] To curate information by using machine learning 1, a system/process informed administrator 2 firstly defines a system/process information domain 4. This includes identifying the cause and effect of independent variables on dependent variables in scientific/mathematical models corresponding to the system/process. Once the information is defined it is aggregated (i.e., created) 6 into a system/process information (i.e., knowledge) database 6A. The informed administrator 2 then identifies one or more risks in the information database 6A.

[065] Next, the informed administrator generates (i.e., prepares) machine algorithm training data 8 by generating a data frame (not shown) with one or more columns, each of the columns corresponding to an independent variable (not shown) in a scientific/mathematical model of the system/process identified in the domain 4. The informed administrator 2 generates (or prepares) continuous and/or discrete distribution values (not shown) for each of the independent variables (not shown), and then generates random values (not shown) from the continuous and/or discrete values. Continuous values can include floating point numbers. Discrete values can include integers or binary values. The informed administrator 2 then generates one or more data frame rows and the random values are each assigned to a location in one or more data frame rows intersecting with the column (i.e., a data frame cell) corresponding to the independent variable from which the random values were originally generated. The informed administrator 2 generates a further column corresponding to one or more risks related to the independent variables, and labels (or i.e., assigns/distributes) the one or more risks with one or more risk levels defined in the information database 6A.

[066] The informed administrator 2 can evaluate distribution the one or more labels to ensure that each risk and/or risk level is sufficiently represented, and can add extra data table rows corresponding to a risk and/or risk level that is not sufficiently represented (i.e., imbalanced).

[067] Using the prepared training data 8, the informed administrator 2 then trains 12 one or more machine learning algorithms until a risk identification/classification model 14 is generated such that the model 14 is able to classify a risk and/or a corresponding risk level associated with the independent variables of the system/process and/or the system process in the information base 6A according to one or more performance metric (not shown) selected by the informed administrator 2.

[068] The informed administrator 2 can select a performance metric, such as a Kappa statistic, based on the degree of imbalance of one or more risk and/or risk level classified by the machine learning model 14. The informed administrator 2 can select a Kappa statistic to be a performance metric for other reasons.

[069] It will be appreciated that the steps of the method 1 including defining a system/process information domain 4; aggregating the information into a system/process information database 6A; identifying one or more risks in the information database (not shown); preparing (or generating) machine algorithm training data 8; training one or more machine learning algorithms 12 until a machine learning model 14 is generated; and selecting a performance metric (not shown), as well as other steps in the method 1, can take the form of an iterative process starting from any one step, and then stepping through other steps in any order as required 1A. Accordingly, in any one or more iteration of the method, or part thereof, one or more risk levels can be assigned to one or more risks on the basis of the type of performance metric used, the observed accuracy of the algorithm compared to the error rate of the informed administrator 2, and/or the potential consequence of a risk event occurring, as well as on the basis of other reasons mentioned herein.

[070] The machine learning model is used to classify (i.e., predict) one or more system/process risks and/or risk levels. In a preferred form of the invention, the risks and/or risk levels are inputs 16 provided by a user 2A (i.e., a person seeking an information deliverable relating to risk-management) via a GUI 18. The inputs may need to be validated 16A prior to being input into the machine learning model 14 so that inputs meet predetermined requirements. [071] Having classified the one or more risks and/or risk levels (which can preferably be entered into the information database 6A), the informed administrator 2 then identifies in the information database 6A one or more and more suitable risk-reduction controls and/or relevant knowledge elements (not shown) and selects these along with any other information in the information database 6A (or elsewhere) considered relevant by the informed administrator 2. The informed administrator 2 can decide which risk-reduction controls to select depending on the expected variability of one or more characteristics of the system/process (including i.e., dependent or independent variables, knowledge elements and/or the other information considered relevant), the degree of sensitivity of one or more risks and/or risk levels to the risk-reduction controls and/or knowledge elements, and/or how the risks and/or risk levels have been classified by one or more machine learning models 14.

[072] A text assembly algorithm 20 is used to generate one or more sentences 22 (or i.e., text elements) from the classified risks and/or risk reduction controls and/or relevant knowledge elements 24, and then one or more templates 22A are used to format the one or more sentences into formatted documents 26. At least one document is compiled by a document compiling algorithm 26A from the one or more formatted documents 26, and then packaged by means of a document package algorithm 28 into a curated information deliverable 30.

[073] A user 2A receives the information deliverable 30, the user 2A having provided inputs 16 in the form of inputs related to risks and/or risk-management corresponding to a technical information domain via GUI 18, the GUI being associated with cloud-infrastructure 32 (i.e., cloud-based system).

[074] In a preferred embodiment, the model 14, information database 6A, text assembly algorithm 20, document compiling algorithm 26A, and the document package algorithm 28 are implemented on the cloud infrastructure 32 accessed, managed, and/or administered by one or more informed user 2 or other suitable person.

[075] It will be appreciated that the present invention provides less expensive, easier and faster access to technical knowledge, allows knowledge to be scaled to make it available as demand grows, and provides consistency across technical knowledge.

[076] The Person Skilled in the Art ("PSA") will appreciate that machine learning model 14 performance can be improved by iterative 1A training. [077] The PSA will appreciate the various ways in which the systems according to the present invention can be implemented on various computer systems including one or more personal computers, mobile devices, client/server systems, distributed systems, cloud-based systems, databases, internet, web, frontend, backend, network, and similar and any combination of these as deemed necessary by the PSA.

[078] The PSA will appreciate the available existing software, programs, and platforms available to assist or enable the PSA to perform the invention.

[079] " Machine-readable code containing a set of instructions" means any suitable computer language or combination thereof which can be used by the PSA to implement the methods, algorithms, and systems of the invention. Suitable computer languages include machine language, assembly language, and high-level languages such as C, Python, R, etc.

[080] One or more GUI ("Graphic User Interface"), including features to allow access to the invention by one or more expert 2 and/or user 2A, can be implemented on or more computer system as deemed necessary. A PSA will appreciate that a GUI also refers to a Ul such as a command prompt.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1. A method for curating information by using machine learning, the method including the steps of: defining a system/process information domain; aggregating an information database corresponding to the system/process information domain; identifying one or more risks associated with a system/process in the information database; assigning one or more risk levels to the one or more risks; preparing training data for training at least one risk classification machine learning algorithm; labelling the training data with the one or more risks and the one or more assigned risk levels; and training the least one risk classification machine learning algorithm with the training data until a risk classification machine learning model is generated.

2. A method for curating information according to claim 1 wherein the step of defining the system/process information domain further includes identifying the cause and effect of independent variables on dependent variables in scientific/mathematical models of the information domain, wherein the one or more of the independent variables relate to the one or more risks and one or more risk levels.

3. A method for curating information according to claim 2 wherein the training data includes the independent variables.

4. A method for generating machine learning algorithm training data including the steps of: generating a data frame having one or more columns, each of the one or more columns corresponding to an independent variable in a scientific/mathematical model of a system/process; generating continuous or discrete distribution values of the independent variable of each of the one or more columns; generating one or more random values from the continuous or discrete distribution of the independent variable of each of the one or more columns; generating one or more rows in the data frame; assigning the one or more generated random values to a location in the one or more rows corresponding to the column of the independent variable from which the one or more random values was generated; and generating one or more columns corresponding to one or more risks labelled with one or more corresponding risk levels. A method for curating information according to any one of claims 1 to 3 wherein the step of preparing training data is implemented according to the method defined in claim 4. Machine learning algorithm training data in the form of a data frame generated by the method according to claim 4. A method for training a risk classification machine learning algorithm including: using training data to train a risk classification machine learning algorithm until a risk classification machine learning model is generated, the model being able to predict one or more system/process risks and/or risk levels according to one or more predetermined performance metrics. A method for training a machine learning algorithm according to claim 7 wherein the one or more performance metrics is selected based on the degree of imbalance of one or more risks and/or risk levels. A method for training a machine learning algorithm according to claim 7 wherein the performance metric is a Kappa statistic. A method for training a machine learning algorithm according to any one of claims 7 to 9 wherein one or more risk levels is assigned to one or more risks in the training data on the basis of any one or more of: the type of performance metric used; the observed accuracy of the algorithm compared to the error rate of a human expert; and the potential consequence of a risk event occurring. 17 A risk classification machine learning model including: a machine learning model trained by a machine learning classification algorithm to classify one or more system/process risks and/or risk levels, wherein the machine learning model has been trained according to the method of any one or more of claims 4, and 7 to 10. A method for curating information according to any one of claims 1 to 3, or 5, further including the steps of: classifying one or more system/process risk and/or risk levels in a system/process, the classification being made by the risk classification machine learning model; identifying one or more risk-reduction controls in the information database, the one or more risk-reduction controls corresponding to the one or more classified risks and/or risk levels; selecting one or more risk-reduction controls and/or relevant knowledge elements from the information domain; using a text assembly algorithm to generate at least one text element from the one or more risk-reduction controls and/or relevant knowledge elements; using a document compiling algorithm to generate at least one document from the at least one text element; and using a document package algorithm to package the at least one document. A method for curating information according to claim 12 wherein the one or more riskreduction controls and/or relevant knowledge elements includes any one or more of: one or more risk control actions performable on the system/process to control a physical characteristic of the system/process; one or more inspections and/or tests of the system/process, wherein the frequency of application of any one or more of the aforementioned risk-reduction controls and/or relevant knowledge elements is based on any one or more of: the expected variability in one or more characteristics of the system/process; and the degree of sensitivity of one or more risks to the risk-reduction controls and/or knowledge elements. A method for curating information according to claim 13 further including the step of selecting and adjusting the frequency of the one or more risk-reduction controls depending on how the one or more risks and/or risk levels have been classified by the risk classification machine learning model. A method for curating information according to claim 12 to 14 wherein the text element is a concatenable text string. A method for curating information according to claim 15 wherein a concatenable text string corresponding to the one or more risk controls is concatenated into at least one sentence. A method for curating information according to claim 16 wherein the at least one sentence is assigned to a variable in memory, the variable being retrievable by the compiling algorithm into a relevant location of the at least one document. A machine-readable code containing a set of instructions for implementing a method according to any one of claims 1 to 5, 7 to 10, and 12 to 17. A machine-readable code containing a set of instructions for implementing the training data according to claim 6. A machine-readable code containing a set of instructions for implementing the machine learning model according to claim 11. A system for implementing the machine learning model according to claim 11, wherein the system includes any one or more of: a cloud-based system; a distributed system; and a client/server system. A system for implementing a method according to any one of claims 1 to 5, 7 to 10, and 12 to

17 wherein the system includes any one or more of: a cloud-based system; a distributed system; and a client/server system. One or more GUIs for interacting with the system according to claim 21 or claim 22.