WO2024019617A2 - Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning - Google Patents

Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning Download PDF

Info

Publication number
WO2024019617A2
WO2024019617A2 PCT/NL2023/050385 NL2023050385W WO2024019617A2 WO 2024019617 A2 WO2024019617 A2 WO 2024019617A2 NL 2023050385 W NL2023050385 W NL 2023050385W WO 2024019617 A2 WO2024019617 A2 WO 2024019617A2
Authority
WO
WIPO (PCT)
Prior art keywords
process set
chemical
digitized
directed graph
node
Prior art date
Application number
PCT/NL2023/050385
Other languages
French (fr)
Other versions
WO2024019617A3 (en
Inventor
Artur Maria SCHWEIDTMANN
Original Assignee
Technische Universiteit Delft
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Technische Universiteit Delft filed Critical Technische Universiteit Delft
Publication of WO2024019617A2 publication Critical patent/WO2024019617A2/en
Publication of WO2024019617A3 publication Critical patent/WO2024019617A3/en

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B17/00Systems involving the use of models or simulators of said systems
    • G05B17/02Systems involving the use of models or simulators of said systems electric
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32104Data extraction from geometric models for process planning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/35Nc in input of data, input till input file format
    • G05B2219/35203Parametric modelling, variant programming, process planning

Definitions

  • the present invention is in the field of physical processes, chemical processes, biological processes, and microbiological processes in general, apparatuses for such processes, such as for boiling, for separation, for mixing, for dissolving, for reacting, for controlling, and in particular a process comprising a plurality of such apparatuses and processes or process steps, as well as the interaction between said apparatuses and processes or process steps, such as in terms of flows of chemicals between apparatuses.
  • apparatuses for such processes such as for boiling, for separation, for mixing, for dissolving, for reacting, for controlling, and in particular a process comprising a plurality of such apparatuses and processes or process steps, as well as the interaction between said apparatuses and processes or process steps, such as in terms of flows of chemicals between apparatuses.
  • a process flow diagram may be used.
  • the process flow diagram is aimed to visually display the relationship between major equipment of a plant facility and does not show minor details.
  • process flow diagrams may be used.
  • a first step leading to a construction of a process plant and its use in the manufacture of a product is typically the conception of a process, typically involving process steps.
  • the process concept may then be visualized by a process flow diagram, representing the process steps, and main details thereof, or likewise, a method of producing.
  • Process design can then then proceed on the basis of the process flow diagram chosen.
  • Fig. 1 shows some typical elements and symbols used.
  • process flow diagrams may be used to perform steady-state and non- steady- state heat and mass balancing, sizing and costing calculations, such as for a chemical process. It is considered an essential and core part of process design.
  • a computer or the like is used, in particular for supporting the calculations, and hence process design.
  • Typical steps in process design are an initial step, which may be referred to as synthesis, a step for optimizing the process design, which may involve heat and material balance, sizing of process equipment, and cost calculations, and a control step for assessing topics as safety, operability, and a final step, wherein the process design or parts thereof are further optimized in view of a previous step.
  • process flow diagrams of a process may include various elements, such as operational parameter data (see above), references to a mass balance, major equipment items, connections with other systems, identifications, such as process stream names, process piping, and major bypass and recirculation (recycle) streams.
  • The typically do not include minor elements, such as minor bypass lines, instrumentation and details thereof, controllers like level or flow controllers, pipe classes or piping line numbers, isolation and shutoff valves, maintenance vents and drains, relief and safety valves, and flanges, though this is not a general rule.
  • Process flow diagrams of multiple process units, within a large industrial plant may as a consequence of the size and complexity usually contain less detail.
  • process flow diagram can be computer generated, such as from process simulators, using CAD packages, or using flow chart software using a library of chemical engineering symbols. Rules and symbols are available from standardization organizations such as DIN, ISO or ANSI as mentioned above.
  • process flow diagrams may be produced on large sheets of paper.
  • process flow diagrams of many commercial processes can be found in literature, specifically in encyclopedias of chemical technology, although some might be outdated. More recent ones can be found on-line.
  • these process flow diagrams relate to a pixel-oriented diagram, that is, wherein the diagram is present as an image as such, without the details of the image being incorporated as separate items or the like.
  • the meaning of or information relating to various elements in the image in the real world do not form part of the image; as mentioned, often the diagrams are not even digitized at all. Also digitization of small elements in such diagrams may form a problem.
  • ML machine learning models
  • WO 2021/145138 Al recites a display device which acquires data on a plurality of devices installed in a facility, and stores, in an associated data memory unit, corresponding relationships between the devices installed in the facility and components in drawing data in which the devices installed in the facility are drawn as the components. Further, the display device, upon receiving a designation of a specific component among the plurality of components in the drawing data, selects a specific device corresponding to the specific component using the corresponding relationships stored in the associated data memory unit, and displays the data on the specific device and data on a device group having a causal relationship with the specific device in association with each other.
  • the document may be considered as an example of the prior art identified above, showing some of the basic concepts for digitization in a rathe basic form.
  • the present invention relates to an improved system and method for analyzing a (chemical) process and providing a digitized set-up which overcomes one or more of the above disadvantages, without jeopardizing functionality and advantages.
  • the present invention relates in a first aspect to system for analyzing a chemical process, which system in principle can be used for any process, comprising a computer memory provided with digital representation of a directed graph representation of the chemical process, the graph representation comprising elements selected from apparatuses, flow modifiers, devices, process steps, flows, pipelines, signal lines, pressure regulators, temperature regulators, concentration regulators, chemical species regulators, controllers, and combinations thereof, and interactions between these elements, and a data processor provided with a computer program which, when running on the data processor, -provides trained machine learning, which is trained using a selection of a training dataset comprising directed graph representations of chemical processes and/or string representations of the directed graphs (such as SFILES) and resulting directed graphs and nodes representing elements, and annotated versions thereof, and as this typically is the training data set of the object detection algorithm, it typically includes the location of the objects on the image, e.g., through a bounding box, or a pixel-based mask, and the type of equipment, provides the
  • a bounding box may be considered a box, the mask may be considered a flexible form based on pixels. So one can cut out objects accurately.
  • object detection architecture, object detection performance metrics, and skeletonization are used.
  • a system is provided which solves one or more of the above disadvantages.
  • the present system and likewise method, provide a system that detects unit operations and their connectivity in process flowsheets, such as chemical process flows.
  • a directed graph is made therefrom. Therewith a full digitization is provided.
  • the graph can be read automatically into a process simulation, such as process simulation software.
  • a model of the graph can be created automatically.
  • the graph may be considered as a knowledge graph.
  • certain elements may be cut out, such as by using a mask, in particular for cutting out unit operations.
  • a neural network or the like may be used, in particular for learning.
  • auto-completion of to be made graphs such as of chemical flowsheets, is provided.
  • reinforcement learning and graph representation may be used.
  • a suitable programming environment is Python. No graphical user interface is required. The graph results are found to be more accurate compared to prior art methods, and also more meaningful, that is representing the real environment better. It is also found to scale better.
  • the present invention relates to a method of providing a digitized process set-up, the digitized process set-up with a sequence of at least two process steps, which sequence may be a linear sequence or a circular sequence or multiple cycles or a combination thereof, wherein the at least two process steps are selected from a chemical process step, a physical process step, a biological process step, and a micro-biological process step, in particular wherein process steps are selected from heating, cooling, flowing, reacting, mixing, contacting, depositing, annealing, separating, adding, removing, filtering, crystallizing, phase-separating, distilling, oxidizing, reducing, hydrogenating, de- hydrogenating, polymerizing, poly-condensing, esterifying, alkylating, de- alkylating, aminating, halogenating, sulfonating, nitrifying, de-hydrating, hydrolysing, and melting, comprising optically reading an image of a process set-up, digitizing said optical
  • each node individually is selected from an end node, an intermediate node, and an intersection node, using artificial intelligence, identifying at least one physical object to each node in the directed graph, using artificial intelligence, identifying at least one process path, which may be referred to as interaction, or edge, or connection, between each first node and each second node of the plurality of nodes, and using rule-based ontology, in particular rule-based ontology obtained from a data model, such as ONTOCAPE, supplementing (also referred to as enriching) the directed graph of the digitized process set-up with the at least one process path and identified objects, or vice versa, in particular wherein the process is a chemical process.
  • a data model such as ONTOCAPE
  • the present invention relates to a use of the digitized process set-up for optimizing the process set-up, for forming a digital twin of the process set-up, for linking the process set-up to operational data, or for building a model of the process set-ups.
  • present system may comprise instructions for carrying out the present method.
  • the present invention is also a topic of to be published scientific papers, entitled “Digitization of chemical process flowsheets using computer vision on big data” and “LEARNING FROM FLOWSHEETS: A GENERATIVE TRANSFORMER MODEL FOR FLOWSHEET COMPLETION”, which reference and its content is incorporated by reference.
  • the present invention relates in a first aspect to a system for analyzing a chemical process.
  • artificial intelligence is based on a convolutional neural network or a neural network with a transformer architecture.
  • An example of such a process flow is given in fig. 3.
  • artificial intelligence is trained on labelled data.
  • An example of such a process flow is given in fig. 3.
  • each node is provided with supplementary actors, wherein the supplementary actors are selected from chemical species, pressure, temperature, flow, concentration, controls, reactant, catalyst, product, pH values, composition, physical or chemical states, enzyme, biological species, nucleic acid sequence or part thereof, .
  • supplementary actors are selected from chemical species, pressure, temperature, flow, concentration, controls, reactant, catalyst, product, pH values, composition, physical or chemical states, enzyme, biological species, nucleic acid sequence or part thereof, .
  • An example is given in fig. 4.
  • a layout of the process set-up comprising physical objects, the physical objects selected from apparatuses, in particular wherein apparatuses are selected from a tank, a column, a reflux, a reboiler, a boiler, a controller, a valve, a cooler, a mixer, a heater, a heat exchanger, a furnace, a filter, a mixer, a splitter, a phase separator, a absorber, a flash unit, a reactor, a pump, a flow controller, a compressor, a filter, a splitter, and a vessel.
  • apparatuses are selected from a tank, a column, a reflux, a reboiler, a boiler, a controller, a valve, a cooler, a mixer, a heater, a heat exchanger, a furnace, a filter, a mixer, a splitter, a phase separator, a absorber, a flash unit, a reactor, a pump, a flow controller,
  • a layout of the process set-up is made comprising chemical objects, the chemical objects selected from chemical species, catalysts, solvents, inert species, reactants, carriers, stabilizers, buffers, intermediate products, non-reactants, oxidants, and reductants.
  • chemical objects selected from chemical species, catalysts, solvents, inert species, reactants, carriers, stabilizers, buffers, intermediate products, non-reactants, oxidants, and reductants.
  • the directed graph is supplemented with a standard process model.
  • a standard process model An example is given in fig. 7.
  • nodes or actors are auto-com- pleted.
  • An example is given in fig. 8.
  • a novel method is provided to learn from (chemical) process flowsheets and provide flowsheet structure recommendations, such as for engineers performing process synthesis.
  • the method may recommended one or multiple process nodes, their connectivity, and attributes or alternative process topologies (referred to as “auto- completion” or “auto-correction”).
  • auto- completion or “auto-correction”.
  • inventors created two data sets, the first one consisting of synthetically generated and the second one consisting of real flowsheets in graph format.
  • inventors automatically generated the corresponding text-based SFILES 2.0 data sets.
  • the present inventors pre-trained a generative Transformer language model on the data set of synthetically generated flowsheets and fine-tuned it on the data set of real flowsheets.
  • the trained generative Transformer model is capable of learning the grammatical structure of the SFILES 2.0 language and the patterns contained in the flowsheet topologies. Consequently, the results demonstrate that using the trained model for causal language modelling is a strategy to auto-complete flowsheet topologies.
  • the input of the machine learning model may be a graph or string representation of a flowsheet.
  • the output of the machine learning model may be a graph or string representation of a flowsheet or a part of a flowsheet.
  • Using beam search as the decoding strategy yields the highest probability flowsheet completion.
  • the top-p sampling decoding strategy is a promising addition to beam search.
  • FIGS 1, 2a-d, and 3-22 show aspects of the present invention.
  • Figure 1 shows examples of symbols typically used in process flow diagrams.
  • Fig. 2a shows a non-limitative example of a process flow diagram, having to a certain extent arbitrary elements shown therein.
  • Figure 2b shows a graph representing the process flow diagram of fig. 2a.
  • Figure 2c shows a fully digitized process flow diagram, according to the graph of fig. 2b, and the process flow diagram of fig. 2a.
  • Figure 2d shows schematically the method of providing a digitized process set-up, the digitized process set-up with a sequence of at least two events, wherein the at least two events are selected from a chemical event, a physical event, a biological event, and a micro-biological event.
  • the process starts with the process flow diagram of fig. 2a, which is digitized.
  • a deep learning model which may be based on convolutional neural networks
  • a supervised learning approach such as wherein the model is trained on labeled data, such as that of figure 3.
  • Figure 4 shows a complex process flow diagram which is digitized through computer vision, according to the invention.
  • Figures 5-6 show a use of an advanced model with a mask, in addition to the present method or system.
  • the advanced model can identify a pixel-based mask for each object detected.
  • the advanced model may be based on a Mask R-CNN architecture. Therewith basically unit operations are cut out more accurately than in a bounding box approach.
  • Figure 7 shows automatic generation of UniSim models from process flow diagrams.
  • Figure 8 shows auto-completion of an exemplary process flow diagram.
  • chemical species Fh and CO2 which are in a first step mixed (dashed oval) the present system provides suggestions for addition of a next step, apparatus, parameters, etc. (dashed-dotted oval).
  • typically used elements are provided as optional selections at the right hand side of the screen (dashed dark oval). A user may select items from the pictograms on the right.
  • FIG. 2a shows an example flowsheet of a proposed Cumene production plant. The illustration was slightly altered, the flow structure however is kept. Via a procedure known as information extraction inventors automatically retrieved information of the chemical process representation in structured formats from unstructured data through several different methods.
  • Two -stage detectors contain a model that determines regions of interest with high probabilities of containing objects and a second model that classifies found regions of interest.
  • one-stage detectors consist of a single network model that simultaneously predicts bounding boxes and classifications.
  • Transfer learning refers to the improvement of model learning in one task by transferring knowledge from a related, previously learned task. With transfer learning, a model can initiate the training process on new data distributions with pre-trained weights, shortening training time and possibly leading to superior performance due to convergence to better optima.
  • Backbone models in detection models are usually pre-trained on large datasets such as the ImageNet classification challenge dataset or the Common Objects in Context (COCO) dataset and during transfer training, parts of the network are frozen, meaning their parameter are not updated during training.
  • Data augmentation methods are techniques used to increase the size of a limited dataset by adding modified copies of the data. Many augmentation techniques have been applied to image datasets in the literature, such as geometric transformations (e.g., stretching, skewing), flipping, color changes, cropping, rotation, translation, noise injection, random erasing, blurring, and more. Not all data augmentation techniques may apply to every dataset in every domain. Augmentations could reflect real varieties found in a data distribution.
  • Feature pyramid networks are a set of deep CNNs which construct features at different scales while keeping computation feasible. Feature pyramids are an important component in detection systems that facilitate the recognition of objects at different scales. The main objective of feature pyramids in a model is to allow a neural network to learn high to low-level features and independently make predictions at each level.
  • the objective of the object detection model is to localize and classify objects within images.
  • two performances are typically evaluated, the placement of the bounding box around the object, and the classification accuracy of said bounding box.
  • the most common performance evaluation metrics used herein are the Average Precision (AP) and Mean Average Precision (mAP), both of which consider correct, missed and false predictions in their respective calculation.
  • the mAP is the primary metric used to measure a detector’s accuracy over all the object categories in a dataset.
  • the mAP is found dependent on the Intersection over Union (loU) threshold chosen since it determines when a prediction is considered correct.
  • the Pascal VOC AP metric also known as AP50, is the mAP calculated at an loU threshold of 0.5.
  • the COCO mAP metric is the average of mAPs with loU thresholds in the range of [0.5:0.05:0.95]. Comparing the AP50 to the COCO mAP provides valuable insights into the performances of the classification and bounding box placement tasks individually, as a high ap50 and a low mAP suggest that object are correctly but imprecisely detected.
  • Skeletonization produces a compact representation of objects in images by reducing them to their medial axis, effectively transforming shapes to curves of a 1 - pixel thickness while preserving their connectivities.
  • Figure 9 presents an example of distillation column skeletonizations. Imperfections in skeletonization can be observed when applying it to unit operations.
  • skeletonization facilitates the application of a graph search algorithm through a rule-based approach.
  • large amounts of valuable and diverse data for training, testing, and validation were used.
  • As flowsheet digitization represents a gap in current literature, inventors further introduce a novel categorization based on visual and functional features with examples. Process flow diagrams were retrieved by applying the flowsheet recognition algorithm.
  • the algorithm downloads all full text papers from a given source and extracts all images from said source. Then, a CNN classifier decides whether each figure is a flowsheet, or not. Inventors applied the algorithm to diverse sources, such as a number of journals, process engineering education books, and retrieved about one thousand flowsheets. Very few figures were wrongly classified as flowsheets, which is in accordance with the high accuracy of the algorithm. The diversity in data is found imperative, as ML models regularly fail to extrapolate outside their trained data distribution, meaning the object detection algorithm would fail to properly detect unseen ways of illustration unit operations.
  • class decomposition within unit operation types was utilized to increase model performance and to create a more consistent dataset.
  • Class decomposition describes the method of splitting classes into different, more homogeneous sub-classes, decomposing the detection problem into a larger group of separate classes with similar topological characteristics. Such a technique can serve many benefits to supervised learning models by improving the class-to-instance association. Each sub-class exhibits more similar patterns within itself and more distinguishable patterns to other classes.
  • the class decomposition reasoning was based on two observations. Firstly, many classes contain clearly identifiable sub-classes of very different illustrations for the same equipment.
  • the category pump was sub-divided into different categories.
  • Another observation made on the flowsheets was that sub-classes could allow for more detailed information to be extracted from the data.
  • the unit operation categorization proposed in literature was a single valve, while inventors found a large variety of valves with different functionalities, such as control valves or check valves. Thus, further decomposing provided more information about used equipment.
  • the mined flowsheets, comprising actors, objects, nodes, and interactions, were labeled using domain expertise and contextual information.
  • the open- source graphical annotation tool Labelling was utilized. The quality of data provided to the object detection model is found to directly impact the predicting performance of the model. Thus, correct and consistent annotation of objects in the data are found important.
  • the used digitization approach may involve several distinct steps from an image to a graph representation.
  • an object detection model is used to detect unit operations, such as those of figure 1. Text as well as arrowheads indicating stream directions may be detected by a second object detection model.
  • the found bounding boxes of arrowheads and unit operations are filled before skeletonization is applied to facilitate skeletonization. With the skeletonized image and the locations of unit operations known, connectivity among unit operations are explored. In the following, inventors will discuss the steps unit operation detection, and stream recognition, in more detail.
  • Repeat factor sampling allows to train images with underrepresented categories more often to account for slower learning effects. Repeat factor training is especially important for our dataset as some unit operations are seldom found in literature, while others, such as heat exchangers or pumps, are naturally often present. Hence, without repeat factor sampling, an imbalance in performance can occur. Furthermore, to increase generalization, several augmentation techniques are applied during training. Thus, a set of applicable augmentation methods were identified, and the effect of data augmentation on the object detection model performance was investigated. Specifically, the techniques of flipping, adding noise, blurring, and repetition of rare objects were applied and studied.
  • the detection of unit operation is the first step in digitization scheme. After unit operations have been successfully detected, their bounding boxes are processed. Bounding boxes with significant overlap, measured in intersection over union, are compared and the one with the lower confidence score is removed. This is necessary as rarely the object detection algorithm detects objects twice with different categorization. Afterwards, detected unit operations with a confidence score lower than a threshold are converted to a category X, indicating a low confidence of the model. The flowsheet image is binarized and then reduced to one -pixel thin layers of object, allowing stream recognition. Once the PFD has gone through the first stage, the skeletonized flowsheet is prepared for the graph search algorithm. First, the skeletonized image is represented as a graph in which each pixel is a node.
  • each node has a maximum of 8 edges corresponding to the 8 neighboring pixels. Additionally, each node in the graph contains information on its color and whether it is inside an object bounding box or not.
  • the program checks for white pixel neighbors along the bounding box border, identifying possible paths. For each path, the algorithm traverses the graph along neighboring white pixels and continues the search. A graphical representation of this procedure is shown in Figure 10. A connection between two objects is established when the algorithm reaches a pixel belonging to a new unit operation. If the exploration reaches a dead end, it creates an ”In/ Out” stream object, indicating an incoming or outgoing stream of the process.
  • the algorithm moves to the next unit and repeats the search, storing information about all detected connections.
  • the graph search information is saved on the connections between unit operations.
  • the graph representation of the flowsheet is constructed using the NetworkX open-source Python package. A graph is created with each unit operating as a node and the streams between them as directed edges. Each edge and node in the graph allows for adding attributes, such as associated text and operating conditions and can be handled for further processing
  • the present inventors make use of a transformer-model architecture and decoding strategies used for text generation in natural language processing (NLP). Furthermore, it recaps the used flowsheet representations, namely flowsheet graphs and the SFILES 2.0 notation. The latter is used to represent the flowsheet data in a text-based manner in order to enable using NLP models. Transformer-based models increased the performance in several benchmark tasks and also show successful applications beyond the human language.
  • Text may be processed as a sequence of tokens, whereby the tokens are either words or other chunks of the input sequence.
  • Tokenization is typically the first text processing step in NLP and follows a tokenization strategy. After to- kenizing the input sequence, each token is converted to a vector by using a learned numerical embedding.
  • the decoder uses the encoder’s output and the previously generated outputs to compute the output probabilities for the next token.
  • Each encoder layer contains two sub-layers with subsequent layer normalization.
  • Each decoder layer contains three sub-layers with subsequent layer normalization. Since recurrent components are completely removed in the Transformer architecture, before input and output embeddings are passed to the encoder and decoder, respectively, positional encoding is applied.
  • Positional encoding ensures that the information of the order of tokens in the sequence is taken into account.
  • the core components of the Transformer architecture are the attention sub-layers.
  • the calculation of attention takes a query vector q, key vector k, and value vector v for each input token and compares all queries against all keys resulting in scores for query-key compatibility.
  • the compatibility scores are then used as weights to calculate the attention output as a weighted sum of the values.
  • the attention is computed for all inputs of an input sequence in parallel, putting together all query, key, and value vectors in the query matrix Q, key matrix K, and value matrix V. This finally yields a matrix as attention output.
  • multi-head attention is used as self-attention layers in the encoder, as masked self-attention in the decoder, and as encoder-decoder attention to combine the vector embedding of the encoder with the previous decoder outputs.
  • self-attention means that query, key, and value matrices are calculated from the same input sequence. Therefore, the computed attention represents each token and its meaning in the sequence.
  • Self-attention in the encoder considers both the left and right context of each token (bidirectional). Contrary, in the case of masked self-attention in the decoder, only the left context is used, meaning that subsequent positions of each token are masked out (unidirectional).
  • Each decoder layer consists of a masked multi-head self-attention sub-layer and a feed-forward sub-layer. Since the encoder is left out, the encoder-decoder attention sub-layer is left out, too.
  • Several decoding strategies may be used.
  • FIG 11 relating to a simple chemical process flowsheet with branchings, recycle stream, and different mass trains.
  • figure 12 is obtained, being a Graph representation of flowsheet in Figure 11.
  • Two consecutive unit operations in the string imply a normal stream connection.
  • all but the last branch are noted in brackets.
  • Recycles are noted by using numbers # to reference the recycle start node and ⁇ # to reference the recycle end node.
  • tags in braces are used to indicate whether the branch is a top or bottom product.
  • the second branch is inserted in the string, surrounded by ⁇ &
  • Multi-stream heat exchangers are separated in one node per stream compartment and marked with a number in braces, capturing which streams are heat integrated.
  • Initialization Feed(s); Reaction; Thermal separation (distillation, rectification); Countercurrent separation (absorption, extraction); Filtration (gas, liquid); Centrifugation; and End: Purification.
  • the last three blocks relate to a procedure for multiple branches.
  • the block represent from left to right: Initialize graph with feed(s); First subprocess category + pattern in category; Next subprocess category for each stream + pattern in category; and Purification of stream Optional: random heat integration or recycle.
  • the selection of the first sub-process, excluding purification is a Markov transition with fixed probabilities (transition probabilities do not depend on previous unit operations).
  • a set of patterns (not shown here) specifying how the inlet and outlet stream(s) are processed, e.g., with additional temperature or pressure change unit operations.
  • the sub-processes lead to several outlet streams, in the following referred to as branches. For each branch, we transition to the "Next sub -process" state followed by a Markov transition to the next sub-process. This selection differs from the first sub-process selection by the additional purification sub-process. Note that once a branch reaches the purification step, it is determined to end as a product. After each branch ended in the purification step, the flowsheet graph generation is complete.
  • Figures 14-15 show a completed flowsheet using beam search.
  • Figure 16 schematically illustrates the auto-completion of flowsheets using the Generative Flowsheet Transformer. Inventors achieve this by specifying an input sequence in SFILES 2.0 that represents the incomplete flowsheet and pass it to the Generative Flowsheet Transformer which auto-completes the sequence in SFILES 2.0 language.
  • the completed flowsheets correspond to the completed SFILES 2.0 sequences with the Generative Flowsheet Transformer.
  • Figures 17-21 show completed flowsheets using top-p sampling.
  • Table 1/Fig. 22 shows exemplary Unit operations and abbreviations in SFILES 2.0.

Abstract

The present invention is in the field of physical processes, chemical processes, biologi- cal processes, and microbiological processes in general, apparatuses for such processes, such as for boiling, for separation, for mixing, for dissolving, for reacting, for controlling, and in particular a process comprising a plurality of such apparatuses and processes or process steps, as well as the interaction between said apparatuses and processes or process steps, such as in terms of flows of chemicals between apparatuses. To indicate such general flow aspects a process flow diagram may be used. The process flow diagram displays the relation- ship between major equipment of a plant facility and does not show minor details.

Description

Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning
FIELD OF THE INVENTION
The present invention is in the field of physical processes, chemical processes, biological processes, and microbiological processes in general, apparatuses for such processes, such as for boiling, for separation, for mixing, for dissolving, for reacting, for controlling, and in particular a process comprising a plurality of such apparatuses and processes or process steps, as well as the interaction between said apparatuses and processes or process steps, such as in terms of flows of chemicals between apparatuses. To indicate such general flow aspects a process flow diagram may be used. The process flow diagram is aimed to visually display the relationship between major equipment of a plant facility and does not show minor details.
BACKGROUND OF THE INVENTION
In the representation of physical processes, chemical processes, biological processes, and microbiological processes in general, apparatuses for such processes, and the interaction between said apparatuses and processes or process steps, process flow diagrams may be used. A first step leading to a construction of a process plant and its use in the manufacture of a product is typically the conception of a process, typically involving process steps. The process concept may then be visualized by a process flow diagram, representing the process steps, and main details thereof, or likewise, a method of producing. Process design can then then proceed on the basis of the process flow diagram chosen. Therein also physical properties of the apparatuses are incorporated. Fig. 1 shows some typical elements and symbols used. The elements of such flow diagrams, as well as aspects thereof, such as implementation, typically comply with one or more of the following standard: ISO 15519-l:2010(en): Specification for diagrams for process industry — Part 1: General rules; ISO 15519- 2:2015(en): Specifications for diagrams for process industry — Part 2: Measurement and control; ISO 10628- l:2014(en): Diagrams for the chemical and petrochemical industry — Part 1: Specification of diagrams; ISO 10628-2:2012(en): Diagrams for the chemical and petrochemical industry — Part 2: Graphical symbols; ANSI Y32.l l: Graphical Symbols For Process Flow Diagrams (withdrawn 2003); and SAA AS 1109: Graphical Symbols For Process Flow Diagrams For The Food Industry. These process flow diagrams may be used to perform steady-state and non- steady- state heat and mass balancing, sizing and costing calculations, such as for a chemical process. It is considered an essential and core part of process design. Therein nowadays a computer or the like is used, in particular for supporting the calculations, and hence process design. Typical steps in process design are an initial step, which may be referred to as synthesis, a step for optimizing the process design, which may involve heat and material balance, sizing of process equipment, and cost calculations, and a control step for assessing topics as safety, operability, and a final step, wherein the process design or parts thereof are further optimized in view of a previous step. In optimization structural [physical] elements of the process design can be optimized, as well as particular setting in the process, such as parameters, e.g. temperature, pressure, flow rate, density, etc., in particular in view of interaction between process steps and apparatuses involved. Initially one could change a selection of the apparatus(es) involved, and then one could change the values of parameters, such as temperature and pressure. Parameter optimization is considered to be a more advanced stage. As mentioned, process flow diagrams play an important role in process design.
Typically, process flow diagrams of a process may include various elements, such as operational parameter data (see above), references to a mass balance, major equipment items, connections with other systems, identifications, such as process stream names, process piping, and major bypass and recirculation (recycle) streams. The typically do not include minor elements, such as minor bypass lines, instrumentation and details thereof, controllers like level or flow controllers, pipe classes or piping line numbers, isolation and shutoff valves, maintenance vents and drains, relief and safety valves, and flanges, though this is not a general rule. Process flow diagrams of multiple process units, within a large industrial plant, may as a consequence of the size and complexity usually contain less detail.
Nowadays a process flow diagram can be computer generated, such as from process simulators, using CAD packages, or using flow chart software using a library of chemical engineering symbols. Rules and symbols are available from standardization organizations such as DIN, ISO or ANSI as mentioned above. In view of complexity of a typical process, process flow diagrams may be produced on large sheets of paper. However, many non-digit- ized versions of process flow diagrams still exist, and often these are used in valuable and critical processes. Process flow diagrams of many commercial processes can be found in literature, specifically in encyclopedias of chemical technology, although some might be outdated. More recent ones can be found on-line. Typically these process flow diagrams relate to a pixel-oriented diagram, that is, wherein the diagram is present as an image as such, without the details of the image being incorporated as separate items or the like. In other words, the meaning of or information relating to various elements in the image in the real world do not form part of the image; as mentioned, often the diagrams are not even digitized at all. Also digitization of small elements in such diagrams may form a problem. Although promising results have been reported from previous studies, some shortcomings of prior research also becomes apparent. Firstly, all machine learning models (ML) in literature are typically trained on data sets from a single source, mostly a company cooperating with researchers, or even on synthesized data sets. Unsurprisingly, the accuracy of such models is near perfection, as the data exhibits little variation. It needs be acknowledged that retrieving piping and instrumentation diagrams (P&IDs) is not trivial, as companies naturally rarely publish their documentation. It is however doubtful that such models would generalize well to other data distributions, for instance diagrams generated with other CAD editors, making developed digitization approaches very isolated niche solutions. Secondly, most symbol data sets only consist of few categories, not reflecting the variety of equipment used in process industries. As a consequence of single source data sets, few different symbols are categorized, leading to a lack of a complete symbol categorization. Thirdly, the amount of data used for training is not reflecting the data driven nature of deep learning (DL) models. DL models are commonly trained on big data. Many DL approaches for P&IDs however rely on very little data with less than a hundred diagrams. Again, a possible explanation for this issue is the lack of publicly available data, combined with the time consuming nature of labeling such diagrams. Lastly, while there has been made considerable effort towards the task of digitizing P&IDs, to the best of our knowledge DL powered digitization approaches have not been applied to process flow diagrams (PFDs).
WO 2021/145138 Al recites a display device which acquires data on a plurality of devices installed in a facility, and stores, in an associated data memory unit, corresponding relationships between the devices installed in the facility and components in drawing data in which the devices installed in the facility are drawn as the components. Further, the display device, upon receiving a designation of a specific component among the plurality of components in the drawing data, selects a specific device corresponding to the specific component using the corresponding relationships stored in the associated data memory unit, and displays the data on the specific device and data on a device group having a causal relationship with the specific device in association with each other. The document may be considered as an example of the prior art identified above, showing some of the basic concepts for digitization in a rathe basic form.
So analyzing process flow diagrams in terms of e.g. functionality, digitally communicating process flow diagrams, making flow diagrams, appear to be in a stage wherein room for improvement is present.
The present invention relates to an improved system and method for analyzing a (chemical) process and providing a digitized set-up which overcomes one or more of the above disadvantages, without jeopardizing functionality and advantages.
SUMMARY OF THE INVENTION
The present invention relates in a first aspect to system for analyzing a chemical process, which system in principle can be used for any process, comprising a computer memory provided with digital representation of a directed graph representation of the chemical process, the graph representation comprising elements selected from apparatuses, flow modifiers, devices, process steps, flows, pipelines, signal lines, pressure regulators, temperature regulators, concentration regulators, chemical species regulators, controllers, and combinations thereof, and interactions between these elements, and a data processor provided with a computer program which, when running on the data processor, -provides trained machine learning, which is trained using a selection of a training dataset comprising directed graph representations of chemical processes and/or string representations of the directed graphs (such as SFILES) and resulting directed graphs and nodes representing elements, and annotated versions thereof, and as this typically is the training data set of the object detection algorithm, it typically includes the location of the objects on the image, e.g., through a bounding box, or a pixel-based mask, and the type of equipment, provides the digital representation, which may be regarded as an image, of the directed graph representation of the chemical process in the computer memory as input to the trained machine learning, and the trained machine learning providing in the computer memory the chemical process as directed graph with nodes and edges, which may be considered interconnections between nodes, defining the elements. Basically, a bounding box may be considered a box, the mask may be considered a flexible form based on pixels. So one can cut out objects accurately. In particular object detection architecture, object detection performance metrics, and skeletonization, are used. Therewith a system is provided which solves one or more of the above disadvantages. The present system, and likewise method, provide a system that detects unit operations and their connectivity in process flowsheets, such as chemical process flows. A directed graph is made therefrom. Therewith a full digitization is provided. The graph can be read automatically into a process simulation, such as process simulation software. A model of the graph can be created automatically. The graph may be considered as a knowledge graph. In the process of making the graph certain elements may be cut out, such as by using a mask, in particular for cutting out unit operations. A neural network or the like may be used, in particular for learning. In addition auto-completion of to be made graphs, such as of chemical flowsheets, is provided. Therein reinforcement learning and graph representation may be used. A suitable programming environment is Python. No graphical user interface is required. The graph results are found to be more accurate compared to prior art methods, and also more meaningful, that is representing the real environment better. It is also found to scale better.
The contribution of this invention is considered manifold. Firstly, inventors developed an extensive catalogue of unit operations in PFDs. As PFDs are only loosely based on a common illustration convention, inventors categorized symbols for unit operations based on their functionality as well as their appearance. Secondly, inventors collected and annotated a large PFD dataset. Inventors mined over 1,000 flowsheets from various sources including scientific literature. Thirdly, inventors developed object detection models that can identify unit operations in PFDs. The present system may be based on a state-of-the-art Faster R- CNN architecture, or a Mask R-CNN architecture. The present results show that the proposed system has competitive performance on the diverse data set. Lastly, inventors improved a pixel-based search algorithm to the specifics of PFD illustrations, such as different stream intersection illustrations and text in unit operations.
In a second aspect the present invention relates to a method of providing a digitized process set-up, the digitized process set-up with a sequence of at least two process steps, which sequence may be a linear sequence or a circular sequence or multiple cycles or a combination thereof, wherein the at least two process steps are selected from a chemical process step, a physical process step, a biological process step, and a micro-biological process step, in particular wherein process steps are selected from heating, cooling, flowing, reacting, mixing, contacting, depositing, annealing, separating, adding, removing, filtering, crystallizing, phase-separating, distilling, oxidizing, reducing, hydrogenating, de- hydrogenating, polymerizing, poly-condensing, esterifying, alkylating, de- alkylating, aminating, halogenating, sulfonating, nitrifying, de-hydrating, hydrolysing, and melting, comprising optically reading an image of a process set-up, digitizing said optically read process set-up forming a digitized image, which typically comprises pixels, using artificial intelligence, making a directed graph of the digitized image of the process set-up, the directed graph comprising a plurality of unique nodes and at least one [biological-]physical-chemical interaction between each first node and each second node of the plurality of nodes, and optionally at least one direction of said interaction, such as shown in figs. 2a-2d, wherein each node individually is selected from an end node, an intermediate node, and an intersection node, using artificial intelligence, identifying at least one physical object to each node in the directed graph, using artificial intelligence, identifying at least one process path, which may be referred to as interaction, or edge, or connection, between each first node and each second node of the plurality of nodes, and using rule-based ontology, in particular rule-based ontology obtained from a data model, such as ONTOCAPE, supplementing (also referred to as enriching) the directed graph of the digitized process set-up with the at least one process path and identified objects, or vice versa, in particular wherein the process is a chemical process.
In a third aspect the present invention relates to a use of the digitized process set-up for optimizing the process set-up, for forming a digital twin of the process set-up, for linking the process set-up to operational data, or for building a model of the process set-ups.
In a further aspect the present system may comprise instructions for carrying out the present method.
Thereby the present invention provides a solution to one or more of the above mentioned problems.
The present invention is also a topic of to be published scientific papers, entitled “Digitization of chemical process flowsheets using computer vision on big data” and “LEARNING FROM FLOWSHEETS: A GENERATIVE TRANSFORMER MODEL FOR FLOWSHEET COMPLETION”, which reference and its content is incorporated by reference.
Advantages of the present description are detailed throughout the description. References to the figures are not limiting, and are only intended to guide the person skilled in the art through details of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention relates in a first aspect to a system for analyzing a chemical process.
In an exemplary embodiment of the present method in the process set-up objects are localized and classified, such as a unit operation, an arrow, an intersection, a control unit, and text. An example of such a process flow is given in fig. 3.
In an exemplary embodiment of the present method artificial intelligence is based on a convolutional neural network or a neural network with a transformer architecture. An example of such a process flow is given in fig. 3.
In an exemplary embodiment of the present method artificial intelligence is trained on labelled data. An example of such a process flow is given in fig. 3. In an exemplary embodiment of the present method each node is provided with supplementary actors, wherein the supplementary actors are selected from chemical species, pressure, temperature, flow, concentration, controls, reactant, catalyst, product, pH values, composition, physical or chemical states, enzyme, biological species, nucleic acid sequence or part thereof, . An example is given in fig. 4.
In an exemplary embodiment of the present method based on the obtained directed graph or supplemented directed graph a layout of the process set-up is made comprising physical objects, the physical objects selected from apparatuses, in particular wherein apparatuses are selected from a tank, a column, a reflux, a reboiler, a boiler, a controller, a valve, a cooler, a mixer, a heater, a heat exchanger, a furnace, a filter, a mixer, a splitter, a phase separator, a absorber, a flash unit, a reactor, a pump, a flow controller, a compressor, a filter, a splitter, and a vessel. An example is given in fig. 5.
In an exemplary embodiment of the present method based on the obtained directed graph or supplemented directed graph a layout of the process set-up is made comprising chemical objects, the chemical objects selected from chemical species, catalysts, solvents, inert species, reactants, carriers, stabilizers, buffers, intermediate products, non-reactants, oxidants, and reductants. An example is given in fig. 6.
In an exemplary embodiment of the present method the directed graph is supplemented with a standard process model. An example is given in fig. 7.
In an exemplary embodiment of the present method nodes or actors are auto-com- pleted. An example is given in fig. 8.
A novel method is provided to learn from (chemical) process flowsheets and provide flowsheet structure recommendations, such as for engineers performing process synthesis. For example, the method may recommended one or multiple process nodes, their connectivity, and attributes or alternative process topologies (referred to as “auto- completion” or “auto-correction”). In this respect inventors created two data sets, the first one consisting of synthetically generated and the second one consisting of real flowsheets in graph format. Using the conversion algorithm for the automated conversion between flowsheet graphs and SFILES 2.0 strings, inventors automatically generated the corresponding text-based SFILES 2.0 data sets. The present inventors pre-trained a generative Transformer language model on the data set of synthetically generated flowsheets and fine-tuned it on the data set of real flowsheets. The trained generative Transformer model is capable of learning the grammatical structure of the SFILES 2.0 language and the patterns contained in the flowsheet topologies. Consequently, the results demonstrate that using the trained model for causal language modelling is a strategy to auto-complete flowsheet topologies. The input of the machine learning model may be a graph or string representation of a flowsheet. The output of the machine learning model may be a graph or string representation of a flowsheet or a part of a flowsheet. Using beam search as the decoding strategy yields the highest probability flowsheet completion. On the other hand, if more diverse flowsheet recommendations are preferred, the top-p sampling decoding strategy is a promising addition to beam search.
The invention is further detailed by the accompanying figures and examples, which are exemplary and explanatory of nature and are not limiting the scope of the invention. To the person skilled in the art, it may be clear that many variants, being obvious or not, may be conceivable falling within the scope of protection, defined by the present claims.
SUMMARY OF FIGURES
Figures 1, 2a-d, and 3-22 show aspects of the present invention.
DETAILED DESCRIPTION OF FIGURES
Figure 1 shows examples of symbols typically used in process flow diagrams.
Fig. 2a shows a non-limitative example of a process flow diagram, having to a certain extent arbitrary elements shown therein. Figure 2b shows a graph representing the process flow diagram of fig. 2a. Figure 2c shows a fully digitized process flow diagram, according to the graph of fig. 2b, and the process flow diagram of fig. 2a. Figure 2d shows schematically the method of providing a digitized process set-up, the digitized process set-up with a sequence of at least two events, wherein the at least two events are selected from a chemical event, a physical event, a biological event, and a micro-biological event. The process starts with the process flow diagram of fig. 2a, which is digitized. In the process of digitization objects represented in the process flow diagram of fig. 2a are detected. Further a flow path is explored, such that a graph can be made, in particular the graph of fig. 2b. Then the graph of fig. 2b is supplemented or enriched with the elements of fig. 2a, and optional further elements, wherein the elements are selected from physical-chemical interaction between each first node and each second node of the plurality of nodes, from objects within the figure 2a, parameters, etc. as is explained throughout the description and claims. Figure 3 shows an exemplary further process flow diagram. It is an objective of the present invention to localize and classify objects in flow diagrams, such as unit operations, arrows, intersections, and text, to use a deep learning model, which may be based on convolutional neural networks, to further supplement the process flow diagram, and to use a supervised learning approach, such as wherein the model is trained on labeled data, such as that of figure 3.
Figure 4 shows a complex process flow diagram which is digitized through computer vision, according to the invention.
Figures 5-6 show a use of an advanced model with a mask, in addition to the present method or system. The advanced model can identify a pixel-based mask for each object detected. The advanced model may be based on a Mask R-CNN architecture. Therewith basically unit operations are cut out more accurately than in a bounding box approach.
Also, typically it learns better, such as with less data.
Figure 7 shows automatic generation of UniSim models from process flow diagrams.
Figure 8 shows auto-completion of an exemplary process flow diagram. Starting with chemical species Fh and CO2, which are in a first step mixed (dashed oval) the present system provides suggestions for addition of a next step, apparatus, parameters, etc. (dashed-dotted oval). In addition thereto, or as an alternative, typically used elements are provided as optional selections at the right hand side of the screen (dashed dark oval). A user may select items from the pictograms on the right.
The invention although described in detailed explanatory context may be best understood in conjunction with the accompanying figures.
Experiment
The below is an example of how the invention could be implemented in practice. Fig. 2a shows an example flowsheet of a proposed Cumene production plant. The illustration was slightly altered, the flow structure however is kept. Via a procedure known as information extraction inventors automatically retrieved information of the chemical process representation in structured formats from unstructured data through several different methods.
Specifically, an introduction is given to object detection architectures, object detection performance measurement and skeletonization.
For object detection a distinction can be made between one-stage and two-stage detectors. Two -stage detectors contain a model that determines regions of interest with high probabilities of containing objects and a second model that classifies found regions of interest. On the other hand, one-stage detectors consist of a single network model that simultaneously predicts bounding boxes and classifications. Transfer learning refers to the improvement of model learning in one task by transferring knowledge from a related, previously learned task. With transfer learning, a model can initiate the training process on new data distributions with pre-trained weights, shortening training time and possibly leading to superior performance due to convergence to better optima. Backbone models in detection models are usually pre-trained on large datasets such as the ImageNet classification challenge dataset or the Common Objects in Context (COCO) dataset and during transfer training, parts of the network are frozen, meaning their parameter are not updated during training. Data augmentation methods are techniques used to increase the size of a limited dataset by adding modified copies of the data. Many augmentation techniques have been applied to image datasets in the literature, such as geometric transformations (e.g., stretching, skewing), flipping, color changes, cropping, rotation, translation, noise injection, random erasing, blurring, and more. Not all data augmentation techniques may apply to every dataset in every domain. Augmentations could reflect real varieties found in a data distribution. Feature pyramid networks (FPN) are a set of deep CNNs which construct features at different scales while keeping computation feasible. Feature pyramids are an important component in detection systems that facilitate the recognition of objects at different scales. The main objective of feature pyramids in a model is to allow a neural network to learn high to low-level features and independently make predictions at each level.
The objective of the object detection model is to localize and classify objects within images. Thus, two performances are typically evaluated, the placement of the bounding box around the object, and the classification accuracy of said bounding box. The most common performance evaluation metrics used herein are the Average Precision (AP) and Mean Average Precision (mAP), both of which consider correct, missed and false predictions in their respective calculation. The mAP is the primary metric used to measure a detector’s accuracy over all the object categories in a dataset. The mAP is found dependent on the Intersection over Union (loU) threshold chosen since it determines when a prediction is considered correct. The Pascal VOC AP metric, also known as AP50, is the mAP calculated at an loU threshold of 0.5. The COCO mAP metric, known simply as mAP, is the average of mAPs with loU thresholds in the range of [0.5:0.05:0.95]. Comparing the AP50 to the COCO mAP provides valuable insights into the performances of the classification and bounding box placement tasks individually, as a high ap50 and a low mAP suggest that object are correctly but imprecisely detected.
Skeletonization produces a compact representation of objects in images by reducing them to their medial axis, effectively transforming shapes to curves of a 1 - pixel thickness while preserving their connectivities. Figure 9 presents an example of distillation column skeletonizations. Imperfections in skeletonization can be observed when applying it to unit operations. In the digitization of PFDs, skeletonization facilitates the application of a graph search algorithm through a rule-based approach. In the development of efficient ML algorithms through supervised learning methods large amounts of valuable and diverse data for training, testing, and validation were used. As flowsheet digitization represents a gap in current literature, inventors further introduce a novel categorization based on visual and functional features with examples. Process flow diagrams were retrieved by applying the flowsheet recognition algorithm. The algorithm downloads all full text papers from a given source and extracts all images from said source. Then, a CNN classifier decides whether each figure is a flowsheet, or not. Inventors applied the algorithm to diverse sources, such as a number of journals, process engineering education books, and retrieved about one thousand flowsheets. Very few figures were wrongly classified as flowsheets, which is in accordance with the high accuracy of the algorithm. The diversity in data is found imperative, as ML models regularly fail to extrapolate outside their trained data distribution, meaning the object detection algorithm would fail to properly detect unseen ways of illustration unit operations.
Inventors defined main unit operations in chemical processes, and extended further on to incorporate equipment types and different illustrations. Additionally, class decomposition within unit operation types was utilized to increase model performance and to create a more consistent dataset. Class decomposition describes the method of splitting classes into different, more homogeneous sub-classes, decomposing the detection problem into a larger group of separate classes with similar topological characteristics. Such a technique can serve many benefits to supervised learning models by improving the class-to-instance association. Each sub-class exhibits more similar patterns within itself and more distinguishable patterns to other classes. In the context of PFD digitization, the class decomposition reasoning was based on two observations. Firstly, many classes contain clearly identifiable sub-classes of very different illustrations for the same equipment. As an example, the category pump was sub-divided into different categories. Another observation made on the flowsheets was that sub-classes could allow for more detailed information to be extracted from the data. For example, the unit operation categorization proposed in literature was a single valve, while inventors found a large variety of valves with different functionalities, such as control valves or check valves. Thus, further decomposing provided more information about used equipment. The mined flowsheets, comprising actors, objects, nodes, and interactions, were labeled using domain expertise and contextual information. The open- source graphical annotation tool Labelling was utilized. The quality of data provided to the object detection model is found to directly impact the predicting performance of the model. Thus, correct and consistent annotation of objects in the data are found important. In order to accelerate the annotation process, a semi -automation was employed. With a first batch of data, a preliminary model was trained and used for interference on unannotated data to create annotations. These were then corrected and used for further training of the model. Inventors found that this approach greatly accelerates the process of annotation, as the model quickly learns to detect the most common unit operations and human correction is only rarely necessary for more uncommon objects.
The used digitization approach may involve several distinct steps from an image to a graph representation. First, an object detection model is used to detect unit operations, such as those of figure 1. Text as well as arrowheads indicating stream directions may be detected by a second object detection model. The found bounding boxes of arrowheads and unit operations are filled before skeletonization is applied to facilitate skeletonization. With the skeletonized image and the locations of unit operations known, connectivity among unit operations are explored. In the following, inventors will discuss the steps unit operation detection, and stream recognition, in more detail.
Various information are encoded in flowsheets. Apart from unit operations, there may be important information contained in text and arrows as well. In total, inventors trained two separate object detection models for different tasks: (1) detection of unit operations and unknown units, (2) detection of arrows, path intersections, and text. For object detection, the Faster R-CNN architecture was used. The choice of a backbone model is hereby one of the most crucial decisions for performance. Inventors used three different backbone models, which mostly differ in their architecture deepness. Pretraining the backbone model, even though on an unrelated dataset, typically increases model performance as the backbone model will learn to extract distinct features. This will help convergence on a flowsheet dataset even with a limited number of flowsheets. To account for imbalance among categories in the dataset, repeat factor sampling is applied. Repeat factor sampling allows to train images with underrepresented categories more often to account for slower learning effects. Repeat factor training is especially important for our dataset as some unit operations are seldom found in literature, while others, such as heat exchangers or pumps, are naturally often present. Hence, without repeat factor sampling, an imbalance in performance can occur. Furthermore, to increase generalization, several augmentation techniques are applied during training. Thus, a set of applicable augmentation methods were identified, and the effect of data augmentation on the object detection model performance was investigated. Specifically, the techniques of flipping, adding noise, blurring, and repetition of rare objects were applied and studied.
The detection of unit operation is the first step in digitization scheme. After unit operations have been successfully detected, their bounding boxes are processed. Bounding boxes with significant overlap, measured in intersection over union, are compared and the one with the lower confidence score is removed. This is necessary as rarely the object detection algorithm detects objects twice with different categorization. Afterwards, detected unit operations with a confidence score lower than a threshold are converted to a category X, indicating a low confidence of the model. The flowsheet image is binarized and then reduced to one -pixel thin layers of object, allowing stream recognition. Once the PFD has gone through the first stage, the skeletonized flowsheet is prepared for the graph search algorithm. First, the skeletonized image is represented as a graph in which each pixel is a node. In this graph, each node has a maximum of 8 edges corresponding to the 8 neighboring pixels. Additionally, each node in the graph contains information on its color and whether it is inside an object bounding box or not. Starting from a unit operation, the program checks for white pixel neighbors along the bounding box border, identifying possible paths. For each path, the algorithm traverses the graph along neighboring white pixels and continues the search. A graphical representation of this procedure is shown in Figure 10. A connection between two objects is established when the algorithm reaches a pixel belonging to a new unit operation. If the exploration reaches a dead end, it creates an ”In/ Out” stream object, indicating an incoming or outgoing stream of the process. Once all the outgoing paths from a unit operation are explored, the algorithm moves to the next unit and repeats the search, storing information about all detected connections. After the graph search, information is saved on the connections between unit operations. Finally, the graph representation of the flowsheet is constructed using the NetworkX open-source Python package. A graph is created with each unit operating as a node and the streams between them as directed edges. Each edge and node in the graph allows for adding attributes, such as associated text and operating conditions and can be handled for further processing
For auto-completion the following example is given. It is noted that the subject matter of the present system and method and the auto-completion may overlap, and therefore that elements of these embodiments may be combined.
The present inventors make use of a transformer-model architecture and decoding strategies used for text generation in natural language processing (NLP). Furthermore, it recaps the used flowsheet representations, namely flowsheet graphs and the SFILES 2.0 notation. The latter is used to represent the flowsheet data in a text-based manner in order to enable using NLP models. Transformer-based models increased the performance in several benchmark tasks and also show successful applications beyond the human language. Text may be processed as a sequence of tokens, whereby the tokens are either words or other chunks of the input sequence. Tokenization is typically the first text processing step in NLP and follows a tokenization strategy. After to- kenizing the input sequence, each token is converted to a vector by using a learned numerical embedding. Putting together all inputs’ vectors yields a matrix, called input embedding in the following, which can be processed by the NLP model. In a further example the original Transformer architecture is a neural sequence translation model consisting of an encoder stack of N = 6 identical layers and a decoder stack of N = 6 identical layers in sequence. The decoder uses the encoder’s output and the previously generated outputs to compute the output probabilities for the next token. Each encoder layer contains two sub-layers with subsequent layer normalization. Each decoder layer contains three sub-layers with subsequent layer normalization. Since recurrent components are completely removed in the Transformer architecture, before input and output embeddings are passed to the encoder and decoder, respectively, positional encoding is applied. Positional encoding ensures that the information of the order of tokens in the sequence is taken into account. The core components of the Transformer architecture are the attention sub-layers. The calculation of attention takes a query vector q, key vector k, and value vector v for each input token and compares all queries against all keys resulting in scores for query-key compatibility. The compatibility scores are then used as weights to calculate the attention output as a weighted sum of the values. In practice, the attention is computed for all inputs of an input sequence in parallel, putting together all query, key, and value vectors in the query matrix Q, key matrix K, and value matrix V. This finally yields a matrix as attention output. In the original architecture, multi-head attention is used as self-attention layers in the encoder, as masked self-attention in the decoder, and as encoder-decoder attention to combine the vector embedding of the encoder with the previous decoder outputs. Hereby, self-attention means that query, key, and value matrices are calculated from the same input sequence. Therefore, the computed attention represents each token and its meaning in the sequence. Self-attention in the encoder considers both the left and right context of each token (bidirectional). Contrary, in the case of masked self-attention in the decoder, only the left context is used, meaning that subsequent positions of each token are masked out (unidirectional). For decoder-only architecture for causal language modeling a GPT-2-like model architecture only containing a decoder stack is used. Each decoder layer consists of a masked multi-head self-attention sub-layer and a feed-forward sub-layer. Since the encoder is left out, the encoder-decoder attention sub-layer is left out, too. Several decoding strategies may be used.
For auto-completion the following example is given in Figure 11, relating to a simple chemical process flowsheet with branchings, recycle stream, and different mass trains. With the above method figure 12 is obtained, being a Graph representation of flowsheet in Figure 11. Two consecutive unit operations in the string imply a normal stream connection. In the case of a branching such as after a distillation column, all but the last branch are noted in brackets. Recycles are noted by using numbers # to reference the recycle start node and <# to reference the recycle end node. Furthermore, tags in braces are used to indicate whether the branch is a top or bottom product. In the case of converging branches, the second branch is inserted in the string, surrounded by <&| and &|. Multi-stream heat exchangers are separated in one node per stream compartment and marked with a number in braces, capturing which streams are heat integrated. In an example inventors subdivided flowsheets into the following subprocess categories; Initialization: Feed(s); Reaction; Thermal separation (distillation, rectification); Countercurrent separation (absorption, extraction); Filtration (gas, liquid); Centrifugation; and End: Purification.
As illustrated in Figure 13 the last three blocks relate to a procedure for multiple branches. The block represent from left to right: Initialize graph with feed(s); First subprocess category + pattern in category; Next subprocess category for each stream + pattern in category; and Purification of stream Optional: random heat integration or recycle. After initializing the flowsheet graph with raw materials, including feed preprocessing, the selection of the first sub-process, excluding purification, is a Markov transition with fixed probabilities (transition probabilities do not depend on previous unit operations). Within each sub-process, we further sample from a set of patterns (not shown here) specifying how the inlet and outlet stream(s) are processed, e.g., with additional temperature or pressure change unit operations. Also, we include design heuristics such as adding recycles, performing heat integration in reaction sub-process, or adding reactants. In general, the sub-processes lead to several outlet streams, in the following referred to as branches. For each branch, we transition to the "Next sub -process" state followed by a Markov transition to the next sub-process. This selection differs from the first sub-process selection by the additional purification sub-process. Note that once a branch reaches the purification step, it is determined to end as a product. After each branch ended in the purification step, the flowsheet graph generation is complete.
Figures 14-15 show a completed flowsheet using beam search. Figure 16 schematically illustrates the auto-completion of flowsheets using the Generative Flowsheet Transformer. Inventors achieve this by specifying an input sequence in SFILES 2.0 that represents the incomplete flowsheet and pass it to the Generative Flowsheet Transformer which auto-completes the sequence in SFILES 2.0 language. The completed flowsheets correspond to the completed SFILES 2.0 sequences with the Generative Flowsheet Transformer. Figures 17-21 show completed flowsheets using top-p sampling.
Table 1/Fig. 22 shows exemplary Unit operations and abbreviations in SFILES 2.0.
It should be appreciated that for commercial application it may be preferable to use one or more variations of the present system, which would similar be to the ones disclosed in the present application and are within the spirit of the invention.

Claims

1. A system for analyzing a chemical process, comprising: a computer memory provided with digital representation of a directed graph representation of the chemical process, the graph representation comprising elements selected from apparatuses, flow modifiers, devices, process steps, flows, pipelines, signal lines, pressure regulators, temperature regulators, concentration regulators, chemical species regulators, controllers, and elements thereof, and combinations thereof, and interactions between these elements, and a data processor provided with a computer program which, when running on the data processor,
-provides trained machine learning, which is trained using a selection of a training dataset comprising directed graph representations of chemical processes and/or string representations of the directed graphs and resulting directed graphs and nodes representing elements, and annotated versions;
-provides the digital representation of the directed graph representation of the chemical process in the computer memory as input to the trained machine learning, and
- the trained machine learning providing in the computer memory the chemical process as directed graph with nodes and edges defining the elements.
2. A method of providing a digitized process set-up, the digitized process set-up with a sequence of at least two process steps, wherein the at least two process steps are selected from a chemical process step, a physical process step, a biological process step, and a micro-biological process step, in particular wherein process steps are selected from heating, cooling, flowing, reacting, mixing, contacting, depositing, annealing, separating, adding, removing, filtering, crystallizing, phase-separating, distilling, oxidizing, reducing, hydrogenating, dehydrogenating, polymerizing, poly-condensing, esterifying, alkylating, de-alkylating, aminating, halogenating, sulfonating, nitrifying, de -hydrating, hydrolysing, and melting, comprising optically reading an image of a process set-up, digitizing said optically read process set-up forming a digitized image, using artificial intelligence, making a directed graph of the digitized image of the process set-up, the directed graph comprising a plurality of unique nodes and at least one bio- logical-physical-chemical interaction between each first node and each second node of the plurality of nodes, and optionally at least one direction of said interaction, wherein each node individually is selected from an end node, an intermediate node, and an intersection node, using artificial intelligence, identifying at least one physical object to each node in the directed graph, using artificial intelligence, identifying at least one process path between each first node and each second node of the plurality of nodes, and using rule-based ontology, in particular rule -based ontology obtained from a data model, supplementing the directed graph of the digitized process set-up with the at least one process path and identified objects, or vice versa, in particular wherein the process is a chemical process.
3. The method of providing a digitized process set-up according to claim 2, wherein in the process set-up objects are localized and classified, such as a unit operation, an arrow, an intersection, a control unit, and text.
4. The method of providing a digitized process set-up according to any of claims 2-3, wherein artificial intelligence is based on a convolutional neural network or a neural network with a transformer architecture.
5. The method of providing a digitized process set-up according to any of claims 2-4, wherein artificial intelligence is trained on labelled data.
6. The method of providing a digitized process set-up according to any of claims 2-5, wherein each node is provided with supplementary actors, wherein the supplementary actors are selected from chemical species, pressure, temperature, flow, concentration, controls, reactant, catalyst, product, pH values, composition, physical or chemical states, enzyme, biological species, nucleic acid sequence or part thereof.
7. The method of providing a digitized process set-up according to any of claims 2-6, wherein based on the obtained directed graph or supplemented directed graph a layout of the process set-up is made comprising physical objects, the physical objects selected from apparatuses, in particular wherein apparatuses are selected from a tank, a column, a reflux, a reboiler, a boiler, a controller, a valve, a cooler, a mixer, a heater, a heat exchanger, a furnace, a filter, a mixer, a splitter, a phase separator, a absorber, a flash unit, a reactor, a pump, a flow controller, a compressor, a filter, a splitter, and a vessel.
8. The method of providing a digitized process set-up according to any of claims 2-7, wherein based on the obtained directed graph or supplemented directed graph a layout of the process set-up is made comprising chemical objects, the chemical objects selected from chemical species, catalysts, solvents, inert species, reactants, carriers, stabilizers, buffers, intermediate products, non-reactants, oxidants, and reductants.
9. The method of providing a digitized process set-up according to any of claims 2-8, wherein the directed graph is supplemented with a standard process model.
10. The method of providing a digitized process set-up according to any of claims 2-9, wherein nodes and/or actors are auto-completed.
11. Use of the digitized process set-up for optimizing the process set-up, for forming a digital twin of the process set-up, for linking the process set-up to operational data, or for building a model of the process set-ups, in particular the digitized process set-up obtained by the method according to any of claims 2-10.
12. The system according to claim 1, comprising instructions for carrying out the method of any of claims 2-10.
13. The system according to claim 1 and/or the method according to any of claims 2-10, further comprising one or more elements according to the description, in particular according to the examples, more in particular using one or more of object detection architecture, object detection performance metrics, skeletonization, processing a bounding box, processing a mask, using a diverse variety of data sources, using data categorization, using data annotation, using labeling of objects, using labeling of actors, repeating one or more steps, unit operation detection, stream recognition, factor sampling, augmentation of objects and/or actors, using pixels, using artificial intelligence-assisted process synthesis, using a transformermodel architecture, using natural language processing, using decoding, tokenization, and numerical embedding.
PCT/NL2023/050385 2022-07-18 2023-07-17 Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning WO2024019617A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL2032523 2022-07-18
NL2032523A NL2032523B1 (en) 2022-07-18 2022-07-18 Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning

Publications (2)

Publication Number Publication Date
WO2024019617A2 true WO2024019617A2 (en) 2024-01-25
WO2024019617A3 WO2024019617A3 (en) 2024-02-29

Family

ID=84330923

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2023/050385 WO2024019617A2 (en) 2022-07-18 2023-07-17 Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning

Country Status (2)

Country Link
NL (1) NL2032523B1 (en)
WO (1) WO2024019617A2 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021145138A1 (en) 2020-01-14 2021-07-22 エヌ・ティ・ティ・コミュニケーションズ株式会社 Display device, display method, and display program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588514B2 (en) * 2015-01-26 2017-03-07 Fisher-Rosemount Systems, Inc. Commissioning field devices in a process control system supported by big data
US20220187818A1 (en) * 2019-03-25 2022-06-16 Schneider Electric Systems Usa, Inc. Automatic extraction of assets data from engineering data sources
US20200333772A1 (en) * 2019-04-18 2020-10-22 Siemens Industry Software Ltd. Semantic modeling and machine learning-based generation of conceptual plans for manufacturing assemblies

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021145138A1 (en) 2020-01-14 2021-07-22 エヌ・ティ・ティ・コミュニケーションズ株式会社 Display device, display method, and display program

Also Published As

Publication number Publication date
WO2024019617A3 (en) 2024-02-29
NL2032523B1 (en) 2024-01-26

Similar Documents

Publication Publication Date Title
Yu et al. Multi-task curriculum framework for open-set semi-supervised learning
US11625540B2 (en) Encoder, system and method for metaphor detection in natural language processing
Mani et al. Automatic digitization of engineering diagrams using deep learning and graph search
CN112395876B (en) Knowledge distillation and multitask learning-based chapter relationship identification method and device
Lopes et al. An AutoML-based approach to multimodal image sentiment analysis
CN112163429A (en) Sentence relevancy obtaining method, system and medium combining cycle network and BERT
CN113393370A (en) Method, system and intelligent terminal for migrating Chinese calligraphy character and image styles
CN114781392A (en) Text emotion analysis method based on BERT improved model
CN111553159B (en) Question generation method and system
US20230014904A1 (en) Searchable data structure for electronic documents
CN115130591A (en) Cross supervision-based multi-mode data classification method and device
CA3189344A1 (en) Explaining machine learning output in industrial applications
NL2032523B1 (en) Flowsheet digitization with computer vision, automatic simulation, and flowsheet (auto)completion with machine learning
Askarian et al. Data-based fault detection in chemical processes: Managing records with operator intervention and uncertain labels
Lind Knowledge acquisition and strategies for multilevel flow modelling
CN115563959A (en) Chinese pinyin spelling error correction-oriented self-supervision pre-training method, system and medium
Du et al. Unsupervised domain adaptation with unified joint distribution alignment
US20220091594A1 (en) Method and system to generate control logic for performing industrial processes
CN113297385A (en) Multi-label text classification model and classification method based on improved GraphRNN
EP2565799A1 (en) Method and device for generating a fuzzy rule base for classifying logical structure features of printed documents
CN116561814B (en) Textile chemical fiber supply chain information tamper-proof method and system thereof
CN113869349B (en) Schematic question-answering method based on hierarchical multi-task learning
Gao et al. OWFD-UCPM: An open-world fault diagnosis scheme based on uncertainty calibration and prototype management
Viljamaa et al. Transformer Networks in Gene Prediction
Fateh et al. Advancing Multilingual Handwritten Numeral Recognition With Attention-Driven Transfer Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745664

Country of ref document: EP

Kind code of ref document: A2