WO2025214888A1 - Determining properties of materials - Google Patents

Determining properties of materials

Info

Publication number
WO2025214888A1
WO2025214888A1 PCT/EP2025/059233 EP2025059233W WO2025214888A1 WO 2025214888 A1 WO2025214888 A1 WO 2025214888A1 EP 2025059233 W EP2025059233 W EP 2025059233W WO 2025214888 A1 WO2025214888 A1 WO 2025214888A1
Authority
WO
WIPO (PCT)
Prior art keywords
building blocks
target
data
numerical
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
PCT/EP2025/059233
Other languages
French (fr)
Inventor
Soheila SAMIEE
Sebastian Hermann Martschat
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BASF SE
Original Assignee
BASF SE
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BASF SE filed Critical BASF SE
Publication of WO2025214888A1 publication Critical patent/WO2025214888A1/en
Pending legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C60/00Computational materials science, i.e. ICT specially adapted for investigating the physical or chemical properties of materials or phenomena associated with their design, synthesis, processing, characterisation or utilisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/10Analysis or design of chemical reactions, syntheses or processes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/30Prediction of properties of chemical compounds, compositions or mixtures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/80Data visualisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/70Machine learning, data mining or chemometrics

Definitions

  • the disclosure relates to a resource and time efficient development of new materials by determining target properties of materials reliably and scalable and provides methods for determining target properties of materials , apparatuses for determining target properties of materials, use of an indication of a target property of a target material, use of a numerical representation of a structure of the target material and the position of two or more building blocks associated with the structure of the target material, numerical representation of two or more building blocks associated with the target material and the position of two or more building blocks associated with the structure of the target material.
  • it relates to a, in particular computer-implemented, method for determining a target property of a target material, the method comprising:
  • it relates to a, in particular computer-implemented, method for obtaining one or more data- driven model(s) for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, providing and/or obtaining, in particular receiving, a plurality of indications of the structure of the material(s), in particular digital representations of the structure of the material(s), providing and/or obtaining, in particular receiving, at least one numerical representation of a position of the two or more building blocks within a structure of the materials, providing the indications of the structure of the material(s), the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s) to relate the indications of the structure of the material(s) to the
  • it relates to a method for monitoring and/or controlling producing and/or processing of a target material, the method comprising:
  • Mapping preferably via the processor, the numerical representation associated with the two or more building blocks to an indication of the target property of the target material at least partially by the one or more data-driven models,
  • the indication of the target property of the target material for monitoring and/or controlling production and/or processing of the target material.
  • it relates to method for obtaining one or more data-driven model(s) for monitoring and/or controlling producing and/or processing of a target material, the method comprising: Providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, preferably via an interface, providing and/or obtaining, in particular receiving, a plurality of indications of the structure of the material(s), in particular digital representations of the structure of the material(s), preferably via the interface, providing and/or obtaining, in particular receiving, at least one numerical representation of a position of the two or more building blocks within a structure of the materials, preferably via the interface, providing the indications of the structure of the material(s), the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s) to relate the indication
  • it relates to a, in particular computer-implemented, method for determining a target property of a target chemical compound, the method comprising: providing and/or obtaining, in particular receiving, an indication of a structure of the chemical compound, identifying two or more building blocks associated with the indication of the structure of the target chemical compound according to a vocabulary indicative of a plurality of building blocks associated with structures of chemical compounds, wherein at least one building block comprises at least two components, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more data-driven model(s), wherein the one or more data-driven model(s) are trained based on historical indications of one or more structure(s) of one or more material(s) and corresponding indications of properties of the materials, mapping the numerical representation of the two or more building blocks to an indication of the target property by the one or more data-driven model(s), providing the indication of the target property of the target material.
  • it relates to a, in particular computer-implemented, method for obtaining one or more data- driven model(s) for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, providing and/or obtaining, in particular receiving, at least one historical numerical representation of a position of the two or more building blocks within a structure of the materials, providing the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s), optionally providing the one or more data-driven model(s).
  • it relates to a, in particular computer-implemented, method for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a numerical representation of a structure of the target material associated with two or more building blocks, processing the numerical representation by one or more data-driven model(s), wherein the one or more data-driven model(s) are obtained by claim 1 , providing a numerical representation of a position of the two or more building blocks within a structure of the target material to the one or more data-driven model(s), generating a numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s), mapping the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material to an indication of the target property of the target material at least partially by the one or more data-driven models, providing the indication of the target property of the target material.
  • an apparatus for determining a target property of a target material and/or for monitoring and/or controlling producing and/or processing of a target material comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform any one of the methods as described herein.
  • it relates to use of an indication of a target property of a target material as obtained by any one of the methods as described herein for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material.
  • it relates to use of a numerical representation of a position of two or more building blocks associated with the structure of one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material as described herein.
  • it relates to a numerical representation of two or more building blocks associated with the one or more material(s) and a position of two or more building blocks associated with the structure of the one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material as described herein.
  • Properties of materials need to be tailored to their intended application. Otherwise, resulting products might be of lower quality and/or unusable. Additionally or alternatively, materials with slightly deviating properties from target properties may be non-processable by envisaged processing processes, e.g. due to a differing viscosity. This would constitute a waste of material. Hence, controlling and tailoring properties of materials is of high technical relevance.
  • the properties of materials highly depend on the structure of the material. Even small modifications in the structure may lead to completely different properties of the material associated with modified structure. Said dependency may be non-obvious, in particular in relation to small modifications changing properties significantly. In particular, a modification of a building block of the material may completely change the 3-dimensional structure of the material. Followingly, developing new materials requires a lot of research efforts including large amounts of experiments to be conducted. This uses a lot of material for experimentation and time. Therefore, it is desired to reduce time and resources for experimentation for determining properties of materials.
  • the structure of the material determines the properties of the material e.g. via determining the 3-dimensional structure and thus interaction within the material and towards other materials.
  • This structure is determined via the type of the building blocks and the orientation of the building blocks associated with the material.
  • the orientation and the type of building blocks associated with the material need to be taken into account for reliably determining properties of the material.
  • Processing a numerical representation of the structure of the target material and generating a numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s) from the numerical representation of the two or more building blocks and a numerical representation of a position of the two or more building blocks allows to take the position of the building blocks into account just before providing the output data.
  • Attention of the data-driven model may focus on the type of the building blocks during processing of the received input data. Especially complex structures are highly determined on the orientation of the building blocks.
  • data-driven model may refer to a model suitable for describing one or more non-linear relations between input data and output data.
  • Input data may refer to data to be provided to the data-driven model and/or to data being received by the data-driven model.
  • Output data may be data to be received from the data-driven model and/or to be provided by the data-driven model.
  • the data-driven model may determine the output data based on transforming the input data via one or more non-linear relations.
  • the one or more data-driven model(s) may comprise a representation data-driven model and/or a property data-driven model.
  • the one or more data-driven model(s) may comprise one or more embedding layer(s) and/or may be associated with the one or more embedding layer(s).
  • the one or more data-driven model(s) may comprise a representation data-driven model, and wherein the representation data-driven model may be parametrized and/or trained based on historical indications of structures of materials and/or historical building blocks of materials.
  • property in particular target property, may refer to an environmental property, to a physical property, to a chemical property or a combination thereof.
  • environmental property may comprise at least one of emission data of the material, recyclate content of the material, bio-based content of the material, renewable content of the material, material declaration data, material safety data or a combination thereof.
  • Target property may be at least one property selected from the plurality of properties of the target material.
  • emission data may comprise any data related to environmental footprint.
  • the environmental footprint may refer to an entity and its associated environmental footprint.
  • the environmental footprint may be entity specific.
  • the environmental footprint may relate to a material, a company, a process such as a manufacturing process, a raw material or basic substance, a material, a component, a component assembly, an end product, combinations thereof or additional entity-specific relations.
  • Emission data may include data relating to carbon footprint of a material.
  • Emission data may include data relating to greenhouse gas emissions e.g. released in production of the material.
  • Emission data may include data related to greenhouse gas emissions.
  • Greenhouse gas emissions may include emissions such as carbon dioxide (CO2) emission, methane (CH4) emission, nitrous oxide (N2O) emission, hydrofluorocarbons (HFCs) emission, perfluorocarbons (PFCs) emission, sulphurhexafluoride (SF6) emission, nitrogen trifluoride (NF3) emission, combinations thereof and additional emissions.
  • Emission data may include data related to greenhouse gas emissions of an entities or companies own operations (production, power plants and waste incineration).
  • Scope 2 comprise emissions from energy production which is sourced externally.
  • Scope 3 comprise all other emissions along the value chain. Specifically, this includes the greenhouse gas emissions of raw materials obtained from suppliers.
  • PCF Product Carbon Footprint
  • Chemical property may be a property that can be established by changing the structure of the at least one material.
  • Examples for chemical properties may be acidity, oxidation state or reactivity.
  • Physical property may be one of the following: mechanical properties, electrical properties, optical properties, thermal properties or the like.
  • physical property may comprise one or more of the following density, scratch resistance, electrical conductivity, color, absorption, heat capacity or the like.
  • material may be associated with one or more building blocks.
  • the structure of the material may be represented by one or more building blocks.
  • the material may be a biological and/or a chemical material.
  • the chemical material may comprise one or more chemical compound(s).
  • the material may be characterized by the structure of the material.
  • the structure of the material may be associated with a spatial orientation of one or more building block(s).
  • the building block(s) may be biological building block(s) and/or chemical building block(s).
  • the building blocks may be associated with at least partially repetitive parts of the structure of the material.
  • the building block may refer to an arrangement of one or more component(s).
  • the material may be associated with a plurality of components.
  • One building block may comprise one or more components.
  • the building block and/or the ensemble of components of the building block may be associated with one property and/or one function.
  • providing may include receiving.
  • providing a numerical representation and/or providing an indication and/or providing a structure may comprise receiving the numerical representation and/or receiving an indication and/or receiving a structure e.g. via a data providing interface such as a user interface.
  • numerical representation may comprise one or more numerical values for representing the data represented by the numerical representation.
  • the numerical representation may comprise a tensor, in particular a vector and/or a matrix.
  • the vector may represent one building block.
  • the matrix may represent two or more building blocks.
  • the numerical representation may be obtained from the data represented by the numerical representation, preferably according to a relation between the data and the corresponding numerical representation.
  • the relation between the data and the corresponding numerical representation may comprise a predefined relation and/or one or more embedding layer(s).
  • the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • the one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points.
  • the one or more embedding layer(s) may be comprised by the one or more data-driven model(s).
  • providing the indication of the target property may include providing the value associated with the target property.
  • the indication of the target property may comprise the values associated with the target property.
  • indication of a target property may be suitable for deriving and/or obtaining the target property.
  • the indication of the target property may comprise the target property, in particular a numerical value associated with the target property.
  • the indication of the structure of the material, in particular the target material may be associated with a structure of the material.
  • the structure of the material may be derived and/or obtained from the indication of the structure of the material, in particular the target material.
  • the indication of the structure of the material, in particular the target material may comprise a structural representation of the material.
  • the structural representation of the material may be a digital representation of the material.
  • the indication of the structure of the material, in particular the target material may be associated with one or more characterizing properties of the material, preferably independent of the one or more target properties.
  • the indication of the structure of the material, in particular the target material may comprise the structure of the material.
  • the indication of the structure of the chemical compound may be an indication of the target material.
  • the target material may be a target chemical compound.
  • all embodiments applying to the target material may equally apply to the target chemical compound.
  • Materials may comprise one or more chemical compound(s).
  • all embodiments applying to the materials may equally apply to chemical compounds.
  • the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks.
  • the relative position of the two or more building blocks may refer to a position of at least one building blocks in relation to at least another of the two or more building blocks.
  • the two or more building blocks may comprise at least one target building block.
  • the relative position may refer to a position of the two or more building blocks in relation to at least one target building block.
  • the target building block may be a masked building block.
  • the masked building block may be a placeholder for a building block to be determined.
  • the target building block may be a building block to be generated, e.g. by the one or more data-driven model(s), in particular the representation data-driven model.
  • the building block may comprise one or more components.
  • the material may comprise and/or may constitute of a plurality of components. Hence, the number of building blocks may be equal or below the number of components. By doing so, a direct relation between the building blocks can be represented.
  • the relative position allows to determine one or more building blocks in relation to other building blocks associated with the material.
  • materials typically have complex 3-dimensional structures being a result the exact orientation of the structure of the material. These structures are determined by the exact location of the building blocks associated with the material. Hence, taking the relative position into account provides a better representation of the material. Therefore, determining of the target properties is improved.
  • the numerical representation of the two or more building blocks may be obtained by providing an indication of a structure of the material, in particular the target material, identifying two or more building blocks associated with the indication of the structure of the material, in particular the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, in particular structure of the target material, and mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s).
  • the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • the one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points.
  • the one or more embedding layer(s) may be comprised by the one or more data-driven model(s).
  • the identified two or more building blocks may be generated by mapping the identified two or more building blocks to a numerical representation of the two or more building blocks separately or at once. Where the identified two or more building blocks may be mapped separately, the identified two or more building blocks may mapped one after one another. Hence, the two or more building blocks may be mapped to one or more numerical representation(s) per building block.
  • the one or more numerical representation(s) per building block may be combined, i.e. concatenated, into one or more numerical representation of the two or more building blocks. By doing so, a structure of the material can be represented by a structured and machine-processable representation.
  • the structure of the material can be represented with a fixed amount of building blocks. This allows for a better control of input size to the data- driven model while the number of building blocks is smaller than the number of possible structures of materials. The reason for this is that repetitions in the structure of a material can be represented by reoccurring building blocks. Hence, structures of materials can be represented in an efficient format. Thus, resources for determining the target properties of the material can be saved.
  • the identified two or more building blocks may be mapped separately and/or together to a numerical representation of the two or more building blocks by the one or more embedding layer(s). This speeds up computing of the matrix operations by separately mapping the building blocks.
  • the numerical representation of the two or more building blocks and optionally of a relation between the two or more building blocks may be a numerical representation of the structure of the target material. Any one of the methods may further comprise providing a numerical representation of a relation between the two or more building blocks. Processing the numerical representation by one or more data-driven model(s) may comprise generating the numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks by the one or more data-driven model(s).
  • the numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the numerical representation of the two or more building blocks associated with the target material and the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks.
  • the numerical representation of the structure of the material may be a numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks, and wherein the numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks may be obtained by generating a numerical representation associated with the two or more building blocks and the relation between the two or more building blocks from the numerical representation of the two or more building blocks and a numerical representation of the relation between the two or more building blocks by the one or more data-driven models. Taking the relation between the two or more building blocks into account enables the data-driven model to determine the target properties of the target material based on a relative orientation of the two or more building blocks.
  • the numerical representation of the two or more building blocks may be obtained by providing an indication of a structure of the material, obtaining a structure of the material from the indication of the structure of the material, in particular the target material, identifying two or more building blocks associated with the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s).
  • the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • the one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points.
  • the one or more embedding layer(s) may be comprised by the one or more data-driven model(s).
  • Obtaining a structure of the material from the indication may comprise mapping the indication of the structure of the material, in particular the target material to the structure of the material.
  • Obtaining a structure of the material from the indication of the structure of the material, in particular the target material may comprise retrieving the structure of the material based on the indication of the structure of the material, in particular the target material.
  • Retrieving the structure of the material may comprise providing a query associated with and/or obtained from the indication of the structure of the material, in particular the target material and receiving the structure of the material in response to providing the query.
  • the query may be provided to a structured database and/or the query may be a structured query for retrieving the structure of the material.
  • the query may comprise a numerical representation of the indication of the structure of the material, in particular the target material.
  • a similarity score indicative of a distance between the numerical representation of the indication of the structure of the material, in particular the target material and numerical representations of structures of materials may be determined.
  • the received structure of the material may be associated with the similarity being within a predefined similarity range. Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material.
  • users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials.
  • a name such as a trivial name, IUPAC name or the like to determine properties of the materials.
  • structures of the target material can be retrieved error-free e.g. from a database.
  • efficiency of determining target properties of the target material is improved.
  • obtaining the structure of the material from the indication of the structure of the material, in particular the target material may comprise retrieving the structure of the material based on the indication of the structure of the material, in particular the target material.
  • the structure of the material may be associated with a sequence of the two or more building blocks.
  • the sequence of the two or more building blocks may be linear and/or non-linear.
  • the indication of the structure of the material, in particular the target material, in particular the structure of the material may be transformed into a linearized representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material.
  • SMILES constitute a linearized representation for possibly non-linear structures.
  • the structure of the material may for example comprise a ring element. Transforming the structure of the material to e.g. a SMILES representation allows to represent non-linear structures in a linearized format.
  • Said format can be processed by the one or more data-driven model(s).
  • this enables to process non-linear structures of materials in a sequential manner.
  • data-driven models operating on sequences i.e. generative models, can be used for determining target properties of the materials from the structure of the material.
  • any one of the methods may further comprising reducing the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks by providing the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks to one or more embedding layer(s).
  • the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks may be obtained by mapping the two or more building blocks and/or the relation between the two or more building blocks to a numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and/or between a plurality of relations of building blocks and corresponding numerical representations.
  • Mapping the two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s) comprises mapping the identified two or more building blocks to a numerical representation of the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and reducing the numerical representation of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the two or more building block(s) to the one or more embedding layer(s).
  • Mapping the relation between the two or more building blocks to a numerical representation of the relation between the two or more building blocks by one or more embedding layer(s) may comprise mapping the relation between the two or more building blocks to a numerical representation of the relation between the two or more building blocks according to a relation between relations of the two or more building blocks and corresponding numerical representations, and reducing the numerical representation of the relation between the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the relation between the two or more building blocks to the one or more embedding layer(s).
  • the numerical representation of the two or more building blocks may be associated with a number of numerical values equal or higher than a number of the plurality of building blocks.
  • the number of numerical values associated with the reduced numerical representation of the indication of the structure of the material, in particular the target material and/or of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the indication of the structure of the material, in particular the target material and/or the two or more building blocks.
  • the number of numerical values associated with the numerical representation of the relation between the two or more building blocks may be equal or higher than a number of the plurality of building blocks.
  • the number of numerical values associated with the reduced numerical representation of the relation between the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the relation between the two or more building blocks.
  • mapping the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks to an indication of the target property of the target material by one or more data-driven model(s) may comprise mapping the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks to a numerical representation of the indication of the target property by a property data-driven model, and, mapping the numerical representation of the indication of the target property according to a relation between numerical representations of data and the corresponding data.
  • the property data-driven model may be configured to map numerical representations of structures of materials to numerical representations of indications of target properties of the material.
  • the property data-driven model may be parametrized and/or trained according to numerical representations of historical indications of one or more structure(s) of one or more material(s) and corresponding indications of properties of the materials.
  • the numerical representations of the historical indication of one or more structures may comprise numerical representations of historical building blocks associated with materials.
  • the numerical representations of the historical indication of one or more structures may be obtained by providing historical structures of materials, historical indications of the structure of the material, in particular the target material and/or historical two or more building blocks to the representation data-driven model.
  • the representation data-driven model may be trained and/or parametrized prior to training and/or parametrizing the property data-driven model.
  • the representation data-driven model may be further trained together with the property data-driven model.
  • the property data-driven model may be specifically trained to determine target properties of materials.
  • the target properties determined by the property data-driven model may achieve higher accuracies as the numerical representation of the structure of the material is determined prior to determining the property.
  • the property data-driven model is provided with meaningful numerical representations.
  • the property data-driven model can focus on determining the target properties and does need to fulfill several tasks. Ultimately, this increases the accuracy of determining target properties of the material.
  • generating the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks from the at least two numerical representations by one or more data-driven models may comprise mapping the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks by a representation data-driven model.
  • the representation data-driven model may be configured to generate a common numerical representation from two or more numerical representations.
  • Mapping the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks may comprise modifying the numerical representations to generate at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks by applying a filter to the numerical representations.
  • the filter may be obtained based on the numerical representations.
  • the filter may be obtained by one or more matrix operation(s) of the numerical representations, in particular one or more matrix multi plication (s) of the numerical representations. Applying the filter may result in weighting a contribution of the two or more building blocks to the numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the representation data-driven model is enabled to attend to different parts of the structure according to their contribution to determining one or more target properties of the target material. This increases the accuracy for determining the target properties and thus, contributes to determining target properties robustly and in a scalable manner.
  • the representation data-driven model may be parametrized and/or trained based on historical indications of structures of materials and/or historical building blocks of materials.
  • the representation data-driven model may be parametrized and/or trained to determine one or more building blocks associated with the provided one or more numerical representation(s). Hence, during training and/or parametrizing the representation data- driven model may be provided by numerical representations of two or more building blocks and relations between the two or more building blocks. Further, the representation data-driven model may be triggered to determine one or more building blocks corresponding to the two or more building blocks associated with the provided numerical representations. The representation data-driven model may be triggered by providing a masked building block. Hence, the representation data-driven model may be provided by three or more building blocks. The three or more building blocks may comprise at least one masked building block. The at least one masked building blocks may be associated with the building block to be generated by the representation data-driven model.
  • historical indications of structures of materials and/or historical building blocks of materials associated with at least one masked building block may be generated, in particular by masking at least one building block associated with the historical indications of structures of materials and/or historical building blocks of materials.
  • Masking at least one building block may comprise exchanging at least one building block by at least one masked building block.
  • Materials in this context may include chemical compounds.
  • any one of the methods may further comprise providing a numerical representation of a position of the two or more building blocks within a structure of the material to the one or more data-driven model(s), and combining the numerical representation of the position of the two or more building blocks and the numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • Combining numerical representations may comprise merging the numerical representations and/or adding numerical values associated with the numerical representations.
  • a numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated by the one or more data-driven model(s).
  • the numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the two or more building blocks and the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material, e.g. by providing the numerical representation of the two or more building blocks, the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material to the one or more data-driven model(s), in particular the representation data-driven model.
  • the numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the modified numerical representation associated with the two or more building blocks and the relation between the two or more building blocks.
  • This allows to attend stronger to the position of the building blocks associated with the structure of the material.
  • the position is taken more into account for determining the target properties of the target material. Because of the close relation between the properties and the 3-dimensional orientation of parts of the material, this increases the meaningfulness of the numerical representation of the structure of the material and hence, increases the accuracy of determining target properties of the target material from this numerical representation. Ultimately, this contributes to an accurate and scalable determination of the target properties of target materials.
  • any one of the methods as described herein may further comprise generating a numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s). Additionally or alternatively, any one of the methods as described herein may further comprise combining the numerical representation associated with the two or more building blocks and the numerical representation associated with the position of the two or more building blocks to the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material. Combining numerical representations may comprise merging the numerical representations and/or adding numerical values associated with the numerical representations.
  • the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material, e.g. by providing the numerical representation of the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material to the one or more data-driven model(s), in particular the representation data-driven model.
  • the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the modified numerical representation associated with the two or more building blocks. This allows to attend stronger to the position of the building blocks associated with the structure of the material.
  • the position of the two or more building blocks may be and/or may be indicative of an absolute position of the two or more building blocks within the structure of the material.
  • the absolute position of a building block may be indicative of a placing of the building block within a sequence of the two or more building blocks associated with the structure of the material.
  • the absolute position may be determined by counting the number building blocks and assigning a number indicative of the placing of the building blocks within the counted number of building blocks.
  • the absolute position is crucial for determining target properties from structures of target materials because of the close relationship between the positions of parts of the material and the corresponding material properties. Further, absolute positions allow to distinguish building blocks with a similar or same relation to another building block clearly. Hence, absolute positions enable a more meaningful representation of structures of target materials. Ultimately, this contributes to more accurate and scalable determination of target properties of target materials from the structure of the target materials.
  • At least one building block may comprise at least two components associated with the structure of the material.
  • at least one building block may constitute of one component.
  • the component may be a part of a building block.
  • the vocabulary may be indicative of the plurality of building blocks and optionally a plurality of components.
  • the vocabulary may be indicative of a relation between the plurality of components and the plurality of building blocks.
  • the vocabulary may be obtained by reducing an initial set of building blocks to a target number of building blocks.
  • the initial set of building blocks may be determined by forming at least a part of the historical indications by one or more combinations of the plurality of building blocks. Further, at least one building block may be formed by one or more combinations of two or more components.
  • the historical indications of the structure of the material, in particular the target material can be formed by one or more combinations of the two or more components.
  • At least one building block may constitute of one component and/or may be equal to one component. Hence, the building blocks may be regarded as an extension of the vocabulary to avoid redundancies and/or simplify representation of the structure of materials.
  • the historical indications of structures of material may be provided, e.g. via a user interface.
  • a target number of building blocks for forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be provided.
  • the determination score may be determined per building block of the initial set.
  • the determination scores may be associated with generating the building blocks of the set of building blocks. In particular, the determination scores may be obtained based on occurrences of the building blocks within the historical indications of the structure of the material, in particular the target material.
  • a loss score may be determined per building block of the set of building blocks.
  • the loss scores may be associated with modifying the determination scores by removing the building blocks from the set of building blocks.
  • the set of building blocks may be filtered according to the loss scores. In particular, building blocks associated with loss scores within a predefined range may be removed from the set of building blocks.
  • the filtered set may be provided. Otherwise, the determination scores and the loss scores associated with the filtered set of building blocks may be determined until the number of building blocks associated with the filtered set of building blocks may be equal or lower than the target number of building blocks.
  • the provided filtered set of building blocks may be the vocabulary. By doing so, the length of sequences of building blocks associated with the structure of materials can be reduced. This is a consequence of representing reoccurring sequences of components by one or more building blocks. Fol lowingly, longer sequences of a plurality of components can be reduced to shorter sequences of building blocks.
  • the structure of the material may be associated with and/or may be related to at least three components.
  • the two or more building blocks may comprise and/or may be related to the at least three components.
  • the one or more data-driven model(s) may be provided by the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks for generating one numerical representation by transforming the at least two numerical representations into one numerical representation according to one or more matrix operations associated with the one or more data-driven model(s).
  • the one or more data-driven model(s) may be provided by the numerical representation of the structure of the target material and the numerical representation of the position of the two or more building blocks for generating one numerical representation by transforming the at least two numerical representations into one numerical representation according to one or more matrix operations associated with the one or more data-driven model(s).
  • the one or more matrix operations may include applying a filter to the numerical representations.
  • the filter may be obtained based on the numerical representations.
  • the filter may be obtained by one or more matrix operation(s) of the numerical representations, in particular one or more matrix multi plication (s) of the numerical representations. Applying the filter may result in weighting a contribution of the two or more building blocks to the numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the representation data-driven model is enabled to attend to different parts of the structure according to their contribution to determining one or more target properties of the target material. This increases the accuracy for determining the target properties and thus, contributes to determining target properties robustly and in a scalable manner.
  • the target property, in particular an indication of the target property, of the target material may be determined and/or provided for monitoring and/or controlling chemical and/or biological production. Additionally or alternatively, the target property, in particular an indication of the target property, of the target material may be determined and/or provided for producing and/or processing the material.
  • the numerical representation of the structure of the target material associated with the two or more building blocks may be obtained by providing an indication of a structure of the target material, identifying the two or more building blocks of the target material associated with the indication of the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s), wherein the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • a structure of the material can be represented by a structured and machine-processable representation. This enables commonly available structures of material to be processed by data-driven models.
  • the structure of the material can be represented with a fixed amount of building blocks. This allows for a better control of input size to the data-driven model while the number of building blocks is smaller than the number of possible structures of materials. The reason for this is that repetitions in the structure of a material can be represented by reoccurring building blocks. Hence, structures of materials can be represented in an efficient format. Thus, resources for determining the target properties of the material can be saved.
  • the numerical representation of the structure of the target material may be obtained by providing an indication of the structure of the target material, obtaining a structure of the target material from the indication of the structure of the target material, mapping the structure of the target material to the numerical representation of the structure of the target material.
  • Mapping the structure of the target material to the numerical representation of the structure of the material may comprise identifying two or more building blocks associated with the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s).
  • the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material. For example, users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials.
  • structures of the target material can be retrieved error-free e.g. from a database. Hence, efficiency of determining target properties of the target material is improved.
  • any one of the methods may further comprising reducing the numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks by providing the numerical representation of the two or more building blocks and/or the numerical representation of the position of the two or more building blocks to one or more embedding layer(s).
  • the numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks may be obtained by mapping the structure of the target material and/or the position of the two or more building blocks to a numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks according to a relation between a plurality of structures of materials and corresponding numerical representations and/or between a plurality of positions of building blocks and corresponding numerical representations.
  • Mapping the structure of the target material to a numerical representation of the structure of the target material by one or more embedding layer(s) comprises mapping the identified two or more building blocks to a numerical representation of the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and reducing the numerical representation of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the two or more building block(s) to the one or more embedding layer(s).
  • Mapping the position of the two or more building blocks to a numerical representation of the position of the two or more building blocks by one or more embedding layer(s) may comprise mapping the position of the two or more building blocks to a numerical representation of the position of the two or more building blocks according to a relation between positions of the plurality of building blocks and corresponding numerical representations, and reducing the numerical representation of the position of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the position of the two or more building blocks to the one or more embedding layer(s).
  • the numerical representation of the structure of the target material may be associated with a number of numerical values equal or higher than a number of the plurality of building blocks.
  • the number of numerical values associated with the reduced numerical representation of the structure of the target material and/or of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the structure of the target material and/or the two or more building blocks, in particular before reducing.
  • the number of numerical values associated with the numerical representation of the position of the two or more building blocks may be equal or higher than a number of the plurality of building blocks.
  • the number of numerical values associated with the reduced numerical representation of the position of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the position of the two or more building blocks, in particular before reducing.
  • the two or more building blocks comprise three or more components, in particular a sequence of three or more components and/or a sequence of the two or more building blocks.
  • the two or more building blocks may be associated with and/or may comprise three or more components.
  • the target chemical compound may a polymer.
  • the building blocks may be building blocks of a polymer.
  • the building blocks may be representations associated with two or more monomers of the target chemical compound, in particular the polymer.
  • the building blocks may correspond to structural unit of the polymer obtained by one or more chemical reaction(s) of the monomers associated with the polymer.
  • the mass of the target chemical compound may be above 1000 g/mol.
  • the target chemical compound may an organic chemical compound.
  • Polymers and larger molecules typically comprise at least partially repetitive structures. Hence, polymers and larger molecules are especially well suited for determining target properties from the structure of the polymer.
  • any one of the methods may comprise providing a numerical representation of a relation between the two or more building blocks and generating a numerical representation of the two or more building blocks and the relation between the two or more building blocks from the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks by the one or more data-driven model(s), wherein mapping the numerical representation of the two or more building blocks to an indication of the target property comprises mapping the numerical representation of the two or more building blocks and the relation between the two or more building blocks to an indication of the target property.
  • the relation between the two or more building blocks may be indicative of one or more chemical bond(s) between the two or more chemical compounds.
  • Data-driven model(s) may have to select the parts of the input data to attend to for determining the properties of the material. Allowing the data-driven model to learn a common representation enables the data-driven model to weight the importance of both factor for determining properties of the material.
  • the type, i.e. the presence of a particular building block, or the orientation of at least one building block may be a stronger factor for determining properties of the material.
  • Acidity of carboxylic acids may be judged mainly based on the types of building block, i.e.
  • the orientation between the building blocks may be of less importance than the type of building blocks.
  • sequences of e.g. hydrogen bonds forming building blocks are to be analyzed, the orientation of the building blocks towards each other may be of high importance.
  • Learning a common representation of the type of the building blocks and the orientation of the building blocks enables to determine properties of materials robustly for a plurality of different materials.
  • Such a data-driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
  • any one of the methods may comprise providing a numerical representation of a position of the two or more building blocks within a structure of the target material to the one or more data-driven model(s), and combining the numerical representation of the position of the two or more building blocks and the numerical representation of the two or more building blocks and optionally the relation between the two or more building blocks, wherein mapping the numerical representation of the two or more building blocks to an indication of the target property comprises mapping the numerical representation of the two or more building blocks and the position of the two or more building blocks and optionally the relation between the two or more building blocks to an indication of the target property.
  • mapping the numerical representation of the two or more building blocks to an indication of the target property by the one or more data-driven model(s) may comprise processing the numerical representation of the two or more building blocks by the one or more data- driven model(s), and, providing a numerical representation of a position of the two or more building blocks, and, combining the numerical representation of the position of the two or more building blocks and the processed numerical representation of the two or more building blocks.
  • Mapping the numerical representation of the two or more building blocks to an indication of the target property may comprise mapping the numerical representation of the two or more building blocks and the position of the two or more building blocks and optionally the relation between the two or more building blocks to an indication of the target property.
  • Processing the numerical representation of the two or more building blocks includes modifying the numerical representation of the two or more building blocks, e.g. by one or more matrix operation(s).
  • Processing the numerical representation of the position of the two or more building blocks allows to take the position of the building blocks into account, in particular just before providing the output data. This focusses the attention of the data-driven model towards the relation between the orientation and the type of the building block just before providing the output data. Attention of the data-driven model may focus on the type of the building blocks during processing of the received input data. Especially complex structures are highly determined on the orientation of the building blocks. Followingly, taking the position of the building blocks into account after processing the numerical representation of the structure of the material refocused the attention of the data-driven model. Such processing by the data- driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
  • any one of the methods may comprise obtaining a structure of the target chemical compound from the indication of the structure of the target chemical compound.
  • the structure of the target chemical compound may be indicative of the two or more building blocks.
  • the two or more building blocks may be identified based on the obtained structure of the target chemical compound.
  • Obtaining the structure of the target chemical compound may comprise retrieving the structure of the target chemical compound from a database.
  • the indication of the structure of the target chemical compound may be suitable for retrieving the structure of the target chemical compound.
  • the indication of the structure of the target chemical compound may be provided to the database for retrieving the structure of the target chemical compound.
  • the database may be configured to provide the structure of chemical compounds in response to receiving indications of structures of chemical compounds.
  • Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material. For example, users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials.
  • structures of the target material can be retrieved error-free e.g. from a database. Hence, efficiency of determining target properties of the target material is improved.
  • training the one or more data-driven model(s) may include generating a numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s), mapping the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material to an indication of the target property of the target material at least partially by the one or more data-driven models.
  • training the one or more data-driven model(s) may further include merging the historical numerical representations of the structure of the material with a historical numerical representation of a relation between the two or more building blocks and/or merging the historical numerical representations of the structure of the material and the relation between the two or more building blocks with the historical numerical representations of the position of the two or more building blocks.
  • Combining may refer to merging.
  • Merging may include concatenating the at least two numerical representations.
  • the one or more data-driven model(s) may be further trained based on historical numerical representations of a relation between the two or more building blocks.
  • the historical numerical representations of the structure of the one or more material(s) may be obtained by providing historical indications of the structure of the one or more material(s), obtaining a structure of the one or more material(s) from the indication of the structure of the one or more material (s), mapping the structure of the one or more material(s) to the numerical representation of the structure of the one or more material(s).
  • Mapping the structure of the one or more material(s) to the numerical representation of the structure of the one or more material(s) may comprise identifying two or more building blocks associated with the structure of the one or more material(s) according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s).
  • the one or more embedding layer(s) may be configured to map data to numerical representations of the data.
  • the structure of the one or more material(s), in particular the target material may be associated with a sequence of the two or more building blocks.
  • the one or more data-driven model(s) may comprise a representation data-driven model and a property data-driven model.
  • providing the indication of the target property for monitoring and/or controlling the production and/or the processing of the target material may comprise using the indication of the target property for adapting one or more parameter(s) and/or one or more equipment item(s) for producing and/or processing the target material and/or comparing an obtained and/or measured property of the target material during and/or after producing and/or processing with the target property.
  • historical indication may refer to a, in particular previously, determined and/or known structure of the material, e.g. by using a model configured to determine an indication of a structure of a material and/or by performing a measurement.
  • a processor may be a processor of any suitable type, and is preferably a processor configured for parallel processing of at least a hundred or at least a thousand threads in parallel, e.g. a graphical processing unit (GPU).
  • the processor comprises at least a hundred or a at least a thousand parallel processing cores.
  • the processor may comprise at least one (preferably at least a thousand) compute unified device architecture (CUDA) core(s), which may allow for using a graphical processing unit as the processor, which may increase computational efficiency.
  • the processor may comprise at least one (e.g. at least a hundred) streaming multiprocessor cores, which may allow for increasing the data throughput.
  • the processor may comprise one or more (e.g.
  • a tensor core may be specifically adapted to perform matrix operations and may allow to accelerate large matrix operations.
  • a tensor core may be configured to perform mixed-precision matrix multiply and accumulate calculations in a single operation. For instance, a tensor core may perform mixed-precision floating-point matrix arithmetic, specifically utilizing FP16 (halfprecision) inputs to produce either full-precision (FP32) or half-precision (FP16) outputs.
  • a tensor core may provide a performance boost by storing the intermediate accumulation results in FP32 format, thereby maintaining the precision necessary for accurate results.
  • a tensor processing unit may be an application-specific integrated circuit (ASIC). It may comprise a matrix multiplication unit (MXU), which may be specifically adapted or configured for dense linear algebra operations. TPUs may be configured to handle large-scale matrix operations efficiently, which may provide high computational throughput for Al tasks.
  • a TPU may be equipped with on-chip high-bandwidth memory (HBM), which may enhance the capability for the use of larger models and batch sizes. TPUs may be connected in groups called Pods, which may scale up workloads with minimal code changes.
  • An MXU may be specifically configured for performing matrix multiplications.
  • a TPU may comprise a tensor core.
  • a processor may comprise several thousand tensor cores, each capable of performing 64 floating point FMA (Fused Multiply-Add) operations per clock cycle or (e.g. at least several hundred) tensor processing units (TPUs) being specifically configured for accelerating machine learning (ML) workloads, particularly for cloud-based applications.
  • FPGAs Field-Programmable Gate Arrays
  • ASICs Application-Specific Integrated Circuits
  • a GPU may allow for hundreds of TFLOPs (Tera Floating-Point Operations per Second) of performance in mixed-precision computations.
  • a tensor core may support a variety of numerical formats, including IEEE standard half-precision, single-precision, and double-precision floating-point formats, as well as a range of integer formats.
  • a processor may be a central processing units (CPU) configured with an advanced architecture, such as Intel's Xeon Scalable processors or AMD's EPYC series.
  • a CPU may be configured for sequential processing and general-purpose computing. These CPUs may incorporate vector instruction sets, such as AVX-512, to accelerate mathematical computations that may e.g. enhance Al model training and inference.
  • CPUs may integrate Al accelerators i.e. a CPU may be specifically configured for deep learning workloads.
  • the processor may be coupled to memory having a memory bandwidth of at least a hundred gigabytes per second, which may allow efficient handling of extensive data sets and may allow faster reading, processing, and writing compared to a general-purpose processor such as a computational processing unit.
  • the memory may be a high-capacity memory configured to manage the data-intensive nature of Al applications, providing necessary bandwidth and storage capacity for complex datasets.
  • the memory may for instance be DDR4, DDR5, High Bandwidth Memory (HBM) and/or GDDR6X memory, which may improve data transfer rates and reduce latency.
  • HBM High Bandwidth Memory
  • GDDR6X memory may improve data transfer rates and reduce latency.
  • Such memory may enhance e.g. modeling and real-time sensor data for monitoring and control.
  • the memory may be operated with memory optimization techniques, such as caching and prefetching, which may enhance the execution speed of Al algorithms.
  • NVM Non-volatile Memory
  • NVM Non-volatile Memory
  • NAND Flash and 3D XPoint may provide persistent storage solutions with highspeed access, which may enhance rapid data storage and retrieval for Al applications.
  • the target property may be a desired property.
  • the target property may be a property targeted to be obtained and/or measured associated with the target material.
  • the target property may be specified by an application associated with the processed and/or produced target material.
  • position may comprise a relative position and/or an absolute position.
  • FIG. 1 illustrates an embodiment of materials and corresponding properties of the material.
  • FIG. 2 illustrates an embodiment of determining a property of a material.
  • FIG. 3 illustrates an embodiment of determining, in particular measuring, a property of a material.
  • FIG. 4 illustrates embodiments of an indication of a structure of a material.
  • FIG. 5 illustrates an embodiment of obtaining a vocabulary.
  • FIG. 6 illustrates an embodiment of an embedding layer.
  • FIG. 7A illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
  • FIG. 7B illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
  • FIG. 8A illustrates an embodiment of a representation data-driven model 8-148.
  • FIG. 8B illustrates an embodiment of a representation data-driven model 8-148.
  • FIG. 8C illustrates an embodiment of a representation data-driven model 8-148.
  • FIG. 9A illustrates an embodiment of training and/or deploying the representation data-driven model 902.
  • FIG. 9B illustrates an embodiment of training and/or deploying the representation data-driven model 916.
  • FIG. 10 illustrates an embodiment of a vocabulary length and an accuracy of determining properties of materials according to the methods as described herein.
  • FIG. 1 illustrates an embodiment of materials and corresponding properties of the material.
  • Materials comprise one or more building blocks.
  • Material may be a biological material such as a protein.
  • Biological material may comprise a plurality of amino acids.
  • the relative orientation of amino acids within one or more sequences of amino acids determines an interaction between the amino acids. For example, including a plurality of amino acids with an OH-group may allow for establishment of hydrogen bods across parts of the biological material, in particular parts of the sequence of amino acids. Said hydrogen bonds may strongly determine the spatial orientation of the sequence.
  • a common example may be the spatial orientation of DNA. Due to the interaction of the building blocks of the DNA, a double helical structure may be formed by the DNA strands.
  • the spatial orientation of the biological material may determine the properties of the biological material, e.g. if the biological material, in particular the protein, may fit with another biological material or not. Fitting with another biological material may allow for a structural modification, in particular a well-defined structure modification. Hence, the spatial orientation of the protein may determine its capability to interact with other proteins. Followingly, the structure, in particular the sequence of amino acids, of the biological material may determine the interaction within a biological system.
  • Chemical compounds may be part of a value chain towards a chemical product. Hence, chemical products may be allowed to react towards the chemical product, e.g. via one or more intermediate products. The reactions that a chemical compound may undergo may be determined by their chemical structure. Chemical compounds may be associated with rings, adjacent chains or the like. The reason may be that chemical compounds may be defined via the atoms comprised by the chemical compound and the bonding between the atoms. Carbon atoms are typically bonded via 4 bonds. Hence, chemical compounds may be non-linear. For example, caffeine as shown in FIG. 1 may comprise two rings with additional functional groups. Such structures may be represented e.g. via SMILES strings. SMILES strings may be an unambiguous and sequential representation of a chemical structure.
  • a polymer may be represented by polymer building blocks, in particular monomers.
  • the representation of the chemical compound may be chosen based on the type of chemical structure. Said chemical structure may determine the exact properties of the chemical compound. Even a small change e.g. exchanging one atom by another in a molecule comprising of hundred or more atoms or changing the orientation of one subgroup of a molecule, may result in the unchanged chemical compound participating in a chemical reaction and the changed chemical compound participating in a different chemical reaction.
  • the chemical structure in particular the spatial orientation of the atoms and bonding between the atoms may determine the properties, especially a reactivity, of the chemical compound.
  • FIG. 2 illustrates an embodiment of determining a property of a material.
  • An indication of a structure of a material may be provided 202.
  • Examples for indication of a structure of a material may include SMILES strings, sequences of amino acids or other biological building blocks, polymer sequences or the like as described in the context of FIG. 4.
  • the indication of the structure may be provided e.g. via a user interface.
  • the indication of the structure of a material may further comprise a denotation associated with the material and/or a composition of the material, optionally reaction conditions for obtaining the material from one or more chemical reaction(s) of the compounds specified by the composition of the material.
  • the indication may be suitable for retrieving a structure of the material and/or may comprise a structure of the material.
  • the structure of the material may be retrieved based on the indication suitable for retrieving the structure of the material.
  • a query obtained from the indication of the structure of the material, in particular the target material may be provided to a database.
  • the database may be configured to provide the structure of the material in response to receiving the query.
  • Said query may be a structured query, e.g. for a SQL database, or may comprise a numerical representation of the indication of the structure of the materials, e.g. for a similarity search.
  • Two or more building blocks associated with the indication of the structure of the material, in particular the target material may be identified according to a vocabulary indicative of a plurality of building blocks associated with structures of materials.
  • At least one building block may comprise two or more component.
  • One subelement may comprise, in particular constitute of, one building block associated with the structure of the material.
  • One building block may comprise one or more component.
  • an amino acid or a functional group may be a subelement.
  • the building block may comprise one or more amino acid(s) and/or one or more functional group(s).
  • the vocabulary may specify and/or may be indicative of a plurality of building blocks.
  • the vocabulary may be suitable for and/or configured to identify one or more element(s) associated with the indication of the structure of the material, in particular the target material.
  • the vocabulary may be suitable for identifying one or more building blocks comprised by the structure of the material.
  • a tokenizer and/or a tokenizing engine 302 may be configured to identify the two or more building blocks.
  • the vocabulary may be obtained as described in the context of FIG. 5.
  • identifying the two or more building blocks may comprise mapping the indication of the structure of the material, in particular the target material to a structure of the material, preferably a sequential structure of the material.
  • the material may be comprise one or more ring(s) and/or two or more strand(s) and/or two or more chain(s).
  • a sequential structure may be chosen according to the type of the material.
  • the chosen format of the structure of the material may be SMILES strings. By doing so, non-linear structures of materials may be linearized. This may allow for processing of the structure of the material by a tokenizing engine 302, a representation engine 308 and/or a property determination engine 306.
  • parts of the structure of the material may be weighted according to their relevance to determining the numerical representation and/or the property of the material. Weighting allows to focus the attention of the data-driven models to decisive parts of the structure. For example, in a long molecule with a carboxylic acid group, the carboxylic acid group may be decisive for the properties of the molecule. Similarly, in a chain of amino acids a presence of thiol groups may be decisive for building connection across the chains, i.e. sulfide bridges. Therefore, it enables a realistic and thus robust determination of numerical representations of structures of the material and/or properties of the material. Ultimately, this improves tailoring the properties of materials to a variety of application scenarios.
  • the structure of the material may comprise, in particular constitute of, the two or more building blocks.
  • the two or more building blocks may be mapped to a numerical representation of the two or more building blocks 204. Further, the two or more building blocks may be mapped to a numerical representation of a relation between the two or more building blocks 204.
  • the numerical representation of two or more building blocks and/or the relation between the two or more building blocks may be a tensor, in particular a two-dimensional tensor, i.e. a matrix.
  • One building block may be represented by a vector.
  • the numerical representation of the two or more building blocks and/or the relation between the two or more building blocks may be obtained by providing an indication of the two or more building blocks to one or more embedding layer(s).
  • the indication of the two or more building blocks may be a one-hot vector per building block.
  • a vector representing one building block may be obtained after another. Concatenating the two or more vectors associated with the two or more building blocks may result in the numerical representation of the two or more building blocks.
  • the embedding layer may be configured to map an indication of one or more building blocks to a numerical representation of, in particular a two-dimension tensor representing, the two or more building blocks associated with the indication of an indication of the structure of the material, in particular the target material.
  • the one or more embedding layers may be described in more detail in the context of FIG. 6. Hence, two or more numerical representations may be obtained 206.
  • the relation between the two or more building blocks may be a relative position, i.e. may be indicative of a position of the two or more building blocks in relation to the two or more building blocks.
  • the numerical representations may be mapped to numerical representations of a predefined size. This may include adding numerical values, i.e. matrix entries, to result in the numerical representations of the predefined size. This may be referred to as padding. Additionally or alternatively, this may include eliminating one or more numerical values of the numerical representations to result in the numerical representations of the predefined size.
  • This allows for using data-driven models of a predefined size. Data-driven models are built prior to their usage. They have defined inputs and outputs. Therefore, input of varying lengths need to be amended to result in the predefined input size required by the models.
  • the numerical representations may be modified 210. This may include generating one numerical representation from the two numerical representations by applying a filter to the numerical representations.
  • the filter may be obtained based on the numerical representations. This may be known as self-attention as the filter may be obtained based on the input data.
  • the filter may define one or more matrix operations for combining the numerical representations into one numerical representation and weighting a contribution of the two or more building blocks. Self-attention may be further described in the context of FIG. 8A.
  • the numerical representations may be provided to a representations data-driven model, e.g. as described in the context of FIG. 3 and FIG. 8A - FIG. 8C.
  • the representation data-driven model may be configured to modify the numerical representations to obtain one numerical representation.
  • the representation data-driven model may be part of a representation engine 308, in particular representation generating engine 304.
  • the two numerical representations may be split into at least two numerical representations per numerical representations, i.e. at least four numerical representations.
  • the so-obtained numerical representations may be modified as described above separately. This speeds up computing of the matrix operations.
  • one numerical representation may be obtained by applying the filter. This may be known as multihead operation and described in further detail in the context of FIG. 8A- FIG. 8C.
  • the obtained numerical representation may be mapped to a numerical representation of an indication of a property of the material 212.
  • This may comprise providing the obtained numerical representation to a property data-driven model.
  • the property data-driven model may be configured to provide properties of materials in response to receiving numerical representations of the structure of materials.
  • the property data- driven model may be a classification model.
  • the property data-driven model may comprise one or more layer(s).
  • the one or more layer(s) may be configured to map numerical representations of structures of materials to properties of said materials.
  • the property data-driven model be may trained together with the representation data-driven model or after training of the representation data-driven model.
  • the representation data-driven model may be trained prior to retraining the representation data-driven model together with training the property data-driven model.
  • the numerical representation of the indication of the property of the material may be mapped to the indication of the property of the material 214.
  • This may be a reverse process to embedding the indication of the structure of the material, in particular the target material, i.e. mapping the indications of the structure of the material, in particular the target material to a numerical representation of the indication of the structure of the material, in particular the target material.
  • This may be referred to as decoding.
  • One or more decoding layer(s) may be configured to map the numerical representation of the indication of the property of the material to the indication of the property of the material.
  • mapping the numerical representation of the indication of the property of the material to the indication of the property of the material may comprise providing the numerical representation of the indication of the property of the material to the one or more decoding layer(s).
  • An example of the one or more decoding layer(s) may be described in the context of FIG. 6.
  • the indication of the one or more properties of the materials may be provided.
  • the indication may be suitable for determining the one or more properties of the material and/or may be indicative of the one or more properties of the material.
  • providing the indication may comprise determining the one or more properties of the material from the indication.
  • the indication of the one or more properties of the material may be numerical values associated with the one or more properties of the material.
  • the indication of the one or more properties may be obtained from a classification model.
  • the classification model may output a numerical value indicating a classification of the material, in particular a classification of the one or more properties of the material.
  • the classification provided by the classification model may be mapped to one or more properties of the material.
  • FIG. 3 illustrates an embodiment of determining, in particular measuring, a property of a material.
  • the structure of the material may determine the properties of the material.
  • the indication of the structure of the material, in particular the target material may be provided as described in the context of FIG. 2.
  • the indication of the structure of the material, in particular the target material, in particular the structure of the material may be provided to a representation engine 308.
  • the representation engine 308 may comprise a tokenizing engine 302 and a representation generating engine 304.
  • the representation engine 308 may be configured to generate a numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material.
  • the numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material may be indicative of two or more building blocks associated with the indication of the structure of the material, in particular the target material, in particular the structure of the material. Further, the numerical representation may be indicative of a relation between the two or more building blocks.
  • the numerical representation may have a predefined size.
  • the numerical representation may be obtained as described within the context of FIG. 2.
  • two or more building blocks may be identified, e.g. as described within the context of FIG. 2.
  • the indication of the structure of the material, in particular the target material may be split into two or more building blocks, i.e..
  • the indication of the structure of the material, in particular the target material may be tokenized.
  • a vocabulary may specify a plurality of building blocks associated with structures of materials, in particular historical structures of materials. Historical structures of materials may refer to structures obtained previously identified structures of materials.
  • the indication of the structure of the material, in particular the target material may be tokenized by a tokenizing engine 302.
  • the tokenizing engine 302 may be configured to identify two or more building blocks associated with data provided to the tokenizing engine 302, in particular of structures of materials.
  • the two or more building blocks associated with the indication of the structure of the material, in particular the target material, in particular the sequence of the two or more identified building blocks may be provided to the representation generating engine 304.
  • the representation generating engine 304 may be configured to generate the numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material, from the two or more building blocks identified by the tokenizing engine 302, e.g. as described within the context of FIG. 2.
  • the representation generating engine 304 may provide the numerical representation of the indication of the structure of the material, in particular the target material, in particular numerical representation of the structure of the material, to a property determination engine 306.
  • the property determination engine 306 may be configured to map the numerical representation of the indication of the structure of the material, in particular the target material to a numerical representation of an indication of a property of the material, e.g. as described in the context of FIG. 2.
  • FIG. 4 illustrates embodiments of an indication of a structure of a material.
  • the material may be a biological and/or a chemical material.
  • Biological materials may be characterized by a biological activity associated with the biological material. Further, biological materials may comprise a sequence of amino acids. Hence, biological materials may comprise macromolecules, e.g. of more than 1000 Da and/or a sequence of at least 100 amino acids.
  • An example of an indication of a structure of a biological material may be a sequence of biological building blocks.
  • Biological building block may include one or more amino acids, one or more phosphate group(s), one or more nucleotide(s), one or more nucleabase(s) or the like.
  • Said sequence may be tokenized. Tokenizing the sequence may split the sequence into a plurality of building blocks. Depending on a vocabulary associated with the structure of biological materials, different tokenizations may be available. The vocabulary may specify a plurality of building blocks occurring in biological sequences.
  • Chemical materials may be materials undergoing chemical reactions and/or being obtained by a chemical reaction.
  • a chemical material may comprise a chemical compound.
  • Chemical compounds comprise a plurality of atoms. Chemical compounds may be characterized by the type of the plurality of atoms of the chemical compound and one or more bonds between the plurality of atoms. Chemical material may be in particular a polymer. Polymers may be obtained by one or more chemical reaction(s) of a plurality of monomers. Polymers may be characterized by the type of monomers and the reaction conditions associated with obtaining the polymer.
  • the indication of a structure of a polymer 414 and the indication of a structure of a chemical material 412 can be tokenized differently according to two or more vocabularies associated with the structure of chemical materials, in particular polymers.
  • FIG. 5 illustrates an embodiment of obtaining a vocabulary.
  • Historical indications of structures of materials may be provided 502. Historical indications may refer to previously determined structures of materials, e.g. by using models and/or by performing measurements.
  • An initial set of building blocks forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be determined.
  • the initial set may be determined randomly.
  • the initial set may comprise a number of building blocks higher than a target number of building blocks associated with a target vocabulary.
  • the target of obtaining the vocabulary may be to reduce the number of building blocks of the vocabulary, i.e. decrease the vocabulary size. Large vocabularies may require more computing resources and/or time. Hence, decreasing the vocabulary size enables to improve the resource consumption of determining properties of materials.
  • the target number of building blocks for forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be provided 506.
  • a determination score per building block of the set of building blocks may be determined 508.
  • the determination scores may be associated with generating the building blocks of the set of building blocks.
  • the determination scores may be determined by a unigram model.
  • the unigram model may assume that the determination score of the building block may be calculated based on a number of occurancies of the building block within the historical indications. Other models for determining said determination scores may be available.
  • the determination score may be indicative of a probability that a building block may occur.
  • a loss score per building block of the set of building blocks may be determined 510.
  • the loss scores may be associated with modifying the determination scores by removing the building blocks from the set of building blocks.
  • the loss scores may be indicative of a probability to modify the determination scores by removing the building blocks associated with the loss and/or determination scores from the set of building blocks.
  • the loss scores may be calculated according to the following equation:
  • X may be the input sequence of the two or more building blocks
  • D may be a corpus of building blocks associated with the historical indications.
  • the set of building blocks may be filtered according to the loss scores associated with the set of building blocks 512. Hence, a part of the building blocks may be removed from the set of building blocks. The removed building blocks may be associated with loss scores within a predefined removal range. The building blocks within the set of building blocks after removal of a former part of the set may be associated with loss scores within a predefined target range. Hence, a predefined number of highest loss scores may be selected. It may be determined if the number of building blocks within the filtered set may be equal or smaller than the target number of building blocks 514. 508 - 514 may be repeated until the number of building blocks within the filtered set may be equal or smaller than the target number.
  • the so-obtained set of building blocks may be the vocabulary for identifying building blocks associated with the indication of the structure of the material, in particular the target material, ie for tokenizing the indication of the structure of the material, in particular the target material, preferably the structure of the material.
  • FIG. 6 illustrates an embodiment of an embedding layer.
  • An input embedding may be obtained by training for example an embedding model such as a continuous bag of words model (CBOW) or a skip-gram model.
  • the embedding layer may be configured to generate the numerical representation of the indication of the structure of the material, in particular the target material.
  • Generating the numerical representation of the indication of the structure of the material, in particular the target material may refer to embedding the numerical representation of the indication of the structure of the material, in particular the target material.
  • Embedding the indication of the structure of the material, in particular the target material may result in a numerical representation associated with the indication of the structure of the material, in particular the target material.
  • the indication of the structure of the material, in particular the target material may comprise one or more building blocks.
  • the one or more building blocks may be represented by the input vector 606.
  • the numerical representation of the indication of the structure of the material, in particular the target material 614 and/or the input vector 606 may be machine- readable and/or processable by a processor.
  • the numerical representation of the indication of the structure of the material, in particular the target material 614 and/or the input vector 606 may be a tensor, in particular a first-rank tensor per building block.
  • the input vector 606 may be a one-hot vector per building block or a matrix comprising one one-hot vector per building block.
  • a one-hot vector may be a vector with one entry unequal to zero. Examples for one-hot vectors may be 606, and 618.
  • the entries unequal to zero in the one-hot vector may be indicative of the building block.
  • a lookup table may define the relation between the position of the entries unequal to zero and the building block indicated by the one-hot vector.
  • the lookup table may specify a plurality of different building blocks, preferably the building blocks comprised in the vocabulary.
  • the number of different building blocks may be equal to the number of entries in the one-hot vector.
  • the number of different building blocks may be referred to as vocabulary size.
  • the building blocks may be represented by and/or may be building blocks of the material.
  • a sequence associated with indication of the structure of the material, in particular the target material may be represented by a plurality of building blocks.
  • a building block may represent one or more amino acids, one or more nucleatides, one or more monomers, one or more functional groups, one or more nucleobases or the like.
  • this tokenization of building blocks reduces sequence lengths. For example, where a protein comprises 200 amino acids, using a suited tokenization and hence vocabulary may allow to reduce the size of the sequence associated with said protein to a length of 20 building blocks. Followingly, it may be easier to determine a relation between the 20 building blocks and attend to the relevant building blocks as compared to a sequence of 200 building blocks.
  • the numerical representations of the indication of the structure of the material, in particular the target material 614 may represent one or more building blocks accurately and lead to accurate results based on processing the numerical representations of the indication of the structure of the material, in particular the target material 614.
  • the embedding layer may comprise a number of neurons equal to the number of entries in the numerical representation of the indication of the structure of the material, in particular the target material 614.
  • the output layer may generate the output vector 616.
  • the output vector may be a vector and/or may indicate the one or more building blocks of the indication of the structure of the material, in particular the target material.
  • the output vector 616 may indicate the one-hot vector associated with the input vector 606.
  • the output layer may comprise a number of neurons equal to the number of entries of the input vector 606 and/or the output vector 616.
  • the output layer may apply a softmax function to the numerical representations of the indication of the structure of the material, in particular the target material 614.
  • the output vector may comprise the probabilities, i.e. confidence scores, associated with the building blocks associated with the entries of the output vector 616 unequal to zero.
  • the building block associated with vector 618 may correspond to the input vector with a probability of 71 %.
  • Additional or alternative building blocks may correspond to the input vector as indicated by the output vector with lower probability.
  • the model of FIG. 6 may be continuous bag of words (CBOW) model.
  • the CBOW model may be trained based on a training data set comprising a plurality of input vectors and corresponding output vectors. As the training data set may not be labeled, the training of the CBOW model may be referred to as self-supervised.
  • the CBOW model Before training of the CBOW model, the CBOW model may be initialized with random values assigned to the weights of the neurons.
  • the input vectors may be passed through the initialized embedding layer and the output layer and a loss may be determined by comparing the output vector obtained by passing the input vector 606 through the model to the output vector corresponding to the input vector 606 as specified by the training data set.
  • backpropagation may be applied to determine the gradients associated with the neurons of the embedding layer 602 and the decoding layer 604 to lower the loss.
  • the weights of the neurons may be updated by using a gradient descent algorithm.
  • the training may be terminated and a trained CBOW model may be obtained.
  • the embedding layer 602 may be suitable for embedding input data comprising one or more building blocks.
  • This embedding layer 602 may be used in other machine-learning architectures requiring an embedding layer 602 such as the architectures as described within the context of FIG. 8A - FIG. 8C.
  • a trained embedding layer 602 may be required.
  • a model such as a CBOW model may be trained prior to training the models of FIG. 8A - FIG. 8C.
  • the embedding layer 602 may be trained together with the representation data-driven model 8-148.
  • FIG. 7A illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
  • An indication of a structure of a material 704 may be provided and/or received.
  • the indication of a structure of a material 704 may comprise a sequence of building blocks of the material.
  • the indication of a structure of a material 704 may be split into two or more building blocks, i.e. the indication of a structure of a material 704 may be tokenized.
  • a tokenized indication of the structure of the material, in particular the target material 702 may be obtained from the indication of a structure of a material 704.
  • the tokenized indication of the structure of the material, in particular the target material 702 may comprise the two or more building blocks.
  • the tokenized indication of the structure of the material, in particular the target material 702 may be mapped to a numerical representation of the indication of the structure of the material, in particular the target material 706.
  • the numerical representation of the indication of the structure of the material, in particular the target material 706 may comprise of a plurality of vectors or a matrix. The number of vectors and/or the number of rowas and/or columns may be equal to the number of building blocks, i.e. building blocks.
  • the numerical representation of the indication of the structure of the material, in particular the target material 706 may be reduced by providing the numerical representation of the indication of the structure of the material, in particular the target material 706 to one or more embedding layer(s) e.g. as described in the context of FIG. 6.
  • the reduced numerical representation of the indication of the structure of the material, in particular the target material 708 may be associated with a different size than the numerical representation of the indication of the structure of the material, in particular the target material 706.
  • the numerical representation of the indication of the structure of the material, in particular the target material 706 may be associated with at least one of a higher number of rows, columns or a combination thereof than the reduced numerical representation of the indication of the structure of the material, in particular the target material 708.
  • the reduced numerical representation of the indication of the structure of the material, in particular the target material 708 may be an example of a numerical representation of the indication of the structure of the material, in particular the target material, in particular of the structure of the material.
  • the context embedding 814 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148. Hence, the parameters associated with the context embedding 814 may be updated together with updating the parameters associated with the representation data-driven model 8-148.
  • FIG. 7B illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
  • numerical representations of a position of and/or a relation between the two or more building blocks are shown.
  • the numerical representation of the position of the two or more building blocks may be a numerical representation of the absolute position of the two or more building blocks.
  • the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks in relation to at least one of the two or more building blocks.
  • the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks in relation to a masked token, i.e. a building block to be determined by the representation data-driven model.
  • An indication of a structure of a material 704 may be provided and/or received.
  • the indication of a structure of a material 704 may comprise a sequence of building blocks of the material.
  • the indication of a structure of a material 704 may be split into two or more building blocks, i.e. the indication of a structure of a material 704 may be tokenized.
  • a tokenized indication of the structure of the material, in particular the target material 702 may be obtained from the indication of a structure of a material 704.
  • the tokenized indication of the structure of the material, in particular the target material 702 may comprise the two or more building blocks.
  • the tokenized indication of the structure of the material, in particular the target material 702 may be mapped to a numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738.
  • the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 may be reduced by providing the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 to one or more embedding layer(s) e.g. as described in the context of FIG. 6.
  • the reduced numerical representation of the position of and/or the relation between the building blocks 746 may be associated with a different size than the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738.
  • the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 may be associated with at least one of a higher number of rows, columns or a combination thereof than the reduced numerical representation of the position of and/or the relation between the building blocks 746.
  • the reduced numerical representation of the position of and/or the relation between the building blocks 746 may be an example of a numerical representation of the indication of the structure of the material, in particular the target material, in particular of the structure of the material. Reducing the representation may allow for a more efficient processing of the indication of the structure of the material, in particular the target material.
  • properties of materials can be determined by deploying a decreased amount of resources such as time and/or computational resources.
  • position embedding 806 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148. Hence, the parameters associated with the position embedding 806 may be updated together with updating the parameters associated with the representation data-driven model 8-148.
  • FIG. 8A illustrates an embodiment of a representation data-driven model 8-148.
  • the building blocks may be identifed by a tokenizer.
  • the tokenizer may be configured to tokenize the indication of the structure of the material, in particular the target material, ie split the indication of the structure of the material, in particular the target material into two or more building blocks.
  • the two or more building blocks may be embedded via a context embedding 814, ie the two or more building blocks may be mapped to a numerical representation of the two or more building blocks.
  • the context or the meaning of the building blocks may be represented via numerical representation.
  • one or more embedding layers as described in the context of FIG. 6 may be deployed.
  • the two or more building blocks may be embedded via a position embedding 806, in particular a relative position embedding 806 as described in the context of FIG. 7B.
  • the two or more building blocks may be mapped to a numerical representation of a relation between the two or more building blocks, in particular a numerical representation of a position of the two or more building blocks within the sequence of building blocks associated with the indication of the structure of the material, in particular the target material.
  • context embedding 814 and/or position embedding 806 one or more embedding layers e.g. as described within the context of FIG. 6 per embedding operation may be deployed.
  • the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks may be mapped to a numerical representation of a predefined size related to the numerical representation of the two or more building blocks. This may be referred to as padding.
  • Data-driven model(s) may require data input of a predefined size.
  • padding may allow for processing of input data of irregular size by the data-driven model.
  • Padding may include concatenating a numerical representation independent of the input data with the numerical representation of the two or more building blocks to generate the numerical representation of predefined size related to the numerical representation of the two or more building blocks.
  • the numerical representation independent of the input data may be indicative of a zero.
  • mapping the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks to a numerical representation of a predefined size related to the numerical representation of the two or more building blocks may comprise eliminating at least a part of the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks. This may be referred to truncating.
  • the at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148.
  • the representation data-driven model 8-148 may process the at least two numerical representations by multi-head self-attention 808, at least one layer normalization 810 and 816, at least one feed-forward layer 812 and/or at least one softmax layer 832.
  • the embedded input data i.e. the numerical representations of the two or more building blocks and the relation between the two or more building blocks may be processed by the representation data- driven model 8-148.
  • the embedded input data may be provided to the layer normalization 810 by a residual connection.
  • the multi-head self-attention 808 may apply a filter obtained from the at least two numerical representations to the at least two numerical representations.
  • Multi-head self-attention 858 may be applied to the numerical representations separately.
  • Multi-head selfattention 858 may comprise the two components multi-head and self-attention.
  • Self-attention may be understood as being a filter applied to the embedded input data.
  • the filter By applying the filter to the embedded input data, the building blocks associated with the embedded input data contributing to the to be generated output data may be identified for generating the output data.
  • the filter may represent the degree of contributing to the to be generated output data by the building blocks associated with the embedded input data. Applying the filter may be referred to as weighting the building blocks associated with the embedded input data. This is advantageous specifically regarding long sequences of building blocks.
  • the filter may be learned and improved during the training by learning to identify the contribution of building blocks associated with the embedded input data.
  • the self-attention may focus the representation data-driven model 8-148 to attend to specific parts of the input data.
  • Self-attention may refer to attention generated based on the input data.
  • the filter may be determined based on the input data, preferably the embedded input data.
  • the embedded input data may serve as query Q, key K and value V with respect to the self-attention operation.
  • the self-attention may refer to attention based on the received input data.
  • the filter associated with the multi-head self-attention may be calculated according to the following equation: where dfc may correspond to the dimension of the key and A may be a numerical representation obtained based on a cross product of the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks.
  • A may be obtained by multiplying the numerical representation of the two or more building blocks with the numerical representation of the relation between the two or more building blocks and/or multiplying the numerical representation of the two or more building blocks with the numerical representation of the two or more building blocks.
  • A may be obtained according to the following formula: where Q constitues the query, K constitutes the key, d may be the dimension of the numerical representation provided by the one or more embedding layer(s) and/or the dimension of the numerical representation provided to the data-driven model(s).
  • Multi-head self-attention 858 may comprise applying the filter to two or more building blocks of the embedded input data.
  • the result of the two or more head may be concatenated according to the following equation: jj xrf and h may refer to the number of heads.
  • the embedded input data may be transformed via the multi-head self-attention 858 into a context tensor.
  • the context tensor may represent the sequence of building blocks and the relation between two or more building blocks of the input data.
  • the context tensor may be a numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the context tensor may be a second rank tensor.
  • layer normalization 810 may be applied based on the context tensor and/or the embedded input data from the residual connection. Applying layer normalization 810 may refer to normalizing the context tensor. Normalizing the context tensor may lower the values of the entries of the context tensor. This reduces the computational cost associated with processing the context tensor. Further, it improves the training by contributing the loss to converge and preventing instabilities.
  • Layer normalization 810 may be followed by passing the context tensor to a feed-forward layer 812 again followed by layer normalization 816 based on the residual connection to the context tensor and/or the output of the feed-forward layer 812.
  • the feed-forward layer 812 may be a feed-forward neural network.
  • the feed- forward neural network may comprise of a plurality of fully connected neurons. Passing the context tensor through the feed-forward neural network may result in transforming the context tensor linearly. Additionally or alternatively, the neural network may comprise one or more activation functions such as a rectified linear unit (ReLU).
  • ReLU rectified linear unit
  • the neural network may be configured for performing one or more non-linear operations to the context tensor and/or transforming the context tensor non-linearly.
  • the context tensor may be provided to one or more further layers configured to apply multi-head self-attention 808, layer normalization 810 and 816 or one or more further feed-forward layers 812.
  • Having passed the context tensor through the feed-forward layer 812 may adapt the context tensor for the processing by a further attention layer of the one or more further blocks 8-114 for applying a self-attention filter, preferably multi-head self-attention 858.
  • the context vector after being transformed by the layer normalization 816 and the feed-forward layer 812 may be referred to as hidden state.
  • the obtained numerical representation may be the modified representation referenced in the context of FIG. 2.
  • absolute position embedding 8-140 may be added. This may comprise adding a positional factor indicative of a position of at least one building block within the sequence associated with the indication of the structure of the material, in particular the target material.
  • the positional factor may in particular be indicative of an absolute position of at least one building block within the sequence associated with the indication of the structure of the material, in particular the target material.
  • An example of the absolute position may be seen in FIG. 7B.
  • a numerical representation of the two or more building blocks, a relation between the two or more building blocks and an indication of the absolute position of the two or more building blocks within the sequence associated with the indication of the structure of the material, in particular the target material may be obtained.
  • This numerical representation may be provided to a softmax layer 832.
  • the softmax layer 832 may be configured to apply the softmax function to one or more entries of the provided numerical representation.
  • the so-obtained numerical representation of the two or more building blocks, a relation between the two or more building blocks and an indication of the absolute position of the two or more building blocks within the sequence associated with the indication of the structure of the material, in particular the target material may be provided to determine a property associated with the material, e.g. to a property data-driven model.
  • the property data-driven model may be trained based on historical numerical representations of the structure of materials and corresponding properties.
  • the property data-driven model may be configured to provide properties of materials in response to receiving numerical representations of the structure of materials.
  • the property data-driven model may be a classification model.
  • the property data-driven model may comprise one or more layer(s).
  • the one or more layer(s) may be configured to map numerical representations of structures of materials to properties of said materials.
  • the multi-head self-attention 808 may be masked multi-head self-attention.
  • Masked multihead self-attention 864 corresponds to the multi-head self-attention 858 as described above with additionally masking a part of the embedded input data associated with building blocks later in the sequence than the building block to be generated. Additionally or alternatively, the part of the input data associated with building blocks later in the sequence than the building block to be generated may not be received and/or transformed into the embedded input data.
  • the representation data-driven model 8-148 may be configured to generate a subsequent building block to a sequence.
  • FIG. 8B illustrates an embodiment of a representation data-driven model 8-148.
  • the representation data-driven model 8-148 may comprise a recurrent neural network. Tokenization 8-152, context embedding 814, position embedding 806 and the property determination 804 may be analogous to the description of FIG. 8A.
  • the at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148.
  • the representation data-driven model 8-148 may process the at least two numerical representations by one or more hidden layer(s) 818.
  • the at least two numerical representations may be provided in parts at different points in time, i.e. time steps.
  • a first part of the at least two numerical representations, in particular a first element per numerical representation may be provided to the representation data-driven model 8-148 at a first time step.
  • a second part of the at least two numerical representations, in particular a second element per numerical representation may be provided to the representation data-driven model 8-148 at a second time step.
  • the one or more hidden layer(s) 818 may be provided by output of the one or more hidden layer(s) from a previous mapping 8-258, in particular output from the first time step at the second time step.
  • a third part may be provided together with output from the one or more hidden layer(s) 818 obtained by providing the second part to the one or more hidden layer(s) 818.
  • the one or more hidden layer(s) may be configured to map the at least two numerical representations to at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the numerical representation of the two or more building blocks and the relation between the two or more building blocks may be provided to the property determination 804, in particular the property data-driven model.
  • FIG. 8C illustrates an embodiment of a representation data-driven model 8-148.
  • the representation data-driven model 8-148 may comprise a convolutional neural network. Tokenization 8-152, context embedding 814, position embedding 806 and the property determination 804 may be analogous to the description of FIG. 8A.
  • the at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148.
  • the representation data-driven model 8-148 may process the at least two numerical representations by one or more convolutional layer(s) 822, one or more pooling layer(s) 824, one or more fully-connected layer(s) 826 to a numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the one or more convolutional layer(s) 822 may be configured to change a format associated with the at least two numerical representations and may be configured to combine the at least two numerical representations into one numerical representation, e.g. by concatentating the at least two numerical representations.
  • the one or more pooling layer(s) 824 may be configured to change a dimensionaltiy associated with the at least two numerical representations.
  • the one or more fully-connected layer(s) 826 may be configured to modify the numerical values associated with the output from the one or more pooling layer(s) 824 and/or one or more convolutional layer(s) 822.
  • Processing the at least two numerical representations by the one or more convolutional layer(s) 822, the one or more pooling layer(s) 824 and/or the one or more fully-connected layer(s) 826 may comprise mapping the at least two numerical representations to one numerical representation of the two or more building blocks and the relation between the two or more building blocks and modifying the numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the numerical representation of the two or more building blocks and the relation between the two or more building blocks may be provided to the property determination 804, in particular the property data-driven model.
  • FIG. 9A illustrates an embodiment of training and/or deploying the representation data-driven model 902.
  • the representation data-driven model 902 may be associated with an architecture as described within the context of FIG. 8A- FIG. 8C.
  • the output data generated by the representation data-driven model 902 may comprise of a numerical representation of one or more building blocks, in particular corresponding to the building blocks of the sequence provided.
  • the representation data-driven model 902 may be provided with a numerical representation of the indication of the structure of the material, in particular the target material.
  • the numerical representation of the indication of the structure of the material, in particular the target material may comprise N token(s), i.e. building blocks.
  • the representation data-driven model 902 may be trained to generate building blocks corresponding to the numerical representation of sequence of building blocks associated with the indication of the structure of the material, in particular the target material 924, i.e. generate one of the building blocks 1-N.
  • the representation data-driven model 8-148 may combine the numerical representations of the two or more building blocks and the relation between the two or more building blocks to at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks as described above.
  • the representation data-driven model 8-148 may be trained to generate and/or provide the modified numerical representation of the two or more building blocks and the relation between the two or more building blocks.
  • the numerical representation of the two or more building blocks and the relation between the two or more building blocks may be combined with a numerical representation of the position of the two or more building blocks, e.g. as provided by an absolute position embedding 934.
  • the two or more building blocks may be part of one or more sequence(s), in particular associated with the indication of the structure of the material. Hence, the two or more building blocks may be associated with a position within the one or more sequences.
  • the numerical representation of the position may be indicative of the position of the two or more building blocks within the indication of the structure of the material, in particular the target material, in particular within the structure of the material.
  • the numerical representation of the position of the two or more building blocks may be a representation of an absolute position of the two or more building blocks, i.e.
  • the numerical representation of the position of the two or more building blocks and the modified numerical representation may be combined.
  • the numerical values of the numerical representation of the position of the two or more building blocks may be added to the numerical values of the modified numerical representation.
  • one numerical representation of the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks may be obtained.
  • the to be generated building block may be a masked building block.
  • the masked building block may indicate that the building block at the position of the masked building block may be generated corresponding to the sequence of 1-N building blocks.
  • the masked building block may be different from the building blocks used for representing the structure of materials.
  • the masked building block may be independent of a building block of a structure of a material.
  • Training the representation data-driven model 902 may comprise providing, in particular historical, numerical representations of sequences of building blocks to the representation data-driven model 916.
  • The, in particular historical, numerical representations of sequences of building blocks may comprise one or more masked token(s) per sequence.
  • the representation data-driven model 916 may generate a building block corresponding to the sequences, i.e. propose a building block K, wherein K may be within a range between 1 and N.
  • the proposed building blocks associated with the one or more masked building block(s) may be compared to target building blocks.
  • the target building blocks may be specified in a training data set.
  • the training data set may comprise the in particular historical, numerical representations of sequences of building blocks.
  • the representation data-driven model 902 For training the representation data-driven model 902, at least a part of the building blocks associated with the in particular historical, numerical representations of sequences of building blocks may be exchanged by masked building blocks.
  • the building blocks specified by the numerical representation of the indication of the structure of the material, in particular the target material before the exchange at the position of the masked building blocks may be target building blocks.
  • parameters associated with the representation data-driven model 902 may be updated to decrease a deviation of the proposed building blocks from the target building blocks.
  • the representation data-driven model 902 may be trained to generate building blocks at different positions of a sequence.
  • the representation data-driven model 902 may be trained to determine the best suited building blocks for masked building blocks based on received sequences comprising the masked building blocks.
  • the context embedding 814 and/or the position embedding 806 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148.
  • the parameters associated with the context embedding 814 and/or position embedding 806 may be updated together with updating the parameters associated with the representation data-driven model 8- 148.
  • FIG. 9B illustrates an embodiment of training and/or deploying the representation data-driven model 916.
  • the representation data-driven model 916 may be associated with an architecture as described within the context of FIG. 8A.
  • the output data generated by the representation data-driven model 902 may comprise of a numerical representation of one or more building blocks, in particular corresponding to the building blocks of the sequence provided.
  • the representation data-driven model 916 may be provided with a numerical representation of the indication of the structure of the material, in particular the target material.
  • the numerical representation of the indication of the structure of the material, in particular the target material may comprise N building block(s), i.e. building blocks.
  • the representation data-driven model 916 may be configured to generate building blocks corresponding to the numerical representation of sequence of building blocks associated with the indication of the structure of the material, in particular the target material 924, i.e. generate a further building block N+1 to the sequence of building blocks 1-N.
  • the representation data-driven model 916 may be trained to generate a following building block N+1 to the building block N.
  • Said representation data-driven model 916 may be associated with an encoder-decoder or decoder architecture. Training the representation data-driven model 916 may comprise providing, in particular historical, numerical representations of sequences of building blocks to the representation data-driven model 916.
  • the representation data-driven model 916 may generate further building blocks to the sequences, i.e. propose further building blocks.
  • the proposed and/or further building blocks generated by the representation data-driven model 916 may be compared to target building blocks.
  • the target building blocks may be specified in a training data set.
  • the training data set may comprise the in particular historical, numerical representations of sequences of building blocks and corresponding target building blocks.
  • the in particular historical, numerical representations may represent fractions of sequences of building blocks and the target building blocks may be the building blocks following the provided fractions of sequences.
  • a further building block N+2 may be generated by the representation data-driven model 916 upon receiving a sequence of N+1 building blocks, wherein the N+1 -th building block may be generated by the representation data-driven model 916.
  • parameters associated with the representation data-driven model 916 may be updated to decrease a deviation of the proposed building blocks from the target building blocks. By doing so, the representation data-driven model 916 may be trained to generate following building blocks of a sequence.
  • FIG. 10 illustrates an embodiment of a vocabulary length and an accuracy of determining properties of materials according to the methods as described herein.
  • FIG. 10 may show the number of component associated with one building block. It can be seen that the majority of building blocks, i.e. building blocks may be associated with several component.
  • the properties of materials in particular biological materials characterized by a sequence of amino acids provides an improved accuracy compared with other models available in the state of the art. Said other models are associated with a higher number of parameters. Fol lowingly, the model as presented herein may be at least as accurate as models with the higher number of parameters using less resources during inference.
  • ..determining also includes ..initiating or causing to determine
  • generating also includes ..initiating and/or causing to generate
  • provisioning also includes “initiating or causing to determine, generate, select, send and/or receive”.
  • “Initiating or causing to perform an action” includes any processing signal that triggers a computing node or device to perform the respective action.
  • indefinite article “a” or “an” and the definite article “the” does not exclude a plurality.
  • indefinite article “a” or “an” may be replaced with one or more and the definite article “the” may be replaced with the one or more.
  • a single building block or other unit may fulfill the functions of several entities or items recited in the claims.
  • the mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.
  • Providing in the scope of this disclosure may include any interface configured to provide data. This may include an application programming interface, a human-machine interface such as a display and/or a software module interface. Providing may include communication of data or submission of data to the interface, in particular display to a user or use of the data by the receiving entity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Data Mining & Analysis (AREA)
  • Analytical Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a resource and time efficient development of new materials by determining target properties of materials reliably and scalable and provides methods for determining target properties of materials, apparatuses for determining target properties of materials, use of an indication of a target property of a target material, use of a numerical representation of a structure of the target material and the position of two or more building blocks associated with the structure of the target material, numerical representation of two or more building blocks associated with the target material and the position of two or more building blocks associated with the structure of the target material.

Description

240437W001
DETERMINING PROPERTIES OF MATERIALS
TECHNICAL FIELD
The disclosure relates to a resource and time efficient development of new materials by determining target properties of materials reliably and scalable and provides methods for determining target properties of materials , apparatuses for determining target properties of materials, use of an indication of a target property of a target material, use of a numerical representation of a structure of the target material and the position of two or more building blocks associated with the structure of the target material, numerical representation of two or more building blocks associated with the target material and the position of two or more building blocks associated with the structure of the target material.
TECHNICAL BACKGROUND
Properties of materials need to be tailored well to the application areas of the materials. The properties of materials correlate strongly with the 3-dimensional structure of the materials. Nevertheless, the relation between properties of materials and the corresponding structure may be unforeseeable in some regards. Hence, experiments need to be conducted to determine properties of materials reliably. This requires a lot of resources for conducting said experiments. Followi ngly, it is desired to reduce the resources for determining properties of materials.
SUMMARY
Any disclosure, embodiments and examples described herein relate to the methods, the systems, apparatuses, chemical products and computer elements lined out above and below. Advantageously, the benefits provided by any of the embodiments and examples equally apply to all other embodiments and examples.
In an aspect, it relates to a, in particular computer-implemented, method for determining a target property of a target material, the method comprising:
Providing and/or obtaining, in particular receiving, a numerical representation of two or more building blocks associated with the target material,
Providing and/or obtaining, in particular receiving, a numerical representation of a relation between the two or more building blocks, generating a numerical representation associated with the two or more building blocks and the relation between the two or more building blocks from the at least two numerical representations by one or more data-driven models, wherein the one or more data-driven model(s) are trained based on historical indications of one or more structure(s) of one or more material(s) and corresponding indications of properties of the materials, mapping the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks to an indication of the target property of the target material by one or more data-driven models, providing the indication of the target property of the target material.
In another aspect, it relates to a, in particular computer-implemented, method for obtaining one or more data- driven model(s) for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, providing and/or obtaining, in particular receiving, a plurality of indications of the structure of the material(s), in particular digital representations of the structure of the material(s), providing and/or obtaining, in particular receiving, at least one numerical representation of a position of the two or more building blocks within a structure of the materials, providing the indications of the structure of the material(s), the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s) to relate the indications of the structure of the material(s) to the numerical representations, optionally providing the one or more data-driven model(s) for providing numerical representations to determine the target property from the numerical representations.
In another aspect, it relates to a method for monitoring and/or controlling producing and/or processing of a target material, the method comprising:
Providing and/or obtaining, in particular receiving, a numerical representation of a structure of the target material associated with two or more building blocks, preferably via an interface,
Processing, preferably via a processor, the numerical representation by one or more data-driven model(s), wherein the one or more data-driven model(s) are obtained as described herein,
Mapping, preferably via the processor, the numerical representation associated with the two or more building blocks to an indication of the target property of the target material at least partially by the one or more data-driven models,
Providing, preferably via the interface, the indication of the target property of the target material for monitoring and/or controlling production and/or processing of the target material.
In another aspect, it relates to method for obtaining one or more data-driven model(s) for monitoring and/or controlling producing and/or processing of a target material, the method comprising: Providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, preferably via an interface, providing and/or obtaining, in particular receiving, a plurality of indications of the structure of the material(s), in particular digital representations of the structure of the material(s), preferably via the interface, providing and/or obtaining, in particular receiving, at least one numerical representation of a position of the two or more building blocks within a structure of the materials, preferably via the interface, providing the indications of the structure of the material(s), the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s) to relate the indications of the structure of the material(s) to the numerical representations, preferably via a processor, optionally providing the one or more data-driven model(s) for providing numerical representations to determine the target property from the numerical representations, preferably via the interface.
In another aspect, it relates to a, in particular computer-implemented, method for determining a target property of a target chemical compound, the method comprising: providing and/or obtaining, in particular receiving, an indication of a structure of the chemical compound, identifying two or more building blocks associated with the indication of the structure of the target chemical compound according to a vocabulary indicative of a plurality of building blocks associated with structures of chemical compounds, wherein at least one building block comprises at least two components, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more data-driven model(s), wherein the one or more data-driven model(s) are trained based on historical indications of one or more structure(s) of one or more material(s) and corresponding indications of properties of the materials, mapping the numerical representation of the two or more building blocks to an indication of the target property by the one or more data-driven model(s), providing the indication of the target property of the target material.
In another aspect, it relates to a, in particular computer-implemented, method for obtaining one or more data- driven model(s) for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, providing and/or obtaining, in particular receiving, at least one historical numerical representation of a position of the two or more building blocks within a structure of the materials, providing the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s), optionally providing the one or more data-driven model(s).
In another aspect, it relates to a, in particular computer-implemented, method for determining a target property of a target material, the method comprising: providing and/or obtaining, in particular receiving, a numerical representation of a structure of the target material associated with two or more building blocks, processing the numerical representation by one or more data-driven model(s), wherein the one or more data-driven model(s) are obtained by claim 1 , providing a numerical representation of a position of the two or more building blocks within a structure of the target material to the one or more data-driven model(s), generating a numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s), mapping the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material to an indication of the target property of the target material at least partially by the one or more data-driven models, providing the indication of the target property of the target material.
In another aspect, it relates to an apparatus for determining a target property of a target material and/or for monitoring and/or controlling producing and/or processing of a target material, the apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform any one of the methods as described herein.
In another aspect, it relates to use of an indication of a target property of a target material as obtained by any one of the methods as described herein for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material.
In another aspect, it relates to use of a numerical representation of a position of two or more building blocks associated with the structure of one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material as described herein.
In another aspect, it relates to a numerical representation of two or more building blocks associated with the one or more material(s) and a position of two or more building blocks associated with the structure of the one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material and/or monitoring and/or controlling producing and/or processing of a target material as described herein.
EMBODIMENTS
In the following, terminology as used herein and/or the technical field of the present disclosure will be outlined by ways of definitions and/or examples. Where examples are given, it is to be understood that the present disclosure is not limited to said examples.
These and other objects, which become apparent upon reading the following description, are solved by the subject matters of the independent claims. The dependent claims refer to embodiments of the invention.
Properties of materials need to be tailored to their intended application. Otherwise, resulting products might be of lower quality and/or unusable. Additionally or alternatively, materials with slightly deviating properties from target properties may be non-processable by envisaged processing processes, e.g. due to a differing viscosity. This would constitute a waste of material. Hence, controlling and tailoring properties of materials is of high technical relevance. The properties of materials highly depend on the structure of the material. Even small modifications in the structure may lead to completely different properties of the material associated with modified structure. Said dependency may be non-obvious, in particular in relation to small modifications changing properties significantly. In particular, a modification of a building block of the material may completely change the 3-dimensional structure of the material. Followingly, developing new materials requires a lot of research efforts including large amounts of experiments to be conducted. This uses a lot of material for experimentation and time. Therefore, it is desired to reduce time and resources for experimentation for determining properties of materials.
As pointed out above, the structure of the material determines the properties of the material e.g. via determining the 3-dimensional structure and thus interaction within the material and towards other materials. This structure is determined via the type of the building blocks and the orientation of the building blocks associated with the material. Thus, the orientation and the type of building blocks associated with the material need to be taken into account for reliably determining properties of the material. Processing a numerical representation of the structure of the target material and generating a numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s) from the numerical representation of the two or more building blocks and a numerical representation of a position of the two or more building blocks allows to take the position of the building blocks into account just before providing the output data. This focusses the attention of the data-driven model towards the relation between the orientation and the type of the building block just before providing the output data. Attention of the data-driven model may focus on the type of the building blocks during processing of the received input data. Especially complex structures are highly determined on the orientation of the building blocks. Fol lowingly, taking the position of the building blocks into account after processing the numerical representation of the structure of the material refocused the attention of the data-driven model. Such processing by the data-driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
In an embodiment, data-driven model may refer to a model suitable for describing one or more non-linear relations between input data and output data. Input data may refer to data to be provided to the data-driven model and/or to data being received by the data-driven model. Output data may be data to be received from the data-driven model and/or to be provided by the data-driven model. Hence, the data-driven model may determine the output data based on transforming the input data via one or more non-linear relations. The one or more data-driven model(s) may comprise a representation data-driven model and/or a property data-driven model. Further, the one or more data-driven model(s) may comprise one or more embedding layer(s) and/or may be associated with the one or more embedding layer(s). The one or more data-driven model(s) may comprise a representation data-driven model, and wherein the representation data-driven model may be parametrized and/or trained based on historical indications of structures of materials and/or historical building blocks of materials.
In an embodiment, property, in particular target property, may refer to an environmental property, to a physical property, to a chemical property or a combination thereof. In an embodiment, environmental property may comprise at least one of emission data of the material, recyclate content of the material, bio-based content of the material, renewable content of the material, material declaration data, material safety data or a combination thereof. Target property may be at least one property selected from the plurality of properties of the target material.
In an embodiment, emission data may comprise any data related to environmental footprint. The environmental footprint may refer to an entity and its associated environmental footprint. The environmental footprint may be entity specific. For instance, the environmental footprint may relate to a material, a company, a process such as a manufacturing process, a raw material or basic substance, a material, a component, a component assembly, an end product, combinations thereof or additional entity-specific relations. Emission data may include data relating to carbon footprint of a material. Emission data may include data relating to greenhouse gas emissions e.g. released in production of the material. Emission data may include data related to greenhouse gas emissions. Greenhouse gas emissions may include emissions such as carbon dioxide (CO2) emission, methane (CH4) emission, nitrous oxide (N2O) emission, hydrofluorocarbons (HFCs) emission, perfluorocarbons (PFCs) emission, sulphurhexafluoride (SF6) emission, nitrogen trifluoride (NF3) emission, combinations thereof and additional emissions. Emission data may include data related to greenhouse gas emissions of an entities or companies own operations (production, power plants and waste incineration). Scope 2 comprise emissions from energy production which is sourced externally. Scope 3 comprise all other emissions along the value chain. Specifically, this includes the greenhouse gas emissions of raw materials obtained from suppliers. Product Carbon Footprint (PCF) sum up greenhouse gas emissions and removals from the consecutive and interlinked process steps related to a particular product. Cradle-to- gate PCF sum up greenhouse gas emissions based on selected process steps: from the extraction of resources up to the factory gate where the product leaves the company. Such PCFs are called partial PCFs. In order to achieve such summation, each company providing any products must be able to provide the scope 1 and scope 2 contributions to the PCF for each of its products as accurately as possible, and obtain reliable and consistent data for the PCFs of purchased energy (scope 2) and their raw materials (scope 3).
Chemical property may be a property that can be established by changing the structure of the at least one material. Examples for chemical properties may be acidity, oxidation state or reactivity. Physical property may be one of the following: mechanical properties, electrical properties, optical properties, thermal properties or the like. For example, physical property may comprise one or more of the following density, scratch resistance, electrical conductivity, color, absorption, heat capacity or the like.
In an embodiment, material may be associated with one or more building blocks. Hence, the structure of the material may be represented by one or more building blocks. The material may be a biological and/or a chemical material. The chemical material may comprise one or more chemical compound(s). The material may be characterized by the structure of the material. The structure of the material may be associated with a spatial orientation of one or more building block(s). The building block(s) may be biological building block(s) and/or chemical building block(s). The building blocks may be associated with at least partially repetitive parts of the structure of the material. The building block may refer to an arrangement of one or more component(s). The material may be associated with a plurality of components. One building block may comprise one or more components. The building block and/or the ensemble of components of the building block may be associated with one property and/or one function.
In an embodiment, providing may include receiving. For example, providing a numerical representation and/or providing an indication and/or providing a structure may comprise receiving the numerical representation and/or receiving an indication and/or receiving a structure e.g. via a data providing interface such as a user interface.
In an embodiment, numerical representation may comprise one or more numerical values for representing the data represented by the numerical representation. The numerical representation may comprise a tensor, in particular a vector and/or a matrix. The vector may represent one building block. The matrix may represent two or more building blocks. The numerical representation may be obtained from the data represented by the numerical representation, preferably according to a relation between the data and the corresponding numerical representation. The relation between the data and the corresponding numerical representation may comprise a predefined relation and/or one or more embedding layer(s). The one or more embedding layer(s) may be configured to map data to numerical representations of the data. The one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points. The one or more embedding layer(s) may be comprised by the one or more data-driven model(s).
In an embodiment, providing the indication of the target property may include providing the value associated with the target property. The indication of the target property may comprise the values associated with the target property.
In an embodiment, indication of a target property, may be suitable for deriving and/or obtaining the target property. The indication of the target property may comprise the target property, in particular a numerical value associated with the target property.
In an embodiment, the indication of the structure of the material, in particular the target material may be associated with a structure of the material. Preferably, the structure of the material may be derived and/or obtained from the indication of the structure of the material, in particular the target material. The indication of the structure of the material, in particular the target material may comprise a structural representation of the material. The structural representation of the material may be a digital representation of the material. Hence, the indication of the structure of the material, in particular the target material, may be associated with one or more characterizing properties of the material, preferably independent of the one or more target properties. The indication of the structure of the material, in particular the target material, may comprise the structure of the material. The indication of the structure of the chemical compound may be an indication of the target material. Hence all embodiments applicable to the indication of the indication of the structure of the target material may equally apply to the indication of the structure of the chemical compound. The target material may be a target chemical compound. Hence, all embodiments applying to the target material may equally apply to the target chemical compound. Materials may comprise one or more chemical compound(s). Hence, all embodiments applying to the materials may equally apply to chemical compounds.
In an embodiment, the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks. The relative position of the two or more building blocks may refer to a position of at least one building blocks in relation to at least another of the two or more building blocks. The two or more building blocks may comprise at least one target building block. The relative position may refer to a position of the two or more building blocks in relation to at least one target building block. The target building block may be a masked building block. The masked building block may be a placeholder for a building block to be determined. The target building block may be a building block to be generated, e.g. by the one or more data-driven model(s), in particular the representation data-driven model. The building block may comprise one or more components. The material may comprise and/or may constitute of a plurality of components. Hence, the number of building blocks may be equal or below the number of components. By doing so, a direct relation between the building blocks can be represented. The relative position allows to determine one or more building blocks in relation to other building blocks associated with the material. Typically, materials have complex 3-dimensional structures being a result the exact orientation of the structure of the material. These structures are determined by the exact location of the building blocks associated with the material. Hence, taking the relative position into account provides a better representation of the material. Therefore, determining of the target properties is improved.
In an embodiment, the numerical representation of the two or more building blocks may be obtained by providing an indication of a structure of the material, in particular the target material, identifying two or more building blocks associated with the indication of the structure of the material, in particular the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, in particular structure of the target material, and mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s). The one or more embedding layer(s) may be configured to map data to numerical representations of the data. The one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points. The one or more embedding layer(s) may be comprised by the one or more data-driven model(s). The identified two or more building blocks may be generated by mapping the identified two or more building blocks to a numerical representation of the two or more building blocks separately or at once. Where the identified two or more building blocks may be mapped separately, the identified two or more building blocks may mapped one after one another. Hence, the two or more building blocks may be mapped to one or more numerical representation(s) per building block. The one or more numerical representation(s) per building block may be combined, i.e. concatenated, into one or more numerical representation of the two or more building blocks. By doing so, a structure of the material can be represented by a structured and machine-processable representation. This enables commonly available structures of material to be processed by data-driven models. Furthermore, the structure of the material can be represented with a fixed amount of building blocks. This allows for a better control of input size to the data- driven model while the number of building blocks is smaller than the number of possible structures of materials. The reason for this is that repetitions in the structure of a material can be represented by reoccurring building blocks. Hence, structures of materials can be represented in an efficient format. Thus, resources for determining the target properties of the material can be saved.
In an embodiment, the identified two or more building blocks may be mapped separately and/or together to a numerical representation of the two or more building blocks by the one or more embedding layer(s). This speeds up computing of the matrix operations by separately mapping the building blocks.
In an embodiment, the numerical representation of the two or more building blocks and optionally of a relation between the two or more building blocks may be a numerical representation of the structure of the target material. Any one of the methods may further comprise providing a numerical representation of a relation between the two or more building blocks. Processing the numerical representation by one or more data-driven model(s) may comprise generating the numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks by the one or more data-driven model(s). The numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the numerical representation of the two or more building blocks associated with the target material and the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks. The numerical representation of the structure of the material may be a numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks, and wherein the numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks may be obtained by generating a numerical representation associated with the two or more building blocks and the relation between the two or more building blocks from the numerical representation of the two or more building blocks and a numerical representation of the relation between the two or more building blocks by the one or more data-driven models. Taking the relation between the two or more building blocks into account enables the data-driven model to determine the target properties of the target material based on a relative orientation of the two or more building blocks. This allows to take the two components of the orientation of the two or more building blocks, ie the relative and the absolute position, into account for determining the target properties of the target material. Such processing by the data-driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
In an embodiment, the numerical representation of the two or more building blocks may be obtained by providing an indication of a structure of the material, obtaining a structure of the material from the indication of the structure of the material, in particular the target material, identifying two or more building blocks associated with the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s). The one or more embedding layer(s) may be configured to map data to numerical representations of the data. The one or more embedding layer(s) may be obtained based on a plurality of data points of data to be represented and a size of the target numerical representations of plurality of data points. The one or more embedding layer(s) may be comprised by the one or more data-driven model(s). Obtaining a structure of the material from the indication may comprise mapping the indication of the structure of the material, in particular the target material to the structure of the material. Obtaining a structure of the material from the indication of the structure of the material, in particular the target material may comprise retrieving the structure of the material based on the indication of the structure of the material, in particular the target material. Retrieving the structure of the material may comprise providing a query associated with and/or obtained from the indication of the structure of the material, in particular the target material and receiving the structure of the material in response to providing the query. For example, the query may be provided to a structured database and/or the query may be a structured query for retrieving the structure of the material. Additionally or alternatively, the query may comprise a numerical representation of the indication of the structure of the material, in particular the target material. In response to providing the numerical representation of the indication of the structure of the material, in particular the target material, a similarity score indicative of a distance between the numerical representation of the indication of the structure of the material, in particular the target material and numerical representations of structures of materials may be determined. The received structure of the material may be associated with the similarity being within a predefined similarity range. Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material. For example, users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials. Thus, structures of the target material can be retrieved error-free e.g. from a database. Hence, efficiency of determining target properties of the target material is improved.
In an embodiment, obtaining the structure of the material from the indication of the structure of the material, in particular the target material may comprise retrieving the structure of the material based on the indication of the structure of the material, in particular the target material.
In an embodiment, the structure of the material may be associated with a sequence of the two or more building blocks. The sequence of the two or more building blocks may be linear and/or non-linear. Where the structure of the material may be associated with non-linear sequences of building blocks, the indication of the structure of the material, in particular the target material, in particular the structure of the material, may be transformed into a linearized representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material. For example, SMILES constitute a linearized representation for possibly non-linear structures. The structure of the material may for example comprise a ring element. Transforming the structure of the material to e.g. a SMILES representation allows to represent non-linear structures in a linearized format. Said format can be processed by the one or more data-driven model(s). Hence, this enables to process non-linear structures of materials in a sequential manner. By doing so, data-driven models operating on sequences, i.e. generative models, can be used for determining target properties of the materials from the structure of the material.
In an embodiment, any one of the methods may further comprising reducing the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks by providing the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks to one or more embedding layer(s). The numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks may be obtained by mapping the two or more building blocks and/or the relation between the two or more building blocks to a numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and/or between a plurality of relations of building blocks and corresponding numerical representations.
Mapping the two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s) comprises mapping the identified two or more building blocks to a numerical representation of the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and reducing the numerical representation of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the two or more building block(s) to the one or more embedding layer(s). Mapping the relation between the two or more building blocks to a numerical representation of the relation between the two or more building blocks by one or more embedding layer(s) may comprise mapping the relation between the two or more building blocks to a numerical representation of the relation between the two or more building blocks according to a relation between relations of the two or more building blocks and corresponding numerical representations, and reducing the numerical representation of the relation between the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the relation between the two or more building blocks to the one or more embedding layer(s). The numerical representation of the two or more building blocks may be associated with a number of numerical values equal or higher than a number of the plurality of building blocks. The number of numerical values associated with the reduced numerical representation of the indication of the structure of the material, in particular the target material and/or of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the indication of the structure of the material, in particular the target material and/or the two or more building blocks. The number of numerical values associated with the numerical representation of the relation between the two or more building blocks may be equal or higher than a number of the plurality of building blocks. The number of numerical values associated with the reduced numerical representation of the relation between the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the relation between the two or more building blocks. By doing so, a compressed representation of the structure of the material may be processed. This uses less computational resources and/or allows to process the numerical representation faster. Hence, this contributes to a resource-efficient determination of the one or more target properties associated with the target material.
In an embodiment, mapping the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks to an indication of the target property of the target material by one or more data-driven model(s) may comprise mapping the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks to a numerical representation of the indication of the target property by a property data-driven model, and, mapping the numerical representation of the indication of the target property according to a relation between numerical representations of data and the corresponding data. The property data-driven model may be configured to map numerical representations of structures of materials to numerical representations of indications of target properties of the material. The property data-driven model may be parametrized and/or trained according to numerical representations of historical indications of one or more structure(s) of one or more material(s) and corresponding indications of properties of the materials. The numerical representations of the historical indication of one or more structures may comprise numerical representations of historical building blocks associated with materials. The numerical representations of the historical indication of one or more structures may be obtained by providing historical structures of materials, historical indications of the structure of the material, in particular the target material and/or historical two or more building blocks to the representation data-driven model. In particular, the representation data-driven model may be trained and/or parametrized prior to training and/or parametrizing the property data-driven model. In an embodiment, the representation data-driven model may be further trained together with the property data-driven model. By doing so, the property data-driven model may be specifically trained to determine target properties of materials. Hence, the target properties determined by the property data-driven model may achieve higher accuracies as the numerical representation of the structure of the material is determined prior to determining the property. Followingly, the property data-driven model is provided with meaningful numerical representations. Thereby, the property data-driven model can focus on determining the target properties and does need to fulfill several tasks. Ultimately, this increases the accuracy of determining target properties of the material.
In an embodiment, generating the numerical representation associated with the two or more building blocks and the relation between the two or more building blocks from the at least two numerical representations by one or more data-driven models may comprise mapping the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks by a representation data-driven model. The representation data-driven model may be configured to generate a common numerical representation from two or more numerical representations. Mapping the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks may comprise modifying the numerical representations to generate at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks by applying a filter to the numerical representations. The filter may be obtained based on the numerical representations. The filter may be obtained by one or more matrix operation(s) of the numerical representations, in particular one or more matrix multi plication (s) of the numerical representations. Applying the filter may result in weighting a contribution of the two or more building blocks to the numerical representation of the two or more building blocks and the relation between the two or more building blocks. Hence, the representation data-driven model is enabled to attend to different parts of the structure according to their contribution to determining one or more target properties of the target material. This increases the accuracy for determining the target properties and thus, contributes to determining target properties robustly and in a scalable manner. The representation data-driven model may be parametrized and/or trained based on historical indications of structures of materials and/or historical building blocks of materials. The representation data-driven model may be parametrized and/or trained to determine one or more building blocks associated with the provided one or more numerical representation(s). Hence, during training and/or parametrizing the representation data- driven model may be provided by numerical representations of two or more building blocks and relations between the two or more building blocks. Further, the representation data-driven model may be triggered to determine one or more building blocks corresponding to the two or more building blocks associated with the provided numerical representations. The representation data-driven model may be triggered by providing a masked building block. Hence, the representation data-driven model may be provided by three or more building blocks. The three or more building blocks may comprise at least one masked building block. The at least one masked building blocks may be associated with the building block to be generated by the representation data-driven model. For training and/or parametrizing, historical indications of structures of materials and/or historical building blocks of materials associated with at least one masked building block may be generated, in particular by masking at least one building block associated with the historical indications of structures of materials and/or historical building blocks of materials. Masking at least one building block may comprise exchanging at least one building block by at least one masked building block. Materials in this context may include chemical compounds. By doing so, the representation data-driven model may be trained and/or parametrized to generate robust and meaningful representations of structures of materials. This enables an accurate relation between the structure of the target material with the one or more target properties. Ultimately, this contributes to increase the accuracy and robustness of determining target properties in a scalable manner.
In an embodiment, any one of the methods may further comprise providing a numerical representation of a position of the two or more building blocks within a structure of the material to the one or more data-driven model(s), and combining the numerical representation of the position of the two or more building blocks and the numerical representation of the two or more building blocks and the relation between the two or more building blocks. Combining numerical representations may comprise merging the numerical representations and/or adding numerical values associated with the numerical representations. Additionally or alternatively, a numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated by the one or more data-driven model(s). The numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the two or more building blocks and the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material, e.g. by providing the numerical representation of the two or more building blocks, the relation between the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material to the one or more data-driven model(s), in particular the representation data-driven model. In an embodiment, the numerical representation associated with the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the modified numerical representation associated with the two or more building blocks and the relation between the two or more building blocks. This allows to attend stronger to the position of the building blocks associated with the structure of the material. Follow! ngly, the position is taken more into account for determining the target properties of the target material. Because of the close relation between the properties and the 3-dimensional orientation of parts of the material, this increases the meaningfulness of the numerical representation of the structure of the material and hence, increases the accuracy of determining target properties of the target material from this numerical representation. Ultimately, this contributes to an accurate and scalable determination of the target properties of target materials.
In an embodiment, any one of the methods as described herein may further comprise generating a numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s). Additionally or alternatively, any one of the methods as described herein may further comprise combining the numerical representation associated with the two or more building blocks and the numerical representation associated with the position of the two or more building blocks to the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material. Combining numerical representations may comprise merging the numerical representations and/or adding numerical values associated with the numerical representations. The numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the numerical representation of the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material, e.g. by providing the numerical representation of the two or more building blocks and the numerical representation of the position of the two or more building blocks within the structure of the material to the one or more data-driven model(s), in particular the representation data-driven model. In an embodiment, the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material may be generated from the modified numerical representation associated with the two or more building blocks. This allows to attend stronger to the position of the building blocks associated with the structure of the material. Fol lowingly , the position is taken more into account for determining the target properties of the target material. Because of the close relation between the properties and the 3-dimensional orientation of parts of the material, this increases the meaningfulness of the numerical representation of the structure of the material and hence, increases the accuracy of determining target properties of the target material from this numerical representation. Ultimately, this contributes to an accurate and scalable determination of the target properties of target materials. In an embodiment, the position of the two or more building blocks may be and/or may be indicative of an absolute position of the two or more building blocks within the structure of the material. The absolute position of a building block may be indicative of a placing of the building block within a sequence of the two or more building blocks associated with the structure of the material. The absolute position may be determined by counting the number building blocks and assigning a number indicative of the placing of the building blocks within the counted number of building blocks. The absolute position is crucial for determining target properties from structures of target materials because of the close relationship between the positions of parts of the material and the corresponding material properties. Further, absolute positions allow to distinguish building blocks with a similar or same relation to another building block clearly. Hence, absolute positions enable a more meaningful representation of structures of target materials. Ultimately, this contributes to more accurate and scalable determination of target properties of target materials from the structure of the target materials.
In an embodiment, at least one building block may comprise at least two components associated with the structure of the material. In an embodiment, at least one building block may constitute of one component. The component may be a part of a building block. The vocabulary may be indicative of the plurality of building blocks and optionally a plurality of components. In particular, the vocabulary may be indicative of a relation between the plurality of components and the plurality of building blocks. The vocabulary may be obtained by reducing an initial set of building blocks to a target number of building blocks. The target number of building blocks may be equal the a number of the plurality of building blocks. Reducing the initial set of building blocks may comprise determining the initial set of building blocks based on historical indications of structures of materials. The initial set of building blocks may be determined by forming at least a part of the historical indications by one or more combinations of the plurality of building blocks. Further, at least one building block may be formed by one or more combinations of two or more components. The historical indications of the structure of the material, in particular the target material can be formed by one or more combinations of the two or more components. At least one building block may constitute of one component and/or may be equal to one component. Hence, the building blocks may be regarded as an extension of the vocabulary to avoid redundancies and/or simplify representation of the structure of materials. The historical indications of structures of material may be provided, e.g. via a user interface. Further, a target number of building blocks for forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be provided. The determination score may be determined per building block of the initial set. The determination scores may be associated with generating the building blocks of the set of building blocks. In particular, the determination scores may be obtained based on occurrences of the building blocks within the historical indications of the structure of the material, in particular the target material. A loss score may be determined per building block of the set of building blocks. The loss scores may be associated with modifying the determination scores by removing the building blocks from the set of building blocks. The set of building blocks may be filtered according to the loss scores. In particular, building blocks associated with loss scores within a predefined range may be removed from the set of building blocks. If the number of building blocks associated with the filtered set of building blocks may be equal or lower than the target number of building blocks, the filtered set may be provided. Otherwise, the determination scores and the loss scores associated with the filtered set of building blocks may be determined until the number of building blocks associated with the filtered set of building blocks may be equal or lower than the target number of building blocks. The provided filtered set of building blocks may be the vocabulary. By doing so, the length of sequences of building blocks associated with the structure of materials can be reduced. This is a consequence of representing reoccurring sequences of components by one or more building blocks. Fol lowingly, longer sequences of a plurality of components can be reduced to shorter sequences of building blocks. Thereby, long and complex sequences can be reduced to a size processable by data-driven model while providing meaningful representations of the structures of the material. Typically, biological and/or chemical material can be associated with long, yet at least partially repetitive sequences. Followingly, this contributes to representing structures of materials efficiently. The processing of the length-reduced sequences allows to use less computational resources for a similar of more accurate representation of the structure of the material. Ultimately, this allows for an efficient and accurate determination of target properties of target materials. The structure of the material may be associated with and/or may be related to at least three components. The two or more building blocks may comprise and/or may be related to the at least three components.
In an embodiment, the one or more data-driven model(s) may be provided by the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks for generating one numerical representation by transforming the at least two numerical representations into one numerical representation according to one or more matrix operations associated with the one or more data-driven model(s). The one or more data-driven model(s) may be provided by the numerical representation of the structure of the target material and the numerical representation of the position of the two or more building blocks for generating one numerical representation by transforming the at least two numerical representations into one numerical representation according to one or more matrix operations associated with the one or more data-driven model(s). The one or more matrix operations may include applying a filter to the numerical representations. The filter may be obtained based on the numerical representations. The filter may be obtained by one or more matrix operation(s) of the numerical representations, in particular one or more matrix multi plication (s) of the numerical representations. Applying the filter may result in weighting a contribution of the two or more building blocks to the numerical representation of the two or more building blocks and the relation between the two or more building blocks. Hence, the representation data-driven model is enabled to attend to different parts of the structure according to their contribution to determining one or more target properties of the target material. This increases the accuracy for determining the target properties and thus, contributes to determining target properties robustly and in a scalable manner.
In an embodiment, the target property, in particular an indication of the target property, of the target material may be determined and/or provided for monitoring and/or controlling chemical and/or biological production. Additionally or alternatively, the target property, in particular an indication of the target property, of the target material may be determined and/or provided for producing and/or processing the material. In an embodiment, the numerical representation of the structure of the target material associated with the two or more building blocks may be obtained by providing an indication of a structure of the target material, identifying the two or more building blocks of the target material associated with the indication of the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s), wherein the one or more embedding layer(s) may be configured to map data to numerical representations of the data. By doing so, a structure of the material can be represented by a structured and machine-processable representation. This enables commonly available structures of material to be processed by data-driven models. Furthermore, the structure of the material can be represented with a fixed amount of building blocks. This allows for a better control of input size to the data-driven model while the number of building blocks is smaller than the number of possible structures of materials. The reason for this is that repetitions in the structure of a material can be represented by reoccurring building blocks. Hence, structures of materials can be represented in an efficient format. Thus, resources for determining the target properties of the material can be saved.
In an embodiment, the numerical representation of the structure of the target material may be obtained by providing an indication of the structure of the target material, obtaining a structure of the target material from the indication of the structure of the target material, mapping the structure of the target material to the numerical representation of the structure of the target material. Mapping the structure of the target material to the numerical representation of the structure of the material may comprise identifying two or more building blocks associated with the structure of the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s). The one or more embedding layer(s) may be configured to map data to numerical representations of the data. Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material. For example, users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials. Thus, structures of the target material can be retrieved error-free e.g. from a database. Hence, efficiency of determining target properties of the target material is improved.
In an embodiment, any one of the methods may further comprising reducing the numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks by providing the numerical representation of the two or more building blocks and/or the numerical representation of the position of the two or more building blocks to one or more embedding layer(s). The numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks may be obtained by mapping the structure of the target material and/or the position of the two or more building blocks to a numerical representation of the structure of the target material and/or the numerical representation of the position of the two or more building blocks according to a relation between a plurality of structures of materials and corresponding numerical representations and/or between a plurality of positions of building blocks and corresponding numerical representations. Mapping the structure of the target material to a numerical representation of the structure of the target material by one or more embedding layer(s) comprises mapping the identified two or more building blocks to a numerical representation of the two or more building blocks according to a relation between a plurality of building blocks and corresponding numerical representations and reducing the numerical representation of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the two or more building block(s) to the one or more embedding layer(s). Mapping the position of the two or more building blocks to a numerical representation of the position of the two or more building blocks by one or more embedding layer(s) may comprise mapping the position of the two or more building blocks to a numerical representation of the position of the two or more building blocks according to a relation between positions of the plurality of building blocks and corresponding numerical representations, and reducing the numerical representation of the position of the two or more building blocks by the one or more embedding layer(s), in particular by providing the numerical representation of the position of the two or more building blocks to the one or more embedding layer(s). The numerical representation of the structure of the target material may be associated with a number of numerical values equal or higher than a number of the plurality of building blocks. The number of numerical values associated with the reduced numerical representation of the structure of the target material and/or of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the structure of the target material and/or the two or more building blocks, in particular before reducing. The number of numerical values associated with the numerical representation of the position of the two or more building blocks may be equal or higher than a number of the plurality of building blocks. The number of numerical values associated with the reduced numerical representation of the position of the two or more building blocks may be lower than the number of numerical values associated with the numerical representation of the position of the two or more building blocks, in particular before reducing. By doing so, a compressed representation of the structure of the material may be processed. This uses less computational resources and/or allows to process the numerical representation faster. Hence, this contributes to a resource-efficient determination of the one or more target properties associated with the target material.
In an embodiment, the two or more building blocks comprise three or more components, in particular a sequence of three or more components and/or a sequence of the two or more building blocks. Hence, the two or more building blocks may be associated with and/or may comprise three or more components. The target chemical compound may a polymer. The building blocks may be building blocks of a polymer. The building blocks may be representations associated with two or more monomers of the target chemical compound, in particular the polymer. The building blocks may correspond to structural unit of the polymer obtained by one or more chemical reaction(s) of the monomers associated with the polymer. The mass of the target chemical compound may be above 1000 g/mol. The target chemical compound may an organic chemical compound. Polymers and larger molecules typically comprise at least partially repetitive structures. Hence, polymers and larger molecules are especially well suited for determining target properties from the structure of the polymer.
In an embodiment, any one of the methods may comprise providing a numerical representation of a relation between the two or more building blocks and generating a numerical representation of the two or more building blocks and the relation between the two or more building blocks from the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks by the one or more data-driven model(s), wherein mapping the numerical representation of the two or more building blocks to an indication of the target property comprises mapping the numerical representation of the two or more building blocks and the relation between the two or more building blocks to an indication of the target property. The relation between the two or more building blocks may be indicative of one or more chemical bond(s) between the two or more chemical compounds. Taking the relation between the two or more building blocks into account for determining target properties allows to attend to both, orientation and identity of the building blocks. This is especially important where structures are complex. Data-driven model(s) may have to select the parts of the input data to attend to for determining the properties of the material. Allowing the data-driven model to learn a common representation enables the data-driven model to weight the importance of both factor for determining properties of the material. In some examples, the type, i.e. the presence of a particular building block, or the orientation of at least one building block may be a stronger factor for determining properties of the material. Acidity of carboxylic acids may be judged mainly based on the types of building block, i.e. a carboxylic acid group at the end of a chain of alkylic carbon atoms. Said carboxylic acid groups are typically at the end of chains. Hence, the orientation between the building blocks may be of less importance than the type of building blocks. On the other hand, where sequences of e.g. hydrogen bonds forming building blocks are to be analyzed, the orientation of the building blocks towards each other may be of high importance. Learning a common representation of the type of the building blocks and the orientation of the building blocks enables to determine properties of materials robustly for a plurality of different materials. Such a data-driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
In an embodiment, any one of the methods may comprise providing a numerical representation of a position of the two or more building blocks within a structure of the target material to the one or more data-driven model(s), and combining the numerical representation of the position of the two or more building blocks and the numerical representation of the two or more building blocks and optionally the relation between the two or more building blocks, wherein mapping the numerical representation of the two or more building blocks to an indication of the target property comprises mapping the numerical representation of the two or more building blocks and the position of the two or more building blocks and optionally the relation between the two or more building blocks to an indication of the target property.
In an embodiment, mapping the numerical representation of the two or more building blocks to an indication of the target property by the one or more data-driven model(s) may comprise processing the numerical representation of the two or more building blocks by the one or more data- driven model(s), and, providing a numerical representation of a position of the two or more building blocks, and, combining the numerical representation of the position of the two or more building blocks and the processed numerical representation of the two or more building blocks. Mapping the numerical representation of the two or more building blocks to an indication of the target property may comprise mapping the numerical representation of the two or more building blocks and the position of the two or more building blocks and optionally the relation between the two or more building blocks to an indication of the target property. Processing the numerical representation of the two or more building blocks includes modifying the numerical representation of the two or more building blocks, e.g. by one or more matrix operation(s).
Processing the numerical representation of the position of the two or more building blocks allows to take the position of the building blocks into account, in particular just before providing the output data. This focusses the attention of the data-driven model towards the relation between the orientation and the type of the building block just before providing the output data. Attention of the data-driven model may focus on the type of the building blocks during processing of the received input data. Especially complex structures are highly determined on the orientation of the building blocks. Followingly, taking the position of the building blocks into account after processing the numerical representation of the structure of the material refocused the attention of the data-driven model. Such processing by the data- driven model may allow for accurate and scalable determination of properties. Ultimately, this allows to reduce resources and time for developing new materials and/or improving the quality of materials and resultant products.
In an embodiment, any one of the methods may comprise obtaining a structure of the target chemical compound from the indication of the structure of the target chemical compound. The structure of the target chemical compound may be indicative of the two or more building blocks. The two or more building blocks may be identified based on the obtained structure of the target chemical compound. Obtaining the structure of the target chemical compound may comprise retrieving the structure of the target chemical compound from a database. The indication of the structure of the target chemical compound may be suitable for retrieving the structure of the target chemical compound. The indication of the structure of the target chemical compound may be provided to the database for retrieving the structure of the target chemical compound. The database may be configured to provide the structure of chemical compounds in response to receiving indications of structures of chemical compounds. Obtaining the structure of the material from the indication of the structure of the material, in particular the target material allows to determine properties of data related to the material. For example, users can provide a name such as a trivial name, IUPAC name or the like to determine properties of the materials. Thus, structures of the target material can be retrieved error-free e.g. from a database. Hence, efficiency of determining target properties of the target material is improved.
In an embodiment, training the one or more data-driven model(s) may include generating a numerical representation of the structure of the target material and the position of the two or more building blocks within the structure of the material by the one or more data-driven model(s), mapping the numerical representation associated with the two or more building blocks and the position of the two or more building blocks within the structure of the material to an indication of the target property of the target material at least partially by the one or more data-driven models.
In an embodiment, training the one or more data-driven model(s) may further include merging the historical numerical representations of the structure of the material with a historical numerical representation of a relation between the two or more building blocks and/or merging the historical numerical representations of the structure of the material and the relation between the two or more building blocks with the historical numerical representations of the position of the two or more building blocks. Combining may refer to merging. Merging may include concatenating the at least two numerical representations. The one or more data-driven model(s) may be further trained based on historical numerical representations of a relation between the two or more building blocks.
In an embodiment, the historical numerical representations of the structure of the one or more material(s) may be obtained by providing historical indications of the structure of the one or more material(s), obtaining a structure of the one or more material(s) from the indication of the structure of the one or more material (s), mapping the structure of the one or more material(s) to the numerical representation of the structure of the one or more material(s). Mapping the structure of the one or more material(s) to the numerical representation of the structure of the one or more material(s) may comprise identifying two or more building blocks associated with the structure of the one or more material(s) according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and, mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s). The one or more embedding layer(s) may be configured to map data to numerical representations of the data.
In an embodiment, the structure of the one or more material(s), in particular the target material, may be associated with a sequence of the two or more building blocks. In an embodiment, the one or more data-driven model(s) may comprise a representation data-driven model and a property data-driven model.
In an embodiment, providing the indication of the target property for monitoring and/or controlling the production and/or the processing of the target material may comprise using the indication of the target property for adapting one or more parameter(s) and/or one or more equipment item(s) for producing and/or processing the target material and/or comparing an obtained and/or measured property of the target material during and/or after producing and/or processing with the target property.
In an embodiment, historical indication may refer to a, in particular previously, determined and/or known structure of the material, e.g. by using a model configured to determine an indication of a structure of a material and/or by performing a measurement.
A processor may be a processor of any suitable type, and is preferably a processor configured for parallel processing of at least a hundred or at least a thousand threads in parallel, e.g. a graphical processing unit (GPU). For instance, the processor comprises at least a hundred or a at least a thousand parallel processing cores. In particular, the processor may comprise at least one (preferably at least a thousand) compute unified device architecture (CUDA) core(s), which may allow for using a graphical processing unit as the processor, which may increase computational efficiency. For instance, the processor may comprise at least one (e.g. at least a hundred) streaming multiprocessor cores, which may allow for increasing the data throughput. As a further example, the processor may comprise one or more (e.g. at least a hundred) tensor core(s) and/or (e.g. at least a hundred) tensor processing units (TPUs). A tensor core may be specifically adapted to perform matrix operations and may allow to accelerate large matrix operations. A tensor core may be configured to perform mixed-precision matrix multiply and accumulate calculations in a single operation. For instance, a tensor core may perform mixed-precision floating-point matrix arithmetic, specifically utilizing FP16 (halfprecision) inputs to produce either full-precision (FP32) or half-precision (FP16) outputs. In the case of FP16 output, a tensor core may provide a performance boost by storing the intermediate accumulation results in FP32 format, thereby maintaining the precision necessary for accurate results. A tensor processing unit may be an application-specific integrated circuit (ASIC). It may comprise a matrix multiplication unit (MXU), which may be specifically adapted or configured for dense linear algebra operations. TPUs may be configured to handle large-scale matrix operations efficiently, which may provide high computational throughput for Al tasks. A TPU may be equipped with on-chip high-bandwidth memory (HBM), which may enhance the capability for the use of larger models and batch sizes. TPUs may be connected in groups called Pods, which may scale up workloads with minimal code changes. An MXU may be specifically configured for performing matrix multiplications. A TPU may comprise a tensor core.
For example, a processor may comprise several thousand tensor cores, each capable of performing 64 floating point FMA (Fused Multiply-Add) operations per clock cycle or (e.g. at least several hundred) tensor processing units (TPUs) being specifically configured for accelerating machine learning (ML) workloads, particularly for cloud-based applications. Additionally, Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) may provide flexibility and performance benefits for specific Al tasks. With these capabilities, such a GPU may allow for hundreds of TFLOPs (Tera Floating-Point Operations per Second) of performance in mixed-precision computations. Furthermore, a tensor core may support a variety of numerical formats, including IEEE standard half-precision, single-precision, and double-precision floating-point formats, as well as a range of integer formats.
A processor may be a central processing units (CPU) configured with an advanced architecture, such as Intel's Xeon Scalable processors or AMD's EPYC series. A CPU may be configured for sequential processing and general-purpose computing. These CPUs may incorporate vector instruction sets, such as AVX-512, to accelerate mathematical computations that may e.g. enhance Al model training and inference. Furthermore, CPUs may integrate Al accelerators i.e. a CPU may be specifically configured for deep learning workloads.
The processor may be coupled to memory having a memory bandwidth of at least a hundred gigabytes per second, which may allow efficient handling of extensive data sets and may allow faster reading, processing, and writing compared to a general-purpose processor such as a computational processing unit.
The memory may be a high-capacity memory configured to manage the data-intensive nature of Al applications, providing necessary bandwidth and storage capacity for complex datasets. The memory may for instance be DDR4, DDR5, High Bandwidth Memory (HBM) and/or GDDR6X memory, which may improve data transfer rates and reduce latency. Such memory may enhance e.g. modeling and real-time sensor data for monitoring and control. Further, the memory may be operated with memory optimization techniques, such as caching and prefetching, which may enhance the execution speed of Al algorithms. Non-volatile Memory (NVM) technologies, including NAND Flash and 3D XPoint, may provide persistent storage solutions with highspeed access, which may enhance rapid data storage and retrieval for Al applications.
In an embodiment, the target property may be a desired property. The target property may be a property targeted to be obtained and/or measured associated with the target material. The target property may be specified by an application associated with the processed and/or produced target material.
In an embodiment, position may comprise a relative position and/or an absolute position.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
In the following, the present disclosure is further described with reference to the enclosed figures. The same reference numbers in the drawings and this disclosure are intended to refer to the same or like building blocks, components, and/or parts. FIG. 1 illustrates an embodiment of materials and corresponding properties of the material.
FIG. 2 illustrates an embodiment of determining a property of a material.
FIG. 3 illustrates an embodiment of determining, in particular measuring, a property of a material.
FIG. 4 illustrates embodiments of an indication of a structure of a material.
FIG. 5 illustrates an embodiment of obtaining a vocabulary.
FIG. 6 illustrates an embodiment of an embedding layer.
FIG. 7A illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
FIG. 7B illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
FIG. 8A illustrates an embodiment of a representation data-driven model 8-148.
FIG. 8B illustrates an embodiment of a representation data-driven model 8-148.
FIG. 8C illustrates an embodiment of a representation data-driven model 8-148.
FIG. 9A illustrates an embodiment of training and/or deploying the representation data-driven model 902.
FIG. 9B illustrates an embodiment of training and/or deploying the representation data-driven model 916.
FIG. 10 illustrates an embodiment of a vocabulary length and an accuracy of determining properties of materials according to the methods as described herein.
DETAILED DESCRIPTION
The following embodiments are mere examples for implementing the method, the system or application device disclosed herein and shall not be considered limiting.
FIG. 1 illustrates an embodiment of materials and corresponding properties of the material. Materials comprise one or more building blocks. Material may be a biological material such as a protein. Biological material may comprise a plurality of amino acids. The relative orientation of amino acids within one or more sequences of amino acids determines an interaction between the amino acids. For example, including a plurality of amino acids with an OH-group may allow for establishment of hydrogen bods across parts of the biological material, in particular parts of the sequence of amino acids. Said hydrogen bonds may strongly determine the spatial orientation of the sequence. A common example may be the spatial orientation of DNA. Due to the interaction of the building blocks of the DNA, a double helical structure may be formed by the DNA strands. The spatial orientation of the biological material may determine the properties of the biological material, e.g. if the biological material, in particular the protein, may fit with another biological material or not. Fitting with another biological material may allow for a structural modification, in particular a well-defined structure modification. Hence, the spatial orientation of the protein may determine its capability to interact with other proteins. Followingly, the structure, in particular the sequence of amino acids, of the biological material may determine the interaction within a biological system.
Chemical compounds may be part of a value chain towards a chemical product. Hence, chemical products may be allowed to react towards the chemical product, e.g. via one or more intermediate products. The reactions that a chemical compound may undergo may be determined by their chemical structure. Chemical compounds may be associated with rings, adjacent chains or the like. The reason may be that chemical compounds may be defined via the atoms comprised by the chemical compound and the bonding between the atoms. Carbon atoms are typically bonded via 4 bonds. Hence, chemical compounds may be non-linear. For example, caffeine as shown in FIG. 1 may comprise two rings with additional functional groups. Such structures may be represented e.g. via SMILES strings. SMILES strings may be an unambiguous and sequential representation of a chemical structure. In another example, a polymer may be represented by polymer building blocks, in particular monomers. Hence, the representation of the chemical compound may be chosen based on the type of chemical structure. Said chemical structure may determine the exact properties of the chemical compound. Even a small change e.g. exchanging one atom by another in a molecule comprising of hundred or more atoms or changing the orientation of one subgroup of a molecule, may result in the unchanged chemical compound participating in a chemical reaction and the changed chemical compound participating in a different chemical reaction. Hence, the chemical structure, in particular the spatial orientation of the atoms and bonding between the atoms may determine the properties, especially a reactivity, of the chemical compound.
Typically, materials with predefined properties are desired. Said properties may be determined via analysis of the respective material. This testing requires a lot of resources for generating the material and for performing the analysis of the material. Such resources can be saved if a property of the material can be determined reliably from the structure of the material. Followingly, it is desired to determine properties of materials based on a robust relation between the structure and the properties of materials. FIG. 2 illustrates an embodiment of determining a property of a material.
An indication of a structure of a material may be provided 202. Examples for indication of a structure of a material may include SMILES strings, sequences of amino acids or other biological building blocks, polymer sequences or the like as described in the context of FIG. 4. The indication of the structure may be provided e.g. via a user interface. The indication of the structure of a material may further comprise a denotation associated with the material and/or a composition of the material, optionally reaction conditions for obtaining the material from one or more chemical reaction(s) of the compounds specified by the composition of the material. Hence, the indication may be suitable for retrieving a structure of the material and/or may comprise a structure of the material. In an embodiment, the structure of the material may be retrieved based on the indication suitable for retrieving the structure of the material. For example, a query obtained from the indication of the structure of the material, in particular the target material may be provided to a database. The database may be configured to provide the structure of the material in response to receiving the query. Said query may be a structured query, e.g. for a SQL database, or may comprise a numerical representation of the indication of the structure of the materials, e.g. for a similarity search.
Two or more building blocks associated with the indication of the structure of the material, in particular the target material may be identified according to a vocabulary indicative of a plurality of building blocks associated with structures of materials. At least one building block may comprise two or more component. One subelement may comprise, in particular constitute of, one building block associated with the structure of the material. One building block may comprise one or more component. For example, an amino acid or a functional group may be a subelement. Hence, the building block may comprise one or more amino acid(s) and/or one or more functional group(s). The vocabulary may specify and/or may be indicative of a plurality of building blocks. Hence, the vocabulary may be suitable for and/or configured to identify one or more element(s) associated with the indication of the structure of the material, in particular the target material. In particular, the vocabulary may be suitable for identifying one or more building blocks comprised by the structure of the material. A tokenizer and/or a tokenizing engine 302 may be configured to identify the two or more building blocks. The vocabulary may be obtained as described in the context of FIG. 5.
In an embodiment, identifying the two or more building blocks may comprise mapping the indication of the structure of the material, in particular the target material to a structure of the material, preferably a sequential structure of the material. For example, the material may be comprise one or more ring(s) and/or two or more strand(s) and/or two or more chain(s). A sequential structure may be chosen according to the type of the material. For example, where the material may be a chemical compound, the chosen format of the structure of the material may be SMILES strings. By doing so, non-linear structures of materials may be linearized. This may allow for processing of the structure of the material by a tokenizing engine 302, a representation engine 308 and/or a property determination engine 306. Followingly, parts of the structure of the material may be weighted according to their relevance to determining the numerical representation and/or the property of the material. Weighting allows to focus the attention of the data-driven models to decisive parts of the structure. For example, in a long molecule with a carboxylic acid group, the carboxylic acid group may be decisive for the properties of the molecule. Similarly, in a chain of amino acids a presence of thiol groups may be decisive for building connection across the chains, i.e. sulfide bridges. Therefore, it enables a realistic and thus robust determination of numerical representations of structures of the material and/or properties of the material. Ultimately, this improves tailoring the properties of materials to a variety of application scenarios.
The structure of the material may comprise, in particular constitute of, the two or more building blocks. The two or more building blocks may be mapped to a numerical representation of the two or more building blocks 204. Further, the two or more building blocks may be mapped to a numerical representation of a relation between the two or more building blocks 204. The numerical representation of two or more building blocks and/or the relation between the two or more building blocks may be a tensor, in particular a two-dimensional tensor, i.e. a matrix. One building block may be represented by a vector. The numerical representation of the two or more building blocks and/or the relation between the two or more building blocks may be obtained by providing an indication of the two or more building blocks to one or more embedding layer(s). The indication of the two or more building blocks may be a one-hot vector per building block. A vector representing one building block may be obtained after another. Concatenating the two or more vectors associated with the two or more building blocks may result in the numerical representation of the two or more building blocks. The embedding layer may be configured to map an indication of one or more building blocks to a numerical representation of, in particular a two-dimension tensor representing, the two or more building blocks associated with the indication of an indication of the structure of the material, in particular the target material. The one or more embedding layers may be described in more detail in the context of FIG. 6. Hence, two or more numerical representations may be obtained 206. The relation between the two or more building blocks may be a relative position, i.e. may be indicative of a position of the two or more building blocks in relation to the two or more building blocks.
The numerical representations may be mapped to numerical representations of a predefined size. This may include adding numerical values, i.e. matrix entries, to result in the numerical representations of the predefined size. This may be referred to as padding. Additionally or alternatively, this may include eliminating one or more numerical values of the numerical representations to result in the numerical representations of the predefined size. This allows for using data-driven models of a predefined size. Data-driven models are built prior to their usage. They have defined inputs and outputs. Therefore, input of varying lengths need to be amended to result in the predefined input size required by the models.
The numerical representations may be modified 210. This may include generating one numerical representation from the two numerical representations by applying a filter to the numerical representations. The filter may be obtained based on the numerical representations. This may be known as self-attention as the filter may be obtained based on the input data. The filter may define one or more matrix operations for combining the numerical representations into one numerical representation and weighting a contribution of the two or more building blocks. Self-attention may be further described in the context of FIG. 8A. The numerical representations may be provided to a representations data-driven model, e.g. as described in the context of FIG. 3 and FIG. 8A - FIG. 8C. The representation data-driven model may be configured to modify the numerical representations to obtain one numerical representation. The representation data-driven model may be part of a representation engine 308, in particular representation generating engine 304.
In an embodiment, the two numerical representations may be split into at least two numerical representations per numerical representations, i.e. at least four numerical representations. The so-obtained numerical representations may be modified as described above separately. This speeds up computing of the matrix operations. After modifying the at least four numerical representations, one numerical representation may be obtained by applying the filter. This may be known as multihead operation and described in further detail in the context of FIG. 8A- FIG. 8C.
Further processing steps of the numerical representation obtained by modification may be described in the context of FIG. 8A- FIG. 8C.
The obtained numerical representation may be mapped to a numerical representation of an indication of a property of the material 212. This may comprise providing the obtained numerical representation to a property data-driven model. The property data-driven model may be configured to provide properties of materials in response to receiving numerical representations of the structure of materials. For example, the property data- driven model may be a classification model. The property data-driven model may comprise one or more layer(s). The one or more layer(s) may be configured to map numerical representations of structures of materials to properties of said materials. The property data-driven model be may trained together with the representation data-driven model or after training of the representation data-driven model. In an embodiment, the representation data-driven model may be trained prior to retraining the representation data-driven model together with training the property data-driven model.
The numerical representation of the indication of the property of the material may be mapped to the indication of the property of the material 214. This may be a reverse process to embedding the indication of the structure of the material, in particular the target material, i.e. mapping the indications of the structure of the material, in particular the target material to a numerical representation of the indication of the structure of the material, in particular the target material. This may be referred to as decoding. One or more decoding layer(s) may be configured to map the numerical representation of the indication of the property of the material to the indication of the property of the material. Hence, mapping the numerical representation of the indication of the property of the material to the indication of the property of the material may comprise providing the numerical representation of the indication of the property of the material to the one or more decoding layer(s). An example of the one or more decoding layer(s) may be described in the context of FIG. 6. The indication of the one or more properties of the materials may be provided. The indication may be suitable for determining the one or more properties of the material and/or may be indicative of the one or more properties of the material. Hence, providing the indication may comprise determining the one or more properties of the material from the indication. In an embodiment, the indication of the one or more properties of the material may be numerical values associated with the one or more properties of the material. In another example, the indication of the one or more properties may be obtained from a classification model. The classification model may output a numerical value indicating a classification of the material, in particular a classification of the one or more properties of the material. The classification provided by the classification model may be mapped to one or more properties of the material.
FIG. 3 illustrates an embodiment of determining, in particular measuring, a property of a material.
As described in the context of FIG. 1 , the structure of the material may determine the properties of the material. The indication of the structure of the material, in particular the target material may be provided as described in the context of FIG. 2.
The indication of the structure of the material, in particular the target material, in particular the structure of the material may be provided to a representation engine 308. The representation engine 308 may comprise a tokenizing engine 302 and a representation generating engine 304. The representation engine 308 may be configured to generate a numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material. The numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material, may be indicative of two or more building blocks associated with the indication of the structure of the material, in particular the target material, in particular the structure of the material. Further, the numerical representation may be indicative of a relation between the two or more building blocks. The numerical representation may have a predefined size. The numerical representation may be obtained as described within the context of FIG. 2. From the indication of the structure of the material, in particular the target material, two or more building blocks may be identified, e.g. as described within the context of FIG. 2. The indication of the structure of the material, in particular the target material may be split into two or more building blocks, i.e.. The indication of the structure of the material, in particular the target material may be tokenized. A vocabulary may specify a plurality of building blocks associated with structures of materials, in particular historical structures of materials. Historical structures of materials may refer to structures obtained previously identified structures of materials. The indication of the structure of the material, in particular the target material may be tokenized by a tokenizing engine 302. The tokenizing engine 302 may be configured to identify two or more building blocks associated with data provided to the tokenizing engine 302, in particular of structures of materials. The two or more building blocks associated with the indication of the structure of the material, in particular the target material, in particular the sequence of the two or more identified building blocks may be provided to the representation generating engine 304. The representation generating engine 304 may be configured to generate the numerical representation of the indication of the structure of the material, in particular the target material, in particular the structure of the material, from the two or more building blocks identified by the tokenizing engine 302, e.g. as described within the context of FIG. 2. The representation generating engine 304 may provide the numerical representation of the indication of the structure of the material, in particular the target material, in particular numerical representation of the structure of the material, to a property determination engine 306. The property determination engine 306 may be configured to map the numerical representation of the indication of the structure of the material, in particular the target material to a numerical representation of an indication of a property of the material, e.g. as described in the context of FIG. 2.
FIG. 4 illustrates embodiments of an indication of a structure of a material.
The material may be a biological and/or a chemical material. Biological materials may be characterized by a biological activity associated with the biological material. Further, biological materials may comprise a sequence of amino acids. Hence, biological materials may comprise macromolecules, e.g. of more than 1000 Da and/or a sequence of at least 100 amino acids. An example of an indication of a structure of a biological material may be a sequence of biological building blocks. Biological building block may include one or more amino acids, one or more phosphate group(s), one or more nucleotide(s), one or more nucleabase(s) or the like. In FIG. 4 an example of a sequence of amino acids is depicted. Said sequence may be tokenized. Tokenizing the sequence may split the sequence into a plurality of building blocks. Depending on a vocabulary associated with the structure of biological materials, different tokenizations may be available. The vocabulary may specify a plurality of building blocks occurring in biological sequences.
Chemical materials may be materials undergoing chemical reactions and/or being obtained by a chemical reaction. A chemical material may comprise a chemical compound. Chemical compounds comprise a plurality of atoms. Chemical compounds may be characterized by the type of the plurality of atoms of the chemical compound and one or more bonds between the plurality of atoms. Chemical material may be in particular a polymer. Polymers may be obtained by one or more chemical reaction(s) of a plurality of monomers. Polymers may be characterized by the type of monomers and the reaction conditions associated with obtaining the polymer. Analogous to the tokenization of the indication of a structure of a biological material 402, the indication of a structure of a polymer 414 and the indication of a structure of a chemical material 412 can be tokenized differently according to two or more vocabularies associated with the structure of chemical materials, in particular polymers.
FIG. 5 illustrates an embodiment of obtaining a vocabulary. Historical indications of structures of materials may be provided 502. Historical indications may refer to previously determined structures of materials, e.g. by using models and/or by performing measurements.
An initial set of building blocks forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be determined. The initial set may be determined randomly. The initial set may comprise a number of building blocks higher than a target number of building blocks associated with a target vocabulary. The target of obtaining the vocabulary may be to reduce the number of building blocks of the vocabulary, i.e. decrease the vocabulary size. Large vocabularies may require more computing resources and/or time. Hence, decreasing the vocabulary size enables to improve the resource consumption of determining properties of materials.
The target number of building blocks for forming at least a part of the historical indications by one or more combinations of the plurality of building blocks may be provided 506.
A determination score per building block of the set of building blocks may be determined 508. The determination scores may be associated with generating the building blocks of the set of building blocks. For example, the determination scores may be determined by a unigram model. The unigram model may assume that the determination score of the building block may be calculated based on a number of occurancies of the building block within the historical indications. Other models for determining said determination scores may be available. The determination score may be indicative of a probability that a building block may occur.
A loss score per building block of the set of building blocks may be determined 510. The loss scores may be associated with modifying the determination scores by removing the building blocks from the set of building blocks. Hence the loss scores may be indicative of a probability to modify the determination scores by removing the building blocks associated with the loss and/or determination scores from the set of building blocks. For example, the loss scores may be calculated according to the following equation:
X may be the input sequence of the two or more building blocks, D may be a corpus of building blocks associated with the historical indications.
The set of building blocks may be filtered according to the loss scores associated with the set of building blocks 512. Hence, a part of the building blocks may be removed from the set of building blocks. The removed building blocks may be associated with loss scores within a predefined removal range. The building blocks within the set of building blocks after removal of a former part of the set may be associated with loss scores within a predefined target range. Hence, a predefined number of highest loss scores may be selected. It may be determined if the number of building blocks within the filtered set may be equal or smaller than the target number of building blocks 514. 508 - 514 may be repeated until the number of building blocks within the filtered set may be equal or smaller than the target number.
The so-obtained set of building blocks may be the vocabulary for identifying building blocks associated with the indication of the structure of the material, in particular the target material, ie for tokenizing the indication of the structure of the material, in particular the target material, preferably the structure of the material.
FIG. 6 illustrates an embodiment of an embedding layer.
An input embedding may be obtained by training for example an embedding model such as a continuous bag of words model (CBOW) or a skip-gram model. The embedding layer may be configured to generate the numerical representation of the indication of the structure of the material, in particular the target material. Generating the numerical representation of the indication of the structure of the material, in particular the target material may refer to embedding the numerical representation of the indication of the structure of the material, in particular the target material. Embedding the indication of the structure of the material, in particular the target material may result in a numerical representation associated with the indication of the structure of the material, in particular the target material. The indication of the structure of the material, in particular the target material may comprise one or more building blocks. The one or more building blocks may be represented by the input vector 606. In particular, the numerical representation of the indication of the structure of the material, in particular the target material 614 and/or the input vector 606 may be machine- readable and/or processable by a processor. For this purpose, the numerical representation of the indication of the structure of the material, in particular the target material 614 and/or the input vector 606 may be a tensor, in particular a first-rank tensor per building block. Specifically, the input vector 606 may be a one-hot vector per building block or a matrix comprising one one-hot vector per building block. A one-hot vector may be a vector with one entry unequal to zero. Examples for one-hot vectors may be 606, and 618. The entries unequal to zero in the one-hot vector may be indicative of the building block. For example, a lookup table may define the relation between the position of the entries unequal to zero and the building block indicated by the one-hot vector. The lookup table may specify a plurality of different building blocks, preferably the building blocks comprised in the vocabulary. The number of different building blocks may be equal to the number of entries in the one-hot vector. The number of different building blocks may be referred to as vocabulary size. In an example, the building blocks may be represented by and/or may be building blocks of the material. A sequence associated with indication of the structure of the material, in particular the target material may be represented by a plurality of building blocks. A building block may represent one or more amino acids, one or more nucleatides, one or more monomers, one or more functional groups, one or more nucleobases or the like. Ultimately, this tokenization of building blocks reduces sequence lengths. For example, where a protein comprises 200 amino acids, using a suited tokenization and hence vocabulary may allow to reduce the size of the sequence associated with said protein to a length of 20 building blocks. Followingly, it may be easier to determine a relation between the 20 building blocks and attend to the relevant building blocks as compared to a sequence of 200 building blocks. Hence, the numerical representations of the indication of the structure of the material, in particular the target material 614 may represent one or more building blocks accurately and lead to accurate results based on processing the numerical representations of the indication of the structure of the material, in particular the target material 614.
For transforming the input vector 606 into the numerical representation of the indication of the structure of the material, in particular the target material 614, the embedding layer may comprise a number of neurons equal to the number of entries in the numerical representation of the indication of the structure of the material, in particular the target material 614. Based on the numerical representations of the indication of the structure of the material, in particular the target material 614, the output layer may generate the output vector 616. The output vector may be a vector and/or may indicate the one or more building blocks of the indication of the structure of the material, in particular the target material. The output vector 616 may indicate the one-hot vector associated with the input vector 606. For this purpose, the output layer may comprise a number of neurons equal to the number of entries of the input vector 606 and/or the output vector 616. The output layer may apply a softmax function to the numerical representations of the indication of the structure of the material, in particular the target material 614. By doing so, the output vector may comprise the probabilities, i.e. confidence scores, associated with the building blocks associated with the entries of the output vector 616 unequal to zero. Hence, from the output vector 616 one or more building blocks may be obtained with a corresponding probability. In the example of FIG. 6, the building block associated with vector 618 may correspond to the input vector with a probability of 71 %. Additional or alternative building blocks may correspond to the input vector as indicated by the output vector with lower probability. By defining a threshold to which the probability may be compared, the selection of the corresponding building blocks may be tailored to the needs of the use case.
The model of FIG. 6 may be continuous bag of words (CBOW) model. The CBOW model may be trained based on a training data set comprising a plurality of input vectors and corresponding output vectors. As the training data set may not be labeled, the training of the CBOW model may be referred to as self-supervised. Before training of the CBOW model, the CBOW model may be initialized with random values assigned to the weights of the neurons. During the training of the CBOW model, the input vectors may be passed through the initialized embedding layer and the output layer and a loss may be determined by comparing the output vector obtained by passing the input vector 606 through the model to the output vector corresponding to the input vector 606 as specified by the training data set. Based on the determined loss, backpropagation may be applied to determine the gradients associated with the neurons of the embedding layer 602 and the decoding layer 604 to lower the loss. According to the determined gradients, the weights of the neurons may be updated by using a gradient descent algorithm. If a predetermined loss may be achieved by the CBOW model, the training may be terminated and a trained CBOW model may be obtained. From the trained CBOW model, the embedding layer 602 may be suitable for embedding input data comprising one or more building blocks. This embedding layer 602 may be used in other machine-learning architectures requiring an embedding layer 602 such as the architectures as described within the context of FIG. 8A - FIG. 8C. For training these architectures, a trained embedding layer 602 may be required. Hence, a model such as a CBOW model may be trained prior to training the models of FIG. 8A - FIG. 8C.
In an embodiment, the embedding layer 602 may be trained together with the representation data-driven model 8-148.
FIG. 7A illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
In particular, numerical representations of the two or more building blocks are shown. An indication of a structure of a material 704 may be provided and/or received. The indication of a structure of a material 704 may comprise a sequence of building blocks of the material. The indication of a structure of a material 704 may be split into two or more building blocks, i.e. the indication of a structure of a material 704 may be tokenized. A tokenized indication of the structure of the material, in particular the target material 702 may be obtained from the indication of a structure of a material 704. The tokenized indication of the structure of the material, in particular the target material 702 may comprise the two or more building blocks. The tokenized indication of the structure of the material, in particular the target material 702 may be mapped to a numerical representation of the indication of the structure of the material, in particular the target material 706. The numerical representation of the indication of the structure of the material, in particular the target material 706 may comprise of a plurality of vectors or a matrix. The number of vectors and/or the number of rowas and/or columns may be equal to the number of building blocks, i.e. building blocks. The numerical representation of the indication of the structure of the material, in particular the target material 706 may be reduced by providing the numerical representation of the indication of the structure of the material, in particular the target material 706 to one or more embedding layer(s) e.g. as described in the context of FIG. 6. The reduced numerical representation of the indication of the structure of the material, in particular the target material 708 may be associated with a different size than the numerical representation of the indication of the structure of the material, in particular the target material 706. Preferably, the numerical representation of the indication of the structure of the material, in particular the target material 706 may be associated with at least one of a higher number of rows, columns or a combination thereof than the reduced numerical representation of the indication of the structure of the material, in particular the target material 708. The reduced numerical representation of the indication of the structure of the material, in particular the target material 708 may be an example of a numerical representation of the indication of the structure of the material, in particular the target material, in particular of the structure of the material. Reducing the representation may allow for a more efficient processing of the indication of the structure of the material, in particular the target material. Hence, properties of materials can be determined by deploying a decreased amount of resources such as time and/or computational resources. In an embodiment, the context embedding 814 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148. Hence, the parameters associated with the context embedding 814 may be updated together with updating the parameters associated with the representation data-driven model 8-148.
FIG. 7B illustrates an embodiment of a numerical representation of the indication of the structure of the material, in particular the target material.
In particular, numerical representations of a position of and/or a relation between the two or more building blocks are shown. Preferably, the numerical representation of the position of the two or more building blocks may be a numerical representation of the absolute position of the two or more building blocks. Preferably, the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks in relation to at least one of the two or more building blocks. In particular, the numerical representation of the relation between the two or more building blocks may be a numerical representation of a relative position of the two or more building blocks in relation to a masked token, i.e. a building block to be determined by the representation data-driven model. An indication of a structure of a material 704 may be provided and/or received. The indication of a structure of a material 704 may comprise a sequence of building blocks of the material. The indication of a structure of a material 704 may be split into two or more building blocks, i.e. the indication of a structure of a material 704 may be tokenized. A tokenized indication of the structure of the material, in particular the target material 702 may be obtained from the indication of a structure of a material 704. The tokenized indication of the structure of the material, in particular the target material 702 may comprise the two or more building blocks. The tokenized indication of the structure of the material, in particular the target material 702 may be mapped to a numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738.
The numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 may be reduced by providing the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 to one or more embedding layer(s) e.g. as described in the context of FIG. 6. The reduced numerical representation of the position of and/or the relation between the building blocks 746 may be associated with a different size than the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738. Preferably, the numerical representation of a position of and/or a relation between the building blocks associated with the indication of the structure of the material, in particular the target material 738 may be associated with at least one of a higher number of rows, columns or a combination thereof than the reduced numerical representation of the position of and/or the relation between the building blocks 746. The reduced numerical representation of the position of and/or the relation between the building blocks 746 may be an example of a numerical representation of the indication of the structure of the material, in particular the target material, in particular of the structure of the material. Reducing the representation may allow for a more efficient processing of the indication of the structure of the material, in particular the target material. Hence, properties of materials can be determined by deploying a decreased amount of resources such as time and/or computational resources.
In an embodiment, position embedding 806 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148. Hence, the parameters associated with the position embedding 806 may be updated together with updating the parameters associated with the representation data-driven model 8-148.
FIG. 8A illustrates an embodiment of a representation data-driven model 8-148.
From the indication of the structure of the material, in particular the target material two or more building blocks comprised in a sequence associated with the indication of the structure of the material, in particular the target material may be identified as described within the context of FIG. 2. This process may be referred to as tokenization 8-152. The building blocks may be identifed by a tokenizer. The tokenizer may be configured to tokenize the indication of the structure of the material, in particular the target material, ie split the indication of the structure of the material, in particular the target material into two or more building blocks. The two or more building blocks may be embedded via a context embedding 814, ie the two or more building blocks may be mapped to a numerical representation of the two or more building blocks. In layman words, the context or the meaning of the building blocks may be represented via numerical representation. For this purpose, one or more embedding layers as described in the context of FIG. 6 may be deployed. Further, the two or more building blocks may be embedded via a position embedding 806, in particular a relative position embedding 806 as described in the context of FIG. 7B. Hence, the two or more building blocks may be mapped to a numerical representation of a relation between the two or more building blocks, in particular a numerical representation of a position of the two or more building blocks within the sequence of building blocks associated with the indication of the structure of the material, in particular the target material. To apply context embedding 814 and/or position embedding 806, one or more embedding layers e.g. as described within the context of FIG. 6 per embedding operation may be deployed.
Further, the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks may be mapped to a numerical representation of a predefined size related to the numerical representation of the two or more building blocks. This may be referred to as padding. Data-driven model(s) may require data input of a predefined size. Hence, padding may allow for processing of input data of irregular size by the data-driven model. Padding may include concatenating a numerical representation independent of the input data with the numerical representation of the two or more building blocks to generate the numerical representation of predefined size related to the numerical representation of the two or more building blocks. The numerical representation independent of the input data may be indicative of a zero. Additionally or alternatively, mapping the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks to a numerical representation of a predefined size related to the numerical representation of the two or more building blocks may comprise eliminating at least a part of the numerical representations of the two or more building blocks and/or the relation between the two or more building blocks. This may be referred to truncating.
The at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148. The representation data-driven model 8-148 may process the at least two numerical representations by multi-head self-attention 808, at least one layer normalization 810 and 816, at least one feed-forward layer 812 and/or at least one softmax layer 832. The embedded input data, i.e. the numerical representations of the two or more building blocks and the relation between the two or more building blocks may be processed by the representation data- driven model 8-148. The embedded input data may be provided to the layer normalization 810 by a residual connection. The multi-head self-attention 808 may apply a filter obtained from the at least two numerical representations to the at least two numerical representations.
Multi-head self-attention 858 may be applied to the numerical representations separately. Multi-head selfattention 858 may comprise the two components multi-head and self-attention. Self-attention may be understood as being a filter applied to the embedded input data. By applying the filter to the embedded input data, the building blocks associated with the embedded input data contributing to the to be generated output data may be identified for generating the output data. Hence, the filter may represent the degree of contributing to the to be generated output data by the building blocks associated with the embedded input data. Applying the filter may be referred to as weighting the building blocks associated with the embedded input data. This is advantageous specifically regarding long sequences of building blocks. The filter may be learned and improved during the training by learning to identify the contribution of building blocks associated with the embedded input data. The self-attention may focus the representation data-driven model 8-148 to attend to specific parts of the input data. Self-attention may refer to attention generated based on the input data. Hence, the filter may be determined based on the input data, preferably the embedded input data. The embedded input data may serve as query Q, key K and value V with respect to the self-attention operation. The self-attention may refer to attention based on the received input data. Where the embedded input data may comprise at least two numerical representations, the filter associated with the multi-head self-attention may be calculated according to the following equation: where dfc may correspond to the dimension of the key and A may be a numerical representation obtained based on a cross product of the numerical representation of the two or more building blocks and the numerical representation of the relation between the two or more building blocks. Preferably, A may be obtained by multiplying the numerical representation of the two or more building blocks with the numerical representation of the relation between the two or more building blocks and/or multiplying the numerical representation of the two or more building blocks with the numerical representation of the two or more building blocks. In particular, A may be obtained according to the following formula: where Q constitues the query, K constitutes the key, d may be the dimension of the numerical representation provided by the one or more embedding layer(s) and/or the dimension of the numerical representation provided to the data-driven model(s).
For improving the efficiency of the transformer model further, the multiple heads are used to apply the filter resulting in the multi-head self-attention 858. Multi-head self-attention 858 may comprise applying the filter to two or more building blocks of the embedded input data. Hence, the tensor may be split into two or more building blocks and the filter may be applied to the two or more building blocks separately by two or more heads according to the following equation: head i = Attention (QWt Q, KW , FIF/) with parameter matrices may be an index associated with one of the plurality any one of claims 1 to 8heads, dy dg and ^<3 may refer to the dimensions of the value, key and query, d may refer to a dimension of input data to the data-driven model(s) and/or the hidden dimension of the model.
The result of the two or more head may be concatenated according to the following equation: jj xrf and h may refer to the number of heads.
The embedded input data may be transformed via the multi-head self-attention 858 into a context tensor. The context tensor may represent the sequence of building blocks and the relation between two or more building blocks of the input data. Hence, the context tensor may be a numerical representation of the two or more building blocks and the relation between the two or more building blocks. The context tensor may be a second rank tensor. After the multi-head self-attention 858 layer normalization 810 may be applied based on the context tensor and/or the embedded input data from the residual connection. Applying layer normalization 810 may refer to normalizing the context tensor. Normalizing the context tensor may lower the values of the entries of the context tensor. This reduces the computational cost associated with processing the context tensor. Further, it improves the training by contributing the loss to converge and preventing instabilities.
Layer normalization 810 may be followed by passing the context tensor to a feed-forward layer 812 again followed by layer normalization 816 based on the residual connection to the context tensor and/or the output of the feed-forward layer 812. The feed-forward layer 812 may be a feed-forward neural network. The feed- forward neural network may comprise of a plurality of fully connected neurons. Passing the context tensor through the feed-forward neural network may result in transforming the context tensor linearly. Additionally or alternatively, the neural network may comprise one or more activation functions such as a rectified linear unit (ReLU). Hence, the neural network may be configured for performing one or more non-linear operations to the context tensor and/or transforming the context tensor non-linearly. After the context tensor has been transformed and/or normalized by the feed-forward layer 812 and the layer normalization 816, the context tensor may be provided to one or more further layers configured to apply multi-head self-attention 808, layer normalization 810 and 816 or one or more further feed-forward layers 812.
Having passed the context tensor through the feed-forward layer 812 may adapt the context tensor for the processing by a further attention layer of the one or more further blocks 8-114 for applying a self-attention filter, preferably multi-head self-attention 858. The context vector after being transformed by the layer normalization 816 and the feed-forward layer 812 may be referred to as hidden state.
The obtained numerical representation may be the modified representation referenced in the context of FIG. 2. To the obtained context tensor, absolute position embedding 8-140 may be added. This may comprise adding a positional factor indicative of a position of at least one building block within the sequence associated with the indication of the structure of the material, in particular the target material. The positional factor may in particular be indicative of an absolute position of at least one building block within the sequence associated with the indication of the structure of the material, in particular the target material. An example of the absolute position may be seen in FIG. 7B. Hence, a numerical representation of the two or more building blocks, a relation between the two or more building blocks and an indication of the absolute position of the two or more building blocks within the sequence associated with the indication of the structure of the material, in particular the target material may be obtained. This numerical representation may be provided to a softmax layer 832. The softmax layer 832 may be configured to apply the softmax function to one or more entries of the provided numerical representation. The so-obtained numerical representation of the two or more building blocks, a relation between the two or more building blocks and an indication of the absolute position of the two or more building blocks within the sequence associated with the indication of the structure of the material, in particular the target material may be provided to determine a property associated with the material, e.g. to a property data-driven model. The property data-driven model may be trained based on historical numerical representations of the structure of materials and corresponding properties. The property data-driven model may be configured to provide properties of materials in response to receiving numerical representations of the structure of materials. For example, the property data-driven model may be a classification model. The property data-driven model may comprise one or more layer(s). The one or more layer(s) may be configured to map numerical representations of structures of materials to properties of said materials.
In an embodiment, the multi-head self-attention 808 may be masked multi-head self-attention. Masked multihead self-attention 864 corresponds to the multi-head self-attention 858 as described above with additionally masking a part of the embedded input data associated with building blocks later in the sequence than the building block to be generated. Additionally or alternatively, the part of the input data associated with building blocks later in the sequence than the building block to be generated may not be received and/or transformed into the embedded input data. Thus, the representation data-driven model 8-148 may be configured to generate a subsequent building block to a sequence.
FIG. 8B illustrates an embodiment of a representation data-driven model 8-148.
In an embodiment, the representation data-driven model 8-148 may comprise a recurrent neural network. Tokenization 8-152, context embedding 814, position embedding 806 and the property determination 804 may be analogous to the description of FIG. 8A.
The at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148. The representation data-driven model 8-148 may process the at least two numerical representations by one or more hidden layer(s) 818. The at least two numerical representations may be provided in parts at different points in time, i.e. time steps. A first part of the at least two numerical representations, in particular a first element per numerical representation may be provided to the representation data-driven model 8-148 at a first time step. A second part of the at least two numerical representations, in particular a second element per numerical representation may be provided to the representation data-driven model 8-148 at a second time step. Further, the one or more hidden layer(s) 818 may be provided by output of the one or more hidden layer(s) from a previous mapping 8-258, in particular output from the first time step at the second time step. Analogously, a third part may be provided together with output from the one or more hidden layer(s) 818 obtained by providing the second part to the one or more hidden layer(s) 818. The one or more hidden layer(s) may be configured to map the at least two numerical representations to at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks. The numerical representation of the two or more building blocks and the relation between the two or more building blocks may be provided to the property determination 804, in particular the property data-driven model.
FIG. 8C illustrates an embodiment of a representation data-driven model 8-148.
In an embodiment, the representation data-driven model 8-148 may comprise a convolutional neural network. Tokenization 8-152, context embedding 814, position embedding 806 and the property determination 804 may be analogous to the description of FIG. 8A.
The at least two numerical representations of the two or more building blocks and the relation between the two or more building blocks may be provided to the representation data-driven model 8-148. The representation data-driven model 8-148 may process the at least two numerical representations by one or more convolutional layer(s) 822, one or more pooling layer(s) 824, one or more fully-connected layer(s) 826 to a numerical representation of the two or more building blocks and the relation between the two or more building blocks. The one or more convolutional layer(s) 822 may be configured to change a format associated with the at least two numerical representations and may be configured to combine the at least two numerical representations into one numerical representation, e.g. by concatentating the at least two numerical representations. The one or more pooling layer(s) 824 may be configured to change a dimensionaltiy associated with the at least two numerical representations. The one or more fully-connected layer(s) 826 may be configured to modify the numerical values associated with the output from the one or more pooling layer(s) 824 and/or one or more convolutional layer(s) 822. Processing the at least two numerical representations by the one or more convolutional layer(s) 822, the one or more pooling layer(s) 824 and/or the one or more fully-connected layer(s) 826 may comprise mapping the at least two numerical representations to one numerical representation of the two or more building blocks and the relation between the two or more building blocks and modifying the numerical representation of the two or more building blocks and the relation between the two or more building blocks. The numerical representation of the two or more building blocks and the relation between the two or more building blocks may be provided to the property determination 804, in particular the property data-driven model.
FIG. 9A illustrates an embodiment of training and/or deploying the representation data-driven model 902.
The representation data-driven model 902 may be associated with an architecture as described within the context of FIG. 8A- FIG. 8C. The output data generated by the representation data-driven model 902 may comprise of a numerical representation of one or more building blocks, in particular corresponding to the building blocks of the sequence provided. The representation data-driven model 902 may be provided with a numerical representation of the indication of the structure of the material, in particular the target material. The numerical representation of the indication of the structure of the material, in particular the target material may comprise N token(s), i.e. building blocks.
In an embodiment, the representation data-driven model 902 may be trained to generate building blocks corresponding to the numerical representation of sequence of building blocks associated with the indication of the structure of the material, in particular the target material 924, i.e. generate one of the building blocks 1-N. During training the representation data-driven model 8-148 may combine the numerical representations of the two or more building blocks and the relation between the two or more building blocks to at least one numerical representation of the two or more building blocks and the relation between the two or more building blocks as described above. The representation data-driven model 8-148 may be trained to generate and/or provide the modified numerical representation of the two or more building blocks and the relation between the two or more building blocks. The numerical representation of the two or more building blocks and the relation between the two or more building blocks may be combined with a numerical representation of the position of the two or more building blocks, e.g. as provided by an absolute position embedding 934. The two or more building blocks may be part of one or more sequence(s), in particular associated with the indication of the structure of the material. Hence, the two or more building blocks may be associated with a position within the one or more sequences. The numerical representation of the position may be indicative of the position of the two or more building blocks within the indication of the structure of the material, in particular the target material, in particular within the structure of the material. The numerical representation of the position of the two or more building blocks may be a representation of an absolute position of the two or more building blocks, i.e. independent of other building blocks. The numerical representation of the position of the two or more building blocks and/or the numerical representation of the two or more building blocks and/or the numerical representation of the relation between the two or more building blocks may be described in further detail in the context of FIG. 7A and FIG. 7B.
The numerical representation of the position of the two or more building blocks and the modified numerical representation may be combined. Preferably, the numerical values of the numerical representation of the position of the two or more building blocks may be added to the numerical values of the modified numerical representation. Hence, one numerical representation of the two or more building blocks, the relation between the two or more building blocks and the position of the two or more building blocks may be obtained.
The to be generated building block may be a masked building block. The masked building block may indicate that the building block at the position of the masked building block may be generated corresponding to the sequence of 1-N building blocks. The masked building block may be different from the building blocks used for representing the structure of materials. The masked building block may be independent of a building block of a structure of a material.
Training the representation data-driven model 902 may comprise providing, in particular historical, numerical representations of sequences of building blocks to the representation data-driven model 916. The, in particular historical, numerical representations of sequences of building blocks may comprise one or more masked token(s) per sequence. The representation data-driven model 916 may generate a building block corresponding to the sequences, i.e. propose a building block K, wherein K may be within a range between 1 and N. The proposed building blocks associated with the one or more masked building block(s) may be compared to target building blocks. The target building blocks may be specified in a training data set. The training data set may comprise the in particular historical, numerical representations of sequences of building blocks. For training the representation data-driven model 902, at least a part of the building blocks associated with the in particular historical, numerical representations of sequences of building blocks may be exchanged by masked building blocks. The building blocks specified by the numerical representation of the indication of the structure of the material, in particular the target material before the exchange at the position of the masked building blocks may be target building blocks. Upon training, parameters associated with the representation data-driven model 902 may be updated to decrease a deviation of the proposed building blocks from the target building blocks. By doing so, the representation data-driven model 902 may be trained to generate building blocks at different positions of a sequence. Hence, the representation data-driven model 902 may be trained to determine the best suited building blocks for masked building blocks based on received sequences comprising the masked building blocks.
In an embodiment, the context embedding 814 and/or the position embedding 806 may be trained together with the representation data-driven model 8-148 and/or before training the representation data-driven model 8-148. Hence, the parameters associated with the context embedding 814 and/or position embedding 806 may be updated together with updating the parameters associated with the representation data-driven model 8- 148.
FIG. 9B illustrates an embodiment of training and/or deploying the representation data-driven model 916.
The representation data-driven model 916 may be associated with an architecture as described within the context of FIG. 8A. The output data generated by the representation data-driven model 902 may comprise of a numerical representation of one or more building blocks, in particular corresponding to the building blocks of the sequence provided. The representation data-driven model 916 may be provided with a numerical representation of the indication of the structure of the material, in particular the target material. The numerical representation of the indication of the structure of the material, in particular the target material may comprise N building block(s), i.e. building blocks.
In an embodiment, the representation data-driven model 916 may be configured to generate building blocks corresponding to the numerical representation of sequence of building blocks associated with the indication of the structure of the material, in particular the target material 924, i.e. generate a further building block N+1 to the sequence of building blocks 1-N. In particular, the representation data-driven model 916 may be trained to generate a following building block N+1 to the building block N. Said representation data-driven model 916 may be associated with an encoder-decoder or decoder architecture. Training the representation data-driven model 916 may comprise providing, in particular historical, numerical representations of sequences of building blocks to the representation data-driven model 916. The representation data-driven model 916 may generate further building blocks to the sequences, i.e. propose further building blocks. The proposed and/or further building blocks generated by the representation data-driven model 916 may be compared to target building blocks. The target building blocks may be specified in a training data set. The training data set may comprise the in particular historical, numerical representations of sequences of building blocks and corresponding target building blocks. In an embodiment, the in particular historical, numerical representations may represent fractions of sequences of building blocks and the target building blocks may be the building blocks following the provided fractions of sequences. In an embodiment, a further building block N+2 may be generated by the representation data-driven model 916 upon receiving a sequence of N+1 building blocks, wherein the N+1 -th building block may be generated by the representation data-driven model 916. Upon training, parameters associated with the representation data-driven model 916 may be updated to decrease a deviation of the proposed building blocks from the target building blocks. By doing so, the representation data-driven model 916 may be trained to generate following building blocks of a sequence.
FIG. 10 illustrates an embodiment of a vocabulary length and an accuracy of determining properties of materials according to the methods as described herein.
FIG. 10 may show the number of component associated with one building block. It can be seen that the majority of building blocks, i.e. building blocks may be associated with several component. Throughout building of the tokenizing engine 302 by training the tokenizer, building the representation engine 308 by training the representation data-driven model, and building the property determination engine 306 by training a property data-driven model, the properties of materials, in particular biological materials characterized by a sequence of amino acids provides an improved accuracy compared with other models available in the state of the art. Said other models are associated with a higher number of parameters. Fol lowingly, the model as presented herein may be at least as accurate as models with the higher number of parameters using less resources during inference.
The present disclosure has been described in conjunction with preferred embodiments and examples as well. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the claims. Notably, in particular, the any steps presented can be performed in any order, i.e. the present invention is not limited to a specific order of these steps. Moreover, it is also not required that the different steps are performed at a certain place or at one node of a distributed system, i.e. each of the steps may be performed at different nodes using different equipment/data processing.
As used herein ..determining" also includes ..initiating or causing to determine", "generating" also includes ..initiating and/or causing to generate" and "providing” also includes "initiating or causing to determine, generate, select, send and/or receive”. "Initiating or causing to perform an action” includes any processing signal that triggers a computing node or device to perform the respective action.
In the claims as well as in the description the word "comprising” does not exclude other building blocks or steps. The indefinite article "a” or "an” and the definite article "the” does not exclude a plurality. In particular, indefinite article "a” or "an” may be replaced with one or more and the definite article "the” may be replaced with the one or more. A single building block or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.
Any disclosure and embodiments described herein relate to the methods, the systems, devices, the computer program building block lined out above and vice versa. Advantageously, the benefits provided by any of the embodiments and examples equally apply to all other embodiments and examples and vice versa. Providing in the scope of this disclosure may include any interface configured to provide data. This may include an application programming interface, a human-machine interface such as a display and/or a software module interface. Providing may include communication of data or submission of data to the interface, in particular display to a user or use of the data by the receiving entity.

Claims

CLAIMS What is claimed is:
1 . A computer-implemented method for obtaining one or more data-driven model(s) for determining a target property of a target biological material, the method comprising: obtaining, in particular receiving a plurality of historical numerical representations of a structure of one or more material(s), wherein at least one of the historical numerical representations is associated with two or more building blocks, obtaining, in particular receiving a plurality of indications of the structure of the material(s), obtaining, in particular receiving at least one numerical representation of a position of the two or more building blocks within a structure of the materials, providing the indications of the structure of the material(s), the historical numerical representations of the structure of the material and the at least one historical numerical representation of the position to the one or more data-driven model(s) for training the one or more data-driven model(s) to relate the indications of the structure of the material(s) to the numerical representations, optionally providing the one or more data-driven model(s) for providing numerical representations to determine the target property from the numerical representations.
2. The method of claim 1 , wherein the position of the two or more building blocks may be indicative of and/or may be an absolute position of the two or more building blocks within the structure of the material.
3. The method of claim 1 or 2, wherein training the one or more data-driven model(s) may further include combining the historical numerical representations of the structure of the material with a historical numerical representation of a relation between the two or more building blocks and combining the historical numerical representations of the structure of the material and the relation between the two or more building blocks with the historical numerical representations of the position of the two or more building blocks.
4. A computer-implemented method for determining a target property of a target biological material, the method comprising: obtaining, in particular receiving, a numerical representation of a structure of the target biological material associated with two or more building blocks, processing the numerical representation by one or more data-driven model(s), wherein the one or more data-driven model(s) are obtained by claim 1 , mapping the numerical representation associated with the two or more building blocks to an indication of the target property of the target biological material at least partially by the one or more data-driven models, providing the indication of the target property of the target biological material.
5. The method of any one of claims 1 to 4, wherein the numerical representation of the structure of the material, in particular the target material, associated with the two or more building blocks is obtained by providing an indication of a structure of the material, in particular the target material, identifying the two or more building blocks of the material, in particular target material associated with the indication of the structure of the material, in particular the target material according to a vocabulary indicative of a plurality of building blocks associated with structures of materials, and mapping the identified two or more building blocks to a numerical representation of the two or more building blocks by one or more embedding layer(s), wherein the one or more embedding layer(s) are configured to map data to numerical representations of the data.
6. The method of claim 5, wherein the identified two or more building blocks are mapped separately to a numerical representation of the two or more building blocks by the one or more embedding layer(s).
7. The method of any one of claims 4 to 6, wherein the numerical representation of the structure of the target material is obtained by providing an indication of the structure of the target material, obtaining a structure of the target material from the indication of the structure of the target material, mapping the structure of the target material to the numerical representation of the structure of the target material.
8. The method of any one of claims 1 to 7, wherein the structure of the one or more material(s), in particular the target material, is associated with a sequence of the two or more building blocks.
9. The method of any one of claims 1 to 8, further comprising reducing the numerical representation of the structure of the target material and/or the at least one numerical representation of the position of the two or more building blocks by providing the numerical representation of the two or more building blocks and/or the numerical representation of the position of the two or more building blocks to one or more embedding layer(s).
10. The method of any one of claims 1 to 9, wherein the one or more data-driven model(s) comprise a representation data-driven model and a property data-driven model.
11. The method of any one of claims 4 to 10, further comprising providing a numerical representation of a relation between the two or more building blocks and wherein processing the numerical representation by one or more data-driven model(s) comprises generating the numerical representation of the two or more building blocks associated with the target material and a relation between the two or more building blocks by the one or more data-driven model(s), and wherein the numerical representation of the structure of the target material is generated from the numerical representation of the numerical representation of the two or more building blocks associated with the target material and the relation between the two or more building blocks.
12. An apparatus for determining a target property of a target material, the apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, configure the apparatus to perform any one of the methods of any one of claims 1 to 11 .
13. Use of an indication of a target property of a target material as obtained by any one of the methods of any one of claims 4 to 11 for determining a target property of a target material.
14. Use of a numerical representation of a position of two or more building blocks associated with the structure of one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material according to any one of claims 1 -11.
15. A numerical representation of two or more building blocks associated with the one or more material(s) and a position of two or more building blocks associated with the structure of the one or more material(s) for training one or more data-driven model(s) for determining a target property of a target material according to any one of claims 1-11.
PCT/EP2025/059233 2024-04-12 2025-04-04 Determining properties of materials Pending WO2025214888A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
EP24170052 2024-04-12
EP24170055 2024-04-12
EP24170052.5 2024-04-12
EP24170041 2024-04-12
EP24170055.8 2024-04-12
EP24170041.8 2024-04-12

Publications (1)

Publication Number Publication Date
WO2025214888A1 true WO2025214888A1 (en) 2025-10-16

Family

ID=95248899

Family Applications (3)

Application Number Title Priority Date Filing Date
PCT/EP2025/059233 Pending WO2025214888A1 (en) 2024-04-12 2025-04-04 Determining properties of materials
PCT/EP2025/059238 Pending WO2025214889A1 (en) 2024-04-12 2025-04-04 Determining properties of materials
PCT/EP2025/059230 Pending WO2025214887A1 (en) 2024-04-12 2025-04-04 Determining properties of materials

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/EP2025/059238 Pending WO2025214889A1 (en) 2024-04-12 2025-04-04 Determining properties of materials
PCT/EP2025/059230 Pending WO2025214887A1 (en) 2024-04-12 2025-04-04 Determining properties of materials

Country Status (1)

Country Link
WO (3) WO2025214888A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115424663A (en) * 2022-10-14 2022-12-02 徐州工业职业技术学院 RNA modification site prediction method based on attention bidirectional representation model
WO2023156543A1 (en) * 2022-02-16 2023-08-24 Basf Se Method for predicting a technical application property of a polymer
WO2024054900A1 (en) * 2022-09-07 2024-03-14 Georgia Tech Research Corporation Systems and methods for predicting polymer properties

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023156543A1 (en) * 2022-02-16 2023-08-24 Basf Se Method for predicting a technical application property of a polymer
WO2024054900A1 (en) * 2022-09-07 2024-03-14 Georgia Tech Research Corporation Systems and methods for predicting polymer properties
CN115424663A (en) * 2022-10-14 2022-12-02 徐州工业职业技术学院 RNA modification site prediction method based on attention bidirectional representation model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANGWEN XU ET AL: "TransPolymer: a Transformer-based language model for polymer property predictions", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 April 2023 (2023-04-26), XP091493835, DOI: 10.1038/S41524-023-01016-5 *
WANG XINYI ET AL: "Self-Attention Based Neural Network for Predicting RNA-Protein Binding Sites", vol. 20, no. 2, 1 March 2023 (2023-03-01), US, pages 1469 - 1479, XP093210526, ISSN: 1545-5963, Retrieved from the Internet <URL:https://ieeexplore.ieee.org/stampPDF/getPDF.jsp?tp=&arnumber=9878191&ref=aHR0cHM6Ly9pZWVleHBsb3JlLmllZWUub3JnL2RvY3VtZW50Lzk4NzgxOTE=> DOI: 10.1109/TCBB.2022.3204661 *

Also Published As

Publication number Publication date
WO2025214887A1 (en) 2025-10-16
WO2025214889A1 (en) 2025-10-16

Similar Documents

Publication Publication Date Title
Kumar et al. An enhanced protein secondary structure prediction using deep learning framework on hybrid profile based features
CN110990596B (en) Multi-mode hash retrieval method and system based on self-adaptive quantization
Zhang et al. Identification of DNA–protein binding sites by bootstrap multiple convolutional neural networks on sequence information
CN116756605B (en) Automatic speech step recognition method, system, equipment and medium based on ERNIE _CN-GRU
CN114943034B (en) Intelligent news recommendation method and system based on fine-grained aspect features
Wang et al. Utilizing VQ-VAE for end-to-end health indicator generation in predicting rolling bearing RUL
Zhang et al. Comq: A backpropagation-free algorithm for post-training quantization
CN114707509A (en) Traffic named entity recognition method and device, computer equipment and storage medium
CN112035607B (en) Citation difference matching method, device and storage medium based on MG-LSTM
Zhao et al. Gene Ontology aided compound protein binding affinity prediction using BERT encoding
WO2025214888A1 (en) Determining properties of materials
CN115826947B (en) A method name recommendation method based on naming pattern
CN114492395B (en) A joint entity disambiguation method and system for knowledge graph
CN117725486A (en) Functional material non-standard data processing and identification methods and related devices based on machine learning
Sonsare et al. A novel approach for protein secondary structure prediction using encoder–decoder with attention mechanism model
CN112633464B (en) Computing system and method for identifying images
Liu et al. Ptq-so: A scale optimization-based approach for post-training quantization of edge computing
Ding et al. Keyword Mamba: Spoken keyword spotting with state space models
Ruan A Comparison of Long Short-Term Memory, Convolutional Neural Network, Transformer, and Mamba Models for Sentiment Analysis
CN119049540B (en) A method for predicting protein secondary structure based on single amino acid sequence features
Varmantchaonala et al. QCSE: A Pretrained Quantum Context-Sensitive Word Embedding for Natural Language Processing
CN119763718B (en) Multi-mode medicine molecule prediction method based on knowledge prompt
CN120072105B (en) Task-specific drug molecule activity cliff prediction method
CN113887225A (en) A word processing method and device based on a multi-task model
US12040050B1 (en) Systems and methods for rational protein engineering with deep representation learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 25715745

Country of ref document: EP

Kind code of ref document: A1