WO2023198592A1

WO2023198592A1 - Method of determining a composition of molecule fragments via a combined experimental – machine learning approach, corresponding data processing circuit and computer program

Info

Publication number: WO2023198592A1
Application number: PCT/EP2023/059148
Authority: WO
Inventors: Martin ELSTNER; Torsten Heinemann
Original assignee: Covestro Deutschland Ag
Priority date: 2022-04-14
Filing date: 2023-04-06
Publication date: 2023-10-19

Abstract

The present invention generally relates to a method of determining a composition of molecule fragments via a combined experimental –machine learning approach and a corresponding data processing circuit and computer program. The method comprises the step of recording sample spectroscopic data of a sample mixture of molecule fragments. The method also comprises the step of determining data sets of expected spectroscopic data. The method further comprises the step of determining matching levels between the sample spectroscopic data and expected spectroscopic data of data sets stored in a data storage device. The data sets each comprise at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto. In at least one of the steps of determining data sets of expected spectroscopic data and determining matching levels between the sample spectroscopic data and expected spectroscopic data, the determining is carried out using a deep neural network, the deep neural network being trained based on machine learning. Furthermore, the method comprises the step of outputting data identifying a molecule fragment composition of a particular data set for which associated respective expected spectroscopic data a matching level is determined to reach or exceed a predetermined matching threshold.

Description

Method of determining a composition of molecule fragments via a combined experimental - machine learning approach, corresponding data processing circuit and computer program

The present invention generally relates to a method of determining a composition of molecule fragments via a combined experimental - machine learning approach and a corresponding data processing circuit and computer program.

In industrial chemical processes appropriate and accurate analysis of chemical compounds (mixtures) is of superordinate importance. Knowing the exact ingredients of a compound provides the possibility to adapt the chemical processes such that the yield of a treatment may be maximized since the treatment may be precisely tailored in view of the specific ingredients of the compound.

Utilizing mass spectrometry, the relative occurrences of different elemental species within a compound may generally be detected. However, mass spectrometry only provides a superposed measuring signal in the sense that the compound under test may comprise different molecular fragments each having different molecular geometries and different ground state energies. Accordingly, due to the superposition of the relative occurrences of different elemental species, different compounds generally distinguishing from each other with regard to the respective ingredient properties or relative amounts of ingredients may still result in similar measuring signals when being inspected utilizing mass spectrometry. Accordingly, additional measurement techniques are usually applied to better characterize a mixture under test.

In this regard, WO 2012/146787 Al discloses a method for analyzing mass spectral data obtained from a sample in two-dimensional mass spectrometry. In effect, mass spectral data of an analyte are compared with mass spectral data of candidate compounds of known structures stored in a data library. Then, multiple candidate compounds are identified from the library based on similarities of mass spectral data. Moreover, for each candidate compound, a value of at least one analytical property is predicted using a quantitative model based on a plurality of molecular descriptors. Also, a match score is calculated for each candidate compound based on the value predicted and a measured value of the analytical property for the analyte. The analysis is being performed in a computer-assisted fashion. However, the analysis is based on a complex technique making use of multiple molecular descriptors.

WO 2018/066587 Al discloses a system for quantifying a composition of a target sample scanned by a first sensor type. A reference database is compared to a custom database and multiple modules are applied. The first sensor type is modelled, predetermined sample compositions are modelled under standard conditions, and a composition of the target sample is estimated based on the scan output utilizing a modified analytical model. However, in effect a calibration process needs to be applied in order to appropriately model the first sensor type and the data measured thereof. The calibration process is included within multiple different stages of the determination procedure.

Thus, the complexity of the analysis methods according to the prior art is high and the computational expenses are large. Accordingly, the objective technical problem to be solved may be considered to overcome or at least to reduce the disadvantages according to the prior art. In particular, a need may be considered to exist for a method, a data processing circuit, and a computer program which allow a specific composition to be determined based on acquired data at reduced complexity and at reduced computational expenses.

The objective technical problem is solved by the subject matter according to the independent claims. Additional embodiments are indicated within the dependent claims and the following description, each of which, individually or in combination, may represent aspects of the disclosure. Some aspects of the present disclosure are presented with regard to methods, others with regard to respective devices. However, the features are correspondingly to be transferred vice versa. A summary of certain embodiments disclosed herein is set forth below. It should be understood that these aspects are presented merely to provide a brief summary of these embodiments and that these aspects are not intended to limit the scope of this disclosure. This disclosure may encompass a variety of aspects that may not be set forth below.

According to an aspect of the present invention, a method of determining a composition of molecule fragments via a combined experimental - machine learning approach is provided. The method comprises at least the steps of:

51 experimentally recording sample spectroscopic data of a sample mixture of molecule fragments under test;

52 determining data sets of expected spectroscopic data, each data set comprising at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto;

53 determining matching levels between the sample spectroscopic data and the expected spectroscopic data; and

54 outputting data identifying a molecule fragment composition of a particular data set for which associated respective expected spectroscopic data a matching level is determined in step S3 to reach or to exceed a predetermined matching threshold; wherein in at least one of the steps S2 and S3 the determining is carried out using a deep neural network, the deep neural network being trained based on machine learning. Step SI may be carried out by at least one sensing device.

As mentioned above, within the method according to the invention, in at least one of the steps

S2 (determining data sets of expected spectroscopic data, each data set comprising at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto) or S3 (determining matching levels between the sample spectroscopic data and the expected spectroscopic data) the determining is carried out by using a deep neural network. This may be understood in a way that both determination steps S2 and S3 may be carried out using a deep neural network. Alternatively, only one of the steps S2 or S3 may be carried out using a deep neural network.

According to the present invention, the deep neural network, which is used to carry out the determining in step S2 and/or in step S3, is trained based on machine learning. Generally, training a deep neural network based on machine learning is known in the art and - by way of example - described in e.g. "AN1-: an extensible neural network potential with DFT accuracy at force field computational cost" (J.S. Smith, 0. Isayev, A.E. Roitberg, Chemical Science 4, 2017 (https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a)).

Accordingly, a method is provided, for which a stringent and compact identification procedure is established for assigning the sample mixture of molecule fragments under test to represent a specific molecule fragment composition. The identification procedure is based on expected spectroscopic data which have been acquired prior to the actual measurement action. However, this provides the possibility to implement straightforward assignment techniques. The method does not rely on a specific spectroscopic technique for which particular properties of a sensing device need to be taken into account for determining the expected spectroscopic data. Moreover, the method also omits the need for including molecular descriptors or calibration measurements within the identification procedure. Hence, the complexity of the method and the computational expenses are advantageously reduced.

Several methods for determining the composition of polymer formulations are known from the art. For example, US 2022/0099566 Al discloses a method for deformulation of spectra for arbitrary compound formulations such as polymer formulations. However, these methods have several drawbacks. In particular, the method disclosed in US 2022/0099566 Al relies on experimental spectra to train a classification model for classification of individual chemical components. However, these spectra need to be acquired from pure substances and mixtures of pure substances which is often not feasible since many polymeric materials cannot be separated in practice. Also, this method lacks the aspect of pre-computing fragments and training the analysis neural network on pre-computed spectra, i.e. the expected spectroscopic data. Whereas the method according to the present invention is capable of simulating many binary combinations of fragments and thus capturing many functional group interactions that can shift the spectral lines, the method known from the prior art is incapable of considering binary interactions due to insufficient training data for the machine learning model applied in this method.

A method for "intelligently building Rahman spectroscopy data" using machine learning is known from CN 113 378 680 A. In this method training data are generated by mixing simulated spectra from pure substances. However, these simulated spectra do not include e.g. neighbour effects (e.g. H-bridges) that would influence the spectra. Accordingly, this method yields inaccurate results.

The publication "Deep Learning for Vibrational Spectral Analysis: Recent Progress and a Practical Guide" by Yang Jie et al. in ANALYTICA CHIMICA ACTA, ELSEVIER, AMSTERDAM, NL, vol. 1081, 8 June 2019 features some general aspects of spectra calculation using machine learning methods. Here, theoretical spectra are calculated with classical methods, like DFT. The drawback of this method is that as a pure statistical method it would need an accuracy of 0.01 kcal/mol which is almost unachievable in the present context.

Within the present context, a composition may be regarded as a mixture of at least two, in particular multiple, different chemical substances. Typical candidates include solutions, suspensions, and colloids. The chemical substances may comprise pure elemental species or, more usual, inorganic or organic molecules, each comprising atoms of a single or different elemental species. Within the present context, a molecule fragment may be regarded as a partial portion of a generally unstable molecular ion dissociated after the application of an ionization process. For example, utilizing mass spectrometry may result in dissociating energetically unstable molecules into various different molecule fragments. In this regard, during the mass spectrometry procedure, the molecule may be excited from a ground state into an ionized excited energy state. Since the excited state represents an instable state of the molecule, the molecule may deexcite by dissociating into different molecule fragments. Subsequent to the dissociation process, the molecule fragments may generally be stable.

By way of example and without limiting the scope of the present invention Fig. A shows of typical molecule fragments under test. For these, data sets of expected spectroscopic data may be determined in accordance with step S2 yielding at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto.

If the two fragments shown in Fig. A are mixed in a real solution the keto-oxygen and the alcohol-oxygen could form an H-bridge (a special case of a weak interaction, see Fig. B). To include such effects, data sets of expected spectroscopic of various fragment mixtures need to be calculated. One further specific advantage of the method according to the invention is that the model applied does not need to know about the potential H-bridge from exactly the two fragments shown in Fig. A. In the present example it is sufficient to have some other keto-oxygen and alcohol-oxygen combination in the training data set for the trained deep neural network. The effect can then be extrapolated to other combinations from the model applied. However, the fragmentation of the underlying molecule may lead to difficulties when trying to identify the underlying molecule based on data measured in view of the molecule fragments after the ionization process. The complexity of the identification process is further enhanced since usually a compound is investigated comprising molecule fragments emerging from different molecules. This is one of the reasons why an efficient identification procedure is sought.

Thus, within the present context, a sample mixture of molecule fragments may be considered to represent a collection of different molecule fragments being arranged within a single collection. The mixture may be fluid, gaseous, solid, or in a hybrid state (plasma). It is assumed that the molecule fragments are statistically (evenly) distributed within the mixture. However, this does not necessarily need to be the case as for example due to polarization effects between the molecule fragments or due to other types of interaction, such as interface effects, deviations from an even distribution may occur. While a statistical distribution makes determination processes easier, deviations may be taken into account if, for example systematic errors, occur.

Within the present context, "experimentally recording" sample spectroscopic data may be understood as measuring, in particular with at least one sensing device, spectroscopic data of a sample mixture of molecule fragments. For example, the sensing device may be configured to utilize at least one out of an infrared, an ultraviolet, a nuclear magnetic resonance, a mass, a Raman, and a gas chromatography spectroscopy technique and obtain corresponding data. The data obtained by the sensing device may then be provided for enabling further processing of the data, for example, by utilizing one or more of a data processing circuit, a data storage device, or an interface.

Within the present context, a data processing circuit may be provided by a data processor. A data processor may be understood as microcontroller, digital signal processor or ASIC (Application Specific Integrated Circuit) and/or a programmable logic circuit (for example a FPGA = Field Programmable Gate Array or a CPLD = Complex Programmable Logic Device)). Optionally, the data processing circuit may be coupled with a data storage device and may access the data sets stored therein. In this context, the data processing circuit may comprise a computer program. Optionally, the computer program may be stored within a computer-readable medium connectable to the data processing circuit and executable by the data processing circuit.

Within the present context, a data storage device may be considered as hardware that is used for storing, porting or extracting data files and objects. The data storage device may hold and store information both temporarily and permanently. The data storage device may be internal or external to a computer, server or computing device. For example, the data storage unit may be a physical unit, such as a part of a computer, or a virtual unit, for example in the case of cloud computing storage. For example, the data storage device may store data sets of expected spectroscopic data, the data sets comprising at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto. Optionally, the data storage device may be internal to the data processing circuit. The data storage device may comprise one or more of RAM (Random Access Memory), ROM (Read Only Memory) and/or a Flash-Memory.

Within the present context, a sensing device may be considered a measurement equipment configured to obtain the sample spectroscopic data, such as a detector, for example an infrared absorption detector or a nuclear magnetic resonance detector. Subsequently, the measured data may be provided by the sensing device for enabling further processing of the data, such as by utilizing a data processing circuit. According to an example, the detector may include or may be coupled to a light emitting device. In this case, the mixture under test may be irradiated by the light emitting device using so-called white light. This means that the mixture is irradiated with light of a wide band wavelength range at a predetermined intensity. Accordingly, light having multiple different (presumably quasi-continuous) wavelengths of the wavelength range is used to irradiate the mixture under test. For example, the wavelength range may comprise infrared light. The detector may then be configured to detect an absorption spectrum in view of the mixture under test. In other words, the detector measures for each wavelength (or at least a delimited partial wavelength range) within the wavelength range intensity levels of the light reaching the detector. Due to the intrinsic properties of the ingredients of the mixture light of specific wavelengths is generally absorbed (generally modified) by the ingredients. For example, light of a suitable wavelength may provoke an excitation of at least some of the ingredients of the mixture under test. However, due to such exemplary absorption processes occurring in view of the interaction of the light with the ingredients of the mixture, the intensity of the light of such wavelengths is reduced (generally modified) when the light reaches the detector. Usually, the intensity variations are element dependent and therefore provide the possibility to draw conclusions on the properties of the mixture under test as well as on its ingredients.

Various different measurement devices are known in the art and may be applied within the sensing device, such as transmission or reflection measurement setups or the like. Moreover, intrinsic processes may be taken into account when testing the mixture, such as luminescence or the like where an energy shift occurs between the excitation energy and the energy of the light being emitted by the luminescent molecule. However, the present method is not limited in this regard and may be applied in various different implementations regarding the sensing device.

Here, sample spectroscopic data are acquired using the at least one sensing device. Usually, the spectroscopic data may be represented using two-dimensional spectra including the absorption (or light) intensities vs. the energy of the light. The energy of the light may be represented in electron volts (eV), wavelength, or frequency. The light intensity is commonly detected using photon counters (such as photo multipliers) or the like. The photon counter is then adjusted for each measurement cycle to count the impinging photons reaching the detector within a predetermined sample period within the inspected partial wavelength range. Due to the interaction between the mixture under test and the light being irradiated thereon, intensity variations (variations of the repeated count actions) will occur depending on the energy. Therefore, the spectra usually comprise multiple various spectral features, which may be referred to as peaks (absorption maxima or minima), or the like.

For example, the detector configured to obtain the sample spectroscopic data may be a nuclear magnetic resonance detector. In this context, the mixture under test may be subjected to electromagnetic radiation. For example, the mixture under test may be subjected to an oscillating electromagnetic wave, such as a radio wave. In this context, an absorption signal may be produced by absorption of electromagnetic radiation of an appropriate frequency, causing the nuclei of the atoms of at least some of the ingredients of the mixture under test to undergo transitions from lower energy to higher energy spin states. In other words, subjecting atomic nuclei of the atoms of at least some of the ingredients of the mixture under test to an external magnetic field and applying an electromagnetic radiation with specific frequency may lead to nuclear magnetic resonance, a resonance transition between different magnetic energy levels. For example, the nuclear magnetic resonance detector may be configured to detect the absorption signals for the subsequent generation of a corresponding nuclear magnetic resonance spectrum. According to positions, intensities, and fine structure of resonance peaks in the nuclear magnetic resonance spectrum, the structures of the ingredients of the mixture under test may then be quantitatively studied. In other words, the nuclear magnetic resonance spectrum may provide the possibility to draw conclusions on the properties of the mixture under test as well as on the quantitative relations of its ingredients.

Optionally, the sensing device, such as a detector, may represent a measurement equipment configured to obtain sample spectroscopic data integrated in an industrial process. In other words, the sensing device may be adopted, optimized, and integrated in a running industrial process for the measurement of sample spectroscopic data. For example, the sample spectroscopic data may be obtained by on-line or in-line measurement. For example, the measurement of sample spectroscopic data may be applied for process control and/or quality control of an industrial process. In this context, the industrial process may be a chemical industrial process, wherein one or more chemical reactions are proceeding. The industrial process may be conducted at different scales. Preferably, the industrial process represents a large- scale industrial production process. For example, the industrial process may include, but not be limited to, the production and/or use of one or more polyols. For example, the process control and/or quality control of an industrial process includes, but is not limited to, the investigation of the ingredients of a mixture under test.

Within the present context, a matching level may be considered to represent a degree of accordance between two compared units. Here, the sample spectroscopic data are compared to the expected spectroscopic data. This comparison is generally drawn in view of the spectral features included within the sample spectra derived of the sample spectroscopic data as well as the expected spectra derived of the expected spectroscopic data. In other words, the peaks of the different spectra are compared to each other with regard to their intensity as well as with regard to the energy at which they occur. Optionally further characteristics may be taken into account, such as the peak width or the like. The comparison is usually performed making use of mathematical fit functions such as e. g. polynomial functions of a specific degree. The fit function may be applied so as to evaluate which modifications need to be made to the expected spectra to adapt its shape to the sample spectra. For example, the outcome of such fitting procedures may be coefficients of the fit function representing those coefficients which need to be applied to adapt the expected spectra to the sample spectra. Ideally, the factor would be 1 which would mean the sample spectra and the expected spectra are identical. However, this is commonly not the case. Anyhow, based on the result of the fitting procedure between the spectra, a matching level may be derived thereof representing the degree of accordance. In this regard, the predetermined matching threshold may be considered to represent a matching level at which the accordance between the sample spectra and the expected spectra is so close that the spectra are considered to correspond to each other. Since the predetermined matching threshold is reached or exceeded, the respective data set, in which the corresponding expected spectroscopic data is included, may be easily selected. Since the data set comprises the information of the particular molecule fragment composition, based on which the respective expected spectroscopic data is achieved, the formerly unknown sample mixture of molecule fragments may be considered to correspond to this particular molecule fragment composition. Accordingly, the formerly unknown sample mixture of molecule fragments may be efficiently determined (identified).

In an alternative, the matching procedure may not necessarily rely on the predetermined matching threshold to be reached or to be exceeded. Instead, the highest matching level may be decisive to determine similarity of the sample spectroscopic data and the expected spectroscopic data. Accordingly, the data processing circuit may output at the interface data identifying a molecule fragment composition of a particular data set for which associated expected spectroscopic data the highest matching level is determined in view of the sample spectroscopic data.

Within step S4, the output may be provided at an interface. The interface may be considered to represent a human machine interface, such as a display or the like. In effect, the data identifying a particular molecule fragment composition are provided such that a human may notice the respectively identified molecule fragment composition and/or may process the provided information in subsequent process steps, for example for use in even more sophisticated evaluation procedures.

Alternatively, or cumulatively, the matching level may be determined based on a regression algorithm. For example, a linear regression procedure, a nonlinear regression procedure, an interpolation and/or extrapolation procedure or the like may be applied to determine the matching level between the sample spectroscopic data and the expected spectroscopic data. Regression analysis techniques are tools to evaluate a degree of accordance between different data sets if there are more equations then known variables. In this case, the overall system is not analytically solvable. In other words, the regression algorithm provides the possibility to determine the matching level without the need for having all variables at hand.

Optionally, the regression algorithm is applied by a data processing circuit. Then, the matching levels may be evaluated in an efficient fashion. Moreover, the regression algorithm may also be applied automatically, if the sensing device provides sample spectroscopic data. To this end, the data processing circuit may be coupled with the data storage device and may access the data sets stored therein.

Moreover, the regression algorithm may be trained. Put differently, the regression algorithm may be applied in view of training mixtures of molecule fragments. For these training mixtures sample spectroscopic data may be achieved. Furthermore, within the data storage device, data sets obtained in view of the training mixtures may already be stored including the respective molecule fragment composition and the expected spectroscopic data associated thereto. After having received the sample spectroscopic data recorded in view of the training mixture of molecule fragments, the regression algorithm may be applied. As a result of applying the regression algorithm, a matching level is determined which should indicate that the recorded sample spectroscopic data correspond to the stored expected spectroscopic data associated to the pre-known training mixture. The regression algorithm may be fed with the information whether the matching level has been appropriately determined or not. This may for example depend on whether the obtained matching level reaches or exceeds the predetermined matching threshold. The regression algorithm may adapt its intrinsic determination procedure based on the fed information of an appropriate determination of the matching level to improve the underlying determination procedure. Several training mixtures of molecule fragments may be provided in this regard until the matching level is appropriately determined on a regular basis. Subsequently, the trained regression algorithm may be applied in an automated fashion.

Optionally, the trained regression algorithm may be applied by a deep neural network. In this regard, the deep neural network may be trained based on machine learning.

Within the present context, machine learning may be considered to represent a computer-based algorithm that can improve automatically through experience and by the use of data, which is exemplarily described hereinbefore. In other words, machine learning algorithms build a model based on training data in order to make predictions or decisions without being explicitly programmed to do so. Then, procedures learned during the training phase of the algorithm may be applied also during the application phase and even be extended so as to include additional parameters within the decision-making process.

Within the present context, a deep neural network consists of a large number of connected neurons organized in layers. Deep neural networks allow features to be learned automatically from training examples. In this regard, a neural network is considered to be "deep" if it has an input and output layer and at least one hidden middle layer. Each node is calculated from the weighted inputs from multiple nodes in the previous layer. Put differently, during learning, the algorithm follows a randomly initialized policy describing the mechanism of the deep neural network. The weights of all neurons may be considered to represent a specific mapping policy from an input state to an output state of the neural network. During training, the mapping policy may be altered based on the fed information of an appropriate/inappropriate determination of the matching level to adapt the weights of the neurons relative to each other. Consequently, this leads to a modified output state of the neural network and, may potentially, lead to an improved determination of the matching level. The accuracy of the matching level may be tested during the training phase since the training mixtures of molecule fragments are known per se. Accordingly, a powerful tool is achieved for determining the matching level also outside a training environment such that the computational expenses may be kept low. Optionally, within the context of training a deep neural network, a generative adversarial network may be applied. Generally, generative adversarial networks may be used for different applications, such as semantic image editing, style transfer, image synthesis, image superresolution and classification. Within the present context, the image may be a spectrum received from the experimental measurement of sample spectroscopic data or from the determination of data sets of expected spectroscopic data. In this context, the generative adversarial network may achieve learning through deriving back propagation signals through a competitive process of two networks: the discriminator and the generator. In this context, the generator represents a network that can generate images using random noise, whereas the discriminator represents a discriminant network that can determine whether a given image belongs to a real distribution, or not. The discriminator may receive an input image and produce a corresponding output, representing the probability that the image belongs to a real image distribution. In this context, in the case of an output value of 1, the probability of a real image distribution may be high, whereas an output of 0 may indicate a fake image distribution. Ideally, obtaining a value of 0.5 may indicate that the discriminator is unable to differentiate fake and real sample. During the training process of the generative adversarial network, parameters of both the generator and the discriminator networks may be updated iteratively.

By way of example, the neural network, which according to the present invention is used for at least one of the determining carried out in steps S2 and S3, may be trained by spectral data (labels) and the atomic environment may be described by radial basis functions around each atom, considering a set of neighbor atoms as descriptors (with a fixed cut-off radius). This descriptors may be generated for various molecular configurations and fed into the neural network. The neural network consists of input layer, output layer and one or more hidden layers. Preferably, the steepest descent optimization is employed to modify the weights of each neurons activation function until a convergence towards the desired output (spectral feature recall) is achieved. The neural network may preferably be coded in a tensor-based framework (pyTorch). The skilled artisan is familiar with relevant publications that relate to similar approaches, e.g. "AN1-: an extensible neural network potential with DFT accuracy at force field computational cost" (J.S. Smith, 0. Isayev, A.E. Roitberg, published in Chemical Science 4, 2017 (https://pubs.rsc.org/en/content/articlelanding/2017/sc/c6sc05720a))

Preferably, the sample spectroscopic data is processed prior to step S3 using a statistical method to remove outliers. For example, the sample spectroscopic data may be flattened. Also, such statistical methods may be provided only partially to the sample spectroscopic data, for example outside the energy ranges for which the spectra show spectral features such as peaks. Examples for suitable statistical methods include variance analysis, mutual information and Z-score. The dimensions of the sample spectroscopic data may also be reduced to the essential spectral features. The statistical analysis also permits to identify common features or clusters in the data, for example by using scatter plots or principal component methods.

Optionally, the sample spectroscopic data and the expected spectroscopic data each comprise at least one out of infrared, ultraviolet, nuclear magnetic resonance, mass, electron, Raman, and gas chromatography spectroscopic data. These techniques are powerful for identifying certain properties of generally unknown sample mixtures of molecule fragments. All these techniques may be applied to obtain and record respective sample spectroscopic data in view of a generally unknown sample mixture. Accordingly, the recorded sample spectra may be compared to data sets previously obtained and stored within the data storage device.

Alternatively or cumulatively, the expected spectroscopic data of an associated molecule fragment composition are determined using a deep neural network. Optionally, the deep neural network may determine the expected spectroscopic data in correspondence with the underlying sensing technique applied by the sensing device.

Moreover, in case that the determining of expected spectroscopic data is carried out using a deep neural network, said deep neural network may be the same which may also be applied for determining the matching levels. However, without limiting the present disclosure, also different deep neural networks may be applied in this regard.

In particular, the deep neural network, which may be applied for determining the expected spectroscopic data of an associated molecule fragment composition, may be trained based at least on the steps of:

SO-1 determining, using a first determination method, a ground state energy and a molecular geometry for at least one chemical formula of at least one molecule fragment;

SO-2 providing, as an input to a machine learning algorithm for training the deep neural network, the at least one chemical formula of the at least one molecule fragment;

SO-3 reproducing, using a second determination method, as an output of the machine learning algorithm for training the deep neural network the ground state energy and the molecular geometry for the at least one chemical formula of the at least one molecule fragment, the second determination method being different from the first determination method.

Within the present context, the chemical formula of a molecule fragment may be considered to represent a sum formula of the respective molecule fragment. Put differently, the chemical formula may specify the relative frequency of atoms of different elements within that particular molecule fragment.

Within the present context, the ground state energy may be considered to represent the potential energy level of the particular molecule fragment when being deexcited within an environment in which external interactions with the molecule fragment may be neglected (state at absolute zero temperature, also called vacuum state).

Within the present context, the molecular geometry may be considered to represent the relative positions of the atoms of the respective molecule fragment with regard to each other. The determination methods may generally be considered different mathematical models or algorithms. For example, the determination methods may be applied for obtaining the ground state energy and the molecular geometry of a molecule fragment. Such determination methods may show varying precisions and may cause different computational costs based on the underlying modelling technique. For example, a first determination model may be based on electron densities, whereas a second determination technique may be based on simulated interactions. This may generally lead to (slightly) different results at different computational expenses. However, the first as well as the second determination method may be considered to represent techniques capable of obtaining the ground state energy and the molecular geometry of a particular molecule fragment at sufficient (predetermined) precision.

Accordingly, a training procedure is provided based on which the ground state energy and the molecular geometry are reproduced by the deep neural network for a particular molecule fragment in view of a ground state energy and a molecular geometry determined independent of the deep neural network. Put differently, machine learning is applied to train the deep neural network to adapt its determination process to an independent determination procedure. In other words, the deep neural network is supposed to adapt its determination procedure such that the results obtained thereof imitate the results obtained with regard to an independent approach.

This training procedure generally enables the deep neural network to be applied to obtain the expected spectroscopic data in view of different molecule fragments in an automated fashion since the ground state energy and the molecular geometry of a particular molecule fragment are key ingredients to obtain appropriate expected spectroscopic data.

Notably, the molecular geometry generally influences the ground state energy. Imagining an organic molecular fragment having several carbon atoms arranged as a chain and comprising multiple functional groups, the ground state energy generally depends on the locations of the functional groups. For example, if the functional groups are coupled to the same carbon atom at an end of the chain, additional interaction effects may occur such that the ground state energy of the molecule fragment having this specific molecular geometry may be higher than for a different molecular geometry if the functional groups are distributed along the carbon atom chain. These dependencies may be considered by the deep neural network when being trained.

Optionally, the first determination method may comprise at least a density functional theory (DFT)-based algorithm. Density functional theory-based algorithms may be applied to many-electron systems in order to determine the quantum mechanical ground state of the system under test. This may be achieved without solving the Schrodinger equation which generally describes the quantum mechanical state in view of its temporal development (presuming nonrelativistic systems). However, even without solving the Schrodinger equation, fundamental properties such as the ground state energy and the molecular geometry of a molecule fragment may be determined using the DFT-based algorithm. Compared to different approaches, DFT-based algorithms already provide significant computational savings. Optionally, the DFT- based algorithm in step SO-1 may be applied by the same deep neural network which is trained and reproduces the DFT-based results based on the second determination method.

Preferably, the second determination method comprises at least a force field method. Within the force field method, the potential energy is parametrized. In the following, force fields are established which are mediated by the chemical bonds of the molecule fragment under test. Additional interactions not being mediated by the chemical bonds may also be included in the model. This leads to alterations of the chemical bonds which directly influence the ground state energy which may be derived thereof as well. From this, the molecular geometry may be precisely determined. Compared to DFT-based algorithms, the computational expenses needed for applying force field methods are greatly reduced. However, the accuracy of the appropriate modelling of a molecule fragment under test depends crucially on how well the chosen force field parameters fit the properties of the respective molecule fragment.

Preferably, the deep neural network may vary force field parameters underlying a force field method applied in step SO-3 upon repetitions of that step for training the deep neural network until a confidence level between the reproduced ground state energy and the molecular geometry and the ground state energy and the molecular geometry determined in step SO-1 for the at least one chemical formula of the at least one respective molecule fragment reaches or exceeds a predetermined confidence threshold. In other words, in step SO-1 a DFT- based algorithm may be applied to obtain the ground state energy and the molecular geometry of a particular molecule fragment. However, the DFT-based algorithm still causes substantial computational expenses. Thus, in step SO-3 a force field method may be applied to reproduce the ground state energy and the molecular geometry of a particular molecule fragment in view of the DFT-based results. In this regard, machine learning is applied to appropriately adapt the force field method with regard to the underlying force field parameters such that the reproduction is achieved at a satisfying confidence level. Since the force field method crucially depends on the force field parameters applied in view of the molecule fragment under inspection, such training may efficiently reduce deviations emerging from the different determination methods. Accordingly, the mapping policy of the deep neural network may be adapted during the machine learning process such that the results obtained by the force field method reproduce the DFT-based results at a desired accuracy. This desired accuracy may be considered to be referred to as to the predetermined confidence threshold. Put differently, as soon as the deep neural network is sufficiently trained by the machine learning, the deep neural network has adapted its intrinsic mapping policy of the neurons. In the following, the force field parameters are chosen such that the reproduction of the ground state energy and the molecular geometry of the respective molecule fragment is obtained at acceptable confidence. Subsequently, the deep neural network may also be applied outside the training environment using the mapping policy obtained during the training procedure. Of course, generally the mapping policy may also be further adapted during the application phase of the deep neural network.

Optionally, the method may further comprise at least the step of determining, using the second determination method, expected spectroscopic data for each of a plurality of molecule fragment compositions. Here, for each chemical formula of a respective molecule fragment composition those specific confidence force field parameters are used based on which the predetermined confidence threshold is reached or exceeded as described above.

This means, that the trained deep neural network is applied for systems of higher complexity. Now, a molecule fragment composition is provided to the deep neural network. Each molecule fragment composition comprises different molecule fragments each having a chemical formula. Subsequently, the deep neural network is applied to determine the expected spectroscopic data, which may generally be obtained based on the ground state energy and the molecular geometry of a respective molecule fragment. Since the deep neural network has been trained accordingly, the force field parameters, for which the predetermined confidence threshold was achieved in view of the DFT-based results, are generally known to the deep neural network. These so-called confidence force field parameters are now applied to determine the expected spectroscopic data of a respective molecule fragment. This procedure is repeated for all molecule fragments included in a particular molecule fragment composition. As a result, overall expected spectroscopic data may be determined in view of the entire molecule fragment composition in question. In this regard, the overall expected spectroscopic data of a molecule fragment composition may for example represent the superposition of the individual expected spectroscopic data obtained in view of the individual molecule fragments. Such superposition may for example be assumed to occur if interactions between the individual molecule fragments do not influence the recordation of the sample spectroscopic data utilizing the sensing device. In other words, if the individual molecule fragments do not interact with each other and the recorded spectra remain uninfluenced, a superposition may be applied. Optionally, the overall expected spectroscopic data of a particular molecule fragment composition may require more sophisticated combination techniques with regard to the individual expected spectroscopic data obtained in view of the individual molecule fragments.

After the expected spectroscopic data have been determined in view of first molecule fragment composition, this procedure is repeated for the remaining molecule fragment compositions of the plurality of molecule fragment compositions. Here, the large computational savings as well as the great level of automation provide substantial advantages in view of the prior art. Both advantages are achieved by utilizing the deep neural network which performs the determination procedure in an automated fashion. Furthermore, the deep neural network applies the force field method which enables the computational expenses to be strongly reduced.

Preferably, data sets of data tuples are stored in the data storage device. Each data tuple comprises at least a specific molecule fragment composition and the expected spectroscopic data determined thereof. Accordingly, a database is established which enables an efficient comparison of sample spectroscopic data with expected spectroscopic data in view of an associated molecule fragment composition. Thus, by comparing the sample data to the expected data conclusions may be drawn whether or not the mixture under test corresponds to the molecule fragment composition associated to the expected spectral spectroscopic data.

Alternatively, or cumulatively, the molecule fragment compositions of the plurality of molecule fragment compositions distinguish from each other with respect to different relative amounts of molecule fragments and/or with respect to different molecule fragment types. This means that the different molecule fragment compositions may at least partially comprise different ingredients and/or different fractional ratios of the ingredients. Both aspects may generally lead to different expected spectroscopic data. Hence, a large database of possible molecule fragment compositions and their associated expected spectroscopic data may be established. The database may even comprise information on different molecule fragment compositions differing in multiple aspects from each other. Optionally, the molecule fragments comprise at least polyol molecule fragments.

Within the present context, a polyol may be considered an organic molecule comprising at least two alcoholic functional groups. Preferably, the polyol comprises multiple alcoholic functional groups. Preferably, the polyol represents a polymeric polyol. For example, the polyol may comprise polyether polyol such as polyethylene oxide, polyethylene glycol or polypropylene glycol, polyester, polycarbonate, or acrylic polyols. The alcoholic functional groups of the polyols may be arranged in various different ways among the underlying organic molecule. This may lead to various different molecule fragments. However, precise determination of the sample mixture is of high relevance for subsequent treatment procedures of such mixtures. For example, the quality of chemical processes crucially depends on the purity of the treated substances. Such purities may be efficiently inspected by applying the present method.

According to another aspect, a data processing circuit comprising means for carrying out at least the steps S2 to S4 of the method as described herein is provided. In particular, the data processing circuit may also comprise means for carrying out the steps SO-1 to SO-3 as described herein.

According to a different aspect, a system is provided. The system comprises at least a sensing device, a data processing circuit, a data storage device, and an interface. The data processing circuit is coupled to the sensing device, the data storage device, and the interface. The data storage device may optionally be internal of the data processing circuit. The sensing device is configured to carry out step SI as described before. The data processing circuit is configured to carry out at least steps S2 to S4 of the method as described before.

According to yet another aspect, a computer program comprising instructions which, when the program is executed by a computer, cause the computer to carry out at least the steps S2 to S4 of the method as described herein is provided. In particular, the computer program may also comprise instructions which, when the program is executed by a computer, cause the computer to carry out at least the steps SO-1 to SO-3 as described herein.

According to an additional aspect, a computer-readable medium is provided. The computer-readable medium comprises instructions which, when executed by a computer, cause the computer to carry out at least the steps S2 to S4 of the method as described herein. In particular, the computer-readable medium may also comprise instructions which, when executed by a computer, cause the computer to carry out at least the steps SO-1 to SO-3 as described herein.

The foregoing aspects and further advantages of the claimed subject matter will become more readily appreciated, as the same become better understood by reference to the following detailed description when taken in conjunction with the accompanying drawings. In the drawings,

- Fig. 1 is a schematic drawing of a method of determining a composition of molecule fragments; and

- Fig. 2 is a schematic drawing of a system for carrying out the method.

The detailed description set forth below in connection with the appended drawings, where like numerals reference like elements, is intended as a description of various embodiments of the disclosed subject matter and is not intended to represent the only embodiments. Each embodiment described in this disclosure is provided merely as an example or illustration and should not be construed as preferred or advantageous over other embodiments. The illustrative examples provided herein are not intended to be exhaustive or to limit the claimed subject matter to the precise forms disclosed. Various modifications to the described embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the described embodiments. Thus, the described embodiments are not limited to the embodiments shown but are to be accorded the widest scope consistent with the principles and features disclosed herein.

All of the features disclosed hereinafter with respect to the example embodiments and/or the accompanying figures can alone or in any subcombination be combined with features of the aspects of the present disclosure including features of preferred embodiments thereof, provided the resulting feature combination is reasonable to a person skilled in the art.

For the purposes of the present disclosure, the phrase "at least one of A, B, and C", for example, means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C), including all further possible permutations when greater than three elements are listed. In other words, the term "at least one of A and B" generally means "A and/or B", namely "A" alone, "B" alone or "A and B".

Figure 1 is a schematic drawing of a method 10 of determining a composition of molecule fragments. The method 10 comprises mandatory and optional steps. Optional steps are shown in dashed lines and may individually or in (sub)combinations be combined with the mandatory steps of the method 10.

The method 10 comprises the step 12 of recording, using at least one sensing device, sample spectroscopic data of a sample mixture of molecule fragments under test. The sample mixture of the molecule fragments may for example be provided after having been analyzed by means of a gas spectrometry setup. This setup may ionize the former unfragmented underlying molecules such that the excited molecules break up into several molecule fragments. Accordingly, the sample mixture of molecule fragments under test is generally unknown with regard to its exact ingredients and may need to be further characterized in order to enable identification thereof. The sensing device is utilized to detect corresponding sample spectroscopic data by means of a suitable measurement technique, such as for example infrared spectroscopy or nuclear magnetic resonance spectroscopy.

The method 10 further comprises a step 34 of determining expected spectroscopic data. The expected spectroscopic data may be determined using a deep neural network 20 that is trained based on a machine learning process. The deep neural network 20 may be trained in a specific fashion so as to be tailored in view of the expected spectroscopic data to be provided. In other words, the deep neural network 20 may be particularly trained such that the expected spectroscopic data may be provided at high confidence and at low computational expenses.

Subsequently, the method 10 comprises the step 14 of determining matching levels between the sample spectroscopic data and expected spectroscopic data of data sets stored in a data storage device. The data sets each comprises at least one data tuple of expected spectroscopic data and a molecule fragment composition associated thereto. In other words, it is checked to which stored expected spectroscopic data the sample spectroscopic data fits. In this regard, regression algorithms may be applied, in particular, trained regression algorithms, optionally being applied by a deep neural network. The regression algorithm may be trained in view of known sample mixtures to achieve a fitting procedure which provides appropriate matching level determination. Subsequent to the training phase, the regression algorithm may then be applied in an automated fashion once sample spectroscopic data are provided by the at least one sensing device.

The method 10 also comprises the step 16 of outputting, preferably at an interface, data identifying a molecule fragment composition of a particular data set for which associated respective expected spectroscopic data a matching level is determined in step 14 to reach or to exceed a predetermined matching threshold. Simply put, a "best fit" between the sample spectroscopic data and the expected spectroscopic data is evaluated by the regression algorithm in step 14. The predetermined matching threshold specifies when a "best fit" is considered to be achieved. Since the data tuples including the expected spectroscopic data each include the underlying molecule fragment composition as well, it may then be notified to a user or a downstream processing circuit, generally at the interface, that the sample mixture is determined to represent the specific molecule fragment composition for which associated expected spectroscopic data the predetermined matching threshold is reached or exceeded. Consequently, method 10 enables the previously unknown sample mixture of molecule fragments to be identified in an automated and reliable fashion at low computational expenses.

Optionally, the method 10 may also comprise the step 18 of processing the sample spectroscopic data prior to step 14 using a statistical method to remove outliers. The removal of outliers may only partially be performed within the dataset of the sample spectroscopic data, such as outside spectral features (peaks). Removing outliers may enable the determination of the matching levels to be achieved at greatly enhanced confidence.

The method 10 may be further developed in view of the expected spectroscopic data. These may be determined using a deep neural network 20. Optionally, the deep neural network 20 is trained in a specific fashion so as to be tailored in view of the expected spectroscopic data to be provided. In other words, the deep neural network 20 is particularly trained such that the expected spectroscopic data may be provided at high confidence and at low computational expenses.

In this regard, in optional step 22, using a first determination method, a ground state energy and a molecular geometry for at least one chemical formula of at least one molecule fragment may be determined. In particular, the first determination method may comprise a DFT-based algorithm. Therefore, appropriate characteristics of the respective molecule fragment may be determined based on which the expected spectroscopic data may generally be obtained.

Subsequently, the method 10 may comprise the optional step 24 of providing, as an input to a machine learning algorithm for training the deep neural network 20, the at least one chemical formula of the at least one molecule fragment. Hence, the machine learning algorithm is fed with basic parameters concerning the molecule fragment in question.

The method 10 may proceed with the optional step 26 of reproducing, using a second determination method, as an output of the machine learning algorithm for training the deep neural network 20 the ground state energy and the molecular geometry for the at least one chemical formula of the at least one molecule fragment determined in step 22. The second determination method is different from the first determination method. In particular, the second determination method may comprise a force field method.

The arrow 28 emphasizes that the aim of step 26 is to reproduce the exact ground state energy and molecular geometry of the molecule fragment in question which have been determined in step 22 as well but utilizing a different determination method. In other words, the intention may thus be regarded to provide a deep neural network and to train the network accordingly such that a determination method is achieved which enables the computational expenses to be further reduced compared to the first determination method. This is obtained as the second determination method may even cause reduced computational expenses compared to the first determination method, although the results may generally be reliably reproduced.

The method 10 may also comprise the optional step 30 of repeating step 26 during the training procedure. Then, the deep neural network 20 varies force field parameters underlying the force field method upon repetitions of step 26 for training the deep neural network 20 until a confidence level between the reproduced ground state energy and the molecular geometry and the ground state energy and the molecular geometry determined in step 22 for the at least one chemical formula of the at least one respective molecule fragment reaches or exceeds a predetermined confidence threshold. Thus, the training is proceeded until the precision of the second determination method is considered to be sufficient which is obtained by varying the underlying force field parameters. Hence, the predetermined confidence threshold may be regarded to indicate that the accordance of the differently determined the ground state energy and the molecular geometry is sufficient. Therefore, utilizing the training procedure, sufficient or even similar precision may be achieved when determining the ground state energy and the molecular geometry of the molecule fragment in question.

Subsequently, as mentioned above, the method 10 comprises the step 34 of determining expected spectroscopic data for each of a plurality of molecule fragment compositions, optionally using the second determination method. Here, for each chemical formula of a respective molecule fragment composition those specific confidence force field parameters are used based on which the predetermined confidence threshold is reached or exceeded in step 30. Each of the molecule fragment compositions comprises chemical formulas of a plurality of molecule fragments. In other words, after having achieved the predetermined confidence threshold, the respective force field parameters (so-called confidence force field parameters) based on which the predetermined confidence threshold is achieved, may be extracted by the deep neural network, indicated by arrow 36. Subsequently, for each molecule fragment the expected spectroscopic data may be determined in dependence of these confidence force field parameters. Hence, appropriate expected spectroscopic data are obtained in view of each molecule fragment of a particular molecule fragment composition. This procedure is repeated for the various different compositions such that expected spectroscopic data may be achieved in view of a plurality of different molecule fragment compositions. In this regard, the plurality of molecule fragment compositions may distinguish from each other with respect to different relative amounts of molecule fragments and/or with respect to different molecule fragment types.

Afterwards, the method 10 may comprise the optional step 37 of storing data sets of data tuples in the data storage device. Each data tuple comprises at least a specific molecule fragment composition and the expected spectroscopic data determined thereof in step 34. Hence, the determined expected spectroscopic data are provided in a usable fashion for further application. In view of the different molecule fragment compositions, a large database may be set up comprising data tuples of the expected spectroscopic data and the respective molecule fragment composition associated thereto.

In particular, the stored information may be used in step 14, indicated by arrow 38, where the matching levels may be determined in view of the expected spectroscopic data stored within the data storage device.

Since the expected spectroscopic data are determined using a deep neural network 20, they may be provided in an automated fashion. Furthermore, the computational expenses are reduced since a computational savings providing determination method is applied by the deep neural network (force field method). However, the precision is sufficient or even similar to more sophisticated determination methods (such as DFT-based algorithms) which is enabled by appropriately training the deep neural network using machine learning. In addition, also the evaluation of matching levels may be performed in an automated fashion, optionally utilizing a deep neural network as well, further optionally utilizing the same deep neural network which is also used for determining the expected spectroscopic data.

Figure 2 is a schematic drawing of a system 40 for carrying out the method 10. There is a sample mixture 42 of molecule fragments. The ingredients of the sample mixture 42 are generally unknown and such is the sample mixture 42 as well.

Optionally without limiting the present disclosure, the molecule fragments may at least comprise polyol molecule fragments.

A sensing device 44 is applied to obtain sample spectroscopic data in view of the sample mixture 42 under test. For example, the sensing device 44 may be configured to utilize at least one out of an infrared, an ultraviolet, a nuclear magnetic resonance, a mass, a Raman, and a gas chromatography spectroscopy technique and obtain corresponding data. In effect, the sensing device 44 is configured to carry out at least step 12 of the method 10.

A data processing circuit 46 is coupled to the sensing device 44, a data storage device 48, and an interface 50. The data storage device 48 may also be internal to the data processing circuit 46. The sample spectroscopic data obtained by the sensing device 44 are provided to the data processing circuit 46. Optionally, the sample spectroscopic data may also be at least temporarily stored within the data storage device 48 before being provided to the data processing circuit 46. The data processing circuit 46 is configured to carry out at least steps 14 and 16 of the method 10. To this end, the molecule fragment composition determined in step 16 may be provided to the interface 50 and notified to a user or provided such that the information may be further processed. For example, the interface 50 may be a display or an additional downstream data processing circuit.

The data processing circuit 46 may also comprise a computer program 52 which comprises instructions, which, when being executed by the data processing circuit 46 cause the data processing circuit 46 to carry out at least the steps 14 and 16 of method 10.

Alternatively, or cumulatively, such a computer program 52 may also be stored within a computer-readable medium 54 connectable to the data processing circuit 46 and executable by the data processing circuit 46.

The present application may reference quantities and numbers. Unless specifically stated, such quantities and numbers are not to be considered restrictive, but exemplary of the possible quantities or numbers associated with the present application. Also, in this regard, the present application may use the term "plurality" to reference a quantity or number. In this regard, the term "plurality" is meant to be any number that is more than one, for example, two, three, four, five, etc. The terms "about", "approximately", "near" etc., mean plus or minus 5% of the stated value.

Although the disclosure has been illustrated and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.

Claims

Claims A method (10) of determining a composition of molecule fragments via a combined experimental - machine learning approach, wherein the molecule fragments may be stable fragments of a dissociated molecular ion or molecular clusters, the method (10) comprising at least the steps of:

51 experimentally recording sample spectroscopic data of a sample mixture (42) of molecule fragments under test;

54 outputting data identifying a molecule fragment composition of a particular data set, for which associated respective expected spectroscopic data a matching level is determined in step S3 to reach or to exceed a predetermined matching threshold; wherein in at least one of the steps S2 and S3 the determining is carried out using a deep neural network (20), the deep neural network (20) being trained based on machine learning. The method (10) of claim 1, wherein the matching levels are determined based on a regression algorithm. The method (10) of any of claims 1 or 2, wherein the sample spectroscopic data is processed prior to step S3 using a statistical method to remove outliers. The method (10) of any of the claims 1 to 3, wherein the sample spectroscopic data and the expected spectroscopic data each comprises at least one out of infrared, ultraviolet, nuclear magnetic resonance, mass, Raman, and gas chromatography spectroscopic data. The method of any of claims 1 to 4, wherein in case that in step S2 of claim 1 the determining is carried out using the deep neural network (20), the deep neural network (20) is trained based at least on the following steps:

SO-2 providing, as an input to a machine learning algorithm for training the deep neural network (20), the at least one chemical formula of the at least one molecule fragment;

SO-3 reproducing, using a second determination method, as an output of the machine learning algorithm for training the deep neural network (20) the ground state energy and the molecular geometry for the at least one chemical formula of the at least one molecule fragment determined in step SO-1, the second determination method being different from the first determination method. The method (10) of claim 5, wherein the first determination method comprises at least a density functional theory-based algorithm. The method (10) of claim 5 or 6, wherein the second determination method comprises at least a force field method. The method (10) of claim 7, wherein the deep neural network (20) varies force field parameters underlying the force field method upon repetitions of step SO-3 for training the deep neural network (20) until a confidence level between the reproduced ground state energy and the molecular geometry and the ground state energy and the molecular geometry determined in step SO- 1 for the at least one chemical formula of the at least one respective molecule fragment reaches or exceeds a predetermined confidence threshold. The method (10) of claim 8, wherein the method (10) further comprises at least the step of:

SO-4 determining, using the second determination method, expected spectroscopic data for each of a plurality of molecule fragment compositions, wherein for each chemical formula of a respective molecule fragment composition those specific confidence force field parameters are used based on which the predetermined confidence threshold is reached or exceeded in claim 8, wherein each of the molecule fragment compositions comprises chemical formulas of a plurality of molecule fragments. The method (10) of claim 9, wherein data sets of data tuples are stored in a data storage device (48), wherein each data tuple comprises at least a specific molecule fragment composition and the expected spectroscopic data determined thereof in step SO-4. The method (10) of claims 9 or 10, wherein the molecule fragment compositions of the plurality of molecule fragment compositions distinguish from each other with respect to different relative amounts of molecule fragments and/or with respect to different molecule fragment types. The method (10) of any of the preceding claims, wherein the molecule fragments comprise at least polyol molecule fragments. A data processing circuit (46) comprising means for carrying out at least the steps S2 to S4 of the method (10) of any of the claims 1 to 12, in particular also the steps S0-1 to SO-3 of claim 5, further in particular also step S0-4 of claim 9. A system for carrying out a method (10) of determining a composition of molecule fragments via a combined experimental

- machine learning approach according to one of claims 1 to 12, comprising

- a sensing device (44),

- a data processing circuit (46),

- a data storage device (48), and

- an interface (50), wherein the data processing circuit (46) is coupled to the sensing device (44), the data storage device (48), and the interface (50), wherein the sensing device (44) is configured to carry out step SI, wherein the data processing circuit (46) is configured to carry out at least steps S2 to S4. A computer program (52) comprising instructions which, when the program is executed by a computer, cause the computer to carry out at least the steps S2 to S4 of the method (10) according to any of the claims 1 to 12, in particular also the steps SO-1 to