EP4194107A1

EP4194107A1 - Apparatus and method for classifying material objects

Info

Publication number: EP4194107A1
Application number: EP22152672.6A
Authority: EP
Inventors: Steffen RÜGER; Jann GOSCHENHOFER; Alexander Ennen
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2021-12-10
Filing date: 2022-01-21
Publication date: 2023-06-14

Abstract

The application concerns an apparatus and a method for classifying material objects. The apparatus comprises a deep learning model. The apparatus is configured to, in an initialization phase, subject the deep learning model to supervised learning based on a training data obtained from, for each of a training set of material objects, a pairs of sensor data obtained by a measurement of the respective material object and label information associating the respective material object with a target classification. Additionally, the apparatus is configured to, using the deep learning model, classify a predetermined material object based on sensor data obtained by a measurement of the predetermined material object.

Description

The present application concerns the field of classifying material objects, more specifically deep learning-based material object classification for use in, for instance, Deep learning-based sorting on dual energy X-ray transmission data. Embodiments relate to an apparatus and a method for classifying material objects.
The finite nature of primary raw materials and the increase in the global consumption of raw materials are placing greater focus on the use of secondary raw materials, as industry is dependent on stable availability of raw materials. This substantiates the goals of the circular economy in which the recycling industry plays an important role, making it possible to process end-of-life materials (material flows) efficiently and sustainably for reuse and to secure the availability of secondary raw materials.
This requires robust and accurate sorting systems that can distinguish between different materials. Currently deployed systems mainly use rule-based methods that need to be parameterized by human experts and provide hardly any information about the actual sorting decision which would be valuable for the recycler itself and the downstream industries.
Currently, in practical applications, the criteria for a sorting decision are parameterized manually by trained personnel on the system used. For this purpose, random samples are drawn, the result of the sorting decision is analyzed and based on these results, the involved human experts adjust the system further making it a very labor-intense task. The low availability of experts is a bottleneck in this context.
An adaptation of the system to new material flows requires the adaptation of the underlying parameters for the separation of different materials by human experts. Such parameterization is usually requested as a service order from the manufacturer of the plant. In this case, it is difficult for the operator to use his existing expertise since the technical know-how about the detection technology and its correct parameterization is usually not available. Therefore, it is only possible to adapt slowly and with high manual effort to changes in material flows or new requirements. This also makes it difficult for companies to implement or optimize their own processes. Fig. 17 shows exemplarily a flow chart showing the sequence of an X-ray sorting system and the individual processes through which a material stream passes until it is sorted.
Therefore, it is desired to provide a concept which makes a better compromise between increasing an efficiency in processing end-of-life materials, providing a robust and accurate sorting systems that can distinguish between different materials, providing information about the actual sorting decision and improving an efficiency in adapting processing parameters or decision parameters in response to changes in material flows or new requirements.
This is achieved by the subject matter of the independent claims of the present application.
Further embodiments according to the invention are defined by the subject matter of the dependent claims of the present application.
Herein a Deep Learning based classification concept is provided, which shows improved characteristics and illustrate the advantages by applying this concept in a Deep Learning based sorting system that makes it possible to learn relevant features for sorting decisions from scratch from the collected data and optimize them specifically for an appropriate performance metric. This alleviates the need of human expertise for the parametrization of the currently in use rule-based systems and reduces the effort for model usage after deployment as no basis material decomposition (BMD) needs to be executed for the data inputs. BMD is a method for the characterization of material samples based on two X-ray spectra, e.g., a dual energy X-ray image.
In accordance with a first aspect of the present invention, the inventors of the present application realized that one problem encountered when trying to classifying material objects stems from the fact that present methods need to be parameterized by human experts. According to the first aspect of the present application, this difficulty is overcome by using a deep learning model. Concurrently, the deep learning model is trained in a supervised manner and subsequently yields strong predictive performance, e.g., in terms of quality of and robustness of the classification of the material objects. Even further, the usage of the deep learning model increases a flexibility in classifying material objects and allows an efficient adaptation in response to new and/or additional material objects to be classified.
Accordingly, in accordance with a first aspect of the present application, an apparatus for classifying material objects, comprises a deep learning model. The apparatus is configured to, in an initialization phase, subject the deep learning model to supervised learning (in the following also understood as supervised training or supervised model training or training in a supervised manner). The initialization phase might be at an initialization of the apparatus, i.e. an initialization phase of the apparatus. The supervised model training is based on training data obtained from, for each material object of a training set of material objects, a respective pair of sensor data and label information. The training data is obtained from a plurality of pairs of sensor data and label information, wherein each pair is associated with one of the material objects of the training set of material objects (e.g., a one-to-one correspondence, i.e. a bijective correspondence). The respective sensor data is obtained by a measurement of the respective material object and the respective label information is associating the respective material object with a target classification. Additionally, the apparatus is configured to, using the deep learning model, classify a predetermined material object based on sensor data obtained by a measurement of the predetermined material object.
According to an embodiment the material objects are aluminum pieces. According to an alternative embodiment the material objects are aluminum trash pieces and the classification discriminates between high-grade pure aluminum and a low-grade residual. In other words, the apparatus might be configured to classify a predetermined material object as a high-grade pure aluminum or a low-grade residual. For example, the aluminum trash pieces comprise aluminum flakes and flakes of one or more other materials, wherein the aluminum flakes, for example, should be classified as high-grade pure aluminum and the flakes of the one or more other materials, for example, should be classified as low-grade residual. The usage of the deep learning model for the classification improves a sorting quality in a traceable way.
According to an embodiment, the apparatus comprises a measurement sensor configured to perform the measurement of the training set of material objects and the predetermined material object. In other words, one and the same measurement sensor is used for performing a measurement of each material object of the training set of material objects and for performing a measurement of the predetermined material object. The measurements of the material objects of the training set of material objects may be used by the apparatus for the supervised model training of the deep learning model and the measurement of the predetermined material object may be used by the apparatus for classifying the predetermined material object. By including the measurement sensor in the apparatus it is possible to increase an efficiency in classifying material objects, since the apparatus has direct access to the measurement data. Furthermore, the inventors found that it is advantageous that the deep learning model is trained based on measurement data received from the same measurement sensor as the measurement data of the predetermined material object, which has to be classified by the apparatus. This feature increases the performance of the deep learning model and the quality of the classification.
According to an embodiment, the measurement sensor comprises a dual energy X-ray sensor and, optionally, a conveyor belt for passing the training set of material objects and the predetermined material object by the dual energy X-ray sensor so as to be scanned by the dual energy X-ray sensor. The conveyor belt is configured to transport the material objects of the training set of material objects and the predetermined material object. The conveyor belt leads the material objects of the training set of material objects and the predetermined material object past the dual energy X-ray sensor. Multiple material objects may be distributed on the conveyor belt and the dual energy X-ray sensor might be configured to perform a measurement of each of the multiple material objects. The usage of the dual energy X-ray sensor is advantageous since it provides meaningful information of the materials of the material objects and can therefor increase the quality of the classification. The conveyor belt is advantageous in terms of efficiency, since it enables to classify multiple material objects in short time and it enables to integrate/incorporate the apparatus in a processing line with additional systems and it serves for a better handling of the material flow.
According to an embodiment, the sensor data comprises dual energy X-ray images.
According to an embodiment, the apparatus comprises a sorting stage configured to subject the predetermined material object to sorting according to the classification. The apparatus can achieve a low error rate at the sorting, since the apparatus achieves a high accuracy at the classification by using the deep learning model.
Optionally, the label information, for the respective material object, comprises further a confidence value for the indication of the target classification. The confidence value may indicate a probability for the respective target classification being correct. For example, the target classification may represent a mean of two or more preliminary target classifications associated with the respective material object. The confidence value may indicate whether the two or more preliminary target classifications differ among each other or whether they are consistent. The apparatus, for example, can be configured so that the supervised training of the deep learning model is less sensitive to sensor data of material objects whose label information comprises lower confidence values compared to sensor data of material objects whose label information comprises higher confidence values. This is based on the finding that labeling noise/annotation noise can be reduced by defining the sensitivity of the apparatus for the sensor data. This sensitivity increases a quality of the training data, since the apparatus may be configured to obtain the training data only from pairs of sensor data and label information for which pairs the respective label information indicates a high confidence value. A confidence value, for example, may be regarded as a high confidence value, if the confidence value is equal to or greater than a predetermined threshold. The predetermined threshold may indicate a confidence of 60 %, 66 %, 75 %, 80 %, 85 % or 90 %.
According to an embodiment, the apparatus is configured so that the label information comprises, for the respective material object, at least two labels each indicating a target class for the respective material object. Optionally, the at least two labels are each indicating additionally a confidence value for the indication of the target class. The apparatus is configured so that the supervised training of the deep learning model is less sensitive to sensor data of material objects whose label information comprises labels indicative of different target classes and/or lower confidence values compared to sensor data of material objects whose label information comprises labels indicative of the same target classes and/or higher confidence values. This is based on the finding that labeling noise/annotation noise can be reduced by defining the sensitivity of the apparatus for the sensor data. The model performance can be improved by defining the sensitivity of the apparatus, so that it is only sensitive to sensor data of material objects whose label information comprises labels indicative of the same target classes and/or higher confidence values compared to sensor data of material objects whose label information comprises labels indicative of different target classes and/or lower confidence values. The usage of two or more labels and/or of the confidence values results in more stable supervised training of the deep learning model, since the training data has a higher quality.
According to an embodiment, the apparatus further comprises a user interface.
Additionally, the apparatus may be configured to obtain, for the respective material object, the at least two labels via the user interface. For each of the training set of material objects, the respective label information might be provided via the user interface. The label information might be acquired only once, e.g. during the supervised model training or at an initialization phase of the supervised model training. A user might assign one of the two labels to the respective material object and provide same to the apparatus via the user interface.
Additionally, or alternatively, the apparatus may be configured so that the classification of the predetermined material object depends on performance metric input via the user interface. For example, the performance metric being indicative of a scalar metric differently measuring miss-classifications of different classes according to class-specific weights; e.g. the user inputs weights associated with the different error types (type I/II error or, differently speaking, false positive/ false negatives) for one or more classes. A class-specific weight associated with a class may weight a precision of the classification of material objects associated with the class.
The performance metric may indicate different precisions for different classes. The performance metric, for example, may indicate that a misclassification of material objects belonging to a first class have less influence, e.g., on a sorting of the material objects according to their assigned class, than misclassification of material objects belonging to a second class. For example, the performance metric can be indicative of a trade-off between false positives and false negatives. The inventors found, that in the field of recycling a monetary gain and an accuracy at a sorting of the material objects according to their respective classification can be improved, if the performance metric indicates to favor false negative over false positives given that aluminum corresponds to the positive class. The residual material objects may correspond to the negative class. The dependency of the classification on the performance metric is advantageous in terms of an improvement of a quality of the classification of the predetermined material object by the apparatus. The dependence on the performance metric especially allows to adapt the classification performed by the apparatus to the needs of a user. Therefore, a highly adjustable and accurate classification can be achieved.
According to an embodiment, the performance metric is indicative of miss-classification costs associated with miss-classifications of one or more classes discriminated by the classification of the deep learning model.
According to an embodiment, the apparatus is configured so that the classification of the predetermined material object depends on performance metric by taking the performance metric into account by controlling the supervised model training so that the deep learning model meets, with respect to the training data, the performance metric, or is optimized with respect to the performance metric. The apparatus is configured to, in subjecting the deep learning model to the supervised model training based on the training data in the initialization phase, take the performance metric into account. Therefore the performance metric should be defined at a start of the supervised model training, so that the apparatus is configured to perform the classification of the predetermined material object dependent on the performance metric using the trained deep learning model. This enables an efficient and accurate classification of material objects according to the user needs by the apparatus, since the performance metric is already considered by the deep learning model.
According to an embodiment, the apparatus is configured so that the classification of the predetermined material object depends on performance metric by taking the performance metric into account by subjecting a set of candidate deep learning models to the supervised model training based on the training data obtained and by selecting one candidate deep learning model out of the set of candidate deep learning models which meets or is best in terms of the performance metric. These candidate models may differ in the configurations of their respective hyperparameters, e.g., a learning rate, a dropout rate or properties of the architecture. The apparatus is configured to, in subjecting the deep learning model to the supervised model training based on the training data in the initialization phase, take the performance metric into account. The ability to select the candidate deep learning model out of a set allows the apparatus to select based on predictive performance and model inference time the best deep learning model to be trained by the supervised model training, e.g., depending on the material objects to be classified and/or on the performance metric.
According to an embodiment, the apparatus further comprises a user interface, e.g., the interface described above. The apparatus is configured to obtain, via the user interface, a threshold value, and compare the threshold value with an output of the deep learning model applied to the sensor data of the predetermined material object so as to decide on whether the predetermined material object is attributed to a predetermined class. Thus, it is possible for the user to adapt the threshold to his needs and therefore improve the classification of the material objects according to the precision needed by the user.
According to an embodiment, the deep learning model comprises a convolutional neural network.
According to an embodiment, the apparatus is configured to, intermittently, subject the deep learning model to semi-supervised model training. The apparatus may be configured to obtain training data for the semi-supervised model training based on sensor data without label information being associated to, e.g., unlabeled data, and based on pairs of sensor data and label information, e.g., labeled data. The semi-supervised model training can be performed based on labeled and unlabeled data. The sensor data for the semi-supervised model training may be obtained by measurements of material objects of a further training set of material objects, wherein only for some of the material objects also label information is provided additionally to the respective sensor data. Alternatively, the labeled data for the semi-supervised model training may be obtained from pairs of sensor data and label information of material objects of the training set of material objects for the supervised model training and the unlabeled data may be obtained for the material objects of the further training set of material objects. By using the semi-supervised model training it is possible to increase the amount of training data and therefore increase the performance of the deep learning model. The semi-supervised model training results in reduced time and costs involved with obtaining the label information, since it is not necessary to provide the respective label information for all material objects based on which the training data for the semi-supervised model training is obtained.
According to an embodiment, the apparatus is configured to, intermittently, subject the deep learning model to unsupervised model training. The advantage of this feature is that it is not necessary to provide label information for each of a training set of material objects used for the unsupervised model training, increasing an efficiency in obtaining a trained deep learning model. The time consuming and costly labeling can be reduced. Additionally, the amount of training data can be easily increased, since it is possible to add a further training set of material objects without label information for the unsupervised model training. This increase in training data improves the performance of the deep learning model. Especially, the usage of both, the supervised model training and the unsupervised model training, achieves a good compromise between model performance and efficiency in the training of the deep learning model.
According to an embodiment, the apparatus is configured to obtain the training data from the pairs of sensor data and label information by means of augmentation to artificially increase the variety of the training database for model training. This has a regularizing effect on the model while training, reducing overfitting and thus improving model performance.
A further embodiment relates to a sorting system including one of the apparatus described herein.
A further embodiment relates to a method performed by any of the above apparatuses and systems.
A further embodiment relates to a method for classifying material objects, comprising in an initialization phase, subjecting a deep learning model to supervised model training based on training data obtained from, for each of a training set of material objects, a pair of sensor data obtained by a measurement of the respective material object and label information associating the respective material object with a target classification. Additionally, the method comprises, using the deep learning model, classifying a predetermined material object based on sensor data obtained by a measurement of the predetermined material object.
The method as described above is based on the same considerations as the above-described apparatus. The method can, by the way, be completed with all features and functionalities, which are also described with regard to the apparatus.
A further embodiment relates to a computer program for instructing a computer, when being executed on the computer, the herein described method.
Embodiments of the present invention are now described in further detail with reference to the accompanying drawings. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments of the invention are described with reference to the following drawings, in which:

Fig. 1: shows a schematic view of an apparatus for classifying material objects;
Fig. 2: shows a schematic view of an apparatus for classifying material objects;
Fig. 3: shows schematically a training of a deep learning model;
Fig. 4: shows a block diagram of a method for classifying material objects;
Fig. 5: shows a schematic view of a sorting system;
Fig. 6: shows a block diagram of submodules inside a deep learning model;
Fig. 7: shows schematically a preprocessing of measurement data obtained from material objects;
Fig. 8: shows schematically a supervised model training of a deep learning model;
Fig. 9: shows schematically a detailed preprocessing of measurement data obtained from material objects;
Fig. 10: shows schematically a base material decomposition;
Fig. 11: shows a table of Cohens Kappa scores;
Fig. 12: shows a block diagram of a data base for training of a deep learning model;
Fig. 13: shows exemplarily four data augmentation procedures;
Fig. 14: shows a confusion matrix of a deep learning model;
Fig. 15: shows a block diagram of an evaluation scheme;
Fig. 16: shows a table with evaluation results; and
Fig. 17: shows a flow chart of an X-ray sorting system.

Equal or equivalent elements or elements with equal or equivalent functionality are denoted in the following description by equal or equivalent reference numerals even if occurring in different figures.
In the following description, a plurality of details is set forth to provide a more throughout explanation of embodiments of the present invention. However, it will be apparent to those skilled in the art that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form rather than in detail in order to avoid obscuring embodiments of the present invention. In addition, features of the different embodiments described herein after may be combined with each other, unless specifically noted otherwise.
Fig. 1 shows an apparatus 100 for classifying one or more material objects 110 (e.g., see 110o and 110₁ to 110_n). The apparatus 100 comprises a deep learning model 120. The deep learning model 120 represents or comprises a convolutional neural network. The apparatus 100 is configured to, using the deep learning model 120, classify 130 a predetermined material object 110₀ based on sensor data 140₀ obtained by a measurement of the predetermined material object 110₀. The sensor data 140₀, e.g., is obtained by a measurement using a dual energy X-ray sensor or using a camera or using a measurement system for detecting one or more properties of the predetermined material object 110₀.
In an initialization phase, the apparatus 100 is configured to subject the deep learning model 120 to supervised model training 150. The supervised model training 150 of the deep learning model 120, for example, is performed only once at a set-up or a first start of the apparatus 100. Alternatively, the initialization phase represents a phase initialized in response to a change in settings or requirements for the classification 130, e.g., in response to new or additional classes of material objects 100, in response to a change in a measurement system for obtaining the sensor data 140 or a change in performance metrics. During the initialization phase, the deep learning model 120, for example, is adapted to new settings or requirements for the classification 130.
The supervised model training 150 is performed to achieve a high performance deep learning model 120 for a high accuracy at the classification 130 of the material objects 110. The supervised model training 150 is based on training data 152 obtained from, for each of a training set 154 of material objects 110₁ to 110_n, a pair 156 (see 156₁ to 156_n) of sensor data 140 (see 140₁ to 140_n) obtained by a measurement of the respective material object 110, i.e. one of 110₁ to 110_n, and label information 158 (see 158₁ to 158_n) associating the respective material object 110 with a target classification. Each material object 110₁ to 110_n of the training set 154 is associated with object-individual sensor data 140 and label information 158.
The measurement of each material object 110₁ to 110_n of the training set 154 is performed, e.g., by an external device or by the apparatus 100, to obtain the sensor data 140₁ to 140_n. The sensor data 140₁ to 140_n might be obtained the same way as the sensor data 140o for the predetermined material object 110₀. The respective sensor data 140 might represent dual energy X-ray transmission data of the respective material object 110. However, it is also possible that the respective sensor data 140 represents any other conceivable data domain, like an RGB image.
The label information 158 of the respective material object 110 might be obtained based on the sensor data 140 of the respective material object 110.
The target classification might indicate for the respective material object 110 a class with which the material object 110 is associated. For example, the target classification might indicate one out of two or more different classes. Fig. 1 shows exemplarily two different material classes, i.e. material 1 and material 2. The target classification might indicate the material or main material of the respective material object 110.
According to an embodiment, the respective label information 158 is provided by a user of the apparatus 100, e.g., via a user interface of the apparatus 100. The user might analyze the sensor data 140 of the respective material object 110 for determining the target classification of the respective material object 110. Optionally, the label information 158 comprises two or more labels, each indicating a target classification, wherein each of the two or more labels is provided by another user via the user interface, e.g., see Fig. 2.
The supervised model training 150 enables the deep learning model 120 to determine based on the sensor data 140₀ of the predetermined material object 110o a classification 160 of the predetermined material object 110₀ with high accuracy. The classification 160, for example, indicates for the predetermined material object 110₀ one out of two or more different classes. The two or more different classes should be the same classes as the ones selectable for the target classification. Fig. 1 shows exemplarily that the apparatus 100 classifies 130 the predetermined material object 110o as material 2.
The apparatus 100 may be configured to obtain the training data 152 from the pairs 156 of sensor data 140 and label information 158 by means of augmentation. Augmentation artificially creates training data through different ways of processing of the sensor data 140 or combination of multiple processing of the sensor data 140. This enables to increase the amount of training data 152 and thereby the performance of the deep learning model 120 trained by supervised model training with the training data 152.
According to an embodiment, the material objects 110 comprise aluminum pieces. The label information 158 may indicate for such material objects 110 aluminum as target classification.
The classification 160 of such material objects 110 should be aluminum, e.g., in case the apparatus 100 correctly classifies 130 the respective predetermined material object 110.
According to an embodiment, the material objects 110 are aluminum trash pieces and the classification 130 discriminates between high-grade pure aluminum and a low-grade residual. The aluminum trash pieces comprise, e.g., flakes of different materials, wherein the material objects 110 out of aluminum, i.e. high-grade pure aluminum, are of interest and the material objects 110 out of other materials, i.e. low-grade residual, are not of interest. This type of classification allows to identify the material objects 110 of interest.
The apparatus 100 can comprise one or more of the features and/or functionalities described with regard to Figs. 2 and 3.
Fig. 2 shows an apparatus 100 for classifying material objects 110. The apparatus 100 comprises a deep learning model 120, and is configured to, using the deep learning model 120, classify 130 a predetermined material object 110o based on sensor data 140₀ obtained by a measurement of the predetermined material object 110₀. The classifying 130 of the predetermined material object 110o results in a classification information 160 for the predetermined material object 110₀. At the classification 130, for example, the deep learning model is applied onto the sensor data 140₀ to obtain a probability score. The apparatus can be configured to map the probability score to a class and that class can represent the classification 160. Additionally, the apparatus 100 is configured to, in an initialization phase, subject 122 the deep learning model 120 to supervised model training 150 based on training data 152 obtained from, for each of a training set 154 of material objects 110₁ to 110_n, a pair of sensor data 140 (see 140₁ to 140_n) obtained by a measurement of the respective material object 110 and label information 158 (see 158₁ to 158_n) associating the respective material object 110 with a target classification, e.g., a user individual target classification 157₁/157₂ or a mean target classification 157₀.
The apparatus 100 may be configured so that the respective label information 158 comprises, for the respective material object 110, at least two labels 158a and 158b each indicating a target class 157₁/157₂ for the respective material object 110. The at least two labels 158a and 158b are provided by different user. Therefore, a respective target class 157₁/157₂ for the respective material object 110 may represent a user individual target classification.
Alternatively, the apparatus 100 may be configured so that the respective label information 158 comprises a mean target classification 157o for the respective material object 110 and a confidence value 159 for the indication of the mean target classification 157₀. The respective mean target classification 157₀ may represent a mean of two or more user individual target classifications 157₁ and 157₂ associated with the respective material object 110. The respective confidence value 159 may indicate a confidence in selecting the respective mean target classification 157o based on the respective user individual target classifications 157₁ and 157₂. The apparatus 100 may be configured to determine, for each material object 110 of the training set 154, the respective confidence value based on a statistic over all respective user individual target classifications 157₁/157₂. In case all respective user individual target classifications 157₁/157₂ indicate the same class, e.g., see the label information 158₁, 158₂ and 158_n, the confidence value 159 is 1.0 or 100 % and in case the respective user individual target classifications 157₁/157₂ indicate different classes, e.g., see the label information 158₃, the confidence value 159 is smaller than 1.0 or 100 %, e.g., for N labels comprised by the respective label information 158, the confidence value may be $\frac{Z}{N}$
, wherein Z represents the number of user individual target classifications 157₁/157₂ indicating the same target class as the mean target classification 157₀.
According to an embodiment, it is possible that the respective label information 158 comprises all labels associated with the respective material object, or one label out of two or more labels associated with the respective material object, or one label indicating the mean target classification 157₀. The usage of only one label out of two or more label associated with the respective material object may reduce the quality of the respective label information 158, e.g., provide limitations with regard to the certainty of the respective label information 158.
The apparatus 100, for example, is configured so that the supervised model training 150 is less sensitive to sensor data 140 of material objects 110 whose label information 158 comprises labels 158a and 158b indicative of different target classes 157 and/or lower confidence values 159 compared to sensor data 140 of material objects 110 whose label information 158 comprises labels 158a and 158b indicative of the same target classes 157 and/or higher confidence values 159. For example, the apparatus 100 might prefer the sensor data 140₁, 140₂ and 140_n over the sensor data 140₃ for the supervised model training 150. To improve the performance of the deep learning model 120, the apparatus 100 may be configured to obtain the training data 152 only from pairs of sensor data 140 and label information 158, which indicate the same target classes 157 in both labels 158a and 158b of the respective label information 158. The usage of two or more labels per label information 158 and the selection of the training data 152 based on the information provided by the two labels 158a and 158b, i.e. the respective target classification 157 and/or the respective confidence value 159, improves the supervised model training. Thus, it is possible to obtain efficiently a high performance deep learning model 120. The confidence value 159 may indicate the reliability of the target class provided by the respective label. By evaluating the confidence values 159 and/or by comparing the target classifications 157 of the labels of the same label information 158 to obtain the training data 152, the supervised model training 150 can be improved. This is based on the idea that sensor data with unsure or possibly false target classification 157 are not or only to a small amount considered in the training data 152. This increases the accuracy of the deep learning model 120.
The apparatus may comprise a measurement sensor 170 configured to perform the measurement of the training set 154 of material objects 110₁ to 110_n and of the predetermined material object 110₀. The measurement sensor 170 may comprise or represent a dual energy X-ray sensor for obtaining a dual energy X-ray image as the sensor data 140, an X-ray sensor for obtaining a spectral X-ray image as the sensor data 140, a multi-energy X-ray sensor for obtaining a multi-energy X-ray image as the sensor data 140, a camera for obtaining an RGB image as the sensor data 140 or any combination thereof.
Optionally, the apparatus 100 comprises a conveyor belt 175 for passing the training set 154 of material objects 110₁ to 110_n and the predetermined material object 110o by the measurement sensor 170 so that the material objects 110₁ to 110_n and the predetermined material object 110₀ can be scanned by the measurement sensor 170. Depending on the sensor type or the number of sensors comprised by the measurement sensor 170, the sensor data 140 of the scanned material object 110 may comprise one or more images, e.g., dual energy X-ray images, spectral X-ray images, multi-energy X-ray images and/or RGB-images, of the respective material object 110.
The apparatus 100 may comprise a sorting stage 180 configured to subject the predetermined material object 110o to sorting according to the classification information 160. For example, the sorting stage 180 sorts the predetermined material object 110o to material objects 110 of a first class 182 or to material objects 110 of a second class 184 dependent on the classification information 160. The classification information 160 can indicate for the predetermined material object 110₀ one class out of two or more classes. A class might be associated with a certain material, e.g., a metal, plastic, glass, wood or a textile.
The apparatus 100 may comprise a user interface 190 via which, for example, a user may provide information to the apparatus 100. The user interface 190 can be used for various options.
The apparatus 100, for example, can be configured to obtain, for the respective material object 110, the at least two labels 158a and 158b via the user interface 190. The at least two labels 158a and 158b for the respective material object 110 can be provided by different user. A first user may provide a first label 158a and a second user may provide a second label 158b. This can also be the reason for a discrepancy between the target classifications of the at least two labels 158a and 158b. If the class of the respective material object 110 cannot be clearly determined based on the respective sensor data 140, different user may provide different labels for the same sensor data 140.
The apparatus, for example, is configured to obtain, via the user interface 190, a threshold value 132, and compare the threshold value 132 with an output of the deep learning model 120 applied to the sensor data 140₀ of the predetermined material object 110o so as to decide on whether the predetermined material object 110o is attributed to a predetermined class, e.g., 182 or 184. The deep learning model 120, for example, is configured to output a probability score indicating a likelihood of the predetermined material object 110o belonging to the predetermined class based on the sensor data 140₀, e.g., the classification yields a model prediction probability score [0; 1]. The threshold value 132 can be used to map probability scores to hard class assignments as a post-hoc step, e.g., the threshold value 132 can be used to cut the probability score, i.e. the classification 160, into a binary decision. The model output from the sorting stage 180 may be this binary decision. This threshold value 132 could be adapted by a user after the supervised model training of the deep learning model 120 to calibrate model predictions further to their needs depending on their requirements regarding the accuracy of the classification 130 for a predetermined class of material objects 110.
The apparatus 100, for example, is configured so that the classification 130 of the predetermined material object 110₀ depends on performance metric 134 input via the user interface 190. The performance metric 134 can be indicative of miss-classification costs associated with miss-classifications of one or more classes discriminated by the classification 130 using the deep learning model 120. For example, a miss-classification in one class can induce higher costs than a miss-classification in another class. Therefore, the performance metric 134 may indicate whether the deep learning model 120 has to be more accurate in predicting a first class than in predicting a second class, e.g., the first class might only be determined by the deep learning model 120 at a high probability. The deep learning model has to be very sure that the predetermined material object 110o belongs to the first class to select the first class for the predetermined material object 110₀.
The apparatus 100, for example, can be configured to control the supervised model training 150 so that the deep learning model 120 meets, with respect to the training data 152, the performance metric 134, or is optimized with respect to the performance metric 134. After the supervised model training 150, the deep learning model 120 assesses the sensor data 140₀ of the predetermined material object 110₀ according to the performance metric 134. Thus, the classification 130 of the predetermined material object 110₀ depends on the performance metric 134.
The apparatus 100, for example, is configured to subject a set 124 of candidate deep learning models (DLM) to the supervised model training 150. The apparatus 100 may select the candidate deep learning models for the set 124 based on the obtained training data 152. Additionally, the apparatus 100 can be configured to select one candidate deep learning model, e.g., DLM 2, out of the set 124 of candidate deep learning models which meets or is best in terms of the performance metric 134. The deep learning model 120 is chosen according to the performance metric 134. The classification 130 of the predetermined material object 110₀ depends on the performance metric 134, since the deep learning model 120 is applied onto the sensor data 140₀ of the predetermined material object 110o at the classification 130.
Fig. 3 shows an embodiment for a training of a deep learning model 120. The shown training can be performed by one of the herein described apparatuses 100. The apparatus 100 may be configured to decide 400 between three different training methods, e.g., the supervised model training 150, the unsupervised model training 151 and the semi-supervised model training 350, for training the deep learning model 120. The apparatus 100 may be configured to subject the deep learning model 120 to supervised model training 150. The apparatus 100 can be configured to, e.g., intermittently, subject the deep learning model 120 to unsupervised model training 151 and/or semi-supervised model training 350. The training set 154 of material objects 110 can differ among supervised model training 150, semi-supervised model training 350 and unsupervised model training 151. Unsupervised model training 151 may be performed using training data 152 without any label information 158 for each material object 110 of the training set of material objects 110 for the unsupervised model training 151. The semi-supervised model training 350 may be performed using training data 152 comprising labeled data of some material objects 110 of the training set of material objects for the semi-supervised model training 350 and unlabeled data of the remaining material objects 110 of the training set of material objects 110 for the semi-supervised model training 350.
Fig. 4 shows a block diagram of a method 200 for classifying material objects. The method 200 comprises, in an initialization phase, subjecting 210 a deep learning model to supervised model training 150 based on training data obtained from, for each of a training set of material objects, a pair of sensor data obtained by a measurement of the respective material object and label information associating the respective material object with a target classification. Additionally, the method 200 comprises, using the deep learning model, classifying 220 a predetermined material object based on sensor data obtained by a measurement of the predetermined material object.
In the following further embodiments according to the invention are presented. All of the above described details shall be understood as being, individually or in combination, combinable with the features and/or functionalities of the subsequently presented embodiments to yield even further embodiments.
For example, in the context of major household appliance (MHA) recycling, the apparatus 100 may acquire an application-oriented database from shredded MHA, e.g., the training set 154 of material objects 110₁ to 110_n, on a conveyor belt 175 with an integrated dual energy X-ray scanner 170. These dual energy X-ray images (e.g., comprised by the sensor data 140) of the flakes (i.e. the material objects 110) are then processed with a convolutional neural network model (i.e. the deep learning model 120) from automated separated material objects 110 and to distinguish high-value aluminum flakes from low-value trash flakes.
As often in the recycling context, one faces the trade-off between optimizing precision and recall as a performance metric. In this context, the apparatus 100 may consider a customized performance metric 134 to account for the specific purity requirements that are present in such recycling tasks. Experimental data obtained with one of the herein described apparatuses or methods shows promising results for automated aluminum sorting via deep learning as an F_β=0.2 of 91.74 % and a resulting precision of 93.08 % is achieved.
Contributions of the herein described invention, for example, are the following:

1) applicability of different deep learning architectures of varying complexity and size for automated trash sorting of aluminum from low-grade residual flakes based on dual energy X-ray scans of shredded MHA flakes, e.g., see the above-described set 124 of candidate deep learning models;
2) a multi-rater annotation scheme for data collection in such complex settings, e.g., the usage of the at least two labels 158a and 158b;
3) focus on a realistic model evaluation scheme throughout the experiments, e.g., by optimizing the training data 152, by improving the training 150 of the deep learning model 120 and by improving the classification 130 itself;
4) usage of explainable artificial intelligence (Al) methods to foster trust of the practitioners in such an automated system, which is crucial for its adoption in practice.

The proposed solution involves the interaction of multiple processes to pursue the overall goal of reducing the need of expert knowledge within the classification of material objects. In the following, the advantages are illustrated, by way of example with respect to the sorting of scrap materials. A deep learning based approach to increase the flexibility of sorting system operators and to improve the sorting quality in a traceable way is proposed. The flowchart shown in Fig. 5 outlines a sorting system 300 and its individual processes. The flowchart of the proposed deep learning based X-ray sorting system comprises the individual sub-processes through which a material stream passes until it is sorted.
For example, the system 300 comprises or includes one of the herein described apparatuses 100 and a measurement system 170, e.g., an X-ray system, for obtaining sensor data 140 of a material object 110. The apparatus 100 may comprise a machine learning (ML) model or a deep learning (DL) model to be used for classifying the material objects 110. The apparatus 100 may be configured to receive from the measurement system 170 the sensor data 140 of one or more material objects and output for each of the one or more material objects 110 a classification 160, e.g., a decision to which class the respective material object belongs or a probability score indicating to which class the respective material object may belong. Additionally, the system 300 may comprise a sorting system 180 configured to sort the respective material object 110 based on its classification 160, e.g., map the probability score 160 onto a binary class 181. Optionally, the apparatus 100 may comprise the measurement system 170 and/or the sorting system 180.
The developed deep learning (DL) model 120 is integrated into the flow of the sorting system 300 and replaces previous rule-based solutions, which base their sorting decision on pre-parameterized rule-based methods. This enables the learning of relevant features for a sorting decision directly from the raw data without human guidance. The input provided to the DL model 120 is composed of the underlying material stream and the annotation of data samples (e.g., referred to as 'labeling'; e.g., comprised by the label information 158) that is provided by human experts once to enable the model training, e.g., the supervised model training 150. Furthermore, the system 300 may comprise a user interface 190. Optionally, the apparatus 100 may comprise the user interface 190. The 'user interaction' module, i.e. the user interface 190, may provide a performance metric 134 that is provided by the human expert which reflects the requirements on the trained model 120. Such performance metric can reflect the degree of purity for specific sorting classes for instance. The input data, i.e. the sensor data 140, for the DL model 120 is acquired in the first process step by an X-ray system 170 that provides dual energy X-ray transmission data (XRT data) from the raw material samples, i.e. the material objects 110, on a conveyor belt 175. Based on these inputs 140, a supervised DL model architecture, i.e. the deep learning model 120, is trained to distinguish the different classes according to the provided performance metric 134. This method allows material separation to be performed with little prior knowledge of the underlying XRT data.
Fig. 6 illustrates exemplary submodules inside the Deep Learning Model 120. Fig. 6 shows the different subprocesses of the DL model 120, starting with the preprocessing 126 (see Fig. 7 for a description) of the XRT data 140 to a machine-readable input. For this purpose, for example, individual objects 110 are collected from the continuous flow of material using standard computer vision techniques such as morphological operations and filtering, e.g., in step 126a. The so extracted flakes, i.e. the material objects 110, are then centered on an equally sized grid of 224x224 pixels for each of the two dual energy X-ray dimensions, e.g., in step 126b. This results in a tensor representation of each flake 110 of the dimensions 2x224x224 (energy channels x height x width). This dimensionality is not a requirement for the use of our system 300 or apparatus 100, it can be easily adapted to alternatively sized flakes 110. Fig. 7 illustrates the pre-processing 126 of the scrap samples, i.e. the material objects 110: individual samples 110 are located and cut out from the material flow, e.g., in step 126a, and then stored as individual samples 140a and 140b according to the two dual energy channels, e.g., in step 126b. These samples 140a and 140b, e.g., comprised by the sensor data 140, are then fed to the DL model 120, e.g. as training data 152 for the supervised model training 150 of the DL model 120.
As shown in Fig. 6, for model training, the human expert provides annotations (label information 158) for representative material samples, e.g. for the material objects 110₁ to 110_n. For this annotation process, annotation tools such as the proprietary "DE-Kit (dual-energy-kit)" could be used to collect these annotations 158. Given the complexity of the annotation process, the use of more than one annotator helps to create robust annotations 158. Furthermore, the human expert selects an appropriate metric 134 according to which the DL model 120 is optimized in alignment with model requirements. This performance metric 134 guides the model selection, e.g., the selection out of the set 124 of candidate deep learning models, and could reflect the different costs associated with different types of errors (e.g. False Positives vs. False Negatives) for instance.
Fig. 8 illustrates an embodiment of the DL model training process, e.g., a supervised model training 150, from data input to the final model 120₂. The model 120₁ is subsequently trained 150 on this annotated training data set 152 and the model hyperparameters are tuned via a valid model validation strategy (such as a nested holdout-5-fold cross validation over separately split validation and test data sets) and with an appropriate criterion that is reflected by the selected performance metric 134. Once trained, the final model 120₂ is integrated into the sorting system 300 and the resulting model performance (estimated generalization error) and other statistical metrics are provided to the user. Furthermore, the DL model 120 outputs probability scores [0; 1] for each of the classes which are translated into hard class predictions via a threshold 132. In the 2-class setting, this threshold is by default set to 0.5. The proposed system 300 or apparatus 100 enables the end user to interact with this system 300 or apparatus 100 by adapting this threshold 132 and thereby calibrate the decision making, i.e. the classification 130. Practically this implies that the user can control the purity of the resulting sorting after model training and deployment.
In the following, possible features and or functionalities of the herein described apparatus 100 regarding the sensor data 140, the label information 158, the measurement system 170, the data processing 126 and the training data 152 are explained in more detail.
In sense of machine learning a representative database, e.g., representative training data 152, is crucial for tackling a specific task. Further for observation and evaluation of machine learning models 120 and how they perform in an application oriented field the need of realistic samples, i.e. realistic material objects 110, is mandatory. Therefore, data 152 with focus on two circumstances should be selected. First that the use case that is to be tackle is of interest in the specific domain, e.g., here the recycling industry. Second, the data 152 should be in existence in that domain regarding the acquisition system, i.e. the measurements system 170, sample shape and variety.
The following subsections X-ray system, data preprocessing for deep learning, annotation scheme and final database describe steps on how to process the acquired sensor data 140 and how to make the measured data 140 machine readable in sense of deep learning models 120 which were trained in a supervised manner 150.

x-ray system

In the field of the recycling industry X-ray systems, e.g. comprised by the measurement system 170, are a solid and robust device to acquire relevant information of used materials 110. Those systems can handle tough and dirty surroundings which are predominant in the recycling sector. Further those X-ray systems can be integrated in a processing line with additional systems. In most cases systems which are built for crushing and shredding used materials to a certain size are previous steps. That serves for a better handling of the material flow and aims for sorting with a higher purity.
The measurement system 170 may comprise a X-ray source, e.g., set to 160 keV and 2 mA, a dual energy line detector, e.g., with 896 pixels and a pitch of 1.6 mm, and a conveyor belt 175 on that the material stream is transported.
The two information per pixels can be processed with conventional image processing methods e.g. filtering, morphological operations, etc. Further the dual energy information can be processed within consideration of physical properties of the underlying measurement setup and the law of attenuation of X-rays. That allows to utilize a technique called the basis material decomposition (BMD) which is used to distinguish between different basis materials [1]. Herein a deep learning based approach for processing the dual energy information is proposed for which, for example, a few preprocessing steps are acquired to the raw dual information that are described in the following section.

data preprocessing for deep learning

As mentioned earlier it is crucial in machine learning to have consistent and representative data 152 for model training and evaluation. Therefore, preprocessing steps focusing on that are summarized presented in Fig. 9.
Fig. 9 shows an overview of the applied steps to preprocess the measured dual energy information. Starting with the measured pixel lines in step 1 that shows the low energy information of a few pixel lines. Step 2 illustrates a part of the merged pixel lines to an image (only low energy information are shown as a grayscale image). In step 3 (126a) the region of interest for each individual flake 110 is marked in which the relevant information is present. The last step 126b shows the extracted flake 110 and its location in a 2x224x224 shaped dual energy data sample. The herein discussed sensor data 140 may comprise two individual samples 140a and 140b according to the two dual energy channels.
Those steps are made to ensure that the entire spatial information of each sample (flake) 110 are captured in one individual resulting data sample 140a and 140b. This holds for each energy channel. This is necessary because the data stream acquired form the dual energy detector, e.g., comprised by the measurement system 170, is a single line for each energy information and the proposed deep learning model 120 operates on image domain. To ensure that the input dimension that are fed into the supervised model training 150 are fix, the regarding energy information per pixel for each flake 110 are centered in a 224x224 sized image, see 140a and 140b. This size is chosen that the underlying extracted flakes 110 do not exceed the maximum height and width which lies, for example, between 10 to 120 mm of the shredded flakes. Further a filter may check if the extracted flakes 110 exceed image dimensions, e.g., 224 pixels in height or width, and removes large flakes 110 accordingly.
In summary single dual energy pixel lines are processed to a data stream and individual flake images 140a and 140b are extracted with related spatial information, e.g., as the sensor data 140, before feeding it into a deep learning pipeline, e.g., the supervised model training 150.

annotation scheme

As shown in Fig. 9 the resulting data information, e.g., the sensor data 140, of one shredded and scanned flake 110 can be illustrated as a grayscale image or two or more gray scale images, e.g., 140a and 140b. That shows energy E dependent attenuation µ and material dependent areal density ρ of intensities (E) in terms of Lambert-Beer's law: $I (E) = I_{0} (E) e^{- μ (E) ρ}$
For the human visible system those seen grayscale information 140a and 140b are hard to distinguish especially for differentiation between two materials. Therefore, for annotation, it is proposed a different color scheme and a processing of the dual energy information with the base material decomposition 144 which is shown in Fig. 10 to improve the annotation condition of the task to vote each flake into class "pure aluminum" or "low-grade residual", e.g., as the target classification 157. The respective sensor data 140 of the respective material object 110, for example, is processed by the apparatus 100 using base material decomposition 144 to obtain a processed image 142 of the respective material object 110 and the apparatus 100 or a user may obtain the respective target classification 157 based on the processed image 142 of the respective material object 110. Fig. 10 shows an exemplary result 142 of the base material decomposition (BMD) 144 and the visualization of the output in a blue to greenish color scheme.
Three human annotators may be briefed from two human experts coming from the recycling and physics field of expertise. To observe the annotation quality of the non-expert annotators the agreement level, e.g., the confidence value 159, among each other of the experts and non-experts regarding their votes of the annotation task, e.g., on 100 randomly chosen flake samples 110, can be measured. The used measure, e.g., the confidence value 159, can be the Cohens Kappa score which ranges from -1 to 1. For example, 1 represents a complete agreement and at a value smaller than zero a zero agreement of the annotators is observed. A further interpretation of the resulting Kappa scores are: 0.01-0.20 as slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1.00 as almost perfect agreement [2], [3] and [4].
Fig. 11 shows a table of the Cohens Kappa scores between two experts in the field of recycling and X-ray physics and three non-expert annotators. The Kappa agreements among each annotator from the table show that substantial results are achieved from the comparison of experts among each other. That holds for two of three non-expert human annotators according to the agreement to expert number one. The remaining Kappa scores among experts and non-experts show a moderate agreement level. To a health-related study which uses the Cohen Kappa score an acceptable agreements lies at 0.41 [4]. The inventors conclude that non-expert are able to annotate after an instruction from experts. As a result 3000 randomly chosen flake samples 110 are annotated, i.e. the respective label information 158 is provided. Afterwards a further investigation of the agreement is done and evaluated. This might lead the apparatus 100 to decide whether to perform an additional split of the labeled samples into one fraction in which all three annotator agree with their votes and the rest of labeled samples. According to an embodiment, this may result in 2172 as certain labeled samples and 828 as noisy labeled samples. The apparatus 100 may be configured so that the supervised model training 150 is less sensitive to sensor data 140 of certain labeled samples compared to sensor data 140 of noisy labeled samples.

final database

According to an embodiment shown in Fig. 12, the available data consist of 7346 single flake samples X, i.e. material objects 110, that comes, e.g., from major household appliance recycling scrap. The appliances are shredded previously that, for example, leads to resulting flake sizes between 10 to 120 mm. For supervised training, i.e. the supervised model training 150, 3000 flakes X_l are randomly selected and annotated with the herein proposed annotation scheme into two classes Y ∈ {0, 1}, e.g., the target classification 0 represents pure aluminum and the target classification 1 represents the rest. This annotation can be provided as the label information 158. A distribution, e.g., 1:3, between material objects 110 associated with a first target classification 157 and material objects 110 associated with a second target classification 157, e.g., of pure aluminum flakes and rest flakes, may be in existence among the annotated samples X_l , e.g., the sensor data 140₁ to 140_n and the label information 158₁ to 158_n associated with the material objects 110₁ to 110_n of the training set 154 of material objects 110₁ to 110_n. The apparatus 100 may be configured to split of the annotated samples into certain (X_l ^train,cert , X_l ^val , X_l ^test ) and noisy (X_l ^{train, noisy} ) labeled samples. On Certain samples all three labels comprised by the respective label information 158 indicate the same target classification, i.e. the three annotators agree on the class assignment. Further 4346 unlabeled samples X_u are remaining for studies with different learn approaches like un- and semi-supervised model training. An overview of an exemplary available or final database is given in Fig. 12.
The following sections architectures, data augmentation strategies, annotation noise and performance metric describe features and/or functionalities comprisable by a herein described apparatus 100 to improve a performance of the deep learning model 120.

architectures

For the herein described set 124 of candidate deep learning models different backbone architectures that were established for image classification tasks such as the present one could be used. Next to the Resnet18 model architecture [5], different versions of the EfficientNet architecture [6] might be advantageous. However, other deep learning model could also be used. This latter architecture class offers a methodology to scale the complexity and size of a modular basic architecture and was introduced with the rationale to provide a scalable choice for the trade-off between the architecture size and the required inference time as expressed in model complexity. In particular, the B0, B1, B2 and B4 versions of the EfficientNet are especially advantageous in the field of classifying material objects 110. The use of this model architecture allows to evaluate the effect of the model complexity on both predictive performance and model inference time. All those model architectures were initially designed to work with 3-channel RGB images, whereas the herein proposed data can have a 2-channel dual energy format as out-lined above. Therefore, it is proposed to add a learnable linear layer to the beginning of the model architecture that projects the input data from a 224x224x2 to a 224x224x3 format to train above mentioned architectures with the dual energy data.

data augmentation strategies

Fig. 13 shows exemplary four data augmentation procedures usable within model training, e.g., to increase the amount of the training data 152. For example, in scenarios with a limited amount of labeled data it helps regularizing the model 120 and avoids overfitting. It is proposed to use one or more of the set of data augmentation procedures shown in Fig. 13 and wrapped them via the RandAugment procedure [7] to facilitate the tuning towards the herein discussed use case, e.g., the classifying of material objects 110.
The following data augmentation procedures can be applied to the individual flakes 110 within the 224x224 grid, i.e. to the individual samples 140a and 140b of the sensor data 140:

1) Random Rotation
2) Horizontal Flipping
3) the addition of Gaussian Noise and
4) the addition of Salt Pepper Noise

Each of these procedures should be parametrized with a magnitude parameter mag ∈ [0; 1] that controls the strength of the augmentation. For 1) Random Rotation, mag controls the degree of the rotation, for 2) it refers to the probability of a flip for 3) it refers to the standard deviation of the Gaussian noise distribution and for 4) it controls the ratio of salt and pepper pixels (white and black) as wells as the amount of those noisy pixels. The RandAugment strategy [7] can serve as a wrapper for different data augmentation procedures to allow an elegant tuning of those individual procedures towards the task by introducing two hyperparameters n and mag. Therein, n controls the amount of procedure that are randomly selected at each training step and mag controls their strength as described above. Next to the facilitation of the tuning procedure, RandAugment follows the rationale that potentially detrimental data augmentation procedures would be averaged out in the training process, reducing their potential harm.

annotation noise

As described above and motivated by the complexity of the annotation task, it is proposed to mine annotations, e.g. the target classification 157, from two or more, e.g., of three, human annotators in the data labeling process. Thus, for each annotated flake x_i 110₁ to 110_n, there are three annotations 157 y _i1, y _i2,3 with y_ij ∈{0,1}. This allows for different schemes to aggregate the individual annotations 157 to one global annotation y_i * used as target in model training. The apparatus 100 may be configured to aggregate, for each of the training set 154 of material objects 110₁ to 110_n, the two or more labels, e.g., 158a and 158b, to one global label. The apparatus 100 may be configured to obtain the training data 152 according to one of the following procedures:

1. Clean Annotations: only those flakes 110 for which all three annotators align in their annotation decision are selected as training data 152. Practically, this means that "unconfident" flakes 110 e.g., where y _i1,=0, y _i2=1, y _i3=0, are excluded to guarantee a noise-free training data set 152 at the cost of reducing the amount of training samples.
2. Majority Vote: the majority of the three annotations 158 is selected as global annotation y_i *, 157. This procedure allows the use of all annotated samples in model training, i.e. in the supervised model training 150, at the cost of ignoring the uncertainty in data annotations [8].
3. Soft Annotations: the three annotations 158 are averaged to a continuous score 159 as global annotation 157₀ y_i* ∈{0.0, 0.33, 0.67, 1.0}, i.e., soft labels. This scheme allows the use of the whole training data set while reflecting the annotation noise in model training, i.e. in the supervised model training 150.

As part of experiments, those three aggregation schemes have been empirically compared and the results are described in more detail in the section experimental results.
Conceptually, it is proposed to distinguish Label Noise from Annotator Noise. The former describes label noise on the macro level of the task and can, for instance, be modelled via noise transition matrices with estimated probabilities for the miss-annotation of the true label into other labels [9]. These approaches try to estimate the inherent noise of the task annotations. On the other hand, Annotator Noise describes the disagreement of multiple observers on annotations on a micro level of individual samples [8]. As we mined annotations 157 from three different annotators, this setting of Annotator Noise fits the herein described task. While there exists a multitude of different approaches to modelling annotator noise, it is proposed to restricted the approaches to the three above described settings. However, other approaches might also be usable.

performance metric

The main objective of this work is the training 150 of a model 120 that allows the separation of aluminum and residual flakes where aluminum recyclate is more valuable than residual recyclate. As the value of recycled aluminum in down-stream re-use is heavily dependent on the purity of the resulting recyclate, it is proposed to employ a performance metric 134 that reflects this purity requirement. Specifically, this performance metric 134 has to reflect the interest in a high precision at the potential cost of a low recall for the aluminum class. The F_β -Score allows for the adjustment of this trade-off via the β-parameter via its definition: $F_{β} = (1 + β^{2}) \frac{Precision \cdot Recall}{(β)}$
It was found that a False Positive is 5 times as costly as a False Negative in the field of material classification of aluminum trash objects given a positive label corresponds to the aluminum class. Practically speaking, the misclassification of a true residual flake as aluminum incurs a cost that is 5 times higher than the misclassification of a true aluminum flake as residual. It is proposed to integrate this requirement in the performance metric 134 via a β=0.2-Score attributing precision a relative weight of 5 over the Recall for the aluminum class.

Concrete embodiment

As concrete application, it is proposed to use above proposed approach to automate the sorting of End-of-Life (EoL) aluminum trash into a high-grade pure aluminum fraction and a low-grade residual fraction.
Specifically, it is proposed to use dual energy X-ray transmission data derived from shredded household appliances (Major Household Appliance MHA). This material flow was shredded and passed onto a conveyor belt 175 on which an integrated dual energy X-ray system, e.g., comprised by the measurements system 170, was applied to create dual-energy scans of the material objects 110 as subsequent measurement. This continuous material stream may then be preprocessed, e.g., into 7346 individual flakes 110, as data input and stored in machine-readable tensor data, i.e. the sensor data 140. After the application of filtering steps to ensure that only reasonably sized flakes 110 are being selected, the final material stream thus may consist of 7346 individual flakes 110.
Out of these 7346 flakes 110, 3000 flakes 110 may be randomly selected and presented to three human annotators for data annotation, e.g., to obtain the label information 158. These 3000 flakes 110 may be transformed via a Basis Material Decomposition 144 and then presented to the annotator via the proprietary "DE-Kit". In early experiments with domain experts it has been found that the decision making about the label, e.g., the target classification 157, is ambiguous in especially hard corner cases due to the complexity of the task. It is proposed to mine data annotations from three different human annotators per sample, i.e. per material object 110 of the training set 154 of material objects 110₁ to 110_n, to create a high-quality training database, i.e. the training data 152, which guides the model training process, e.g., the supervised model training 150, to increase the robustness of the resulting data annotations, e.g., the label information 158 or the target classification 157 comprised by the label information 158.
Given the large amount of 4346 unlabeled samples the inventors experimented with a) supervised model training 150 on the 3000 labeled flakes 110, b) Semi-supervised model training 350 on the 3000 labeled and the 4346 unlabeled flakes and c) unsupervised model training 151 without any labels using 7346 unlabeled samples only. The experiments revealed the supervised approach 150 to show the strongest model performance. Thus, it is proposed to focus on this learning paradigm.
For the sake of completeness, those un- and semi-supervised solution approaches and their results are described in the following.
For unsupervised model training 151, it has been investigated the use of two machine learning based clustering methods trained on feature engineered dual energy XRT data. The investigated unsupervised model training techniques 151 were Gaussian-Mixture-Models and k-Means clustering and two different approaches of human engineered feature extraction methods were used. As a first feature extraction method, histograms for each energy channel were calculated. These histograms are binned in range of minimum to maximum values present in the data a summarization of two values. The second feature extraction method yields statistical measures (the arithmetic mean and the standard deviation) as features computed individual from each energy channel. The proposed 5-fold cross validation scheme holds also for training, validation and testing in the unsupervised experiments. Resulting the k-Means with histograms features showed the best performance of F _β=0.2 = 0.763 among the investigated unsupervised model training techniques 151. This model performance served as a lower bound in the experiments.
For semi-supervised model training 350, it has been used the recently published FixMatch architecture which allows to include the unlabeled data next to the labeled data, i.e. the pairs 156 of sensor data 140 and label information 158, in the training process. These experimentations yielded a final model performance of F _β=0.2 = 0.832 which is better than the results from the unsupervised training paradigm 151 but worse than the supervised approaches 150. Hence, it is proposed to focused on the supervised model training paradigm 150 in the project and propose this as the best solution.
Subsequently this annotated data, i.e. the pairs 156 of sensor data 140 and label information 158, may be split into five different folds of training, validation and testing data. The training data 152 may comprise the five different folds of training, validation and testing data. The models, e.g., the deep learning model 120, may be trained via standard Gradient Descent Backpropagation using a standard Cross-Entropy loss function.
A performance metric 134 may be defined for the model selection step, e.g., for the selection of the deep learning model 120 out of the set 124 of candidate deep learning models. In this use case, a high level of purity for the pure aluminum fraction was given more importance than purity in the residual fraction. This was reflected in the choice of an F _β=0.2 score as performance measure 134 which weights precision for the aluminum class 5x higher than the recall. Practically speaking this means that misclassifying a residual flake as aluminum incurs 5 times the cost of misclassifying a high-grade aluminum flake as low-grade residual.
In settings with limited labeled data 156, it is proposed to use data augmentation to artificially increase the variety of the training database, i.e. of the training data 152, for model training, i.e. for the supervised model training 150. This has a regularizing effect on the model 120 while training 150 reducing overfitting and thus improving model performance. It is proposed to use one of the following data augmentation procedures: Rotation, Horizontal Flip, Additive Gaussian Noise, Additive Salt Pepper Noise. It is furthermore proposed to use the established RandAugment procedure as a wrapper around these strategies to improve the robustness of model training 150. RandAugment offers the parametrization of the data augmentation procedure via the parameters N and magnitude which are optimized according to the F_β=0.2 performance metric 134 in the model selection step.
After establishment of this model evaluation and selection strategy, it has been experimented with different model architectures of increasing model complexity (ResNet18, EfficientNet-B0, EfficientNet-B1, EfficientNet-B2, EfficientNet-B4) and hyperparameters (learning rate, dropout ratio, N, magnitude) as part of the model selection as measured by the F_β=0.2 performance metric 134. The used model architectures are all based on Convolutional Neural Networks. This process revealed the EfficientNet-B4 to show the strongest model performance with a model accuracy of 0.865, an F _β=0.2 = 0.9174 and a Precision for the aluminum class of 0.9308, outperforming several baseline models. Thus, this system is expected to distinguish new unseen aluminum flakes 110 from residual flakes 110 with a Precision of 0.9308, i.e., out of 1000 flakes 110 selected as aluminum by the model 120, 69 would be False Positives whilst 931 would correctly correspond to the aluminum class. As default a threshold 132 of 0.5 was used to map the probability scores [0; 1] predicted by the model 120 to hard binary class predictions (class 0: Aluminum, class 1: Residual fraction). This threshold 132 could be adapted by the end user after deployment of the model 120 to calibrate model predictions further to their needs depending on their requirements regarding the purity of the resulting sorting. Fig. 14 shows a confusion matrix of the final model 120 on the unseen test data, i.e. a plurality of sensor data 140₀ of a plurality of predetermined material objects 110₀, to illustrate its strong predictive performance. A threshold 132 of 0.5 was used to map the probability scores [0; 1] predicted by the model to hard binary class predictions (class 0: Aluminum, class 1: Residual fraction).

evaluation protocol for the experiments

The limited size of the annotated dataset 156 left a small test data set, i.e. a plurality of sensor data 140₀ of a plurality of predetermined material objects 110₀. For example, the apparatus 100 is configured to classify a plurality of predetermined material objects 110₀, based on the respective sensor data 140o associated with the respective predetermined material object 140₀. To yield a realistic estimation of the generalization performance of the final model 120, the inventors used a 5-fold Cross Validation scheme as an outer loop in the model validation step [10]. Further, it has been included an inner loop with a holdout validation split to tune the hyperparameters of the different model architectures and backbones. For hyperparameter tuning, the inventors used the Hyperband algorithm with an equal tuning budget across folds [11]. Fig. 15 illustrates the evaluation scheme. A test and validation ratio of 20% respectively was applied and the tuning budget was set to 100 GPU hours while the performance metric 134 described above has been used. Fig. 15 shows a model selection and validation scheme. GE stands for Generalization Error, HPC for Hyperparameter Configuration and HPC* is the optimal HPC per fold. Mean and standard deviation of model performance across folds are reported in the experimental results section below.

experimental results

Fig. 16 shows a table with the final results of the herein proposed methods and their experimental setups. The results are structured according to the above proposed methods: 3.1 Architectures, 3.2 Data Augmentation Strategies and 3.3 Annotation Noise.
The results of the experiments with different backbone architectures show that better performances can be achieved with an increasing available parameter space of the selected architectures (ResNet18, EffNet-b0, EffNet-b2, or EffNet-b4) as backbone. This is observed in terms of the proposed performance score 134 F_β which was used as a criterion for the tuning procedure of the architectures, as well. EffNet-b4 shows in this methods setup the best results.
Considering data augmentation strategy in particular RandAugment in the tuning procedure for training and in terms of the proposed evaluation protocol, the method show slight but increased performances compared to no data augmentation strategy is applied. A F_β 0.88 +/- 0.02 is reached.
The outcome of the proposed schemes for dealing with annotation noise shows increased performances for both strategies (majority vote and soft annotations) that are considering the total amount of annotated samples and the individual criteria of the strategy for training. The best performance achieved the majority votes over the three human annotations within the experimented schemes with a F_β score of 0.89 +/- 0.02.
The overall best performance of the experiments are achieved with the EffNet-b4 architecture as backbone with a F_β of 0.92 + /- 0.01. An investigation of applying RandAugment and the amount of majority voted samples is still pending with this architecture, therefore the ResNet18 architectures has been used as backbone as explained above.
It is proposed a pipeline for the distinction of high-value aluminum flakes 110 from low-value trash flakes 110. It includes the processing of acquired dual energy XRT data coming from a material stream of recycling scrap on a conveyor belt 175. The data is processed with supervised machine learning techniques and state-of-the-art deep learning architectures. A performance score F_β = 0.2 of 91.74 % is achieved, which tackles the precision and recall trade-off regarding the aim of a high purity in the aluminum sorting.
In this context, we deal with common challenges in machine learning: the collection of sufficient annotated data 156, a crucial component especially for the training 150 of supervised learning models 120. Herein a detailed description of a data annotation scheme via three human annotators and the integration of expert knowledge from the recycling domain to compile a representative labeled dataset is provided. Further, special attention has been paid to the handling of annotator noise that stems from the use of three human annotators during model training and explore different suitable model architectures for this problem. In order to prevent the model from overfitting to the relatively small amount of training data, a set of data augmentation strategies that were tailored towards dual energy X-ray data has been successfully used.
A small amount of labeled data 156 also renders model evaluation a difficult problem. Thus, a 5-fold nested cross validation scheme with a hold-out validation set for hyperparameter tuning has been employed to ensure a realistic and comparable performance estimate of the final model. Next to sufficiently strong model performance, the final deployment of such deep learning approaches depends on the acceptance by the practitioners and their trust in such models. Therefore, local explainable AI methods have been employed to open these black box models and provide insights into how the model 120 makes its predictions over different flakes 110. This can foster trust amongst practitioners, and it is an important step towards the final deployment of such automated sorting systems 300.
First results from unsupervised model training 151 are also promising under the consideration that no label information 158 is present. The achieved performance score is F_β = 0.2 of 77.80 %. These results can be increased with semi-supervised model training techniques 350 to F_β = 0.2 of 89.00 %. Both evaluated with the proposed evaluation scheme.
In the following effects and advantages of the previously described special technical features are described.
With the existing domain knowledge of the operators of such a system, we thus increase their independence and the possibility of independently responding to different material flows, the currently applicable market economy situation, and the needs of the customers (processing industry of secondary raw materials).
The human operator of the system has the possibility to interact with the DL model 120 via integrating his prior knowledge about the material stream to be sorted into the system. Potential Options:

Training of the system 300 or apparatus 100 given a small amount of materials A / B.
Individual optimization of the system 300 or apparatus 100 to the appropriate performance metric 134
Calibration of the finally deployed model 120 by setting the threshold 132 to map probability scores to hard class assignments as a post-hoc step.
Monitoring of the sorting results, based on statistical metrics and the sorted materials

Furthermore, such a system 300 or apparatus 100 does not require the application of a Basis Material Decomposition (BMD) 144 for model inference once it is trained. This reduces the computational and human effort required by the BMD 144 dramatically.
Additional notes

Alternatively, sensor data 140 that do not originate from X-ray sensors (here dual energy XRT) may be used.
Use convolution-based neural network architectures, which use information from local structures present in the input data and are optimized for the exploitation of these. One could potentially use alternative model architectures in this context.

The proposed solution was developed for the sorting of EoL aluminum scrap in the context of white goods and MHA in the recycling industry. In addition, it is also conceivable to optimize deep learning models for other material flows for the recycling industry using the conceptual approach. Additionally, the herein described approach is not only limited to the use in the recycling industry, but also suitable for other domains for which sorting is relevant such as the mining industry.

References

[1] Firsching, M., Nachtrab, F., Uhlmann, N., & Hanke, R. (2011). Multi-Energy X-ray Imaging as a Quantitative Method for Materials Characterization. Advanced Materials, 23, pp. 2655-2656.
[2] Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, pp. 37-46. doi:doi:10.1177/001316446002000104
[3] Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34, pp. 555-596.
[4] McHugh, M. L. (2012). Interrater reliability: the kappa statistic. Biochemia medica, 22, pp. 276-282.
[5] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, (pp. 770-778).
[6] Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. International Conference on Machine Learning, 97, pp. 6105-6114.
[7] Cubuk, E. D., Zoph, B., Shlens, J., & Le, Q. V. (2020). RandAugment: Practical automated data augmentation with a reduced search space. Neural Information Processing Systems, 34, pp. 18613-18624.
[8] Tanno, R., Saeedi, A., Sankaranarayanan, D., Alexander, D. C., & Silberman, N. (2019). Learning From Noisy Labels by Regularized Estimation of Annotator Confusion. IEEE/CVF Conference on Computer Vision and Pattern Recognition, (pp. 11236-11245). doi:10.1109/CVPR.2019.01150
[9] Song, H. a.-G. (2020). Learning from noisy labels with deep neural networks: A survey. Arxive.
[10] Hastie, T. a. (2009). The elements of statistical learning. Springer Series in Statistics.
[11] Li, L. a. (2017). Hyperband: A novel bandit-based approach to hyperparameter optimization. The Journal of Machine Learning Research, pp. 6765-6816.

Claims

Apparatus (100) for classifying (130) material objects (110), comprising
a deep learning model (120), and

wherein the apparatus (100) is configured to, in an initialization phase, subject the deep learning model (120) to supervised learning (150) based on training data (152) obtained from, for each of a training set (154) of material objects (110₁ to 110_n), a pair (156) of sensor data (140) obtained by a measurement of the respective material object (110) and label information (158) associating the respective material object (110) with a target classification (157),

wherein the apparatus (100) is configured to, using the deep learning model (120), classify (130) a predetermined material object (110₀) based on sensor data (140₀) obtained by a measurement of the predetermined material object (110₀).
Apparatus (100) according to claim 1, wherein the material objects (110) are aluminum pieces.
Apparatus (100) according to claim 1, wherein the material objects (110) are aluminum trash pieces and the classification (130) discriminates between high-grade pure aluminum and a low-grade residual.
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) comprises a measurement sensor (170) configured to perform the measurement of the training set (154) of material objects (110₁ to 110_n) and the predetermined material object (110₀).
Apparatus (100) according to claim 4, wherein the measurement sensor (170) comprises a dual energy X-ray sensor and a conveyor belt (175) for passing the training set (154) of material objects (110₁ to 110_n) and the predetermined material object (110₀) by the dual energy X-ray sensor so as to be scanned by the dual energy X-ray sensor.
Apparatus (100) according to any of the previous claims, wherein the sensor data (140) comprises dual energy X-ray images.
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) comprises a sorting stage (180) configured to subject the predetermined material object (110₀) to sorting according to the classification (130).
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) is configured so that the label information (158) comprises, for the respective material object (110), at least two labels (158a, 158b) each indicating a target class (157) for the respective material object (110), wherein the apparatus (100) is configured so that the supervised learning (150) is less sensitive to sensor data (140) of material objects (110) whose label information (158) comprises labels (158a, 158b) indicative of different target classes (157) compared to sensor data (140) of material objects (110) whose label information (158) comprises labels (158a, 158b) indicative of the same target classes (157).
Apparatus (100) according to claim 8,
wherein the apparatus (100) further comprises a user interface (190), and

wherein the apparatus (100) is configured to obtain, for the respective material object, the at least two labels (158a, 158b) via the user interface (190).
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) is configured so that the label information (158) comprises, for the respective material object (110), a mean target classification (157₀) as the target classification (157) for the respective material object (110) and a confidence value for the indication of the target classification (157₀), wherein the apparatus (100) is configured so that the supervised learning (150) is less sensitive to sensor data (140) of material objects (110) whose label information (158) comprises a lower confidence value (159) compared to sensor data (140) of material objects (110) whose label information (158) comprises a higher confidence value (159).
Apparatus (100) according to any of the previous claims,
wherein the apparatus (100) further comprises a user interface (190), and

wherein the apparatus (100) is configured so that the classification (130) of the predetermined material object (110₀) depends on performance metric (134) input via the user interface (190).
Apparatus (100) according to claim 11, wherein the performance metric (134) is indicative of miss-classification costs associated with miss-classifications of one or more classes discriminated by the classification of the deep learning model (120).
Apparatus (100) according to claim 11 or 12, wherein the apparatus (100) is configured so that the classification of the predetermined material object (110₀) depends on performance metric (134) by
the apparatus (100) being configured to, in subjecting the deep learning model (120) to the supervised learning (150) based on the training data (152) in the initialization phase, take the performance metric (134) into account by controlling the supervised learning (150) so that the deep learning model (120) meets, with respect to the training data (152), the performance metric (134), or is optimized with respect to the performance metric (134).
Apparatus (100) according to any of claims 11 to 13, wherein the apparatus (100) is configured so that the classification of the predetermined material object (110₀) depends on performance metric (134) by
the apparatus (100) being configured to, in subjecting the deep learning model (120) to the supervised learning (150) based on the training data (152) in the initialization phase, take the performance metric (134) into account by
subjecting a set of candidate deep learning models to supervised learning (150) based on the training data (152) obtained and

selecting one candidate deep learning model out of the set of candidate deep learning models which meets or is best in terms of the performance metric (134).
Apparatus (100) according to any of the claims 1 to 14,
wherein the apparatus (100) further comprises a user interface (190), and

the apparatus (100) is configured to
obtain, via the user interface (190), a threshold value (132), and

compare the threshold value (132) with an output of the deep learning model applied to the sensor data (140₀) of the predetermined material object (110₀) so as to decide on whether the predetermined material object (110₀) is attributed to a predetermined class.
Apparatus (100) according to any of the previous claims, wherein the deep learning model comprises a convolutional neural network.
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) is configured to, intermittently, subject the deep learning model (120) to unsupervised learning (151) and/or to semi-supervised learning (350).
Apparatus (100) according to any of the previous claims, wherein the apparatus (100) is configured to obtain the training data (152) from the pairs (156) of sensor data (140) and label information (158) by means of augmentation.
Sorting system including the apparatus (100) according to any of the previous claims.
Method performed by any of the above apparatuses (100) and systems.
Method (200) for classifying material objects, comprising
in an initialization phase, subjecting (210) a deep learning model to supervised learning based on training data obtained from, for each of a training set of material objects, a pair of sensor data obtained by a measurement of the respective material object and label information associating the respective material object with a target classification,

wherein the method comprises, using the deep learning model, classifying (220) a predetermined material object based on sensor data obtained by a measurement of the predetermined material object.
Computer program for instructing a computer, when being executed on the computer, the method of claim 20 or of claim 21.