EP4097646A1

EP4097646A1 - Hardware-accelerated calculation of convolutions

Info

Publication number: EP4097646A1
Application number: EP21701465.3A
Authority: EP
Inventors: Armin Runge; Taha Ibrahim Ibrahim SOLIMAN; Leonardo Luiz Ecco
Original assignee: Robert Bosch GmbH
Current assignee: Robert Bosch GmbH
Priority date: 2020-01-31
Filing date: 2021-01-20
Publication date: 2022-12-07
Also published as: DE102020201182A1; WO2021151749A1; JP2023513064A

Abstract

Method (100) for calculating a convolution (4) of an input tensor (1) of input data (1a) using a tensorial convolution core (2), wherein • the convolution core (2) is guided (110) in a predefined grid of positions (21, 22) within the input tensor (1), • the convolution core (2) is applied in each of these positions (21, 22) by forming (120), from the input data (1a), a sum (3) weighted with the values (2a) of the convolution core (2) in that region of the input tensor (1) which is covered by the convolution core (2) at the current position (21, 22) thereof, and • this weighted sum (3) is assigned (130) in the convolution (4) to the current position (21, 22) of the convolution core (2), wherein • the weighted sum (3) is calculated (121) using at least one hardware accelerator (5) which has an input memory (51) and a fixed number of multipliers (52) which each retrieve the operands (52a, 52b) thereof from predefined memory locations (51a-51h) of the input memory (51), and • more summands than corresponding (122) to a depth (11) of the input tensor (1) are processed in at least one operation of the hardware accelerator (5), wherein • the assignment between operands (52a, 52b) and memory locations (51a-51h) of the input memory (51) is varied (123) when calculating the convolution (4), and/or input data (1a) and/or values (2a) of the convolution core (2) are repeatedly stored (124) in the input memory (51).

Description

description

Title:

Hardware-accelerated calculation of convolutions

The present invention relates to the computation of the convolution of input data with a convolution kernel by means of a hardware accelerator.

State of the art

Convolutional neural networks (CNN) have established themselves primarily for processing image data and audio data. Evaluation methods for such data that use CNNs have a significantly higher performance potential compared to methods that are not based on artificial intelligence. However, this also requires a significantly higher computational effort. Even smaller CNNs can consist of several million parameters and require billions of arithmetic operations to process a set of input quantities into a set of output quantities.

A large proportion of this effort is made up of convolutions of data with convolution kernels. In the context of these convolutions, sums of data are calculated which are weighted with weights from the convolution kernels. So a great many products are calculated from data and weights, and these products are totaled. In the broadest sense, inner products of data and convolution kernels are calculated.

Dedicated hardware accelerators, such as inner product computing units, are increasingly being used for this basic task. These units are designed to compute a complete inner product of two vectors of fixed length in one clock cycle. Disclosure of the invention

In the context of the invention, a method for calculating a convolution of an input sensor of input data with a tensorial convolution kernel was developed.

With this folding, the folding core is guided in a predetermined grid of positions within the input sensor. The distance between adjacent positions in this grid is also known as the “stride”. The convolution kernel is used in each of the positions by forming a sum weighted with the values of the convolution kernel from the input data in the area of the input sensor covered by the convolution kernel at its current position. In the convolution, this weighted sum is assigned to the current position of the convolution kernel.

If, for example, the convolution kernel is used to identify a certain searched feature in the input data, then the weighted sum is greatest for those positions of the convolution kernel at which there is the greatest correspondence between the searched feature embodied in this convolution kernel and the input data. The result of the convolution with a convolution kernel is therefore also referred to as a feature map in relation to this convolution kernel.

The weighted sum is calculated using at least one hardware accelerator. This hardware accelerator has an input memory and a fixed number of multipliers which call up their operands, that is to say here input data and values of the convolution kernel, from predetermined storage locations in the input memory. For example, an inner product arithmetic unit that calculates the inner product of two vectors with a fixed length typically contains as many multipliers as the vectors each have elements. This means that all the multiplications required to calculate the inner product can be carried out at the same time. The resulting products then only have to be cumulated with adders. Overall, the inner product can be calculated in fewer clock cycles. In the context of the calculation of convolutions, such hardware accelerators are usually operated in such a way that a maximum of as many summands are processed in each operation as corresponds to a depth of the input sensor, i.e. the extension of the input sensor measured in the number of elements in one dimension. For example, if the input data includes RGB image data, the input sensor has a depth of 3 because each image pixel is assigned three intensity values for red, green and blue. During the convolution with the convolution kernel, these three intensity values, which are assigned to a specific pixel, are then always added up, weighted with values of the convolution kernel. This calculation is repeated for all pixels currently covered by the convolution kernel, and then all inner products obtained are added.

This procedure is modified within the scope of the method so that more summands are processed in at least one work step of the hardware accelerator than corresponds to the depth of the input sensor.

In the example of the RGB image mentioned, for example, not only the intensity values and values of the convolution kernel for a single pixel, but also the corresponding data for other pixels can be loaded into the input memory of the hardware accelerator, so that the hardware accelerator ideally has a completely filled input memory is operated. Since the pixel-wise intermediate results calculated up to now are all added anyway in order to obtain the final result of the convolution, it is irrelevant for the result if the calculation for several or ideally all pixels is combined in one operation of the hardware accelerator. However, this result is delivered much faster because, overall, significantly fewer hardware accelerator operations are required.

Behind this is the knowledge that every work step of the hardware accelerator is always independent of the content of the input memory takes the same amount of time, since all the necessary multiplications are carried out at the same time.

The time saving is particularly great if a fold is calculated in a layer of a CNN that has a large lateral extent and, at the same time, a shallow depth. For example, the aforementioned RGB image can have a resolution of Full HD (1,920 x 1,080 pixels) with a depth of only 3. If, for example, an inner product computing unit is used for vectors with a length of 128 elements, according to In the conventional operating mode, this arithmetic unit only performs three multiplications instead of 128 per operation. So almost 98% of the available computing capacity is idle. According to the method proposed here, the hardware accelerator is utilized much better.

Due to the usually fixed assignment between the operands of the multipliers and the storage locations in the input memory of the hardware accelerator, it is not enough to simply reorganize the arithmetic operations.

Only in the special case in which the positions at which the convolution kernel is used are always apart by the extent of the convolution kernel do the areas of the input sensor processed at adjacent positions of the convolution kernel in the grid not overlap. Each value in the input tensor is therefore only included in the calculation of the convolution for one position of the filter core. In this special case, the input data can be loaded into the input memory of the hardware accelerator according to a rule that is the same for all positions of the convolution kernel, and this mere reorganization is sufficient to obtain the same result much faster than before.

In the general case, however, one and the same value of the input data at different positions of the convolution kernel is used several times in the calculation of the convolution, each time having to be weighted with a different value of the convolution kernel. In other words, the value of the input data must be in the list of input data in the input memory of the hardware accelerator are in the position at which the matching value of the convolution kernel is in the list of values of the convolution kernel in the input memory. The method provides two options for ensuring this, so that the result of the convolution, which was previously obtained with very little utilization of the hardware accelerator, can now also be reproduced exactly with a greatly improved utilization.

The first possibility is to vary the assignment between operands and storage locations of at least one input memory and / or the assignment between operands and storage locations of at least one parameter memory for values of the convolution kernel during the calculation of the convolution. For this purpose, for example, a multiplexer can in particular be connected between at least one multiplier and at least one input memory. Alternatively or in combination with this, a multiplexer, for example, can be connected between at least one multiplier and at least one parameter memory for values of the convolution kernel. For example, one and the same multiplexer can have both access to the at least one input memory and access to the at least one parameter memory. In this way, a specific operand for a specific multiplication can optionally be read from one of several possible memory locations. This means that the hardware accelerator can access the input memory and / or the parameter memory at least to a limited extent. The freedom of choice is sufficient to be able to reuse input data that are in the correct place in the input memory of the hardware accelerator for a first position of the convolution kernel, also for the position of the convolution kernel that occurs during the convolution. At the same time, the circuitry effort is significantly lower than, for example, for a bus system or a “Network on Chip”. In particular, a 4: 1 multiplexer has turned out to be an optimal compromise between freedom of choice and thus efficiency on the one hand and hardware costs on the other. The multiplexing can be applied to the input data, to the values of the convolution kernel, or also to both the input data and the values of the convolution kernel. The second possibility, which can be used alternatively or in combination, is to store input data and / or values of the convolution kernel multiple times in the input memory. A separate copy can then be stored in the input memory, for example, for each intended use of a specific value from the input data in the course of the convolution. For example, the collection of the values from the input data to be processed in this work step can be stored in the correct order in the input memory for each intended work step of the hardware accelerator. When the convolution then progresses to the respective operation, these values can be retrieved en bloc from the multipliers of the hardware accelerator.

For example, an input memory with at least one separate memory or memory area, also called a partition or bank, can be selected for each multiplier. Those input data or values of the convolution kernel that the respective multiplier needs in the course of the calculation of the convolution can then be loaded into this memory or memory area. When changing from one position of the convolution kernel to the next, only the next value in each case has to be retrieved from each partition and fed to the multiplier. No random access of the hardware accelerator to the input memory is necessary for this, but it is sufficient, for example, to design the partitions as shift registers.

It is up to you whether the input data and / or the values of the convolution kernel are replicated. The effect is always the same, namely that those values of the input data and values of the convolution kernel come together at the multipliers of the hardware accelerator, the product of which is actually contained in the desired convolution. In particular, replicating the values of the convolution kernel, i.e. the weights, can be useful in layers of neural networks that process input data with little depth and only have a few filters. It is precisely in these layers that the hardware storage units available for the weights are often not fully utilized, so that the weights can be replicated with little or no additional costs for additional hardware storage units. The preceding explanations show that the possibility of adding more summands to the hardware accelerator in each work step and thus better utilizing it is not obviously available free of charge. Rather, advance payment must first be made in the form of additional hardware for at least limited random access to the input memory and / or in the form of increased memory requirements in the input memory.

In a particularly advantageous embodiment, an inner product arithmetic unit for vectors with a length between 16 and 128 elements is selected as the hardware accelerator. The greater the number of elements, the more data can be processed with each operation and the greater the gain in speed with an optimized utilization of this processing unit. However, the said effort for providing the correct input data to the correct multipliers also increases. The inventors' investigations have shown that the range between 16 and 128 elements is an optimal compromise.

A main application for CNNs is the processing of measurement data into output variables relevant for the respective application. For example, in the context of at least partially automated driving, better utilization of the hardware accelerator means that lower costs are incurred for the hardware of a corresponding evaluation system and energy consumption is also reduced accordingly.

The invention therefore generally also relates to a method for evaluating measurement data recorded with at least one sensor, and / or realistic synthetic measurement data from this at least one sensor, for one or more output variables with at least one neural network. Realistic synthetic measurement data can be used, for example, instead of or in combination with actually physically recorded measurement data in order to train the evaluation system. Typically, a data set with realistic, synthetic measurement data from a sensor is difficult to distinguish from measurement data actually recorded physically with this sensor. The neural network has at least one convolution layer. In this convolution layer, a convolution of a tensor of input data with at least one predefined convolution kernel is determined. This convolution is calculated using the method described above. As explained above, this means that the desired output variables can be evaluated particularly quickly from the input variables given the hardware resources. With a given processing speed, the evaluation can be carried out with less use of hardware resources and thus also with less energy consumption.

In a particularly advantageous embodiment, the convolution in the first convolution layer through which the measurement data pass is calculated using the method described above, while this method is not used in at least one convolution layer passed through later. As explained above, the gain in speed through the method described above is greatest in those layers of the CNN which are laterally greatly expanded, but only have a shallow depth. The corresponding circuitry for the at least restricted random access of the hardware accelerator to the input memory, or the corresponding storage space in the input memory of the hardware accelerator, should therefore preferably be used on such layers.

In a particularly advantageous embodiment, the measurement data include image data of at least one optical camera or thermal camera, and / or audio data, and / or measurement data obtained by querying a spatial area with ultrasound, radar radiation or LI DAR. It is precisely these data in the state in which they are entered into the CNN, laterally very extensive and highly resolved, but of comparatively shallow depth. The lateral resolution is successively reduced by the folding from layer to layer, while the depth can increase.

The output variables sought can in particular, for example

• at least one class of a given classification, and / or • at least one regression value of a regression variable you are looking for, and / or

• a detection of at least one object, and / or

• Include a semantic segmentation of the measurement data in relation to classes and / or objects. These are output variables which CNNs are preferably used to obtain from high-dimensional input variables.

In a further particularly advantageous embodiment, a control signal is formed from the output variable or variables. A robot, and / or a vehicle, and / or a classification system, and / or a system for monitoring areas, and / or a system for quality control of mass-produced products, and / or a system for medical imaging, controlled with this control signal. The use of the previously described method for calculating the convolution means that these systems, given the hardware resources for the evaluation, react more quickly to measurement data recorded by sensors. If, on the other hand, the response time is specified, hardware resources can be saved.

In particular, the methods can be implemented in whole or in part by a computer. The invention therefore also relates to a computer program with machine-readable instructions which, when they are executed on one or more computers, cause the computer or computers to carry out one of the described methods. In this sense, control devices for vehicles and embedded systems for technical devices, which are also able to execute machine-readable instructions, are to be regarded as computers.

The invention also relates to a machine-readable data carrier and / or to a download product with the parameter set and / or with the computer program. A download product is a digital product that can be transmitted via a data network, ie can be downloaded by a user of the data network and that can be offered for sale for immediate download in an online shop, for example. Furthermore, a computer can be equipped with the computer program, with the machine-readable data carrier or with the download product.

Further measures improving the invention are illustrated in more detail below together with the description of the preferred exemplary embodiments of the invention with reference to figures.

Embodiments

It shows:

FIG. 1 exemplary embodiment of the method 100 for calculating a convolution 4;

FIG. 2 an illustration of the basic mechanism of action that accelerates the calculation;

FIG. 3 change in the assignment of operands 52a, 52b to memory locations 51a-51h in the input memory 51 of a hardware accelerator 5 with a multiplexer 53,

FIG. 4 multiple storage of input data la and / or values 2a of a convolution kernel 2 in the input memory 51 for more efficient processing;

FIG. 5 exemplary embodiment of the method 200 for evaluating measurement data 61, 62.

FIG. 1 is a schematic flow diagram of an exemplary embodiment of the method 100 with which the convolution 4 of an input sensor 1 is calculated from input data 1 a with a tensile convolution kernel 2. In step 110 the convolution core is guided in a predetermined grid of positions 21, 22 within the input sensor 1. In step 120 the convolution kernel 2 in Each of these positions 21, 22 is applied by forming a sum 3 weighted with the values 2a of the convolution kernel 2 from the input data la in the area of the input sensor 1 covered by the convolution kernel 2 at its current position 21, 22. In step 130, this weighted sum 3 in the fold 4 is assigned to the current position 21, 22 of the fold kernel 2. By successively working through all positions 21, 22 of the folding core 2, the overall result of the folding 4 is obtained.

When applying 120 the convolution kernel 2, a hardware accelerator 5 is used in accordance with block 121, with an inner product arithmetic unit for vectors with a length between 16 and 128 elements being selected here in accordance with block 125, for example. According to block 122, more summands are processed in at least one operation of the hardware accelerator 5 than corresponds to a depth 11 of the input sensor 1.

The more work steps of the hardware accelerator 5 can be better utilized in this way, the faster the overall result of the convolution 4 is obtained.

Inside the box 122 is broken down how it is ensured when using the hardware accelerator 5 that the correct input data la with the correct values 2a of the convolution kernel 2 as operands 52a, 52b at all positions 21, 22 of the convolution kernel 2 in the multipliers 52 of the hardware accelerator 5 be merged by multiplications.

According to block 123, as explained above, the assignment between operands 52a, 52b and memory locations 51a-51h of input memory 51 can be varied during the calculation of convolution 4 in order to give multipliers 52 at least limited random access to input memory 51. For this purpose, a multiplexer 53 can be used according to block 123a, which is explained in more detail in FIG.

According to block 124, input data 1 a and / or values 2 a of the convolution kernel 2 can be stored in the input memory 51 several times. The input data la and values 2a can thus be fed to the hardware accelerator at each position 21, 22 of the convolution core 2 in an arrangement with respect to one another which ensures that the hardware accelerator 5 actually calculates summands occurring in the weighted sum 3. This is explained in more detail in FIG.

For example, according to block 124a, an input memory 51 with at least one separate memory or memory area can be selected for each multiplier 52. According to block 124b, those input data la and values 2a of the convolution kernel 2 that the respective multiplier 52 needs in the course of the calculation of the convolution 4 can then be loaded into this memory or memory area.

FIG. 2 explains the basic principle of improved utilization of a hardware accelerator 5. In the illustrative example shown in FIG Weights are used. The input sensor 1 has a depth 11 of 3 in this example.

With the conventional use of the hardware accelerator 5, only as many input data la and values 2a of the convolution tensor 2 would be processed in each work step of the hardware accelerator 5 as are superimposed along the depth 11. In order to process the shaded values 1 a, 2 a as a whole, three work steps of the hardware accelerator 5 would be required. If, however, all of the values la or 2a to be processed in the input memory 51 of the hardware accelerator 5 are each combined in a vector, the weighted sum of the shaded values la, 2a can be calculated with just one operation of the hardware accelerator 5.

As previously explained, for this purpose the correct input data la from the input tensor 1 must be multiplied by the correct values 2a of the convolution kernel 2 at each position 21, 22 of the convolution kernel 2, so that the weighted sum 3 only contains those summands that really 4 occurrences in the fold. FIGS. 3 and 4 illustrate the previously explained ways in which this can be ensured. FIG. 3 illustrates the use of a multiplexer 53 in order to give a multiplier 52 in a hardware accelerator 5 at least restricted random access to the input memory 51 of the hardware accelerator 5. In this illustrative example, eight storage locations 51a-51h of the input memory 51 are shown. The 4: 1 multiplexer 53 can be used to select whether a value la is retrieved from the memory location 51a, 51c, 51e or 51g of the input memory 51 and fed to the multiplier 52 as the first operand 52a. FIG. 3 shows two exemplary possible sources from which the second operand 52b can originate. The second operand 52b can be fed to the multiplier 52 from the memory location 51b of the input memory 51, for example. The multiplexer 53, or a further multiplexer, can also, for example, have access to different storage locations 55a-55d of the parameter memory 55, each of which stores different values 2a of the convolution kernel 2. The multiplexer 53 can then optionally supply one of these values 2a as a second operand 52b to the multiplier 52. This option is shown in dashed lines in FIG.

The multiplier 52 multiplies the two operands 52a and 52b and delivers the product 52c as the result. A further multiplier 52 'shown by way of example also supplies such a product 52c which it has multiplied from other operands 52a and 52b. Products 52c that have been supplied by different multipliers 52 are added with adders 54 to give intermediate results 54a. The intermediate results 54a are again accumulated with further adders 54 (not shown in FIG. 4) until the weighted sum 3, or at least a part thereof, is finally calculated. The maximum increase in efficiency results when the complete weighted sum 3 can be calculated for a position 21, 22 of the convolution kernel 2 with just one operation of the hardware accelerator 5. However, an increase in efficiency begins as soon as only one such work step can be saved in the course of calculating a weighted sum 3. Any desired compromise between hardware costs and increased efficiency can be set via the number of multipliers 52, 52 'in hardware accelerator 5. FIG. 4 illustrates the replication of input data la in the input memory 51 of the hardware accelerator 5 with the aim of being able to call up the correct input data la as operands 2a for the multiplier 52 for each position 21, 22 of the convolution kernel 2. In this illustrative example, the input sensor 1 comprises three levels, that is to say has a depth 11 of 3. Some different positions of input data la in these levels are identified by different hatching.

In the input memory 51, some values 1 a from the input tensor 1 are written one below the other for the positions 21 and 22 in the order in which they are required for multiplications with values 2 a of the convolution tensor 2. Here, a value la is selected for illustration and denoted by the reference character la. At the first position 21 of the convolution kernel 2, this value la is in the fourth position from the top in the input memory 51, since initially, starting from the upper left corner of the levels of the input tensor 1, a “column” is processed in the direction of the depth 11 of the input tensor 11 and the Value la forms the beginning of the second such "pillar". If, however, the convolution core 2 advances to position 22, the value la must be multiplied by the first value 2a of the convolution core 2.

The value la is therefore required in the first place in the input memory 51 for this position 22. For this purpose, the input data la are replicated in the input memory 51 as shown in FIG.

FIG. 5 is a schematic flow diagram of an exemplary embodiment of the method 200 for evaluating measurement data. This can be any mixture of measurement data 61, which were physically recorded with at least one sensor 6, and realistic synthetic measurement data 62 from this at least one sensor 6.

In step 210, the measurement data 61, 62 are processed with a neural network 8 to form output variables 7. The neural network 8 comprises a plurality of convolution layers 81-83, through which the measurement data 61, 62 pass one after the other. That is, the measurement data 61, 62 are processed by the layer 81 to an intermediate result (“feature map”), which then is processed by the layer 82 to a further intermediate result and by the layer 83 to the final output variables 7.

In each convolution layer 81-83, a convolution 4 of a tensor 1 of input data 1 a with at least one predefined convolution kernel 2 is determined. In this case, according to block 210a, at least one such convolution 4 is calculated using the method 100 described above.

In particular, according to block 210b, the convolution 4 in the first convolution layer 81, which the measurement data 61, 62 pass through, can be calculated with the method 100, while this method is not used in at least one convolution layer 82, 83 passed through later. As explained above, in this way the additional effort required for combining many calculations in one operation of the hardware accelerator 5 can preferably be concentrated on those convolution layers in which the gain in efficiency is particularly great due to their comparatively small depth.

In step 220, a control signal 220a is formed from the output variables 7. In step 230, a robot 91, and / or a vehicle 92, and / or a classification system 93, and / or a system 94 for monitoring areas, and / or a system 95 for the quality control of mass-produced vehicles is used with this control signal Products, and / or a system 96 for medical imaging, controlled.

Claims

Expectations

1. Method (100) for calculating a convolution (4) of an input sensor

(1) of input data (la) with a tensorial convolution kernel (2), where

• the folding core (2) is guided (110) in a predetermined grid of positions (21, 22) within the input sensor (1),

• in each of these positions (21, 22) the convolution kernel (2) is applied by using the input data (la) in the area of the input sensor (1) covered by the convolution kernel (2) at its current position (21, 22) with the values (2a) of the convolution kernel (2) weighted sum (3) is formed (120) and

• this weighted sum (3) in the fold (4) is assigned to the current position (21, 22) of the fold core (2) (130), with

• the weighted sum (3) is calculated (121) with at least one hardware accelerator (5), which has an input memory (51) and a fixed number of multipliers (52), each of which has its operands (52a, 52b) from predetermined memory locations ( 51a-51h) of the input memory (51) and

• more summands are processed in at least one operation of the hardware accelerator (5) than corresponds to a depth (11) of the input sensor (1) (122), with

• the assignment between operands (52a, 52b) and memory locations (51a-51h) of the input memory (51) is varied (123) during the calculation of the convolution (4), and / or

• Input data (la) and / or values (2a) of the convolution kernel (2) are stored several times in the input memory (51) (124).

2. The method (100) according to claim 1, wherein an inner product arithmetic unit for vectors with a length between 16 and 128 elements is selected as hardware accelerator (5) (125).

3. The method (100) according to any one of claims 1 to 2, wherein the assignment between operands (52a, 52b) and storage locations (51a-51h) of the input memory (51), and / or the assignment between operands (52a,

52b) and storage locations (55a-55d) of a parameter memory (55) for values (2a) of the convolution kernel (2), with at least one between a multiplier (52) and at least one input memory (51), and / or between at least one multiplier ( 52) and at least one parameter memory (55), switched multiplexer (53) is varied (123a).

4. The method (100) according to claim 3, wherein a 4: 1 multiplexer is selected as the multiplexer (53).

5. The method (100) according to any one of claims 1 to 4, wherein an input memory (51) with at least one separate memory or memory area for each multiplier (52) is selected (124a) and where those input data (la) are in this memory or memory area and values (2a) of the convolution kernel (2) are loaded (124b) which the respective multiplier (52) requires in the course of the calculation of the convolution (4).

6. Method (200) for evaluating measurement data (61) recorded with at least one sensor (6), and / or realistic synthetic measurement data (62) from this at least one sensor (6), for one or more output variables (7) with at least a neural network (8), this neural network (8) having at least one convolution layer (81-83), wherein in the convolution layer (81-83) a convolution (4) of a tensor (1) of input data (la) with at least a predetermined convolution kernel (2) is determined (210), this convolution (4) being calculated (210a) using the method (100) according to one of claims 1 to 5.

7. The method (200) according to claim 6, wherein the folding (4) in the first folding layer (81), which the measurement data (61, 62) pass through, with the method (100) according to one of claims 1 to 5 is calculated (210b), while at the same time the fold (4) in at least one later-passed convolution layer (82-83) is not calculated with the method (100) according to one of claims 1 to 5 (210c).

8. The method (200) according to any one of claims 6 to 7, wherein the measurement data (61, 62) image data of at least one optical camera or thermal camera, and / or audio data, and / or measurement data obtained by querying a spatial area with ultrasound, radar radiation or LI DAR.

9. The method (200) according to any one of claims 6 to 8, wherein the output variables (7)

• at least one class of a given classification, and / or

• at least one regression value of a regression variable you are looking for, and / or

• a detection of at least one object, and / or

• Include a semantic segmentation of the measurement data in relation to classes and / or objects.

10. The method (200) according to any one of claims 6 to 9, wherein a control signal (220a) is formed (220) from the output variable (s) (7) and wherein a robot (91) and / or a vehicle (92), and / or a classification system (93), and / or a system (94) for the monitoring of areas, and / or a system (95) for the quality control of mass-produced products, and / or a system (96) for the medical imaging, is controlled (230) with this control signal (220a).

11. Computer program containing machine-readable instructions which, when executed on one or more computers, cause the computer or computers to carry out a method (100, 200) according to one of claims 1 to 9.

12. Machine-readable data carrier with the computer program according to claim 11.

13. Computer equipped with the computer program according to claim 18 and / or with the machine-readable data carrier according to claim 12.