DE102021002363B3

DE102021002363B3 - Device for position-independent optical surface inspection

Info

Publication number: DE102021002363B3
Application number: DE102021002363.3A
Authority: DE
Inventors: Norbert Mitschke; Michael Heizmann
Original assignee: Karlsruher Institut fuer Technologie KIT
Current assignee: Karlsruher Institut fuer Technologie KIT
Priority date: 2021-05-04
Filing date: 2021-05-04
Publication date: 2022-09-08
Anticipated expiration: 2041-05-05

Abstract

Die Erfindung betrifft ein computerimplementiertes Verfahren und eine Vorrichtung zur optischen Oberflächeninspektion von Objekten. Das Verfahren umfasst: Bereitstellen eines statistischen Segmentierungsmodells, welches anhand einer Mehrzahl von Trainingsbildern eines Objekts in unterschiedlichen Zuständen trainiert worden ist, wobei das Segmentierungsmodell als Eingang ein Bild nimmt und als Ergebnis eine Klassifikation der Pixel im Eingangsbild in zumindest einer Klasse liefert, Bestimmen einer mit einem Parameter p parametrisierbaren Trajektorie anhand zumindest eines von einer Kamera aufgenommenen und mit Hilfe des Segmentierungsmodells segmentierten Bilds eines zu inspizierenden Objekts; Aufnahme einer zeitlichen Abfolge von Bildern des zu inspizierenden Objekts von der Kamera während sich die Kamera auf der Trajektorie um das Objekt bewegt; Segmentieren der Bilder der zeitlichen Abfolge mit Hilfe des Segmentierungsmodells; Extrahieren zumindest einer zu inspizierenden (Teil-)Oberfläche des Objekts aus jedem der von der Kamera aufgenommenen Bilder mit Hilfe des Segmentierungsergebnis für das jeweilige Bild; Fusionieren der extrahierten (Teil-)Oberflächen in einer zwei-dimensionalen Darstellung, wobei die zwei-dimensionale Darstellung den Parameter p und eine zur Trajektorie orthogonale Bildkoordinate als Ortskoordinaten aufweist; und Durchführen einer Defekterkennung auf die fusionierte zwei-dimensionale Darstellung.The invention relates to a computer-implemented method and a device for the optical surface inspection of objects. The method includes: providing a statistical segmentation model, which has been trained using a plurality of training images of an object in different states, the segmentation model taking an image as input and providing a classification of the pixels in the input image in at least one class as a result, determining a a parameter p parametrizable trajectory based on at least one image of an object to be inspected, recorded by a camera and segmented with the aid of the segmentation model; acquiring a time sequence of images of the object to be inspected from the camera while the camera moves on the trajectory around the object; segmenting the images of the time series using the segmentation model; extracting at least one (partial) surface of the object to be inspected from each of the images recorded by the camera using the segmentation result for the respective image; Merging the extracted (partial) surfaces in a two-dimensional representation, the two-dimensional representation having the parameter p and an image coordinate orthogonal to the trajectory as location coordinates; and performing defect detection on the merged two-dimensional representation.

Description

Die Erfindung betrifft eine Vorrichtung und ein Verfahren für eine lageunabhängige optische Oberflächeninspektion von Objekten.The invention relates to a device and a method for position-independent optical surface inspection of objects.

Insbesondere betrifft die Erfindung eine Vorrichtung für eine lageunabhängige optische Oberflächeninspektion von Objekten (wie z.B. Werkstücken) mittels Einsatzes von tiefen neuronalen Netzwerken auf Videosequenzen.In particular, the invention relates to a device for position-independent optical surface inspection of objects (such as workpieces) using deep neural networks on video sequences.

Vorrichtungen und Verfahren zur optischen Oberflächeninspektion von Werkstücken sind aus dem Stand der Technik bekannt. In der Veröffentlichung [1] werden Videodaten mittels neuronaler Netzwerke für die Erkennung/ Segmentierung und Inspektion von Brückenbauteilen verwendet. Hierbei werden vergangene Frames verwendet, um die Klassifikation des aktuellen Frames u.a. durch Kontextinformationen zu verbessern. Das Training des neuronalen Netzwerks erfolgt auf simulierten Daten. Als neuronales Netzwerk wird ein RNN und ein FCN verwendet, bei denen die örtliche Auflösung des Eingangsbildes reduziert werden. Eine Defekterkennung der extrahierten Komponenten findet nicht statt.Devices and methods for the optical surface inspection of workpieces are known from the prior art. In the publication [1], video data is used by means of neural networks for the detection/segmentation and inspection of bridge components. Here, past frames are used to improve the classification of the current frame through context information, among other things. The neural network is trained on simulated data. An RNN and an FCN are used as a neural network, in which the spatial resolution of the input image is reduced. A defect detection of the extracted components does not take place.

In der Veröffentlichung [2] wird ein Verfahren beschrieben, dass die Bilder eines Videos von Brennstoffkanälen in Kernkraftwerken zusammensetzt („Stitching“) und so in der Lage ist, ein 2D-Bild der Innenbohrung zu erzeugen. Dies ermöglicht ein besseres Erkennen von Anomalien im Zeitverlauf. Die Bilder wurden bei bekannter Orientierung der Kamera (0°, 60°, 120°, 180°, 240°, 300°) im Kanal aufgenommen. Die so erhaltenen 6 Streifen werden mithilfe der Kenntnis von Position, Orientierung und Geschwindigkeit der Kamera fusioniert („Knowledgebased image stitching“). In diesem Verfahren werden die Positionen der Kamera für das Zusammensetzen der Bilder vorausgesetzt.The publication [2] describes a method that stitches together the images from a video of fuel channels in nuclear power plants (“stitching”) and is thus able to generate a 2D image of the inner bore. This allows for better detection of anomalies over time. The images were recorded in the canal with a known orientation of the camera (0°, 60°, 120°, 180°, 240°, 300°). The 6 strips obtained in this way are merged using knowledge of the position, orientation and speed of the camera ("knowledge-based image stitching"). This procedure assumes the camera positions for compositing the images.

In der Veröffentlichung [3] wird ein neuronales Netz für Erkennung, Segmentierung und Defekterkennung von Fehlern auf Eisenbahnschienen vorgestellt, das Bilder von einem Gleisbett als Eingabe hat. Es wird aus einer Serie von Bildern jedes Frame einzeln untersucht. Die Informationserfassung findet daher auf Einzelbildern und nicht auf Videosequenzen statt.In the publication [3] a neural network for detection, segmentation and defect detection of faults on railroad tracks is presented, which has images of a track bed as input. From a series of images, each frame is examined individually. The information acquisition therefore takes place on individual images and not on video sequences.

In der Veröffentlichung [4] wird ein Sichtprüfgerät dargestellt, das ein Werkstück, das sich in einer definierten Position befindet, mittels Videoabtastung inspiziert.In the publication [4], a visual inspection device is presented that inspects a workpiece that is in a defined position by means of video scanning.

Die Veröffentlichungen [5] bis [7] beschreiben Anwendungen, in denen neuronale Netze für die visuelle Inspektion / Defekterkennung verwendet werden. Dabei wird aber keine Kamera verwendet, die sich zur Bildaufnahme bewegt.

[1]: Narazaki, Yasutaka, et al. „Automated bridge component recognition using video data.“ arXiv preprint arXiv: 1806.06820 (2018).
[2]: Murray, Paul, et al. „Automated image stitching for enhanced visual inspections of nuclear power stations.“ (2013).
[3]: Gibert, Xavier, Vishal M. Patel, and Rama Chellappa. „Deep multitask learning for railway track inspection.“ IEEE transactions on intelligent transportation systems 18.1 (2016): 153-164.
[4]: Nielson, Paul C., and Wolfgang Kaufman. „Machine visual inspection device and method.“ U.S. Patent No. 4,760,444 . 26 Jul. 1988.
[5]: Bastian, Blossom Treesa, et al. „Visual inspection and characterization of external corrosion in pipelines using deep neural network.“ NDT & E International 107 (2019): 102134.
[6]: Li, Bing, and Biao Jiang. „Deep Neural Network based Visual Inspection with 3D Metric Measurement of Concrete Defects using Wall-climbing Robot.“ (2019).
[7]: Nagata, F., et al. „Basic application of deep convolutional neural network to visual inspection.“ Proceedings of International Conference on Industrial Application Engineering (ICIAE2018). 2018.

Publications [5] to [7] describe applications in which neural networks are used for visual inspection/defect detection. However, no camera is used that moves to take the picture.

[1]: Narazaki, Yasutaka, et al. "Automated bridge component recognition using video data." arXiv preprint arXiv: 1806.06820 (2018).
[2]: Murray, Paul, et al. "Automated image stitching for enhanced visual inspections of nuclear power stations." (2013).
[3]: Gibert, Xavier, Vishal M. Patel, and Rama Chellappa. "Deep multitask learning for railway track inspection." IEEE transactions on intelligent transportation systems 18.1 (2016): 153-164.
[4]: Nielson, Paul C., and Wolfgang Kaufman. "Machine visual inspection device and method." 4,760,444 . 26 Jul 1988.
[5]: Bastian, Blossom Treesa, et al. "Visual inspection and characterization of external corrosion in pipelines using deep neural network." NDT & E International 107 (2019): 102134.
[6]: Li, Bing, and Biaojiang. "Deep Neural Network based Visual Inspection with 3D Metric Measurement of Concrete Defects using Wall-climbing Robot." (2019).
[7]: Nagata, F., et al. "Basic application of deep convolutional neural network to visual inspection." Proceedings of International Conference on Industrial Application Engineering (ICIAE2018). 2018

Bei den herkömmlichen Verfahren zur optischen Oberflächeninspektion von Objekten (wie z.B. von Werkstücken) werden für die Inspektion meist mehrere Bilder des Objekts einzeln untersucht bzw. Zeilenkameras verwendet. Alternativ wird ein fusioniertes Bild verwendet, das durch gezieltes Abfahren vordefinierter Punkte und anschließendem Scannen des Objektes entsteht. Beide Verfahren haben einen hohen Anspruch an die Kalibrierung und setzen die initiale Kenntnis der Objektposition sowie der zeitabhängigen Lage der Kamera voraus.With conventional methods for the optical surface inspection of objects (e.g. workpieces), several images of the object are usually examined individually or line scan cameras are used for the inspection. Alternatively, a fused image is used, which is created by specifically moving along predefined points and then scanning the object. Both methods place high demands on the calibration and require initial knowledge of the object position and the time-dependent position of the camera.

Aufgabe der Erfindung ist es, ein verbessertes Verfahren und eine entsprechende Vorrichtung zur optischen Oberflächeninspektion von Objekten bereitzustellen, die es ermöglichen, eine optische lageunabhängige Inspektion durchzuführen.The object of the invention is to provide an improved method and a corresponding device for the optical surface inspection of objects, which make it possible to carry out an optical position-independent inspection.

Diese Aufgabe wird durch eine Vorrichtung zur optischen Oberflächeninspektion, ein computerimplementiertes Verfahren zur optischen Oberflächeninspektion und ein Computerprogrammprodukt mit den in den unabhängigen Ansprüchen angegebenen Merkmalen gelöst. Bevorzugte Ausführungsvarianten bzw. -formen sind Gegenstand der abhängigen Ansprüche.This object is achieved by a device for optical surface inspection, a computer-implemented method for optical surface inspection and a computer program product having the features specified in the independent claims. Preferred embodiment variants or forms are the subject matter of the dependent claims.

Gemäß einem ersten Aspekt der Erfindung wird eine Vorrichtung zur optischen Oberflächeninspektion von Objekten bereitgestellt. Die Vorrichtung umfasst:

eine bewegbare Kamera (wie z.B. eine Videokamera), welche auf einer mit einem Parameter p parametrisierbare Trajektorie um ein zu inspizierendes Objekt bewegbar ist, wobei die Kamera eingerichtet ist, während ihrer Bewegung auf der Trajektorie eine zeitliche Abfolge (z.B. in Form eines Videos bzw. einer Videosequenz) von Bildern des Objekts aus unterschiedlichen Positionen auf der Trajektorie aufzunehmen;
eine Rechenvorrichtung umfassend:
- eine Modellbereitstellungseinheit, welche eingerichtet ist, ein statistisches Segmentierungsmodell bereitzustellen, wobei das Segmentierungsmodell anhand einer Mehrzahl von Trainingsbildern des Objekts in unterschiedlichen Zuständen trainiert worden ist, wobei das Segmentierungsmodell als Eingang ein Bild nimmt und als Ergebnis eine Klassifikation der Pixel im Eingangsbild in zumindest einer Klasse liefert, wobei die zumindest eine Klasse einer zu inspizierenden (Teil-)Oberfläche des Objekts entspricht.
- eine Bildsegmentierungseinheit, die eingerichtet ist, ein von der Kamera aufgenommenes Bild (Eingangsbild für die Segmentierung) mit Hilfe des Segmentierungsmodells zu segmentieren;
- eine Trajektorienbestimmungseinheit, die eingerichtet ist, die Trajektorie anhand zumindest eines von der Kamera aufgenommenen und von der Bildsegmentierungseinheit segmentierten Bilds des Objekts zu bestimmen;
- eine Bildtransformationseinheit, die eingerichtet ist, aus jedem Bild der zeitlichen Abfolge von Bildern, welche von der Kamera bei ihrer Bewegung auf der Trajektorie aufgenommen worden sind, zumindest eine zu inspizierende (Teil-)Oberfläche des Objekts anhand des Segmentierungsergebnis für das Bild zu extrahieren und die extrahierten (Teil-)Oberflächen in einer zwei-dimensionalen (planaren) Darstellung zu fusionieren, wobei die zwei-dimensionale Darstellung den Parameter p und eine zur Trajektorie orthogonale Bildkoordinate als Ortskoordinaten aufweist;
- eine Defekterkennungseinheit, die eingerichtet ist, eine Defekterkennung auf die fusionierte zwei-dimensionale Darstellung durchzuführen. Die Defekterkennung kann mittels eines herkömmlichen Defekterkennungsverfahrens erfolgen.

According to a first aspect of the invention, a device for the optical surface inspection of objects is provided. The device includes:

a movable camera (such as a video camera) which can be moved around an object to be inspected on a trajectory that can be parameterized with a parameter p, the camera being set up to record a time sequence (e.g. in the form of a video or a video sequence) of images of the object from different positions on the trajectory;
a computing device comprising:
- a model provision unit which is set up to provide a statistical segmentation model, the segmentation model having been trained using a plurality of training images of the object in different states, the segmentation model taking an image as input and as a result a classification of the pixels in the input image in at least one class delivers, wherein the at least one class corresponds to a (partial) surface of the object to be inspected.
- an image segmentation unit configured to segment an image captured by the camera (input image for the segmentation) using the segmentation model;
- a trajectory determination unit that is set up to determine the trajectory using at least one image of the object recorded by the camera and segmented by the image segmentation unit;
- an image transformation unit that is set up to extract at least one (partial) surface of the object to be inspected from each image in the chronological sequence of images recorded by the camera during its movement on the trajectory using the segmentation result for the image and to fuse the extracted (partial) surfaces in a two-dimensional (planar) representation, the two-dimensional representation having the parameter p and an image coordinate orthogonal to the trajectory as location coordinates;
- a defect detection unit that is set up to carry out defect detection on the merged two-dimensional representation. The defect detection can take place by means of a conventional defect detection method.

Die Vorrichtung zur optischen Oberflächeninspektion von Objekten verwendet ein Segmentierungsmodell, welches sowohl für die Extraktion der zu inspizierenden (Teil-)Oberflächen des Objekts als auch für die Bestimmung einer objekttypischen, parametrisierbaren Trajektorie, auf welcher sich die Kamera bewegt, verwendet wird.The device for the optical surface inspection of objects uses a segmentation model, which is used both for the extraction of the (partial) surfaces of the object to be inspected and for the determination of an object-typical, parameterizable trajectory on which the camera moves.

Die bewegbare Kamera nimmt bei ihrer Bewegung auf der Trajektorie Bilder des Objekts aus unterschiedlichen Richtungen und optional aus unterschiedlichen Entfernungen zwischen Kamera und Objekt auf. Die Position und/oder Orientierung des Objekts zur Kamera brauchen dabei nicht weiter spezifiziert und/oder beschränkt oder vorab vorgegeben zu sein. Diese können anhand der von der Kamera aufgenommenen Bilder mit Hilfe des Segmentierungsmodells ermittelt werden.As it moves along the trajectory, the moveable camera takes pictures of the object from different directions and optionally from different distances between the camera and the object. The position and/or orientation of the object in relation to the camera does not need to be further specified and/or limited or specified in advance. These can be determined using the images recorded by the camera using the segmentation model.

Mit der Vorrichtung zur optischen Oberflächeninspektion gemäß dem ersten Aspekt ist es somit möglich, Unregelmäßigkeiten von im Allgemeinen beliebigen Objekten und insbesondere von im Wesentlichen rotationssymmetrischen Objekten durch Zusammenführen von Einzelbildern ohne Kenntnis über die Position von Objekt und Kamera zuverlässig und schnell zu erkennen.With the device for optical surface inspection according to the first aspect, it is thus possible to reliably and quickly detect irregularities in generally any objects and in particular in essentially rotationally symmetrical objects by merging individual images without knowing the position of object and camera.

Im Unterschied zu bekannten Vorrichtungen und Verfahren zur optischen Oberflächeninspektion von Objekten muss keine genaue Lage der Kamera oder des Objektes bekannt sein. Ferner wird eine zeitliche Abfolge (z.B. in Form eines Videos bzw. einer Videosequenz) und keine Reihe von aus vorbestimmten Positionen aufgenommenen Einzelbildern mit Hilfe des Segmentierungsmodells segmentiert, transformiert und für das Zusammenfügen der zu inspizierenden (Teil-)Oberfläche verwendet. Die Anforderungen an die Kalibrierung der Kamera können folglich erheblich gesenkt werden, was eine schnelle und zuverlässige Inspektion ermöglicht.In contrast to known devices and methods for the optical surface inspection of objects, the exact position of the camera or the object does not have to be known. Furthermore, a chronological sequence (e.g. in the form of a video or a video sequence) and not a series of individual images recorded from predetermined positions is segmented with the aid of the segmentation model, transformed and used for assembling the (partial) surface to be inspected. As a result, the camera calibration requirements can be significantly reduced, enabling fast and reliable inspection.

Die Rechenvorrichtung kann ferner eine Modellinferenzeinheit für eine Inferenz bzw. für ein Herleiten des Segmentierungsmodells umfassen. Die Modellinferenzeinheit ist eingerichtet, ein Trainieren eines ursprünglichen, untrainierten Segmentierungsmodells durchzuführen, wobei das Trainieren mit einer Mehrzahl von Trainingsbildern des Objekts in unterschiedlichen Zuständen erfolgt. Ferner kann die Rechenvorrichtung eine Trainingsbildeingabeschnittstelle zum Bereitstellen der Trainingsbilder umfassen.The computing device can also include a model inference unit for inference or for deriving the segmentation model. The model inference unit is set up to train an original, untrained segmentation model, with the training taking place using a plurality of training images of the object in different states. Furthermore, the computing device can include a training image input interface for providing the training images.

Das Segmentierungsmodell bzw. die Bildsegmentierungseinheit nimmt als Eingang ein Bild und liefert als Ergebnis eine Klassifikation der Pixel im Eingangsbild in zumindest einer Klasse, wobei die zumindest eine Klasse einer zu inspizierenden (Teil-)Oberfläche des Objekts entspricht. Anders ausgedrückt liefert das Segmentierungsmodell als Ausgang eine modellierte (z.B. vorhergesagte oder geschätzte) Klassenzugehörigkeit für die Pixel des Eingangsbilds. Bei der Segmentierung eines Bilds mit Hilfe des Segmentierungsmodells enthält jedes Bildpixel die Information, zu welcher Klasse es gehört. Dabei können mehrere Klassen definiert werden. Die Klassen können zum Beispiel mehreren zu inspizierenden (Teil)-Oberfläche(n) des Objekts entsprechen (je eine Klasse für eine zu inspizierenden (Teil-)Oberfläche). Umfasst das Objekt mehrere Komponenten, kann das Segmentierungsmodell eine Klassifizierung der Pixel des Eingangsbilds zu mehreren Klassen liefern, wobei jede Klasse einer bestimmten Komponente entspricht. Zusätzlich kann eine „Dummy“-Klasse definiert werden, in der alle Pixel klassifiziert werden, die zu keiner bestimmten Komponente zugeordnet werden können. Die Summe der Pixel, welche in den einzelnen Klassen (z.B. in den einzelnen Komponenten-Klassen und gegebenenfalls in der „Dummy“- Klasse) ergibt dann das Segmentierungsergebnis für das Objekt als Ganzes. Es ist auch möglich, für das Objekt eine eigene, separate Klassifizierungsklasse zu definieren.The segmentation model or the image segmentation unit takes an image as input and delivers as a result a classification of the pixels in the input image in at least one class, the at least one class corresponding to a (partial) surface of the object to be inspected. Different If pressed, the segmentation model supplies as output a modeled (eg predicted or estimated) class membership for the pixels of the input image. When segmenting an image using the segmentation model, each image pixel contains the information to which class it belongs. Several classes can be defined. The classes can, for example, correspond to a plurality of (partial) surface(s) of the object to be inspected (one class each for a (partial) surface to be inspected). If the object includes multiple components, the segmentation model can provide a classification of the input image pixels into multiple classes, with each class corresponding to a particular component. In addition, a "dummy" class can be defined in which all pixels are classified that cannot be assigned to a specific component. The sum of the pixels that are in the individual classes (eg in the individual component classes and possibly in the “dummy” class) then gives the segmentation result for the object as a whole. It is also possible to define a separate, separate classification class for the object.

Die modellierte Klassenzugehörigkeit für jedes Pixel des Eingangsbilds kann z.B. in Form von Segmentierungsmasken ausgegeben werden. Eine Segmentierungsmaske entspricht der Gruppe von Pixeln, welche einer bestimmten Klasse (wie z.B. einem bestimmten Objekt, einer bestimmten Komponente und/oder einer bestimmten zu inspizierenden (Teil-)Oberfläche des Objekts) zugeordnet werden. Im Falle eines Ankers als Objekt können die unterschiedlichen (Teil-)Oberflächen die Kommutatoroberfläche, die Wellenoberfläche und/oder die Ritzeloberfläche sein. Allgemein im Fall eines im Wesentlichen rotationssymmetrischen Objekts, kann die zu inspizierenden Oberfläche des Objekts die Mantelfläche des Objekts oder einen Teil der Manteloberfläche sein. The modeled class affiliation for each pixel of the input image can be output in the form of segmentation masks, for example. A segmentation mask corresponds to the group of pixels that are assigned to a certain class (such as a certain object, a certain component and/or a certain (partial) surface of the object to be inspected). In the case of an armature as an object, the different (partial) surfaces can be the commutator surface, the shaft surface and/or the pinion surface. Generally in the case of an essentially rotationally symmetrical object, the surface of the object to be inspected can be the lateral surface of the object or a part of the lateral surface.

Das Segmentierungsmodell kann ein statistisches Modell sein, das anhand einer Vielzahl von Trainingsbildern trainiert worden ist. Das Segmentierungsmodell kann zum Beispiel mehrere Modellparameter (wie z.B. die Gewichtungen eines neuronalen Netzwerks) umfassen, welche während des Trainierens mit den Trainingsbildern verändert bzw. angepasst werden. Beispielhafte statistische Modelle sind lineare und nicht lineare Regressionsmodelle (wie zum Beispiel lineare Regression, nichtlineare Regression, nichtlineare Regression mit einem Attention-Mechanismus, nichtlineare multi-Task Regression, nicht-parametrische oder semiparametrische Regression, etc.), Klassifikationsmodelle, Modelle des maschinellen Lernens, etc.The segmentation model can be a statistical model that has been trained using a large number of training images. The segmentation model can, for example, include a number of model parameters (such as the weights of a neural network), which are changed or adjusted during training with the training images. Exemplary statistical models are linear and non-linear regression models (such as linear regression, non-linear regression, non-linear regression with an attention mechanism, non-linear multi-task regression, non-parametric or semi-parametric regression, etc.), classification models, machine learning models , Etc.

Beispielsweise kann das Segmentierungsmodell aus einem neuronalen Netz bestehen oder ein neuronales Netz umfassen, wobei das neuronale Netz mit den Trainingsbildern des Objekts und die zur Inspektion relevanten (Teil-)Oberflächen des Objekts trainiert worden ist. Beispielhafte neuronale Netze sind tiefe neuronale Netze, CNN („Convolutional Neural Network“ oder faltendes neuronales Netz), U-Netz, etc. Ein geeignetes neuronales Netzwerk zur Segmentierung von Eingangsbildern von Objekten (in dem konkreten Fall von Ankern von Elektromotoren) ist in den Publikationen [8] und [9] beschrieben, deren entsprechende Ausführungen insoweit einen integralen Offenbarungsbestandteil der vorlegenden Anmeldung darstellen:

[8] N. Mitschke und M. Heizmann, „Semantische Segmentierung von Ankerkomponenten von Elektromotoren“, 2020. Forum Bildverarbeitung 2020. Ed.: T. Längle ; M. Heizmann, 329-340, KIT Scientific Publishing;
[9] N. Mitschke und M. Heizmann, „Image-Based Visual Servoing of Rotationally Invariant Objects Using a U-Net Prediction“, 2021. 7th International Conference on Automation, Robotics and Applications (ICARA), 235-240, Institute of Electrical and Electronics Engineers (IEEE).doi:10.1109/ICARA51699.2021.9376577.

For example, the segmentation model can consist of a neural network or include a neural network, the neural network having been trained with the training images of the object and the (partial) surfaces of the object relevant for the inspection. Exemplary neural networks are deep neural networks, CNN (“Convolutional Neural Network” or convolutional neural network), U-network, etc. A suitable neural network for segmenting input images of objects (in the specific case of armatures of electric motors) is described in the Publications [8] and [9] are described, the corresponding statements of which constitute an integral part of the disclosure of the present application:

[8] N. Mitschke and M. Heizmann, "Semantic segmentation of armature components of electric motors", 2020. Image processing forum 2020. Ed.: T. Längle ; M. Heizmann, 329-340, KIT Scientific Publishing;
[9] N. Mitschke and M. Heizmann, "Image-Based Visual Servoing of Rotationally Invariant Objects Using a U-Net Prediction", 2021. 7th International Conference on Automation, Robotics and Applications (ICARA), 235-240, Institute of Electrical and Electronics Engineers (IEEE).doi:10.1109/ICARA51699.2021.9376577.

In einem Beispiel kann das Segmentierungsmodell (z.B. das neuronale Netz) auf einer niedrigeren Auflösung arbeiten bzw. mit niedrig aufgelösten Trainingsbildern des Objekts in unterschiedlichen Zuständen trainiert worden sein. Das Segmentierungsergebnis kann ferner mittels klassischer Bildverarbeitung und/oder durch die Verwendung eines semantischen Objektmodells verbessert bzw. stabilisiert werden. Bei der Inspektion eines bestimmten Objekts können Bilder des Objekts mit einer höheren Auflösung von der Kamera aufgenommen und die relevanten Oberflächen des zu untersuchenden Objekts mit Hilfe des Segmentierungsmodells (wie z.B. des trainierten neuronalen Netzwerks) extrahiert werden.In an example, the segmentation model (e.g. the neural network) may operate at a lower resolution or may have been trained with low-resolution training images of the object in different states. The segmentation result can also be improved or stabilized by means of classic image processing and/or by using a semantic object model. When inspecting a specific object, higher resolution images of the object can be captured by the camera and the relevant surfaces of the object to be inspected can be extracted using the segmentation model (such as the trained neural network).

Das (trainierte) Segmentierungsmodell kann zum Beispiel in einer Speichervorrichtung gespeichert sein, wie zum Beispiel in einer Datenbank, einem Rechner und/oder einer Daten- oder Rechnerwolke.For example, the (trained) segmentation model may be stored in a storage device, such as a database, a computer and/or a data or computer cloud.

Die Vorrichtung kann ferner eine Kamerabewegungseinrichtung umfassen, die eingerichtet ist, die Kamera zu bewegen, und in Signalverbindung mit der Trajektorienbestimmungseinheit steht. Die Kamerabewegungseinrichtung kann zum Beispiel einen Roboterarm umfassen. Ferner kann die Vorrichtung eine Plattform umfassen, auf der das Objekt liegt und/oder eine Objektbewegungseinheit (z.B. einen Roboterarm).The device can also include a camera movement device that is set up to move the camera and is in signal communication with the trajectory determination unit. For example, the camera movement device may comprise a robotic arm. Furthermore, the device can include a platform on which the object lies and/or an object movement unit (e.g. a robotic arm).

Wenn das Eingangsbild für die Segmentierung eine Auflösung aufweist, welche höher ist als die Auflösung der Segmentierung bzw. des Segmentierungsmodells, kann das Eingangsbild zunächst auf die Auflösung der Segmentierung umgetastet werden. Anschließend kann das Segmentierungsmodel auf das umgetastete Bild angewandt werden.If the input image for the segmentation has a resolution that is higher than the resolution of the segmentation or the segmentation model, the input image can first be keyed to the resolution of the segmentation. Then the segmentation model can be applied to the keyed image.

Wie oben beschrieben wird die Trajektorie anhand zumindest eines von der Kamera aufgenommenen und von der Bildsegmentierungseinheit segmentierten Bilds des Objekts bzw. anhand des Ergebnisses aus der Segmentierung zumindest eines von der Kamera aufgenommenen Bild bestimmt.As described above, the trajectory is determined using at least one image of the object recorded by the camera and segmented by the image segmentation unit or using the result of the segmentation of at least one image recorded by the camera.

In einem Beispiel kann die Trajektorie derart bestimmt werden, dass alle auf der Trajektorie liegenden Punkte einen vorbestimmten oder vorbestimmbaren Abstand zu einer Achse (wie z.B. einer langen Achse) des Objekts aufweisen. In one example, the trajectory may be determined such that all points lying on the trajectory are a predetermined or predeterminable distance from an axis (such as a long axis) of the object.

Um die Achse zu bestimmen, kann die Trajektorienbestimmungseinheit eine Hauptkomponentenanalyse des segmentierten Bilds durchführen, wobei die Achse anhand des Ergebnisses der Hauptkomponentenanalyse bestimmt wird. Somit ist es möglich, automatisch und effizient eine objektspezifische Trajektorie zu bestimmen, auf der sich die Kamera bewegt und Bilder für die Inspektion des Objekts aufnimmt.In order to determine the axis, the trajectory determination unit can carry out a principal component analysis of the segmented image, the axis being determined using the result of the principal component analysis. It is thus possible to automatically and efficiently determine an object-specific trajectory on which the camera moves and takes pictures for the inspection of the object.

In einem Beispiel können alle Punkte auf der Trajektorie einen konstanten Abstand zur Achse aufweisen. In diesem Fall ist die Trajektorie ein Kreis mit der ermittelten Achse des Objekts als Normale durch den Mittelpunkt des Kreises. Die Trajektorie kann jedoch eine Kurve sein, bei der der Abstand zur Achse variiert bzw. nicht konstant ist. Der Abstand kann fest vorgegeben oder veränderbar bzw. anpassbar sein. Der Abstand kann somit als ein Freiheitsgrad zur Bestimmung der Trajektorie dienen.In an example, all points on the trajectory may have a constant distance from the axis. In this case, the trajectory is a circle with the detected axis of the object as the normal through the center of the circle. However, the trajectory can be a curve in which the distance to the axis varies or is not constant. The distance can be fixed or changeable or adaptable. The distance can thus serve as a degree of freedom for determining the trajectory.

Vor der Durchführung der Hauptkomponentenanalyse kann das segmentierte Bild einer weiteren Bildtransformation unterzogen werden, z.B. um die Kanten zu verbessern. Dabei können unterschiedliche Kantenverbesserungsverfahren eingesetzt werden. Um die Kanten zu verbessern, kann zum Beispiel das Segmentierungsergebnis mit einer Tiefenkarte des Objekts kombiniert werden. Eine Tiefenkarte des Objekts wird beispielsweise bei der Verwendung einer Tiefenkamera automatisch erzeugt. Alternativ kann die Rechenvorrichtung eingerichtet sein, eine Tiefenkarte des Objekts anhand einer Mehrzahl der von der Kamera aufgenommenen Bilder zu erzeugen. In der Regel sind die Kanten der Objekte, die anhand der Tiefenkarte ermittelt werden, genauer als die Kanten aus der Segmentierung. Dies erleichtert und erhöht die Genauigkeit der Achsenbestimmung.Before performing the principal component analysis, the segmented image can be subjected to a further image transformation, e.g. to improve the edges. Different edge enhancement methods can be used here. For example, to improve the edges, the segmentation result can be combined with a depth map of the object. A depth map of the object is automatically generated when using a depth camera, for example. Alternatively, the computing device can be set up to generate a depth map of the object based on a plurality of images recorded by the camera. As a rule, the edges of the objects that are determined using the depth map are more precise than the edges from the segmentation. This facilitates and increases the accuracy of the axis determination.

Die Rechenvorrichtung kann beispielsweise eingerichtet sein,
eine Tiefenkarte des Objekts bereitzustellen;
aus der Tiefenkarte die Tiefenkanten aller Objekte, die sich von einer Bodenebene erheben, zu bestimmen, und
die Kanten des Objekts mit dem größten Schnitt mit dem Objekt aus der Segmentierung durch die ermittelten Tiefenkanten zu verbessern.The computing device can be set up, for example,
provide a depth map of the object;
determine from the depth map the depth edges of all objects rising from a ground plane, and
to improve the edges of the object with the largest intersection with the object from the segmentation by the determined depth edges.

Mit Hilfe des Segmentierungsmodells ist es ferner möglich, eine optimale Orientierung und/oder Entfernung der Kamera zum Objekt und/oder Anfangsposition der Kamera zum Objekt zu bestimmen. Die Vorrichtung kann dementsprechend eine Kameraorientierungseinheit umfassen, welche eingerichtet ist, eine (optimale) Orientierung und/oder Entfernung der Kamera zum Objekt und/oder eine (optimale) Anfangsposition der Kamera zum Objekt anhand zumindest eines von der Kamera aufgenommenen und von der Bildsegmentierungseinheit segmentierten Bilds des Objekts zu bestimmen. Die Angangsposition kann auf der ermittelten Trajektorie liegen.It is also possible with the aid of the segmentation model to determine an optimal orientation and/or distance of the camera from the object and/or initial position of the camera from the object. The device can accordingly include a camera orientation unit which is set up to determine an (optimal) orientation and/or distance of the camera from the object and/or an (optimal) initial position of the camera from the object based on at least one image recorded by the camera and segmented by the image segmentation unit to determine the object. The starting position can be on the determined trajectory.

Eine optimale Orientierung der Kamera in Bezug auf das Objekt kann z.B. eine Orientierung sein, bei der die optische Achse der Kamera im Wesentlichen orthogonal zu einer (z.B. langen) Achse des Objektes steht. Eine optimale Entfernung zwischen der Kamera und dem Objekt kann z.B. eine Entfernung sein, bei welcher alle relevanten zu inspizierenden (Teil-)Oberflächen im Bild sichtbar sind und eine Defektdetektion ermöglichen. Die optimale Entfernung kann von der Auflösung und/oder dem Öffnungswinkel der Kamera abhängen.For example, an optimal orientation of the camera with respect to the object may be an orientation where the optical axis of the camera is substantially orthogonal to a (e.g., long) axis of the object. An optimal distance between the camera and the object can, for example, be a distance at which all relevant (partial) surfaces to be inspected are visible in the image and enable defect detection. The optimal distance can depend on the resolution and/or the aperture angle of the camera.

Für eine optimale Orientierung und/oder Entfernung der Kamera zum Objekt können beispielsweise die Erfüllung einer oder mehreren der folgenden Bedingungen aufgestellt werden:

- Das Objekt befindet sich in einer vorbestimmten Position im Bild, z.B. im Zentrum des Bilds;
- Das Objekt nimmt einen vorbestimmten Anteil im Bild ein, z.B. 20%, 30%, 50%, etc.;
- Das Objekt weist eine vorbestimmte Orientierung im Bild auf. Zum Beispiel kann gefordert werden, dass eine (lange) Achse des Objekts im Wesentlichen parallel zu einer der Bildkoordinaten ist;

For example, one or more of the following conditions can be met for optimal orientation and/or distance of the camera from the object:

- The object is in a predetermined position in the image, eg in the center of the image;
- The object occupies a predetermined proportion in the image, eg 20%, 30%, 50%, etc.;
- The object has a predetermined orientation in the image. For example, a (long) axis of the object may be required to be substantially parallel to one of the image coordinates;

Insbesondere können aus einem mit Hilfe des Segmentierungsmodells segmentierten Bild des Objekts Lageparameter der Kamera zum Objekt bestimmt werden. Die Lageparameter umfassen die Position und Orientierung des Objektes relativ zur Kamera und dem Koordinatensystem, das von dieser definiert wird. Anhand der Lageparameter kann die Kamera entsprechend ausgerichtet und zu einer entsprechenden Position zum Objekt bewegt werden. Dies kann z.B. mittels eines bildbasierten Reglers erfolgen. Ein beispielhafter bildbasierter Regler ist z.B. in der Publikation [9] beschrieben.In particular, position parameters of the camera relative to the object can be derived from an image of the object segmented with the aid of the segmentation model to be determined. The pose parameters include the position and orientation of the object relative to the camera and the coordinate system defined by it. Using the position parameters, the camera can be aligned accordingly and moved to a corresponding position in relation to the object. This can be done, for example, using an image-based controller. An example of an image-based controller is described in publication [9].

Bei der Bewegung der Kamera auf der so ermittelten Trajektorie kann die Kamera automatisch die ermittelte (optimale) Orientierung zum Objekt einnehmen.When the camera moves along the trajectory determined in this way, the camera can automatically adopt the determined (optimal) orientation to the object.

Da zwischen den Einzelbildern der von der Kamera aufgenommenen zeitlichen Abfolge von Bildern eine Bewegung der Kamera stattfindet, ergeben sich teilüberlappende Bilder derselben Szene, welche die zumindest eine zu inspizierende (Teil-)Oberfläche des Objekts umfasst.Since the camera moves between the individual images of the temporal sequence of images recorded by the camera, partially overlapping images of the same scene result, which include the at least one (partial) surface of the object to be inspected.

Die Bildtransformationseinheit kann eingerichtet sein, vor dem Extrahieren eine Transformation jedes der von der Kamera aufgenommenen Bilder und der entsprechenden segmentierten Bilder durchzuführen.The image transformation unit can be set up to carry out a transformation of each of the images recorded by the camera and the corresponding segmented images before the extraction.

Die Transformation kann zum Beispiel eine oder mehrere der folgenden Operationen umfassen:

Entrollen des jeweiligen Bilds, wobei das Entrollen ein Projizieren der zumindest einen zu inspizierenden (Teil-)Oberfläche des Objekts auf eine planare Ebene umfasst;
Kompensieren der perspektivischen Verzerrung des jeweiligen Bilds; und
Normalisieren des jeweiligen Bilds.

For example, the transformation may include one or more of the following operations:

Unrolling the respective image, wherein the unrolling comprises projecting the at least one (partial) surface of the object to be inspected onto a planar plane;
compensating for the perspective distortion of the respective image; and
Normalize the respective image.

Die Transformation kann zum Beispiel dazu dienen, die perspektivische Verzerrung der aus unterschiedlichen Positionen der Kamera aufgenommenen Bilder herauszurechnen bzw. zu kompensieren. Ferner ist die zu inspizierende (Teil-)Oberfläche im Allgemeinen eine gekrümmte Fläche. Die Transformation kann folglich dazu dienen, die gekrümmte (Teil-)Oberfläche in einer planaren Fläche zu transformieren. Beim Entrollen wird folglich die Krümmung der zu inspizierenden (Teil-)Oberfläche des Objekts bzw. die gekrümmte Oberflächengeometrie herausgerechnet, um eine planare Fläche zu erhalten. In einer beispielhaften Ausführung werden nur die Pixel im jeweiligen Bild entrollt bzw. transformiert, welche zu der zu inspizierenden (Teil-)Oberfläche oder dem zu inspizierenden Objekt gehören.The transformation can be used, for example, to calculate out or to compensate for the perspective distortion of the images recorded from different positions of the camera. Furthermore, the (partial) surface to be inspected is generally a curved surface. The transformation can consequently serve to transform the curved (partial) surface into a planar surface. Consequently, during unrolling, the curvature of the (partial) surface of the object to be inspected or the curved surface geometry is calculated out in order to obtain a planar surface. In an exemplary embodiment, only the pixels in the respective image that belong to the (partial) surface to be inspected or the object to be inspected are unrolled or transformed.

Grundsätzlich werden beide Bilder, d.h. das von der Kamera aufgenommene Bild und das entsprechende segmentierte Bild transformiert. Die Kompensation der perspektivischen Verzerrung der aus unterschiedlichen Positionen der Kamera aufgenommenen Bilder kann vor dem Entrollen, d.h. vor dem Projizieren der zumindest einen zu inspizierenden (Teil-)Oberfläche des Objekts auf eine planare Fläche bzw. Ebene erfolgen.Basically, both images, i.e. the image captured by the camera and the corresponding segmented image, are transformed. The perspective distortion of the images recorded from different positions of the camera can be compensated before unrolling, i.e. before projecting the at least one (partial) surface of the object to be inspected onto a planar surface or plane.

Das Entrollen des von der Kamera aufgenommenen und optional normalisierten Bilds und des entsprechenden segmentierten und optional normalisierten Bilds, kann eine Transformation bzw. eine Neuzuordnung zumindest einer der Koordinaten (z.B. der y-Koordinate) des jeweiligen Bilds umfassen, um die zumindest eine zu inspizierenden (Teil-)Oberfläche des Objekts auf die planare bzw. plane Ebene zu projizieren. Insbesondere kann das Entrollen des von der Kamera aufgenommenen Bilds und des entsprechenden segmentierten Bilds die folgenden Operationen umfassen:

Bestimmen aus dem segmentierten Bild einer (z.B. langen) Achse des Objekts mittels Hauptkomponentenanalyse;
Bestimmen zumindest eines Abstands von der Achse zu einer Kante des Objekts im segmentieren Bild;
Bestimmen einer Koordinatentransformation (z.B. zumindest einer der beiden Bildkoordinaten, wie z.B. der y-Koordinate) anhand des zumindest einen Abstands;
Transformieren des von der Kamera aufgenommenen Bilds und des entsprechenden segmentierten Bilds anhand der Koordinatentransformation.

The unrolling of the image taken by the camera and optionally normalized and the corresponding segmented and optionally normalized image can include a transformation or a reassignment of at least one of the coordinates (e.g. the y-coordinate) of the respective image in order to at least one to be inspected ( To project partial) surface of the object on the planar or planar level. In particular, the unrolling of the image captured by the camera and the corresponding segmented image may include the following operations:

determining from the segmented image a (eg long) axis of the object by means of principal component analysis;
determining at least a distance from the axis to an edge of the object in the segmented image;
determining a coordinate transformation (eg at least one of the two image coordinates, such as the y coordinate) based on the at least one distance;
Transforming the image captured by the camera and the corresponding segmented image using the coordinate transformation.

Das Normalisieren kann beispielsweise eine oder mehrere der folgenden Operationen umfassen:

- Normalisieren der Orientierung des Objekts im Bild, z.B. durch eine Schätzung der Orientierung des Objekts im Bild und eine Rotation, sodass die Achse des Objekts im Bild parallel zur einer der Koordinatenachsen (wie z.B. der x-Achse) des Bilds ist;
- Normalisieren der Position und/oder des Anteils des zu untersuchenden Objekts im Bild, z.B. durch Zuschneiden („Crop“-Operation),
- Normalisieren der Größe des Bilds oder des Bildausschnitts, sodass das Bild oder der Bildausschnitt eine vorgegebene, feste Größe (z.B. in Pixeln) aufweist (z.B. durch eine „Resize“-Operation).

For example, normalizing may include one or more of the following operations:

- normalizing the orientation of the object in the image, eg by estimating the orientation of the object in the image and rotating it so that the axis of the object in the image is parallel to one of the coordinate axes (such as the x-axis) of the image;
- Normalize the position and/or the proportion of the object to be examined in the image, e.g. by cropping ("crop" operation),
- Normalize the size of the image or the image section so that the image or the image section has a predetermined, fixed size (eg in pixels) (eg by a "resize" operation).

Vorzugsweise erfolgt das Normalisieren vor dem Entrollen des Bilds.Preferably, the normalization is done before the image is unrolled.

Die oben beschriebenen Transformationen und insbesondere das Entrollen der Bilder vor der Extraktion erleichtern und verbessern die Extraktion der relevanten zu inspizierenden (Teil-)Oberflächen aus den von der Kamera aufgenommenen Bildern sowie das Fusionieren der extrahierten (Teil-)Oberflächen in einer zwei-dimensionalen (planaren) Darstellung, die nachfolgend auf Defekte untersucht wird.The transformations described above and in particular the unrolling of the images before extraction facilitate and improve the extraction of the relevant (partial) surfaces to be inspected from the images recorded by the camera and the merging of the extracted (partial) surfaces in a two-dimensional (planar) representation, which is then examined for defects.

Um die aus unterschiedlichen Positionen entnommenen Bilder der zumindest einen zu inspizierenden (Teil-)Oberfläche des Objekts zu fusionieren, können unterschiedliche bekannte Bildregistrierungs- und Fusionierungsverfahren eingesetzt werden.In order to fuse the images of the at least one (partial) surface of the object to be inspected, which are taken from different positions, different known image registration and fusion methods can be used.

Das Fusionieren der extrahierten Bilder in der zwei-dimensionalen Darstellung kann zum Beispiel die folgenden Operationen umfassen:

für jedes Paar von zwei aufeinanderfolgenden Bildern, die von der Kamera aufgenommen und optional transformiert worden sind:
Bestimmen von einer Mehrzahl von Merkmalen an bestimmten Punkten in jedem der Bilder;
Auffinden von Punktepaaren mit übereinstimmenden Merkmalen (z.B. mittels einer Merkmalsanpassung („Feature-Matching“) in den zwei aufeinanderfolgenden Bildern;
Bestimmen einer (z.B. affinen) Transformationsmatrix zwischen den aufeinander folgenden Bildern;
Fusionieren der aufeinanderfolgenden Bilder, indem eines der Bilder mit Hilfe der Transformationsmatrix in Deckung zum anderen Bild gebracht wird, wobei im Bereich der Überschneidung der Teil des jeweiligen Bildes genommen wird, der näher am Zentrum des jeweiligen Bilds ist.

The merging of the extracted images in the two-dimensional representation can include the following operations, for example:

for each pair of two consecutive images captured by the camera and optionally transformed:
determining a plurality of features at particular points in each of the images;
Finding pairs of points with matching features (e.g. by means of a feature adjustment ("feature matching") in the two consecutive images;
determining a (eg affine) transformation matrix between the consecutive images;
Fusion of the successive images by making one of the images coincident with the other image using the transformation matrix, taking the part of each image that is closer to the center of each image in the area of overlap.

Die obigen Schritte werden für alle Bilderpaare der zeitlichen Abfolge von Bildern wiederholt.The above steps are repeated for all image pairs of the temporal series of images.

Alternativ kann das Fusionieren der extrahierten Bilder in der zwei-dimensionalen Darstellung die folgenden Operationen umfassen:

für jedes Paar von zwei aufeinanderfolgenden Bildern, die von der Kamera aufgenommen und optional transformiert worden sind:
- Auffinden einer Translation eines der Bilder gegenüber dem anderen Bild, die eine Korrelationsfunktion zwischen den beiden Bildern maximiert (hierzu kann Template-Matching verwendet werden); und
- Fusionieren der aufeinanderfolgenden Bilder, indem eines der Bilder mit Hilfe der Translation in Deckung zum anderen Bild gebracht wird, wobei im Bereich der Überschneidung der Teil des jeweiligen Bildes genommen wird, der näher am Zentrum des jeweiligen Bilds ist.

Alternatively, fusing the extracted images into the two-dimensional representation can include the following operations:

for each pair of two consecutive images captured by the camera and optionally transformed:
- finding a translation of one of the images relative to the other image that maximizes a correlation function between the two images (template matching can be used for this); and
- Merging of the successive images by bringing one of the images into coincidence with the other image by means of translation, taking, in the area of overlap, the part of each image that is closer to the center of each image.

Gemäß einem zweiten Aspekt der Erfindung wird ein computerimplementiertes Verfahren zur optischen Oberflächeninspektion von Objekten bereitgestellt. Das Verfahren kann zum Beispiel mit Hilfe der oben beschriebenen Vorrichtung durchgeführt werden. Das Verfahren umfasst:

Bereitstellen eines statistischen Segmentierungsmodells, welches anhand einer Mehrzahl von Trainingsbildern eines Objekts in unterschiedlichen Zuständen trainiert worden ist, wobei das Segmentierungsmodell als Eingang ein Bild nimmt und als Ergebnis eine Klassifikation der Pixel im Eingangsbild in zumindest einer Klasse liefert, wobei die zumindest eine Klasse einer zu inspizierenden (Teil- )Oberfläche des Objekts entspricht.
Bestimmen einer mit einem Parameter p parametrisierbaren Trajektorie um ein zu inspizierendes Objekt anhand zumindest eines von einer Kamera aufgenommenen und mit Hilfe des Segmentierungsmodells segmentierten Bilds des Objekts;
Aufnahme einer zeitlichen Abfolge von Bildern des zu inspizierenden Objekts von einer Kamera, welche sich auf der Trajektorie um das Objekt bewegt und während ihrer Bewegung Bilder des Objekts aus unterschiedlichen Positionen auf der Trajektorie aufnimmt;
Segmentieren der Bilder der zeitlichen Abfolge von Bildern des Objekts mit Hilfe des Segmentierungsmodells;
Extrahieren zumindest einer zu inspizierenden (Teil-)Oberfläche des Objekts aus jedem der von der Kamera aufgenommenen Bilder mit Hilfe des Segmentierungsergebnisses für das jeweilige Bild;
Fusionieren der extrahierten (Teil-)Oberflächen in einer zwei-dimensionalen (planaren) Darstellung, wobei die zwei-dimensionale Darstellung den Parameter p und eine zur Trajektorie orthogonale Bildkoordinate als Ortskoordinaten aufweist;
Durchführen einer Defekterkennung auf die fusionierte zwei-dimensionale Darstellung.

According to a second aspect of the invention, a computer-implemented method for the optical surface inspection of objects is provided. The method can be carried out, for example, using the device described above. The procedure includes:

Providing a statistical segmentation model which has been trained using a plurality of training images of an object in different states, the segmentation model taking an image as input and providing a classification of the pixels in the input image into at least one class as a result, the at least one class belonging to one inspected (partial) surface of the object corresponds.
determining a trajectory, which can be parameterized with a parameter p, around an object to be inspected using at least one image of the object recorded by a camera and segmented using the segmentation model;
recording a time sequence of images of the object to be inspected by a camera which moves around the object on the trajectory and, during its movement, records images of the object from different positions on the trajectory;
segmenting the images of the temporal series of images of the object using the segmentation model;
extracting at least one (partial) surface of the object to be inspected from each of the images recorded by the camera using the segmentation result for the respective image;
Merging the extracted (partial) surfaces in a two-dimensional (planar) representation, the two-dimensional representation having the parameter p and an image coordinate orthogonal to the trajectory as location coordinates;
Performing defect detection on the merged two-dimensional representation.

Das Bestimmen der Trajektorie kann die folgenden Operationen umfassen:

Bestimmen einer (z.B. langen) Achse des Objekts mit Hilfe einer Hauptkomponentenanalyse des segmentierten Bilds, und
Bestimmen der Trajektorie derart, dass die auf der Trajektorie liegenden Punkte einen vorbestimmten oder vorbestimmbaren Abstand zur ermittelten Achse aufweisen.

Determining the trajectory can include the following operations:

determining a (eg long) axis of the object using a principal component analysis of the segmented image, and
Determining the trajectory in such a way that the points lying on the trajectory predetermine a have voted or predeterminable distance to the determined axis.

Ferner kann das Verfahren umfassen:

Bereitstellen einer Tiefenkarte des Objekts
Bestimmen von Tiefenkanten aller Objekte, die sich von einer Bodenebene erheben anhand der Tiefenkarte, und
Verbessern (z.B. Verstärken, Begradigen) der Kanten des Objekts mit dem größten Schnitt mit dem Objekt aus der Segmentierung mit Hilfe der ermittelten Tiefenkanten.

The procedure can also include:

providing a depth map of the object
determining depth edges of all objects rising from a ground plane from the depth map, and
Improve (eg strengthen, straighten) the edges of the object with the largest intersection with the object from the segmentation using the determined depth edges.

Des Weiteren kann das Verfahren umfassen:

Bestimmen einer (optimalen) Orientierung der Kamera zum Objekt anhand zumindest eines von der Kamera aufgenommenen und mit Hilfe des Segmentierungsmodells segmentierten Bilds des Objekts.

The procedure can also include:

Determination of an (optimal) orientation of the camera to the object based on at least one image of the object recorded by the camera and segmented with the aid of the segmentation model.

Des Weiteren kann das Verfahren Transformieren jedes der von der Kamera aufgenommenen Bilder und der entsprechenden segmentierten Bilder vor dem Extrahieren umfassen, wobei das Transformieren eine oder mehrere der folgenden Operationen umfasst:

Entrollen des jeweiligen Bilds umfassend Projizieren der zumindest einen zu inspizierenden (Teil-)Oberfläche des Objekts auf eine planare Ebene;
Kompensieren der perspektivischen Verzerrung des jeweiligen Bilds; und
Normalisieren des jeweiligen Bilds.

Furthermore, the method may include transforming each of the camera-captured images and the corresponding segmented images prior to extraction, the transforming including one or more of the following operations:

Unrolling the respective image comprising projecting the at least one (partial) surface of the object to be inspected onto a planar plane;
compensating for the perspective distortion of the respective image; and
Normalize the respective image.

Das Entrollen des von der Kamera aufgenommenen Bilds und des entsprechenden segmentierten Bilds kann die folgenden Operationen umfassen:

Bestimmen aus dem segmentierten Bild einer (z.B. langen) Achse des Objekts mittels Hauptkomponentenanalyse;
Bestimmen zumindest eines Abstands von der Achse zu einer Kante des Objekts im segmentieren Bild;
Bestimmen einer Koordinatentransformation (z.B. zumindest einer der beiden einen Bildkoordinaten, wie z.B. der y-Koordinate) anhand des zumindest einen Abstands; und
Transformieren des von der Kamera aufgenommenen Bilds und des entsprechenden segmentierten Bilds anhand der Koordinatentransformation.

Unrolling the image captured by the camera and the corresponding segmented image may involve the following operations:

determining from the segmented image a (eg long) axis of the object by means of principal component analysis;
determining at least a distance from the axis to an edge of the object in the segmented image;
Determining a coordinate transformation (eg at least one of the two image coordinates, such as the y coordinate) based on the at least one distance; and
Transforming the image captured by the camera and the corresponding segmented image using the coordinate transformation.

Ein beispielhaftes Fusionieren der extrahierten Bilder in der zwei-dimensionalen Darstellung kann die folgenden Operationen umfassen:

für jedes Paar von zwei aufeinanderfolgenden Bildern, die von der Kamera aufgenommen und optional transformiert worden sind:
- Bestimmen von einer Mehrzahl von Merkmalen an bestimmten Punkten in jedem der Bilder;
- Auffinden von Punktepaaren mit übereinstimmenden Merkmalen (z.B. mittels Merkmalsanpassung („feature matching“) in den zwei aufeinanderfolgenden Bildern;
- Bestimmen einer (z.B. affinen) Transformationsmatrix zwischen den aufeinander folgenden Bildern;
- Fusionieren der aufeinanderfolgenden Bilder, indem eines der Bilder mit Hilfe der Transformationsmatrix in Deckung zum anderen Bild gebracht wird, wobei im Bereich der Überschneidung der Teil des jeweiligen Bildes genommen wird, der näher am Zentrum des jeweiligen Bilds ist.

An exemplary merging of the extracted images in the two-dimensional representation can include the following operations:

for each pair of two consecutive images captured by the camera and optionally transformed:
- determining a plurality of features at particular points in each of the images;
- Finding pairs of points with matching features (e.g. by means of feature matching) in the two consecutive images;
- determining a (eg affine) transformation matrix between the consecutive images;
- Fusion of the successive images by making one of the images coincident with the other image using the transformation matrix, taking the part of each image that is closer to the center of each image in the area of overlap.

Ein anderes beispielhaftes Fusionieren der extrahierten Bilder in der zwei-dimensionalen Darstellung kann die folgenden Operationen umfassen:

Another exemplary merging of the extracted images in the two-dimensional representation can include the following operations:

In Bezug auf das Verfahren gemäß dem zweiten Aspekt und der oben beschriebenen Beispiele gelten sinngemäß die in Zusammenhang mit der oben beschriebenen Vorrichtung aufgeführten Definitionen, Ausführungsformen und Varianten, Beispiele und Vorteile.With regard to the method according to the second aspect and the examples described above, the definitions, embodiments and variants, examples and advantages given in connection with the device described above apply accordingly.

Gemäß einem dritten Aspekt wird ein Computerprogrammprodukt bzw. Computerprogramm bereitgestellt, welches eingerichtet ist, wenn geladen und ausgeführt auf einem Computer ein beispielhaftes Verfahren zur optischen Oberflächeninspektion von Objekten gemäß dem zweiten Aspekt und der oben beschriebenen Beispiele durchzuführen.According to a third aspect, a computer program product or computer program provided, which is set up, when loaded and executed on a computer, to carry out an exemplary method for the optical surface inspection of objects according to the second aspect and the examples described above.

Das Computerprogrammprodukt bzw. Computerprogramm kann auf einem körperlichen Speichermedium bzw. Programmträger gespeichert werden. Das Computerprogrammprodukt kann ferner als Programmsignal vorliegen.The computer program product or computer program can be stored on a physical storage medium or program carrier. The computer program product can also be present as a program signal.

In Bezug auf das Computerprogrammprodukt bzw. Computerprogramm gelten sinngemäß die in Zusammenhang mit der oben beschriebenen Vorrichtung aufgeführten Definitionen, Ausführungsformen und Varianten, Beispiele und Vorteile.With regard to the computer program product or computer program, the definitions, embodiments and variants, examples and advantages listed above in connection with the device described above apply accordingly.

Die oben genannten Vorrichtungen (wie z.B. die Rechenvorrichtung) und Einheiten zum Bereitstellen, Bestimmen, Festlegen oder Berechnen von Daten (wie z.B. Modellbereitstellungseinheit, Trajektorienbestimmungseinheit, Bildsegmentierungseinheit, Bildtransformationseinheit, Defekterkennungseinheit etc.) können durch geeignet konfigurierte bzw. programmierte Datenverarbeitungsvorrichtungen (insbesondere spezialisierte Hardwaremodule, Computer oder Computersysteme, wie zum Beispiel Rechner- oder Datenwolken) mit entsprechenden Recheneinheiten, elektronische Schnittstellen, Speicher und Datenübermittlungseinheiten realisiert werden.The above-mentioned devices (e.g. the computing device) and units for providing, determining, specifying or calculating data (e.g. model providing unit, trajectory determination unit, image segmentation unit, image transformation unit, defect detection unit etc.) can be replaced by suitably configured or programmed data processing devices (in particular specialized hardware modules, Computers or computer systems, such as computer or data clouds) can be implemented with appropriate computing units, electronic interfaces, storage and data transmission units.

Die oben genannten Vorrichtungen und/oder Einheiten können in Signalverbindung miteinander stehen und Daten austauschen (wie z.B. die Ergebnisse der Segmentierung, Trajektorienbestimmung, Bildtransformationen etc.). Ferner können die oben genannten Vorrichtungen und/oder Einheiten geeignete Schnittstellen aufweisen, die eine Übermittlung, Eingabe und/oder Auslesen von Daten (wie zum Beispiel Trainingsbildern, Kamerabildern, Parametern der Trajektorie etc.) ermöglichen. Ebenfalls können die Vorrichtungen zumindest eine Speichereinheit umfassen, zum Beispiel in Form einer Datenbank, welche die verwendeten Daten speichert.The devices and/or units mentioned above can be in signal connection with one another and exchange data (such as the results of segmentation, trajectory determination, image transformations, etc.). Furthermore, the above-mentioned devices and/or units can have suitable interfaces that enable data (such as training images, camera images, parameters of the trajectory, etc.) to be transmitted, input and/or read out. The devices can also include at least one storage unit, for example in the form of a database, which stores the data used.

Die Vorrichtungen und/oder Einheiten können ferner zumindest eine vorzugsweise interaktive grafische Benutzerschnittstelle (GUI) umfassen, welche es einem Benutzer ermöglicht, Daten zu betrachten und/oder einzugeben und/oder zu modifizieren.The devices and/or units may further comprise at least one preferably interactive graphical user interface (GUI) enabling a user to view and/or enter and/or modify data.

Nachfolgend werden bevorzugte Ausführungsformen der vorliegenden Erfindung anhand begleitender Figuren beispielhaft beschrieben. Einzelelemente der beschriebenen Ausführungsformen sind nicht auf die jeweilige Ausführungsform beschränkt. Vielmehr können Elemente der Ausführungsformen beliebig miteinander kombiniert werden und neue Ausführungsformen dadurch erstellt werden. Es zeigen:

1A ein beispielhaftes Trainingsbild eines Objekts;
1B eine beispielhafte Annotierung des in 1 gezeigten Trainingsbilds;
2A eine schematische Ansicht eines beispielhaften Objekts;
2B eine schematische Ansicht einer beispielhaften Anordnung Kamera-Objekt;
3A eine schematische Ansicht einer beispielhaften optimale Lage der Kamera zum Objekt;
3B ein beispielhaftes Bild des Objekts, das von der Kamera in der optimalen Lage aufgenommen worden ist;
4A ein beispielhaftes Bild eines zu inspizierenden Objekts;
4B eine beispielhafte planare Darstellung, welche anhand einer zeitlichen Abfolge von Bildern des Objekts ermittelt worden ist;
5A die Maske eines Ankers, die aus einem beispielhaften segmentierten Bild abgeleitet werden kann ;
5B das in 5A gezeigte Bild mit den mittels einer Hauptkomponentenanalyse ermittelten Eigenvektoren;
5C das in 5B gezeigte und normalisierte Bild;
6 ein schematisches Abbild eines beispielhaften Objekts 10;
7A ein von der Kamera aufgenommene Bild eines beispielhaften Objekts;
7B das Ergebnis der Segmentierung des in 7A gezeigten Bilds des Objekts;
8A das Ergebnis der Normalisierung des in 7A gezeigten Bilds;
8B das Ergebnis der Normalisierung des in 7B gezeigten segmentierten Bilds;
8C schematisch die Bestimmung einer Achse anhand des in 8B gezeigten segmentierten Bilds;
9A das Ergebnis des Entrollens des in 8A gezeigten normalisierten Bilds;
9B das Ergebnis des Entrollens des in 8B gezeigten segmentierten und normalisierten Bilds;
10A eine schematische Ansicht eines Bilds aus der zeitlichen Abfolge von Bildern eines beispielhaften Objekts;
10B das Ergebnis der Fusion der aus der zeitlichen Abfolge von Bildern extrahierten (Teil-)Oberfläche des in 10 gezeigten Objekts;
11 ein beispielhaftes Verfahren zum Registrieren von Kamerabildern.

In the following, preferred embodiments of the present invention are described by way of example with reference to accompanying figures. Individual elements of the described embodiments are not limited to the respective embodiment. Rather, elements of the embodiments can be combined with one another as desired and new embodiments can thereby be created. Show it:

1A an example training image of an object;
1B an exemplary annotation of the in 1 shown training image;
2A a schematic view of an exemplary object;
2 B a schematic view of an exemplary camera-object arrangement;
3A a schematic view of an exemplary optimal position of the camera to the object;
3B an example image of the object captured by the camera in the optimal location;
4A an exemplary image of an object to be inspected;
4B an exemplary planar representation which has been determined using a chronological sequence of images of the object;
5A the mask of an anchor, which can be derived from an exemplary segmented image;
5B this in 5A the image shown with the eigenvectors determined by means of a principal component analysis;
5C this in 5B shown and normalized image;
6 a schematic image of an exemplary object 10;
7A an image captured by the camera of an exemplary object;
7B the result of the segmentation of the in 7A displayed image of the object;
8A the result of normalizing the in 7A shown image;
8B the result of normalizing the in 7B displayed segmented image;
8C schematically the determination of an axis using the in 8B displayed segmented image;
9A the result of unrolling the in 8A normalized image shown;
9B the result of unrolling the in 8B shown segmented and normalized image;
10A a schematic view of an image from the temporal sequence of images of an exemplary object;
10B the result of the fusion of the (partial) surface of the in. extracted from the temporal sequence of images 10 shown object;
11 an exemplary method for registering camera images.

Nachfolgend wird ein beispielhaftes Verfahren zur lageunabhängigen optischen Oberflächeninspektion eines Objekts (z.B. eines Werkstücks) beschrieben. Das Verfahren kann auf zwei verschiedenen Auflösungsstufen arbeiten.An exemplary method for position-independent optical surface inspection of an object (e.g. a workpiece) is described below. The method can work at two different levels of resolution.

In einer ersten Phase (Modellerstellungsphase oder Lernphase) wird zunächst anhand von Bildern des Objekts in unterschiedlichen Zuständen ein Segmentierungsmodell des Objekts erstellt. Das Ergebnis (d.h. der Ausgang des Segmentierungsmodells) ist zumindest eine Segmentierungsmaske des Objekts und/oder der zu inspizierenden (Teil-)Oberflächen des Objekts. Die Bilder, mit denen das Segmentierungsmodell erstellt wird, werden im Rahmen der vorliegenden Anmeldung Trainingsbilder genannt.In a first phase (model creation phase or learning phase), a segmentation model of the object is first created using images of the object in different states. The result (i.e. the output of the segmentation model) is at least a segmentation mask of the object and/or the (partial) surfaces of the object to be inspected. The images with which the segmentation model is created are called training images in the context of the present application.

Die unterschiedlichen Zustände des Objekts können beispielsweise unterschiedliche Bauformen des Objekts, unterschiedliche Positionen zwischen dem Objekt und der Kamera etc. sein. Die Auflösung der Trainingsbilder kann relativ niedrig sein, z.B. 112x112 bis 448x448 Pixel. Die Anzahl der Trainingsbilder kann z.B. 25 bis 200 sein. Eine höhere oder niedrigere Anzahl ist ebenfalls möglich.The different states of the object can be, for example, different structural forms of the object, different positions between the object and the camera, etc. The resolution of the training images can be relatively low, e.g. 112x112 to 448x448 pixels. The number of training images can be 25 to 200, for example. A higher or lower number is also possible.

In einer zweiten Phase (Inspektionsphase oder Untersuchungsphase) wird ein bestimmtes Objekt auf Defekte untersucht. Hierzu wird eine zeitliche Abfolge von Bildern des Objekts (z.B. in Form eines Videos) von einer beweglichen Kamera aufgenommen. Die aufgenommenen Bilder können eine zweite Auflösung aufweisen, welche höher als die erste Auflösung ist und eine Defektdetektion ermöglicht. Beispielsweise kann die erste Auflösung zwischen 112x112 bis 448x448 Pixel liegen und muss nicht zwangsläufig mit der Auflösung der Trainingsbilder übereinstimmen und die zweite Auflösung kann beispielsweise im Bereich 1333x750 bis 3266x1837 Pixel liegen. Aus den von der Kamera aufgenommenen Bildern des zu untersuchenden Objekts werden mit Hilfe des Segmentierungsmodells das Objekt und die zu inspizierenden (Teil-)Oberflächen des Objekts extrahiert. Die extrahierten Oberflächen werden registriert und fusioniert, sodass sie auf Defekte untersucht werden können.In a second phase (inspection phase or examination phase), a specific object is examined for defects. For this purpose, a chronological sequence of images of the object (e.g. in the form of a video) is recorded by a moving camera. The recorded images can have a second resolution, which is higher than the first resolution and enables defect detection. For example, the first resolution can be between 112x112 to 448x448 pixels and does not necessarily have to match the resolution of the training images, and the second resolution can be in the range of 1333x750 to 3266x1837 pixels, for example. The object and the (partial) surfaces of the object to be inspected are extracted from the images of the object to be examined recorded by the camera with the aid of the segmentation model. The extracted surfaces are registered and fused so that they can be examined for defects.

Das Segmentierungsmodell kann ein statistisches Modell sein, das anhand der Trainingsbilder trainiert wird. Beispielsweise kann das Segmentierungsmodell ein trainiertes neuronales Netz sein oder ein solches Netz umfassen, wobei das neuronale Netz auf die Segmentierung von Bildern des Objekts in unterschiedlichen Zuständen, vorzugsweise mit Augmentierung, trainiert worden ist. Die für die Inspektion relevanten (Teil-)Oberflächen des Objekts können separat in der Lernphase gelernt werden. Im Beispiel eines Ankers können dies die Welle und die Kommutatorfläche sein.The segmentation model can be a statistical model that is trained using the training images. For example, the segmentation model can be a trained neural network or can include such a network, the neural network having been trained to segment images of the object in different states, preferably with augmentation. The (partial) surfaces of the object relevant for the inspection can be learned separately in the learning phase. In the example of an armature, this can be the shaft and the commutator surface.

Um die Parameter und die Segmentierungsmasken zu stabilisieren, kann ferner ein semantisches Modell bei der Segmentierung verwendet werden. Dabei können frühere Segmentierungsergebnisse und/oder Plausibilitäten (z.B.: ein Anker kann maximal 1 Kommutator haben) verwendet werden. Hierfür werden die Masken durch geeignete Merkmalsextraktoren (z.B. ORB-Features) zur Übereinstimmung gebracht werden (Kontrollpunkteanpassung oder „keypoint matching‟). Des Weiteren können die Segmentierungsmasken weiterer Bildbearbeitungsverfahren unterzogen werden, um z.B. die Kanten zu verbessern.Furthermore, in order to stabilize the parameters and the segmentation masks, a semantic model can be used in the segmentation. Previous segmentation results and/or plausibility (e.g. an armature can have a maximum of 1 commutator) can be used. For this purpose, the masks are matched by suitable feature extractors (e.g. ORB features) (control point adjustment or "keypoint matching"). Furthermore, the segmentation masks can be subjected to further image processing methods, e.g. to improve the edges.

1 zeigt eine beispielhafte Annotierung eines Objekts (eines Ankers) beim Erstellen des Segmentierungsmodells, wobei 1A das Originalbild und 1B das mit Hilfe eines Segmentierungsmodells annotierte Bild des Objekts zeigt. Bei diesem Beispiel werden in der Lernphase die folgenden Komponenten und die entsprechenden (Teil-)Oberflächen gelernt: Anker, Welle, Kommutator und Ritzel. Das verwendete Segmentierungsmodell ist ein neuronales Netz, wie z.B. in der Publikation [8] beschrieben. 1 shows an example annotation of an object (an anchor) when creating the segmentation model, where 1A the original image and 1B shows the image of the object annotated using a segmentation model. In this example, the following components and the corresponding (partial) surfaces are learned in the learning phase: armature, shaft, commutator and pinion. The segmentation model used is a neural network, as described, for example, in publication [8].

Wenn ein bestimmtes Objekt untersucht werden soll, wird eine zeitliche Abfolge (z.B. in Form eines Videos) von vorzugsweise hochaufgelösten Bildern des Objekts von einer beweglichen Kamera aufgenommen. Die Kamera bewegt sich auf einer objekttypischen Trajektorie und nimmt eine zeitliche Abfolge von Bildern des Objekts aus unterschiedlichen Positionen auf der Trajektorie auf. 2A zeigt eine schematische seitliche Ansicht eines beispielhaften Objekts 10 (eines Ankers) und 2B eine beispielhafte Anordnung Kamera-Objekt bei der Bewegung der Kamera 12 auf einer objekttypischen Trajektorie 14.If a specific object is to be examined, a chronological sequence (for example in the form of a video) of preferably high-resolution images of the object is recorded by a movable camera. The camera moves on a trajectory that is typical of the object and takes a chronological sequence of images of the object from different positions on the trajectory. 2A 12 shows a schematic side view of an exemplary object 10 (an anchor) and 2 B an exemplary camera-object arrangement during the movement of the camera 12 on an object-typical trajectory 14.

Um ein gutes Bild für die Extraktion und die Registrierung zu erhalten, kann zunächst eine geeignete (optimale) Position und/oder Orientierung der Kamera 12 in Bezug auf das Objekt 10 bestimmt werden. Dies kann zum Beispiel mittels bildbasierter Regelung erfolgen. Hierzu kann aus dem Segmentierungsergebnis zumindest eines mit der Kamera aufgenommenen Bildes des zu untersuchenden Objekts die Parameter der Position und/oder Orientierung der Kamera in Bezug auf das Objekt bestimmt (z.B. geschätzt) werden. Diese Parameter werden im Rahmen der vorliegenden Anmeldung auch Lageparameter genannt. Beispielhafte Lageparameter sind Größe, Orientierung und Neigung zwischen der Objektebene und der Kameraebene. Ein beispielhaftes Verfahren zur bildbasierten Regelung ist in der Publikation [8] beschrieben.In order to obtain a good image for extraction and registration, a suitable (optimal) position and/or orientation of the camera 12 with respect to the object 10 can first be determined. This can be done, for example, by means of image-based control. For this purpose, the parameters of the position and/or orientation of the camera in relation to the object can be determined (eg estimated) from the segmentation result of at least one image of the object to be examined recorded with the camera. These parameters are also called position parameters in the context of the present application. Exemplary location parameters are size, orientation and inclination between the object plane and the camera plane. An exemplary method for image-based control is described in publication [8].

Eine beispielhafte optimale Position und Orientierung der Kamera 12 zum Objekt ist in 3A gezeigt. 3B zeigt ein beispielhaftes Bild des Objekts, das von der Kamera 12 in der optimalen Position und Orientierung zum Objekt 10 aufgenommen worden ist. Die Kamera 12 kann mittels eines Roboterarms 16 zu der zuvor bestimmten Position bewegt und/oder entsprechend der zuvor bestimmten Orientierung ausgerichtet werden. Alternativ oder zusätzlich kann auch das zu untersuchende Objekt 10 relativ zur Kamera 12 bewegt werden.An exemplary optimal position and orientation of the camera 12 to the object is in 3A shown. 3B FIG. 12 shows an exemplary image of the object that has been recorded by the camera 12 in the optimal position and orientation in relation to the object 10. The camera 12 can be moved to the previously determined position by means of a robotic arm 16 and/or aligned in accordance with the previously determined orientation. Alternatively or additionally, the object 10 to be examined can also be moved relative to the camera 12 .

Befindet sich das Objekt 10 in der zuvor bestimmten Position und/oder Orientierung bezüglich der Kamera 12, wird die Kamera 12 in einer objekttypischen mit dem Parameter p parametrisierbaren Trajektorie 14 über das Objekt 10 bewegt. Alternativ oder zusätzlich kann das Objekt 10 in Bezug auf die Kamera 12 bewegt werden. Das Objekt 10 kann beispielsweise durch einen zweiten Robotorarm gedreht werden, um die Rückseite zu begutachten.If the object 10 is in the previously determined position and/or orientation with respect to the camera 12, the camera 12 is moved over the object 10 in an object-typical trajectory 14 that can be parameterized with the parameter p. Alternatively or additionally, the object 10 can be moved in relation to the camera 12 . The object 10 can, for example, be rotated by a second robotic arm in order to inspect the back.

Für das Beispiel eines rotationssymmetrischen Objekts mit einer (z.B. langen) Achse x, wie z.B. des in 2A gezeigten Objekts, ist eine geeignete Trajektorie ein Teilkreis (wie z.B. ein Halbkreis) oder ein Kreis in der yz-Ebene. Die yz-Ebene ist die Ebene, welche senkrecht zu der Achse x des Objekts steht, wie z.B. in 2B gezeigt. Der Halbkreis oder der Kreis ist durch den Winkel 9 parametrisierbar. Der Winkel ϑ ist der Winkel zwischen der z-Achse und der Gerade, welche durch den Mittelpunkt des Halbkreises und die jeweilige Position der Kamera auf Trajektorie durchgeht.For the example of a rotationally symmetrical object with a (e.g. long) axis x, such as the in 2A For the object shown, a suitable trajectory is a partial circle (such as a semicircle) or a circle in the yz plane. The yz-plane is the plane perpendicular to the object's x-axis, such as in 2 B shown. The semicircle or the circle can be parameterized by the angle 9. The angle ϑ is the angle between the z-axis and the straight line that goes through the center of the semicircle and the respective position of the camera on the trajectory.

Während sich die Kamera 12 auf der Trajektorie 14 bewegt, werden die von der Kamera aufgenommenen hochaufgelösten Bilder auf die Auflösung der Segmentierung umgetastet, durch das Segmentierungsmodell segmentiert und optional nachbearbeitet. Das Nacharbeiten kann zum Beispiel das Anwenden eines semantischen Modells und/oder Bildbearbeitung umfassen.While the camera 12 moves on the trajectory 14, the high-resolution images recorded by the camera are keyed to the resolution of the segmentation, segmented by the segmentation model and optionally post-processed. The post-processing can include, for example, applying a semantic model and/or image editing.

Mithilfe des Ergebnisses der Segmentierung und optional der Lageparameter kann aus jedem der von der Kamera 12 aufgenommenen hochaufgelösten Bilder das Objekt extrahiert werden. Die aus den einzelnen Bildern extrahierten Objekte werden anschließend registriert und fusioniert und in ein Koordinatensystem überführt, das durch den Parameter p (wie z.B. den Winkel ϑ) und die zur Trajektorie orthogonale Bildkoordinate aufgespannt wird. Eine Verschiebung der Kameraposition um „Op“ entlang der Trajektorie bzw. eine Änderung des Winkels ϑ führt somit zu einer Translation auf der Ordinatenachse im neuen Koordinatensystem. Das Registrieren und Fusionieren kann z.B. mittels Schablonen- bzw. Mustervergleichs („Template Matching“) erfolgen.The object can be extracted from each of the high-resolution images recorded by the camera 12 with the aid of the result of the segmentation and optionally the position parameters. The objects extracted from the individual images are then registered and merged and transferred to a coordinate system that is spanned by the parameter p (e.g. the angle ϑ) and the image coordinates orthogonal to the trajectory. A displacement of the camera position by "Op" along the trajectory or a change in the angle ϑ thus leads to a translation on the ordinate axis in the new coordinate system. Registering and merging can be done, for example, by comparing templates or patterns (“template matching”).

Die zur Inspektion relevanten (Teil-)Oberflächen werden anschließend aus dem fusionierten Bild des Objekts mit Hilfe des Segmentierungsmodells ausgeschnitten bzw. extrahiert, um diese inspizieren zu können.The (partial) surfaces relevant to the inspection are then cut out or extracted from the merged image of the object using the segmentation model in order to be able to inspect them.

4A zeigt ein Bild eines beispielhaften Objekts mit überlagerter Segmentierungsmasken. 4B zeigt eine extrahierte, registrierte und zusammengefügte (Teil-)Oberfläche des Objekts in einer zwei-dimensionalen (planaren) Darstellung. 4A shows an image of an example object with overlaid segmentation masks. 4B shows an extracted, registered and assembled (partial) surface of the object in a two-dimensional (planar) representation.

Bei der Defektdetektion können unterschiedliche Verfahren verwendet werden. Beispielhaft kann eine sogenannte „Novelty Detection“ (z.B. mittels eines neuronalen Netzes) verwendet werden, bei der eine Reihe von Gut-Bildern auf die oben beschriebene Art extrahiert werden und mit denen ein Autoencoder trainiert wird bzw. trainiert worden ist. Fehler (oder sog. „Novelties“) werden durch eine schlechte Performance des Autoencoders erkannt. Klassische Bildverarbeitungsmethoden (mit Hilfe geeignet definierter Merkmale) sind für diese Defektdetektion ebenso möglich.Different methods can be used for defect detection. For example, a so-called "novelty detection" (e.g. by means of a neural network) can be used, in which a series of good images are extracted in the manner described above and with which an autoencoder is trained or has been trained. Errors (or so-called "novelties") are detected by poor performance of the autoencoder. Classic image processing methods (with the help of suitably defined features) are also possible for this defect detection.

Wie oben beschrieben muss im Unterschied zu bekannten Vorrichtungen und Verfahren zur optischen Oberflächeninspektion von Objekten keine genaue Lage der Kamera bzw. des Objektes bekannt sein. Ferner wird eine zeitliche Abfolge (z.B. in Form eines Videos) und keine Reihe von aus vorbestimmten Positionen aufgenommenen Einzelbildern mit Hilfe eines Segmentierungsmodells segmentiert, transformiert und für das Zusammenfügen der zu inspizierenden (Teil-)Oberfläche verwendet.As described above, in contrast to known devices and methods for the optical surface inspection of objects, the exact position of the camera or the object does not have to be known. Furthermore, a chronological sequence (e.g. in the form of a video) and not a series of individual images recorded from predetermined positions is segmented with the aid of a segmentation model, transformed and used for assembling the (partial) surface to be inspected.

Der aktuelle Parameter p(t_i) der Trajektorie muss ferner nicht bekannt sein, da das Zusammenführen nicht auf Vorwissen basiert, sondern z.B. über Template-Matching vollzogen werden kann.Furthermore, the current parameter p(t _i ) of the trajectory does not have to be known, since the merging is not based on previous knowledge, but can be carried out, for example, via template matching.

Die bildbasierte Regelung und die Extraktion können in Echtzeit erfolgen. In diesem Fall wird das Verfahren vorzugsweise auf einer geeignet konfigurierten Recheneinheit mit hoher Parallelisierbarkeit durchgeführt werden, wie z.B. GPUs, TPUs oder entsprechende FPGAs und ASICs. Darüber hinaus kann sich die Kamera flüssig über die Oberfläche bewegen, ohne dass dabei bestimmte Punkte angefahren werden, an denen die Kamera kurz verweilt.The image-based regulation and the extraction can take place in real time. In this case, the method is preferably carried out on a suitably configured computing unit with high parallelizability, such as GPUs, TPUs or corresponding FPGAs and ASICs. In addition, the camera can move smoothly over the surface without hitting certain points where the camera lingers briefly.

Nachfolgend werden die einzelnen Schritte eines beispielhaften Verfahrens zur optischen Oberflächeninspektion (wie z.B. das oben beschriebene Verfahren) im Detail beschrieben.The individual steps of an exemplary method for optical surface inspection (such as the method described above) are described in detail below.

Segmentierungsmodellsegmentation model

Wie oben beschrieben kann das Segmentierungsmodell ein statistisches Modell sein, welches mit einer Vielzahl von Trainingsbildern trainiert wird. Ein beispielhaftes Segmentierungsmodell kann ein neuronales Netzwerk sein, z.B. ein CNN oder ein anderes tiefes neuronales Netzwerk.As described above, the segmentation model can be a statistical model that is trained with a large number of training images. An exemplary segmentation model may be a neural network, such as a CNN or other deep neural network.

Ein geeignetes Segmentierungsmodell ist in der Publikation 9 beschrieben. Das Segmentierungsmodell umfasst ein U-Netz, welches als Eingabe ein RGB-Bild nimmt und als Ausgabe die geschätzte Klassenzugehörigkeit für jeden Pixel im Bild liefert. Die namensgebende Form des U-Netzes entsteht durch die tiefer werdenden Merkmalskarten zur Mitte hin und die Querverbindungen, bei denen Merkmalskarten gleicher Große konkateniert werden. Ein beispielhaftes U-Net hat z.B. eine Eingabegroße von 224 × 224 Pixeln, fünf Tiefenstufen und ca. 31 Mio. trainierbare Parameter. Auf jede 3 × 3-Faltungsschicht folgt Batch-Normalisierung und eine ReLU-Aktivierung. Beim Hochtasten wird das 2 × 2- Interpolationsfilter auch im Training gelernt. Als Zielfunktion wird der generalisierte Sørensen-Dice-Koeffizient verwendet. Dieser bildet die gewichtete Summe der Sorensen-Dice-Koeffizienten oder einzelnen Klassen.A suitable segmentation model is described in publication 9. The segmentation model comprises a U-mesh that takes an RGB image as input and provides as output the estimated class membership for each pixel in the image. The eponymous form of the U network is created by the feature maps becoming deeper towards the middle and the cross connections, in which feature maps of the same size are concatenated. An exemplary U-Net has an input size of 224 × 224 pixels, five depth levels and around 31 million parameters that can be trained. Each 3x3 convolutional layer is followed by batch normalization and a ReLU activation. When keying up, the 2 × 2 interpolation filter is also learned in training. The generalized Sørensen-Dice coefficient is used as the target function. This forms the weighted sum of the Sorensen-Dice coefficients or individual classes.

Um das U-Net zu trainieren, kann eine Vielzahl (z.B. 100, 150, 200, etc.) von Bildern des Objekts verwendet werden. Die Bilder können auch Bilder mit einer relativ niedrigen Auflösung sein. Die Trainingsbilder können vor der Eingabe in das Segmentierungsmodell unterschiedlichen Bildbearbeitungsoperationen unterzogen werden, um die Genauigkeit der Segmentierung zu verbessern. Die Bildbearbeitungsoperationen können eine oder mehrere der folgenden Operationen umfassen: Augmentierung, Regularisierung und Normalisierung.To train the U-Net, a plurality (e.g. 100, 150, 200, etc.) of images of the object can be used. The images can also be relatively low resolution images. The training images can be subjected to various image processing operations before entering the segmentation model in order to improve the accuracy of the segmentation. The image processing operations can include one or more of the following operations: augmentation, regularization, and normalization.

Die Augmentierung kann eine oder mehrere der folgenden Augmentierungsoperationen umfassen: Spiegelung, affine Transformationen, Farbmanipulationen, Rauschen, Skalieren, Zuschneiden. Durch die Augmentierung wird verhindert, dass das Segmentierungsmodell auf das „falsche“ Objekt (z.B. den Hintergrund) trainiert wird.The augmentation can include one or more of the following augmentation operations: reflection, affine transformations, color manipulations, noise, scaling, cropping. The augmentation prevents the segmentation model from being trained on the "wrong" object (e.g. the background).

Ferner können zusätzliche Informationen und/oder semantischen Modelle verwendet werden, um die Genauigkeit der Segmentierung zu erhöhen. Beispielsweise kann die Hinzunahme der Kanteninformation in einer der hinteren Schichten eines tiefen neuronalen Netzwerks mit mehreren Schichten zu einer Verbesserung führen, da die Objektgrenzen des Objekts im Bild mit den Kanten im Bild zusammenfallen. Zur Kantenextraktion können unterschiedliche bekannte Verfahren eingesetzt werden. Beispielsweise kann der Marr-Hildreth-Operator verwendet werden. Das Ergebnis kann anschließend normiert werden. Die Kante kann zum Beispiel nach der obersten Konkatenierungsschicht entweder hinzuaddiert oder angehängt werden.Furthermore, additional information and/or semantic models can be used to increase the accuracy of the segmentation. For example, adding the edge information in one of the back layers of a multi-layer deep neural network can lead to an improvement since the object boundaries of the object in the image coincide with the edges in the image. Various known methods can be used for edge extraction. For example, the Marr-Hildreth operator can be used. The result can then be normalized. For example, the edge can either be added or appended after the top concatenation layer.

Um den Ausgang des Segmentierungsmodells (z.B. des neuronalen Netzes) zu stabilisieren, kann ein semantisches Modell des Objekts verwendet werden. Dabei können frühere Segmentierungsergebnisse und/oder Plausibilitäten (z.B.: ein Anker kann maximal 1 Kommutator haben) verwendet werden.In order to stabilize the output of the segmentation model (e.g. the neural network), a semantic model of the object can be used. Previous segmentation results and/or plausibility (e.g. an armature can have a maximum of 1 commutator) can be used.

Ein beispielhaftes Verfahren zur Verbesserung des Segmentierungsergebnisses mit Hilfe eines Objektmodells wird in der Publikation [9] beschrieben. Das Verfahren basiert auf der Annahme, dass innerhalb einer Bildabfolge das gleiche Objekt im Bild zu sehen ist. Des Weiteren wird angenommen, dass das Objekt in den nacheinander folgenden Bildern ähnliche Merkmale aufweist.An exemplary method for improving the segmentation result using an object model is described in publication [9]. The method is based on the assumption that the same object can be seen in the image within an image sequence. Furthermore, it is assumed that the object has similar features in the successive images.

In der Regel handelt es sich bei den zu untersuchenden Objekten um zusammenhangende Objekte. Daher kann ferner eine Regularisierung sinnvoll sein, welche lange Konturen bestraft. Somit können Löcher, kleine Fehldetektionen oder andere Artefakte reduziert werden.As a rule, the objects to be examined are connected objects. Therefore, a regularization that penalizes long contours can also be useful. In this way, holes, small false detections or other artefacts can be reduced.

Bestimmung einer optimalen Position und/oder Orientierung der Kamera zum ObjektDetermination of an optimal position and/or orientation of the camera to the object

Für eine optimale Position und Orientierung der Kamera 12 zum Objekt 10 können eine oder mehrere der folgenden Eigenschaften gefordert werden:

a) Das Objekt 10 soll im Bild vollständig sichtbar sein. Dies impliziert, dass die Kamera 12 derart orientiert wird, dass sie in Richtung des Objektes 10 schaut und dass eine geeignete Entfernung zwischen Kamera 12 und Objekt 10 gewählt wird. Die Orientierung der Kamera 12 kann so gewählt werden, dass das Objekt 10 näherungsweise in der Mitte des Bildes ist.
b) Das Objekt 10 soll im Bild so groß sein, dass relevante Details (z.B. Defekte) erkennbar sind, wobei vorzugsweise das Objekt 10 gleichzeitig vollständig im Bild ist. Diese Anforderung lässt einen gewissen Spielraum bei der Entfernung zwischen Objekt 10 und Kamera 12, der im Wesentlichen durch die Auflösung und den Öffnungswinkel der Kamera bestimmt wird. Um die Anforderungen einzuhalten kann beispielsweise gefordert werden, dass das Objekt 10 einen konstanten Anteil des Bildes (z.B. 12%, 20%, 30%, 50%, etc.) einnimmt.
c) Die zu untersuchende (Teil-)Oberfläche oder die zu untersuchenden (Teil-)Oberflächen des Objekts 10 soll bzw. sollen sichtbar sein. Bei einem rotationssymmetrischen Objekt 10 kann zum Beispiel gefordert werden, dass die Orientierung zwischen Kamera 12 und Objekt 10 so gewählt ist, dass die Mantelfläche des Objekts 10 sichtbar ist, und dass optional die Grundflächen (die nicht inspiziert werden sollen) nicht sichtbar sind.

For an optimal position and orientation of the camera 12 to the object 10, one or more of the following properties can be required:

a) The object 10 should be fully visible in the image. This implies that the camera 12 is oriented in such a way that it looks in the direction of the object 10 and that a suitable distance between the camera 12 and the object 10 is chosen. The orientation of the camera 12 can be chosen such that the object 10 is approximately in the center of the image.
b) The object 10 should be so large in the image that relevant details (eg defects) can be seen, with the object 10 preferably being completely in the image at the same time. This requirement leaves a certain amount of leeway in the distance between object 10 and camera 12, which is essentially determined by the resolution and the opening angle of the camera. In order to comply with the requirements, it can be required, for example, that the object 10 has a occupies a constant portion of the image (e.g. 12%, 20%, 30%, 50%, etc.).
c) The (partial) surface to be examined or the (partial) surfaces of the object 10 to be examined should be visible. In the case of a rotationally symmetrical object 10, it can be required, for example, that the orientation between the camera 12 and the object 10 is selected such that the lateral surface of the object 10 is visible and that the base surfaces (which are not to be inspected) are optionally not visible.

Die Orientierung des Objektes 10 im Bild hingegen kann frei gewählt werden und stellt einen Freiheitsgrad dar. Ohne Beschränkung der Allgemeinheit kann zum Beispiel gefordert werden, dass eine (lange) Achse des Objekts 10 parallel zur x-Achse des Koordinatensystems des Bilds steht.The orientation of the object 10 in the image, however, can be chosen freely and represents a degree of freedom. Without restricting the generality, it can be required, for example, that a (long) axis of the object 10 is parallel to the x-axis of the coordinate system of the image.

Eine beispielhafte optimale Position und Orientierung der Kamera 12 zum Objekt 10 ist in 3A gezeigt. Das von der Kamera in dieser Position aufgenommene Bild ist in 3B gezeigt.An exemplary optimal position and orientation of the camera 12 to the object 10 is in 3A shown. The image captured by the camera in this position is in 3B shown.

Wie oben beschrieben, können die optimale Position und Orientierung der Kamera 12 zum Objekt 10 mittels eines bildbasierten Reglers bestimmt werden. Alternativ oder zusätzlich kann die Position und/oder Orientierung der Kamera manuell (entweder durch eine bewegliche Kamera oder durch eine geeignete Positionierung des Objektes bei fester Kamera) oder durch eine Steuerung mittels eines neuronalen Netzes (z.B. mittels „Reinforcement Learning“) bestimmt werden.As described above, the optimal position and orientation of the camera 12 relative to the object 10 can be determined using an image-based controller. Alternatively or additionally, the position and/or orientation of the camera can be determined manually (either by a movable camera or by suitable positioning of the object with a fixed camera) or by control using a neural network (e.g. using "reinforcement learning").

Bestimmung der Traiektorie der KameraDetermination of the trajectory of the camera

Die Trajektorie kann aus zumindest einem von der Kamera aufgenommenen Bild des Objekts bestimmt werden. Das Bild, anhand dessen die Trajektorie bestimmt wird, kann ein Bild sein, das aus der oben beschriebenen optimalen Position und Orientierung der Kamera zum Objekt aufgenommen worden ist (siehe z.B. 3B).The trajectory can be determined from at least one image of the object recorded by the camera. The image on the basis of which the trajectory is determined can be an image that has been recorded from the above-described optimal position and orientation of the camera to the object (see e.g 3B ).

Das Objekt kann zumindest eine zu inspizierende (Teil-)Oberfläche aufweisen. Neben der zumindest einen zu inspizierenden Teiloberfläche kann das Objekt noch weitere Flächen aufweisen, die nicht inspiziert zu werden brauchen.The object can have at least one (partial) surface to be inspected. In addition to the at least one partial surface to be inspected, the object can also have other areas that do not need to be inspected.

Die Trajektorie der Kamera wird derart bestimmt, dass eine Inspektion der (Teil-)Oberfläche bzw. (Teil-)Oberflächen eines Objektes ermöglicht wird. Die Orientierung der Kamera bei ihrer Bewegung auf der Trajektorie kann die oben beschriebene optimale Orientierung sein.The trajectory of the camera is determined in such a way that an inspection of the (partial) surface or (partial) surfaces of an object is made possible. The orientation of the camera as it moves along the trajectory can be the optimal orientation described above.

In einem Beispiel ist das Objekt im Wesentlichen rotationssymmetrisch. Beispielhafte Objekte sind u.a. Zylinder, Kegel und Komposita davon. Abweichungen von der rotationsymmetrischen Form (wie z.B. von der Zylinder- oder Kegelform). Das Verfahren kann auch auf (leicht) gekrümmte, konvexe und (näherungsweise) rotationssymmetrische Objekte übertragen werden, solange die Sichtbarkeit aller relevanten Teile bzw. (Teil-)Oberflächen des Objektes auf einen Großteil oder im Wesentlichen allen Bildern der Trajektorie gegeben ist. Das Objekt kann beispielsweise bis zu einem gewissen Grad S-förmig oder wellenförmig gekrümmt sein,In one example, the object is essentially rotationally symmetrical. Exemplary objects include cylinders, cones, and composites thereof. Deviations from the rotationally symmetrical shape (e.g. from the cylindrical or conical shape). The method can also be transferred to (slightly) curved, convex and (approximately) rotationally symmetrical objects, as long as all relevant parts or (partial) surfaces of the object are visible on a large part or essentially all images of the trajectory. For example, the object may be curved to some degree in an S-shape or in a wavy shape,

Bei einem im Wesentlichen rotationssymmetrischen (z.B. zylinder- oder kegelförmigen) Objekt kann die zu inspizierende Oberfläche die Mantelfläche des Objekts oder ein Teil der Manteloberfläche sein. Die Mantelfläche kann deutlich größer als die Grundflächen sein. Die zwei Grundflächen stellen in diesem Beispiel Flächen dar, die nicht inspiziert werden. Die Grundflächen müssen nicht identisch sein.In the case of an essentially rotationally symmetrical (e.g. cylindrical or conical) object, the surface to be inspected can be the lateral surface of the object or a part of the lateral surface. The outer surface can be significantly larger than the base area. In this example, the two base areas represent areas that are not inspected. The bases do not have to be identical.

Das Objekt kann eine (z.B. lange) Achse (Rotationsachse), die zum Beispiel näherungsweise mittig durch das Objekt verläuft, aufweisen. Die objekttypische Trajektorie ergibt sich dann als ein Kreis bzw. ein Kreisbogen (z.B. ein Kreisbogen eines Halbkreises) mit der Achse als Normale durch den Mittelpunkt bzw. dem Schwerpunkt des dazugehörigen Kreises und der aktuellen Kameraposition als Punkt auf diesem Kreis bzw. Kreisbogen. Die Achse kann wie nachfolgend beschrieben aus einem mit Hilfe des Segmentierungsmodells segmentierten Bild mittels Hauptkomponentenanalyse geschätzt bzw. bestimmt werden.The object can have a (e.g. long) axis (rotational axis) which, for example, runs approximately centrally through the object. The object-typical trajectory then results as a circle or an arc (e.g. an arc of a semicircle) with the axis as the normal through the center point or the center of gravity of the associated circle and the current camera position as a point on this circle or arc. As described below, the axis can be estimated or determined from an image segmented using the segmentation model using principal component analysis.

Bestimmung der AchseDetermination of the axis

Zumindest ein von der Kamera 10 aufgenommenes Bild wird mit Hilfe des Segmentierungsmodells segmentiert. Aus dem segmentierten Bild bzw. des Segmentierungsergebnisses kann die Achse 20 mittels Hauptkomponentenanalyse bestimmt bzw. geschätzt werden.At least one image recorded by camera 10 is segmented using the segmentation model. The axis 20 can be determined or estimated from the segmented image or the segmentation result by means of principal component analysis.

Das segmentierte Bild liefert eine Maske 18 mit zum Objekt gehörenden Pixeln i mit den Koordinaten x_i = [x_1,i, x_2,i] (siehe 5A). Mit dieser Pixelmenge kann eine Hauptkomponentenanalyse durchgeführt werden: Der Bildmittelpunkt ergibt sich aus dem Mittelwert aller Koordinaten der Pixelmenge x = Σ_i x_i. Mithilfe des Mittelpunktes kann dann eine 2x2 Kovarianzmatrix mit Cov(x) = E_i ((x_i - x) (x_i - x)^T) bestimmt werden. Die Kovarianzmatrix hat zwei Eigenwerte und Eigenvektoren. Der Eigenvektor mit dem größeren Eigenwert korrespondiert zur Richtung, in die das Objekt sich am stärksten ausdehnt. Der andere Eigenwert ist orthogonal zum ersten. Die lange Achse wird dann als Gerade geschätzt, die durch den Mittelpunkt x verläuft und die Richtung des Eigenvektors mit dem größeren Eigenwert hat. Beide Eigenvektoren sind im mittleren segmentierten Bild (siehe 5B) eingezeichnet. Das in 5C gezeigte Bild wurde normalisiert, wobei die Normalisierung eine Rotation des Bildes umfasst, so dass die Achse 20 parallel zu einer der Bildachsen (z.B. der x-Achse) ist. Anhand der mittels Hauptkomponentenanalyse ermittelten (langen) Achse kann die (lange) Achse des Objekts im Raum bzw. in Bezug auf die Kamera bestimmt werden. The segmented image supplies a mask 18 with pixels i belonging to the object with the coordinates x _i = [x _1,i , x _2,i ] (see 5A ). A principal component analysis can be carried out with this pixel set: The center of the image results from the mean value of all coordinates of the pixel set x = Σ _i x _i . A 2x2 covariance matrix with Cov(x) = E _i ((x _i - x ) (x _i - x ) ^T ) can be determined. The covariance matrix has two eigenvalues and eigenvectors. The eigenvector with the larger eigenvalue corresponds to the direction in which the object expands the most. The other eigenvalue is orthogonal to the first. The long axis is then estimated as a straight line that through the center x and has the direction of the eigenvector with the larger eigenvalue. Both eigenvectors are in the middle segmented image (see 5B ) drawn. This in 5C The image shown has been normalized, the normalization comprising rotating the image so that the axis 20 is parallel to one of the image axes (eg the x-axis). The (long) axis of the object in space or in relation to the camera can be determined using the (long) axis determined using principal component analysis.

Weitere optionale Schritte der Normalisierung sind das Ausschneiden des relevanten Bildbereichs („Bounding Box“) und die Skalierung des ausgeschnittenen Bildbereichs mittels eines Resize-Befehls, um die Größe (z.B. die absolute Anzahl der Pixel) der Maske konstant zu halten (z.B. 4.000 Pixel).Further optional normalization steps are cutting out the relevant image area (“bounding box”) and scaling the cut out image area using a resize command in order to keep the size (e.g. the absolute number of pixels) of the mask constant (e.g. 4,000 pixels) .

Vor der Hauptkomponentenanalyse kann optional das Ergebnis der Segmentierung mithilfe einer Tiefenkarte verbessert werden, da die Kanten der Tiefenkarte i.d.R. genauer sind als bei der Segmentierung. Wird eine Tiefenbildkamera (z.B. Kinect / Zivid) verwendet, wird die Tiefenkarte automatisch von der Kamera geliefert. Aus der Tiefenkarte werden alle Kontouren der Objekte extrahiert, die sich von der Bodenebene abheben. Die Kanten des Objekts mit dem größten Schnitt mit dem Objekt aus der Segmentierung werden durch die Tiefenkartenkanten verbessert.Before the principal component analysis, the result of the segmentation can optionally be improved using a depth map, since the edges of the depth map are usually more precise than with segmentation. If a depth imaging camera (e.g. Kinect / Zivid) is used, the depth map is automatically supplied by the camera. From the depth map, all contours of the objects that stand out from the ground plane are extracted. The edges of the object with the largest intersection with the object from the segmentation are enhanced by the depth map edges.

Die Bodenebene kann wie folgt geschätzt werden:

Ziel ist es, eine Steigung entlang der x- und eine entlang der y-Achse im Bild zu schätzen. Für jede Zeile im Bild wird zunächst eine Gerade mithilfe eines Least-Square-Schätzers geschätzt. Ausreißer werden herbei mit RANSAC entfernt. Als Ausreißer können zum Beispiel Fehler in der Tiefenkarte und/oder die Objekte, die auf einer Platte liegen, angenommen werden. Es wird implizit angenommen, dass ein Großteil der Bodenplatte sichtbar ist. Der Mittelwert der Steigung der geschätzten Geraden ist die Steigung m_x in x-Richtung. Mithilfe des y-Achsenabschnitts der geschätzten Geraden und RANSAC wird eine Gerade g_y in y-Achse geschätzt. Die Bodenebene wird durch den y-Achsenabschnitt von g_y, der Steigung g_y und m_x bestimmt.

The ground plane can be estimated as follows:

The goal is to estimate a slope along the x-axis and one along the y-axis in the image. For each line in the image, a straight line is first estimated using a least squares estimator. Outliers are removed with RANSAC. For example, errors in the depth map and/or the objects lying on a plate can be assumed to be outliers. It is implicitly assumed that much of the bottom plate is visible. The mean value of the gradient of the estimated straight line is the gradient m_x in the x-direction. Using the y-intercept of the estimated line and RANSAC, a line g_y is estimated in the y-axis. The ground plane is determined by the y-intercept of g_y, the slope g_y and m_x.

Die oben beschriebene Kantenverbesserung des segmentierten Bilds kann auch vor den weiteren Bildtransformationsoperationen (wie z.B. Extrahieren der zumindest einen zu inspizierenden (Teil-)Oberfläche)) durchgeführt werden.The edge enhancement of the segmented image described above can also be performed before the further image transformation operations (such as, for example, extracting the at least one (partial) surface to be inspected)).

Bewegung der Kamera entlang der TraiektorieMovement of the camera along the traiectory

Die Kamera wird auf der berechneten Trajektorie um das Objekt bewegt und nimmt dabei eine zeitliche Abfolge von Bildern des Objekts auf, die wie beschrieben weiterverarbeitet werden. Die Bilder können kontinuierlich, z.B. in Form eines Videos aufgenommen werden. Dabei ist es nicht erforderlich, dass die Kamera bestimmte vorbestimmte Punkte entlang der Trajektorie anfährt und dort verweilt, um Bilder aufzunehmen.The camera is moved around the object on the calculated trajectory and records a chronological sequence of images of the object, which are further processed as described. The images can be recorded continuously, e.g. in the form of a video. In this case, it is not necessary for the camera to move to certain predetermined points along the trajectory and remain there in order to take pictures.

Die Trajektorie kann im Wesentlichen konstant gehalten werden. Wird beispielsweise der Abstand zwischen dem Objekt und der Kamera im Wesentlichen konstant gehalten, muss, um auf den nächsten Punkt der Trajektorie zu gelangen, die Kamera einen Schritt auf dem Kreis um das Objekt bewegt werden.The trajectory can be kept essentially constant. For example, if the distance between the object and the camera is kept essentially constant, in order to get to the next point of the trajectory, the camera has to be moved one step on the circle around the object.

Es ist möglich, die Trajektorie dynamisch zu verändern. Gemäß der oben beschriebenen optimalen Orientierung der Kamera zum Objekt gibt es drei Freiheitsgrade: die Rotation der Kamera (ändert nur die Orientierung des Objektes im Bild, nicht jedoch die Trajektorie der Kamera), die Elevation zwischen Y- und Z-Achse ϑ (diese bestimmt im Wesentlichen die Trajektorie, siehe z.B. 2B und 3A) und der Abstand Kamera-Objekt. Durch letzteren lässt sich die Trajektorie verändern. Eine Änderung der Trajektorie kann dann sinnvoll sein, wenn Randbedingungen an einem Roboterarm oder einer anderen Kamerabewegungseinrichtung eingehalten werden sollen. Beispielsweise kann die Trajektorie dann verändert werden, wenn ein bestimmter Bereich oder bestimmte Bereiche im Bewegungsbereich des Roboterarms aufgrund von Singularitäten oder drohenden Kollisionen vermieden werden soll bzw. sollen.It is possible to change the trajectory dynamically. According to the optimal orientation of the camera to the object described above, there are three degrees of freedom: the rotation of the camera (only changes the orientation of the object in the image, but not the trajectory of the camera), the elevation between the Y and Z axes ϑ (this determines essentially the trajectory, see eg 2 B and 3A ) and the camera-to-object distance. The trajectory can be changed by the latter. A change in the trajectory can be useful if boundary conditions on a robot arm or another camera movement device are to be observed. For example, the trajectory can be changed if a specific area or specific areas in the range of motion of the robot arm is or are to be avoided due to singularities or imminent collisions.

Transformation der aufgenommenen BilderTransformation of the captured images

Die von der Kamera aufgenommenen Bilder werden optional normalisiert und anschließend entrollt. Bei dem Entrollen der Bilder wird z.B. die y-Achse im Bild transformiert bzw. neu zugeordnet, um eine Schätzung für die planare (Teil-)Oberfläche des Objekts, deren Ausschnitt im Originalbild enthalten ist, zu erhalten.The images captured by the camera are optionally normalized and then unrolled. When the images are unrolled, the y-axis in the image, for example, is transformed or reassigned in order to obtain an estimate for the planar (partial) surface of the object whose section is contained in the original image.

In einem Beispiel werden sowohl die von der Kamera aufgenommenen und gegebenenfalls normalisierten Bilder des Objekts als auch die entsprechenden Segmentierungsbilder entrollt. Für die weitere Verarbeitung (wie z.B. Fusionieren) werden jedoch die entrollten Kamerabilder verwendet. Die entsprechenden entrollten segmentierten Bilder dienen zur Prüfung der Plausibilität und können auch weggelassen werden. Es ist ferner nicht notwendig, das gesamte Bild zu entrollen. Es ist ausreichend, nur die Pixel im Bild, die zum Objekt gehören, zu entrollen, wie bei dem nachfolgenden Beispiel.In one example, both the images of the object recorded by the camera and possibly normalized and the corresponding segmentation images are unrolled. However, the unrolled camera images are used for further processing (such as merging). The corresponding unrolled segmented images serve to check the plausibility and can also be omitted. Furthermore, it is not necessary to unroll the entire image. it is sufficient to unroll only the pixels in the image that belong to the object, as in the example below.

Die Annahme für das Entrollen eines Bilds ist, dass das Bild einen Teil der (Teil-)Oberfläche des Objekts (wie z.B. der Mantelfläche eines im Wesentlichen rotationssymmetrischen Objekts) zeigt, welcher abhängig von der Entfernung zwischen dem Objekt und der Kamera sowie vom Abstand bzw. Radius r von der Achse zur Kante des Objekts im Bild ist.The assumption for the unrolling of an image is that the image shows a part of the (partial) surface of the object (e.g. the lateral surface of an essentially rotationally symmetrical object), which depends on the distance between the object and the camera as well as on the distance or radius r from the axis to the edge of the object in the image.

Um ein Bild zu entrollen, wird zunächst der Abstand r(x) (z.B. in Pixeln) im Bild geschätzt. Der Abstand r(x) bezeichnet den Abstand von der Achse des Objekts zur Kante des Objekts im Bild, wobei r1(x) den gemessenen Abstand „nach oben“ (d.h. für y > 0), r2(x) den gemessenen Abstand „nach unten“ (d.h. für y < 0), und „x“ eine Bildkoordinate und „y“ die zur x-Koordinate orthogonale Bildkoordinate bezeichnet. Der Abstand r(x) bzw. die Abstände r1(x) und r2(x) ist bzw. sind im Allgemeinen eine Funktion der x-Koordinate im Bild. Anders ausgedrückt ist der Abstand zwischen der Achse und den Kanten des Objekts im Bild im Allgemeinen nicht konstant bezüglich der Position auf der Achse (x-Koordinate) und bzgl. der Richtung im Bild „nach unten“/ „nach oben“.To unroll an image, first the distance r(x) (e.g. in pixels) in the image is estimated. The distance r(x) denotes the distance from the axis of the object to the edge of the object in the image, where r1(x) is the measured distance "up" (i.e. for y > 0), r2(x) is the measured distance "after below" (i.e. for y < 0), and "x" denotes an image coordinate and "y" denotes the image coordinate orthogonal to the x-coordinate. The distance r(x) or the distances r1(x) and r2(x) is or are generally a function of the x-coordinate in the image. In other words, the distance between the axis and the edges of the object in the image is generally not constant with respect to the position on the axis (x-coordinate) and with respect to the “down”/“up” direction in the image.

6 zeigt ein schematisches Abbild eines beispielhaften Objekts 10 mit einer langen Achse 20 und die Abstände r1(x_i) und r2(x_i), wobei x_i die i-ten Pixel entlang der x-Achse bezeichnet. 6 12 shows a schematic image of an exemplary object 10 with a long axis 20 and the distances r1(x_i) and r2(x_i), where x_i denotes the ith pixel along the x-axis.

Die maximale Größe der im Bild zu sehenden (Teil-)Oberfläche (z.B. Mantelfläche) des Objekts kann bis zu 50% der gesamten (Teil-Oberfläche) (wie z.B. der Mantelfläche) bzw. bis zu 180° betragen. Mithilfe der geschätzten Abstände r1 und r2 (z.B. in Pixeln) und des Abstandes der Kamera zum Objekt kann berechnet werden, welcher Anteil der (Teil-)Oberfläche des Objekts (wie z.B. der Mantelfläche des Objekts) im Bild zu sehen ist.The maximum size of the (partial) surface (e.g. lateral surface) of the object visible in the image can be up to 50% of the total (partial surface) (e.g. the lateral surface) or up to 180°. Using the estimated distances r1 and r2 (e.g. in pixels) and the distance from the camera to the object, it can be calculated what proportion of the (partial) surface of the object (e.g. the lateral surface of the object) can be seen in the image.

In einem Beispiel wird angenommen, dass die Entfernung so groß ist, dass der Anteil der zur inspizierenden (Teil-)Oberfläche (wie z.B. der Mantelfläche) im Bild φ° (φ°/2 nach oben und φ°/2 nach unten) entspricht. Die y-Koordinate jeder Spalte im Bild (d.h. für ein festes x) ist folgendermaßen verzerrt: $ϑ (x) = arcsin (y (x) / r 1 (x) * sin (φ ° / 2)) f \ddot{u} r y > 0$

und

y' (x) = arcsin (y (x) / r 2 (x) * sin (φ ° / 2)) f \ddot{u} r y < 0,

wobei ϑ die neue Koordinate ist.In one example, it is assumed that the distance is so great that the portion of the (partial) surface to be inspected (such as the lateral surface) in the image corresponds to φ° (φ°/2 up and φ°/2 down). . The y-coordinate of each column in the image (i.e. for a fixed x) is skewed as follows:

ϑ (x) = arcsin (y (x) / right 1 (x) * sin (φ ° / 2)) f \ddot{and} ry > 0

and

y' (x) = arcsin (y (x) / right 2 (x) * sin (φ ° / 2)) f \ddot{and} ry < 0,

where ϑ is the new coordinate.

Beispielsweise kann φ° = 178° sein (89° nach oben und 89° nach unten). In diesem Fall gilt: $ϑ (x) = arcsin (y (x) / r 1 (x) * sin (89 °)) f \ddot{u} r y > 0$

und

y' (x) = arcsin (y (x) / r 2 (x) * sin (89 °)) f \ddot{u} r y < 0,

For example, φ°=178° (89° up and 89° down). In this case:

ϑ (x) = arcsin (y (x) / right 1 (x) * sin (89 °)) f \ddot{and} ry > 0

and

y' (x) = arcsin (y (x) / right 2 (x) * sin (89 °)) f \ddot{and} ry < 0,

Mit Hilfe der obigen Transformation können die von der Kamera aufgenommenen und gegebenenfalls normalisierten Bilder und optional die Masken bzw. die segmentierten Bilder transformiert werden, um entsprechend entrollte Bilder zu erhalten.With the help of the above transformation, the images recorded by the camera and possibly normalized and optionally the masks or the segmented images can be transformed in order to obtain correspondingly unrolled images.

Es ist möglich, die obige Transformation zu verallgemeinern. So ist es möglich, für jedes Pixel bzw. für jeden Punkt eine eigene Transformation zu berechnen. Wenn zum Beispiel eine Tiefenkarte vorliegt und die Oberflächennormale geschätzt werden kann, kann für jeden Pixel bzw. für jeden Punkt der Punktwolke eine Transformation auf eine Ebene berechnet werden.It is possible to generalize the above transformation. It is thus possible to calculate a separate transformation for each pixel or point. For example, if a depth map is available and the surface normal can be estimated, a transformation to a plane can be computed for each pixel or point of the point cloud.

Vorzugsweise werden die von der Kamera aufgenommenen Bilder und/oder die segmentierten Bilder vor dem Entrollen vorverarbeitet. Die Vorverarbeitung umfasst ein Normalisieren des jeweiligen Bildes und optional ein Verbessern der Kanten (zum Beispiel wie oben beschrieben anhand einer Tiefenbildkarte). Das Normalisieren des Bilds kann eine oder mehrere der folgenden Operationen umfassen: Normalisieren der Orientierung des Objekts im Bild durch Rotation; Normalisieren der Position des zu untersuchenden Objekts im Bild (z.B. durch Zuschneiden („Crop“-Operation)), Normalisieren der Größe des Bilds bzw. des Bildzuschnitts (z.B. durch eine Änderung der Größe des Bilds („Resize“-Operation).The images recorded by the camera and/or the segmented images are preferably pre-processed before unrolling. The pre-processing includes normalizing the respective image and optionally enhancing the edges (e.g. using a depth map as described above). Normalizing the image may include one or more of the following operations: normalizing the orientation of the object in the image by rotation; Normalizing the position of the object to be examined in the image (e.g. by cropping ("crop" operation)), normalizing the size of the image or the image crop (e.g. by changing the size of the image ("resize" operation).

7A bis 9B zeigen die Ergebnisse einer beispielhaften Bildtransformation anhand von Bildern eines Ankers als Objekt. 7A until 9B show the results of an exemplary image transformation using images of an anchor as an object.

7A zeigt ein von der Kamera aufgenommenes Bild des zu inspizierenden Ankers. 7B zeigt das mit Hilfe des Segmentierungsmodells segmentierte Bild des Ankers. Diese Bilder sind die Eingabe in der Normalisierung bzw. werden einer Normalisierungsoperation unterzogen. 7A shows an image captured by the camera of the anchor to be inspected. 7B shows the image of the anchor segmented using the segmentation model. These images are the input to the normalization or undergo a normalization operation.

8A bis 8C zeigen die Ergebnisse der Normalisierungsoperationen (Rotation, Zuschneiden, Skalieren) oder anders ausgedrückt den Ausgang der Normalisierung. 8A zeigt das normalisierte Kamerabild des Ankers. 8B zeigt das normalisierte segmentierte Bild des Ankers. 8C zeigt schematisch die Bestimmung der Achse und der Abstände r1 und r2 anhand des in 8B gezeigten segmentierten Bilds des Ankers. 8A until 8C show the results of the normalization operations (rotation, cropping, scaling) or in other words the output of the normalization. 8A shows the normalized camera image of the anchor. 8B shows the normalized segmented image of the anchor. 8C shows schematically the determination of the axis and the distances r1 and r2 using the in 8B shown segmented image of the anchor.

Bei der Normalisierung eines Bildes können eine oder mehrere der folgenden Operation durchgeführt werden:

- Schätzung der Orientierung des Ankers und Rotation des Ankers, sodass die Achse des Ankers parallel zur x-Achse des Bildes ist. Dies kann wie oben beschrieben erfolgen;
- Schätzung eines Begrenzungsrahmens des Objekts („BoundingBox“), zum Beispiel mithilfe der Segmentierung, und Zuschneiden („Cropping“) des Bildes, wobei das zugeschnittene Bild dem durch den geschätzten Begrenzungsrahmen definierten Bildausschnitt entspricht. Die Koordinaten des Begrenzungsrahmens bzw. des Bildausschnitts können wie folgt ermittelt werden:
- ◯ y_max = nördlichster Punkt der Segmentierungsmaske für das Objekt,
- ◯ y_min = südlichster Punkt der Segmentierungsmaske für das Objekt,
- ◯ x_max = östlichster Punkt der Segmentierungsmaske für das Objekt,
- ◯ x_min = westlichster Punkt der Segmentierungsmaske für das Objekt;
- Normalisierung der Größe („Resize“) des Bildes, sodass der Bildausschnitt eine vorgegebene Größe bzw. eine vorbestimmte Anzahl von Pixeln (wie z.B. 4.000 Pixeln) enthält.

When normalizing an image, one or more of the following operations can be performed:

- Estimate the orientation of the anchor and rotate the anchor so that the axis of the anchor is parallel to the x-axis of the image. This can be done as described above;
- Estimation of a bounding box of the object ("BoundingBox"), for example using segmentation, and cropping ("Cropping") of the image, where the cropped image corresponds to the image section defined by the estimated bounding box. The coordinates of the bounding box or the image section can be determined as follows:
- ◯ y_max = northernmost point of the segmentation mask for the object,
- ◯ y_min = southernmost point of the segmentation mask for the object,
- ◯ x_max = easternmost point of the segmentation mask for the object,
- ◯ x_min = westernmost point of the segmentation mask for the object;
- Normalization of the size (“resize”) of the image so that the image section contains a specified size or a predetermined number of pixels (such as 4,000 pixels).

Anschließend können wie oben beschrieben die Achse des Objekts und die Abstände r1 und r2 aus den normalisierten Bildern geschätzt bzw. ermittelt werden. Beispielsweise können für jede Spalte des zugeschnittenen Bilds die Pixel innerhalb einer Segmentierungsmaske für das Objekt, deren y-Koordinate kleiner (r1) und größer (r2) als die Position der Ankerachse ist, gezählt werden. Bei dem in 8C gezeigten Beispiel liegt die Achse in etwa bei y=100.Then, as described above, the axis of the object and the distances r1 and r2 can be estimated or determined from the normalized images. For example, for each column of the cropped image, the pixels within a segmentation mask for the object whose y-coordinate is less than (r1) and greater than (r2) than the position of the anchor axis can be counted. At the in 8C In the example shown, the axis is approximately at y=100.

Anhand der ermittelten Abstände r1 und r2 kann das normalisierte Kamerabild und optional das normalisierte segmentierte Bild wie oben beschrieben entrollt werden.Based on the determined distances r1 and r2, the normalized camera image and optionally the normalized segmented image can be unrolled as described above.

Wie oben beschrieben, umfasst das Entrollen ein Neuzuordnen der y-Koordinate im Bild. Die gekrümmte Oberfläche des Ankers wird dabei sinusartig herausgerechnet, um eine Schätzung für die planare Mantelfläche des Ankers, deren Ausschnitt man im Originalbild sieht, zu erhalten.As described above, unrolling involves remapping the y-coordinate in the image. The curved surface of the armature is calculated sinusoidally in order to obtain an estimate for the planar lateral surface of the armature, a section of which can be seen in the original image.

Dabei kann zunächst von dem normalisierten Segmentierungsergebnis ausgegangen werden, d.h. die Achse des Objektes ist parallel zur x-Achse im Bild (siehe 8C). The normalized segmentation result can be used as a starting point, ie the axis of the object is parallel to the x-axis in the image (see Fig 8C ).

Dann ist r1 und r2 der Abstand in Pixeln zwischen der Achse des Objektes und dem Schnittpunkt der Senkrechten der Achse zur Kante (siehe 8C für r1). Mithilfe des Radiuses und des Abstandes zum Objekt kann berechnet werden, welcher Anteil der Mantelfläche zu sehen ist.Then r1 and r2 are the distance in pixels between the axis of the object and the intersection of the perpendicular of the axis to the edge (see 8C for r1). The radius and the distance to the object can be used to calculate what portion of the lateral surface is visible.

Exemplarisch kann angenommen werden, dass die Entfernung so groß ist, sodass der Anteil der Mantelfläche 178° (89° nach oben und nach unten) entspricht. Die y-Koordinate jeder Spalte (d.h. für ein festes x) ist folgendermaßen verzerrt: ϑ(x) = arcsin(y(x) / r1 (x) * sin(89°)) für y>0 und y'(x) = arcsin(y(x) / r2(x) * sin(89°)) für y<0. (ϑ ist die neue Koordinate des entrollten Bilds).As an example, it can be assumed that the distance is so large that the portion of the lateral surface corresponds to 178° (89° up and down). The y-coordinate of each column (i.e. for a fixed x) is distorted as follows: ϑ(x) = arcsin(y(x) / r1 (x) * sin(89°)) for y>0 and y'(x) = arcsin(y(x) / r2(x) * sin(89°)) for y<0. (ϑ is the new coordinate of the unrolled image).

9A und 9B zeigen das Ergebnis des Entrollens der in 8A und 8B gezeigten Bilder, wobei 9A einen Ausschnitt (den mittleren Teil) aus dem entrollten normalisierten Originalbild des Objekts zeigt und 9B einen Ausschnitt (den mittleren Teil) aus dem entrollten normalisierten segmentierten Bild des Objekts zeigt. Diese Bilder werden für die nachfolgende Registrierung verwendet. Der Grund für die Verwendung des mittleren Teils für die Registrierung ist, dass am Rand des jeweiligen Bilds die Verzerrungen aufgrund von kleinen Fehlern bei der Segmentierung nicht mehr vernachlässigt werden können. 9A and 9B show the result of unrolling the in 8A and 8B pictures shown, whereby 9A shows a section (the middle part) from the original uncurled normalized image of the object and 9B shows a section (the middle part) from the unrolled normalized segmented image of the object. These images are used for subsequent registration. The reason for using the middle part for registration is that at the edge of each image the distortions due to small errors in segmentation can no longer be neglected.

BildregistrierungImage Registration

Bildregistrierung ist ein Prozess in der digitalen Bildverarbeitung, welcher dazu dient, mehrere Bilder derselben Szene bzw. desselben Objekts bestmöglich in Übereinstimmung zu bringen. Im Prinzip kann jedes Registrierungsverfahren eingesetzt werden.Image registration is a process in digital image processing that serves to match multiple images of the same scene or object as best as possible. In principle, any registration method can be used.

Wie oben beschrieben kann mit Hilfe des Segmentierungsmodells und optional der auf Basis von Kamerabildern ermittelten Lageparameter aus dem hochaufgelösten Bild das Objekt extrahiert und registriert werden. Die aus den Bildern einer zeitlichen Bildabfolge (z.B. einer Videoabfolge) extrahierten und registrierten Objekte werden anschließend fusioniert und in ein Koordinatensystem überführt, das durch den Parameter p und die zur Trajektorie orthogonale Bildkoordinate aufgespannt wird. Eine Verschiebung der Kameraposition um „Op“ führt somit zu einer Translation auf der Ordinatenachse im neuen Koordinatensystem.As described above, the object can be extracted from the high-resolution image and registered with the aid of the segmentation model and optionally the position parameters determined on the basis of camera images. The objects extracted and registered from the images of a temporal image sequence (e.g. a video sequence) are then merged and transferred to a coordinate system that is spanned by the parameter p and the image coordinates orthogonal to the trajectory. A displacement of the camera position by "Op" thus leads to a translation on the ordinate axis in the new coordinate system.

10 zeigt schematisch eine beispielhafte Transformation einer von der Kamera aufgenommenen zeitlichen Abfolge von Bildern eines Objekts (eines Ankers) in einem zwei-dimensionalen Bild, welches den Winkel ϑ und die zur Trajektorie orthogonale Bildkoordinate als Ortskoordinaten aufweist. Die zeitliche Abfolge von Bildern ist von einer beweglichen Kamera aufgenommen, welche sich auf der in 2B gezeigten Trajektorie mit dem Winkel ϑ als Parameter bewegt. 10A zeigt schematisch ein der aufgenommenen Bilder und den Abstand r(x) (für y < 0). 10B zeigt das Ergebnis der Transformation. 10 shows schematically an exemplary transformation of a chronological sequence of images of an object (an anchor) recorded by the camera into a two-dimensional image, which has the angle θ and the image coordinate orthogonal to the trajectory as location coordinates. The chronological sequence of images is moving from one common camera, which is located on the in 2 B shown trajectory with the angle ϑ as a parameter. 10A shows schematically one of the recorded images and the distance r(x) (for y < 0). 10B shows the result of the transformation.

Nachfolgend werden zwei beispielhafte Registrierungsverfahren beschrieben.Two sample registration procedures are described below.

Erstes beispielhaftes RegistrierungsverfahrenFirst exemplary registration procedure

Für die Registrierung werden transformierte (z.B. normalisierte und entrollte) Kamerabilder verwendet (siehe z.B. 9A). Zunächst werden in jedem transformierten Bild Merkmale an bestimmten Punkten („Points-of-interest“) berechnet. Die Punkte und die Merkmale können mittels bekannter Verfahren, z.B. durch ORB (siehe z.B. „ORB: an efficient alternative to SIFT or SURF“,Proceedings / IEEE International Conference on Computer Vision. IEEE International Conference on Computer Vision, doi:10.1109/ICCV.2011.6126544)oder SIFT (siehe z.B. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004), ermittelt werden. Anschließend werden die Punkte von jeweils zwei aufeinanderfolgenden Bildern gematcht („feature matching“). Dies kann durch das Finden von Punktepaaren mit der niedrigsten euklidischen Distanz erfolgen. Anhand der aufgefundenen Punktepaare kann eine euklidische Transformationsmatrix (zum Beispiel eine affine Transformation ohne Scherung) zwischen den beiden aufeinanderfolgenden Bildern berechnet werden. Die beiden Bilder werden anschließen zusammengesetzt, d.h. fusioniert, indem ein Bild mithilfe der Transformationsmatrix in Deckung zum anderen Bild gebracht wird. Im Bereich der Überschneidung wird der Teil des jeweiligen Bildes genommen, der näher im Zentrum der jeweiligen Bild ist, d.h. dort, wo |p| (z.B. |ϑ| am niedrigsten ist, da dort Verzerrungsfehler der Transformation am geringsten sind.Transformed (e.g. normalized and unrolled) camera images are used for registration (see e.g 9A ). First, features are calculated at specific points (“points of interest”) in each transformed image. The points and the features can be generated using known methods, e.g. by ORB (see e.g. "ORB: an efficient alternative to SIFT or SURF", Proceedings / IEEE International Conference on Computer Vision. IEEE International Conference on Computer Vision, doi:10.1109/ICCV. 2011.6126544) or SIFT (see e.g. DG Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004). Then the points of two consecutive images are matched (“feature matching”). This can be done by finding pairs of points with the lowest Euclidean distance. Based on the pairs of points found, a Euclidean transformation matrix (e.g. an affine transformation without shearing) between the two consecutive images can be calculated. The two images are then put together, ie fused, by bringing one image into congruence with the other image using the transformation matrix. In the area of intersection, the part of each image that is closer to the center of each image, ie where |p|, is taken is taken (e.g. |ϑ| is lowest, since the distortion errors of the transformation are lowest there.

Zweites beispielhaftes RegistrierungsverfahrenSecond exemplary registration procedure

Bei dem zweiten beispielhaften Registrierungsverfahren kann bei der Registrierung das Segmentierungsergebnis durch eine Schätzung der Bodenebene und/oder durch die Verwendung eines semantischen Objektmodells (wie z.B. des in den Publikationen [8] und [9] beschriebenen semantischen Objektmodells) verbessert werden. Wie oben beschrieben, wird mit Hilfe des Segmentierungsergebnisses die Größe und Orientierung des Objektes normalisiert, damit das Objekt in jedem Bild die gleiche Größe und Orientierung hat. Anschließend findet eine Transformation (z.B. Entrollen) statt. Mit dieser Vorverarbeitung reduziert sich die Registrierung auf eine Suche nach der besten Translation. Hierzu kann Template-Matching verwendet werden. Für jede (sinnvolle) Verschiebung in x- und y-Richtung des zweiten Bildes wird eine Korrelation mit dem nicht verschobenen ersten Bild berechnet. Die beste Translation ergibt sich aus der Position des Maximums der Korrelationsfunktion. Anschließend erfolgt eine Fusion der Bilder, wie in Zusammenhang mit der ersten Option beschrieben.In the second exemplary registration method, the segmentation result can be improved upon registration by estimating the ground level and/or by using a semantic object model (such as the semantic object model described in publications [8] and [9]). As described above, the segmentation result is used to normalize the size and orientation of the object so that the object has the same size and orientation in each image. A transformation (e.g. unrolling) then takes place. With this pre-processing, the registration is reduced to a search for the best translation. Template matching can be used for this. A correlation with the non-shifted first image is calculated for each (reasonable) shift in the x and y direction of the second image. The best translation results from the position of the maximum of the correlation function. The images are then merged, as described in connection with the first option.

11 zeigt ein beispielhaftes Verfahren zum Registrieren von Bildern aus der zeitlichen Abfolge von Bildern mittels Template Matching, wobei ein Modell M1 verwendet wird. Das Modell entspricht den bereits zusammengesetzten transformierten Kamerabildern. Falls das aktuelle Kamerabild das erste ist, wird dieses als initiales Modell für die folgenden Kamerabilder verwendet. In einem ersten Schritt S1 wird eine neues Kamerabild als eine neue Eingabe erfasst. Das Kamerabild wird in einem zweiten Schritt S2 transformiert. Das Kamerabild kann zum Beispiel wie oben beschrieben normalisiert und entrollt werden. In einem dritten Schritt S3 wird ein Muster- bzw. Schablonenvergleich („Template-Matching“) zwischen dem transformierten Bild und dem Model M1 durchgeführt. In einem vierten Schritt S4 wird entschieden, ob das das Ergebnis des Muster- bzw. Schablonenvergleichs über einem vorgegebenen Schwellenwert „t“ liegt. Wenn das Ergebnis des Muster- bzw. Schablonenvergleichs über einen vorgegebenen Schwellenwert „t“ liegt, wird der „Ja“ Pfad genommen und das Modell M1 mit der aktuellen Eingabe fusioniert (Schritt S5), d.h. das Modell wird um einen Teil der aktuellen Eingabe erweitert (M2). Das so aktualisierte Modell M2 wird als Eingabemodell für das nächste Eingangsbild verwendet. Wenn das Ergebnis des Muster- bzw. Schablonenvergleichs unter dem vorgegebenen Schwellenwert „t“ liegt, wird der Pfad „Nein“ genommen und auf eine neue Eingabe gewartet (Schritt S6). 11 shows an exemplary method for registering images from the temporal sequence of images by means of template matching, using a model M1. The model corresponds to the already assembled transformed camera images. If the current camera image is the first, this will be used as the initial model for the following camera images. In a first step S1 a new camera image is acquired as a new input. The camera image is transformed in a second step S2. For example, the camera image can be normalized and unrolled as described above. In a third step S3, a pattern or template comparison (“template matching”) is carried out between the transformed image and the model M1. In a fourth step S4, a decision is made as to whether the result of the pattern or template comparison is above a predetermined threshold value “t”. If the result of the pattern or template comparison is above a predetermined threshold value "t", the "yes" path is taken and the model M1 is merged with the current input (step S5), ie the model is expanded by part of the current input (M2). The model M2 thus updated is used as the input model for the next input image. If the result of the template comparison is below the predetermined threshold "t", the path "No" is taken and a new input is awaited (step S6).

Während beim oben beschriebenen ersten Registrierungsverfahren aufeinander folgende entrollte Bilder mit ORB-Merkmalen abgeglichen werden, um übereinstimmende Regionen zu finden, wird beim zweiten Verfahren ein Schablonenabgleich verwendet, um das Modell der Mantelfläche zu vergrößern.While the first registration method described above matches successive unrolled images to ORB features to find matching regions, the second method uses template matching to augment the surface area model.

Ein Nachteil bei ORB-Merkmalen bei manchen Objekten (wie zum Beispiel Ankern) kann sein, dass die Oberflächentextur annähernd periodisch ist und eine relativ große Anzahl von falschen Punkten gematcht werden können. Ein Schablonenvergleich (Template Matching) funktioniert besser aufgrund kleiner Fehler auf der Oberfläche und weil einzelne Elemente (wie z.B. die Kupferdrähte eines Ankers) einen stärkeren Einfluss haben.A disadvantage with ORB features on some objects (such as anchors) can be that the surface texture is approximately periodic and a relatively large number of false points can be matched. Template matching works better because of small imperfections on the surface and because individual elements (such as the copper wires of an armature) have a stronger impact.

Defekterkennungdefect detection

Zur Defekterkennung können unterschiedliche an sich bekannte Verfahren verwendet werden. Beispielsweise können folgende Verfahren verwendet werden:

1. Novelty Detection mit CNNs (CNN: Convolutional Neural Network bzw. faltendes neuronales Netzwerk) oder eines anderen geeigneten neuronalen Netzes: Hierbei wird ein Autoencoder mit nicht annotierten Bildern trainiert. Da Defekte in der Regel äußerst selten auftreten können und ihr Erscheinungsbild variieren kann, kann der Autoencoder die Defekte in seinem Merkmalsraum nicht darstellen. In einem Beispiel kann der Autoencoder daher nur die defektfreien Regionen konstruieren. Das Differenzbild zwischen dem Originalbild und dem rekonstruierten Bild gibt Aufschluss darüber, wo und ob sich Defekte im Bild befinden. Für eine binäre Klassifikation kann ein Schwellwertverfahren hinsichtlich Größe und/oder Amplitude der Defekte festgelegt werden.
2. Spektralanalyse: Defekte in periodischen Texturen (bei einem Anker wäre dies beim Kommutator gegeben) können im Spektrogramm des Bildes erkannt werden. Hierzu wird mithilfe des Segmentierungsergebnisses die Textur extrahiert (z.B. durch Fensterung) und Fourier-transformiert. Die Textur ist dann durch Spitzen („peaks“) im Spektrogramm identifizierbar. Spektralanteile mit einer kleineren Amplitude sind Störungen. Durch eine geeignete Vorverarbeitung der Bilder (umfassend zum Beispiel Entfernen von Bildrauschen und/oder Schwankungen bei der Beleuchtung) und ein geeignetes Schwellwertverfahren hinsichtlich Größe und/oder Amplitude der Defekte können Defekte erkannt werden. Auch Defekte, die sich durch eine Vorzugsrichtung im Bild auszeichnen (z.B. Schleifriefen), lassen sich mit dieser Vorgehensweise erkennen.
3. Deflektometrie bzw. Streifenlichtprojektion für spiegelnde Oberflächen: Durch Projektion eines bestimmten Musters auf das Werkstück können, falls die Reflexion nicht mit der erwarteten Reflexion übereinstimmt, sowohl Fehler in der Geometrie des Werkstücks (Beulen, Dellen) als auch Verschmutzungen/ Defekte durch schlechte Reflexionseigenschaften abgeleitet werden.

Different methods known per se can be used for defect detection. For example, the following methods can be used:

1. Novelty detection with CNNs (CNN: Convolutional Neural Network or convolutional neural network) or another suitable neural network: Here, an autoencoder is trained with non-annotated images. Since defects can usually occur extremely rarely and their appearance can vary, the autoencoder cannot represent the defects in its feature space. In one example, therefore, the autoencoder can only construct the defect-free regions. The difference image between the original image and the reconstructed image provides information about where and whether there are defects in the image. A threshold value method with regard to the size and/or amplitude of the defects can be defined for a binary classification.
2. Spectral analysis: Defects in periodic textures (in the case of an armature, this would be the case with the commutator) can be detected in the spectrogram of the image. For this purpose, the texture is extracted (eg by windowing) and Fourier-transformed using the segmentation result. The texture can then be identified by peaks in the spectrogram. Spectral components with a smaller amplitude are interference. Defects can be detected by suitable pre-processing of the images (comprising, for example, removal of image noise and/or fluctuations in the illumination) and a suitable threshold value method with regard to the size and/or amplitude of the defects. Defects that are characterized by a preferred direction in the image (e.g. grinding marks) can also be detected with this procedure.
3. Deflectometry or structured light projection for reflective surfaces: If the reflection does not match the expected reflection, projecting a specific pattern onto the workpiece can result in errors in the geometry of the workpiece (dents, dents) as well as dirt/defects due to poor reflection properties be derived.

Die obigen Defekterkennungsverfahren sind lediglich beispielhaft. Es ist möglich, andere geeignete Defekterkennungsverfahren einzusetzen.The above defect detection methods are exemplary only. It is possible to use other suitable defect detection methods.

BezugszeichenlisteReference List

1010: Objektobject
1212: Kameracamera
1414: Trajektorietrajectory
1616: Roboterarmrobotic arm
1818: Segmentierungsmaskesegmentation mask
2020: Achseaxis
r1, r2r1, r2: Abstände von der Achse zu einer Kante des Objekts im BildDistances from the axis to an edge of the object in the image
M1-M2M1-M2: Eingabemodell und aktualisiertes ModellInput model and updated model
TT: Templatetemplate
S1-S6S1-S6: Verfahrensschritteprocess steps

Claims

Device for the optical surface inspection of objects, comprising: a movable camera (12) which can be moved around an object (10) to be inspected on a trajectory (14) that can be parameterized with a parameter p, the camera (12) being set up during its movement to record a chronological sequence of images of the object (10) to be inspected from different positions on the trajectory (14) on the trajectory; a computing device comprising: a model provision unit which is set up to provide a statistical segmentation model, the segmentation model having been trained using a plurality of training images of the object (10) in different states, the segmentation model taking an image as input and as a result a classification of the delivers pixels in the input image in at least one class, the at least one class corresponding to a (partial) surface of the object (10) to be inspected; an image segmentation unit that is set up to segment an image recorded by the camera (12) using the segmentation model; a trajectory determination unit that is set up to determine the trajectory (14) using at least one image of the object (10) recorded by the camera (12) and segmented by the image segmentation unit; an image transformation unit that is set up to convert at least one (partial) surface of the object (10) to be inspected from each image in the chronological sequence of images recorded by the camera (12) as it moves along the trajectory (14) to extract based on the segmentation result for the image and to fuse the extracted (partial) surfaces in a two-dimensional representation, the two-dimensional representation having the parameter p and an image coordinate orthogonal to the trajectory (14) as location coordinates; a defect detection unit that is set up to carry out defect detection on the merged two-dimensional representation.

device after claim 1 , wherein the trajectory determination unit is set up to perform a principal component analysis of the segmented image, to determine an axis of the object (10) based on the result of the principal component analysis, and to determine the trajectory (14) in such a way that the trajectory (14) lying Points have a predetermined or predeterminable distance from the axis.

device after claim 1 or 2 , wherein the computing device is further configured to provide a depth map of the object (10); determining from the depth map the depth edges of all objects rising from a ground plane, and enhancing the edges of the object (10) having the greatest intersection with the object (10) from the segmentation by the determined depth edges.

Device according to one of the preceding claims, wherein the computing device further comprises a camera orientation unit which is set up to determine an orientation of the camera (12) with respect to the object (10) using at least one image of the object ( 10) to determine.

Device according to one of the preceding claims, wherein the image transformation unit is set up to carry out a transformation of each of the images recorded by the camera (12) and the corresponding segmented images before the extraction, the transformation comprising: Unrolling the respective image comprising projecting the at least one (partial) surface of the object (10) to be inspected onto a planar plane; and or compensating for the perspective distortion of the respective image; and or Normalize the respective image.

device after claim 5 wherein the unrolling of the image captured by the camera (12) and the corresponding segmented image comprises: determining from the segmented image an axis of the object (10) using principal component analysis; determining at least a distance from the axis to an edge of the object (10) in the segmented image; determining a coordinate transformation based on the at least one distance; Transforming the image captured by the camera (12) and the corresponding segmented image using the coordinate transformation,

Device according to one of the preceding claims, wherein the fusing of the extracted images in the two-dimensional representation comprises: for each pair of two consecutive images captured by the camera (12) and optionally transformed: determining a plurality of features at particular points in each of the images; finding pairs of points with matching features in the two consecutive images; determining a transformation matrix between the consecutive images; Fusion of the successive images by making one of the images coincident with the other image using the transformation matrix, taking the part of each image that is closer to the center of each image in the area of overlap.

Device according to one of Claims 1 until 6 , wherein the fusing of the extracted images in the two-dimensional representation comprises: for each pair of two consecutive images taken by the camera (12) and optionally transformed: finding a translation of one of the images with respect to the other image, the one correlation function between the two images maximized; and fusing the successive images by bringing one of the images into coincidence with the other image by means of translation, taking, in the area of overlap, the part of each image that is closer to the center of each image.

Computer-implemented method for the optical surface inspection of objects, comprising: providing a statistical segmentation model, the segmentation model having been trained using a plurality of training images of an object (10) in different states, the segmentation model taking an image as input and classifying the pixels in the Provides an input image in at least one class, the at least one class corresponding to a (partial) surface of the object (10) to be inspected. determining a trajectory (14) that can be parameterized with a parameter p, around an object to be inspected using at least one image of the object (10) recorded by a camera (12) and segmented using the segmentation model; Recording of a chronological sequence of images of the object (10) to be inspected by the camera (12), the camera moving on the trajectory (14) around the object (10) and during its movement images of the object (10) from different positions on the trajectory (14); segmenting the images of the chronological sequence of images (10) using the segmentation model; extracting at least one (partial) surface of the object (10) to be inspected from each of the images recorded by the camera (12) using the segmentation result for the respective image; Merging the extracted (partial) surfaces in a two-dimensional representation, the two-dimensional representation having the parameter p and an image coordinate orthogonal to the trajectory (14) as location coordinates; and performing defect detection on the merged two-dimensional representation.

procedure after claim 9 , wherein determining the trajectory (14) comprises: determining an axis of the object (10) based on a principal component analysis of the segmented image, and determining the trajectory (14) such that the points on the trajectory (14) are a predetermined or predeterminable distance point to the axis.

procedure after claim 9 or 10 , further comprising: providing a depth map of the object (10); determining depth edges of all objects rising from a ground plane from the depth map and enhancing the edges of the object (10) having the largest intersection with the object (10) from the segmentation using the depth edges.

Procedure according to one of claims 9 until 11 , further comprising determining an orientation of the camera (12) to the object (10) based on at least one of the camera (12) recorded and segmented using the segmentation model image of the object (10).

Procedure according to one of claims 9 until 12 , further comprising transforming each of the images captured by the camera (12) and the corresponding segmented images before the extraction, the transforming comprising: unrolling the respective image comprising projecting the at least one (partial) surface of the object (10) to be inspected a planar plane; and/or compensating for the perspective distortion of the respective image; and/or normalizing the respective image.

procedure after Claim 13 wherein performing the unrolling of the image captured by the camera (12) and the corresponding segmented image comprises: determining an axis of the object (10) using a principal component analysis of the segmented image; determining at least a distance from the axis to an edge of the object (10) in the segmented image; determining a coordinate transformation based on the at least one distance; and transforming the image captured by the camera (12) and the corresponding segmented image using the coordinate transformation.

Procedure according to one of claims 9 until 14 , wherein fusing the extracted images in the two-dimensional representation comprises: for each pair of two consecutive images captured by the camera (12) and optionally transformed: determining a plurality of features at specific points in each of the images ; finding pairs of points with matching features in the two consecutive images; determining a transformation matrix between the consecutive images; Fusion of the successive images by making one of the images coincident with the other image using the transformation matrix, taking the part of each image that is closer to the center of each image in the area of overlap.

Procedure according to one of claims 9 until 14 , wherein the fusing of the extracted images in the two-dimensional representation comprises: for each pair of two consecutive images taken by the camera (12) and optionally transformed: finding a translation of one of the images with respect to the other image, the one correlation function between the two images maximized; and fusing the successive images by bringing one of the images into coincidence with the other image by means of translation, taking, in the area of overlap, the part of each image that is closer to the center of each image.

Computer program product, which is set up, when loaded and executed on a computer, a method for the optical surface inspection of objects (10) according to one of claims 9 until 16 to perform.