EP4185986A1 - Procédé et système ou dispositif pour reconnaître un objet dans une image électronique - Google Patents

Procédé et système ou dispositif pour reconnaître un objet dans une image électronique

Info

Publication number
EP4185986A1
EP4185986A1 EP21746686.1A EP21746686A EP4185986A1 EP 4185986 A1 EP4185986 A1 EP 4185986A1 EP 21746686 A EP21746686 A EP 21746686A EP 4185986 A1 EP4185986 A1 EP 4185986A1
Authority
EP
European Patent Office
Prior art keywords
image
scene
transformed
binarized
pixels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21746686.1A
Other languages
German (de)
English (en)
Inventor
Michael Engel
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vision Components Gesellschaft fur Bildverarbeitungssysteme mbH
Original Assignee
Vision Components Gesellschaft fur Bildverarbeitungssysteme mbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vision Components Gesellschaft fur Bildverarbeitungssysteme mbH filed Critical Vision Components Gesellschaft fur Bildverarbeitungssysteme mbH
Publication of EP4185986A1 publication Critical patent/EP4185986A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/753Transform-based matching, e.g. Hough transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/754Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries involving a deformation of the sample pattern or of the reference pattern; Elastic matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Definitions

  • the invention relates to a method and a system and a device for machine vision and image recognition for recognizing an object in an electronic image. It refers to the field of pattern recognition, in which general regularities, repetitions, similarities or laws must be recognized in a quantity of data. Typical examples of the application areas of pattern recognition are speech recognition, text recognition and face recognition. Pattern recognition is also of central importance for more general areas such as artificial intelligence or data mining.
  • a pattern recognition process can be broken down into several sub-steps, starting with the detection and ending with a determined classification.
  • data or signals are recorded and digitized by means of sensors, e.g. B. recorded a digital image of a scene using a digital camera.
  • Patterns are obtained from the mostly analog signals, which can be represented mathematically in vectors, so-called feature vectors, and matrices.
  • the signals are pre-processed to reduce data and improve quality.
  • the patterns are then transformed into a feature space during feature extraction. mated.
  • the dimension of the feature space, in which the patterns are now represented as points, is limited to the essential features during feature reduction.
  • the final core step is the classification using a classifier, which assigns the features to different classes.
  • the classification method can be based on a learning process using a sample.
  • a pre-processing usually takes place.
  • the removal or reduction of unwanted or irrelevant signal components does not lead to a reduction in the data to be processed, this only happens when the feature is extracted.
  • Possible methods of pre-processing include signal averaging, application of a threshold value and normalization.
  • the desired results of the preprocessing are the reduction of noise and the mapping to a uniform range of values.
  • Methods of characteristic reduction are the analysis of variance, in which it is checked whether one or more characteristics can be separated, and the discriminant analysis, in which the smallest possible number of non-elementary characteristics that can be separated is formed by combining elementary characteristics.
  • the last and essential step in pattern recognition is the classification of features into classes. There are various classification methods for this.
  • the invention is directed to a sub-area of pattern recognition (English pattern recognition) and image processing (English digital image Pro cessing), namely image recognition (English image analysis).
  • image recognition an attempt is made from a digital image using automatic image processing techniques, e.g. B. by means of a computer, an electrical circuit rule, a digital camera or a mobile phone to extract useful information.
  • image processing techniques e.g. B. by means of a computer, an electrical circuit rule, a digital camera or a mobile phone to extract useful information.
  • two-dimensional images are recognized in machine vision and three-dimensional images in medicine.
  • the image processing techniques used include the recognition of two- and three-dimensional objects (object recognition) and segmentation. Segmentation is a subfield of digital image processing and machine vision.
  • segmentation The generation of content-related regions by combining neighboring pixels (pixel in a two-dimensional image) or voxel (pixel in a three-dimensional grid) according to a specific criterion of homogeneity is called segmentation. Objects in an image are segmented. A symbolic description is assigned to these. In the machine vision process, segmentation is usually the first step in image analysis, the sequence of which is as follows: scene, image acquisition, image preprocessing, segmentation, feature extraction, classification, statement.
  • Binarization is the preliminary stage of segmentation.
  • the most common binarization method is certainly the threshold value method. This method is based on a threshold that is best determined using a histogram.
  • a problem with many segmentation algorithms is their susceptibility to changing lighting within the image. This can result in only one part of the image being segmented correctly, but the segmentation in the other parts of the image is unusable.
  • the invention relates to the field of machine vision or image recognition (also called image understanding). These terms generally describe the computer-aided solution of tasks that are based on the capabilities of the human visual system.
  • Typical tasks of machine vision are object recognition and the measurement of the geometric structure of objects and of movements (external movement, own movement).
  • Image processing algorithms for example segmentation, and pattern recognition processes, for example for classifying objects, are used here.
  • object recognition is also referred to as pattern recognition.
  • Machine-seeing systems and devices are primarily used in industrial manufacturing processes in the areas of automation technology and quality assurance. Other areas of application can be found, for example, in traffic engineering - from simple radar traps to "seeing vehicles" - and in security engineering (access control, automatic detection of dangerous situations).
  • the following tasks are solved, for example: product control through automatic optical inspection, defect detection under surfaces, form and dimension inspection, position detection, surface inspection, object detection, layer thickness measurements, completeness check.
  • Image understanding techniques are used in industrial environments. For example, computers support quality control and measure simple objects. The advantages can be a higher level of quality, analysis of disturbance variables and process improvement, less waste, protection of the supply chain, monitoring of highly dynamic production processes and cost optimization. In the state of the art, it is important for the algorithms to run error-free that the specified environmental conditions are largely complied with (ca meraposition, lighting, speed of the assembly line, position of the objects, etc.).
  • Shims are checked on a conveyor belt to check dimensional accuracy and reduce the error rate in the end product by several powers of ten.
  • Welding robots are steered to the correct welding position.
  • Sorting and error detection of parts and workpieces such as bulk goods, circuit boards, photo prints.
  • imaging sensors For such tasks, electronic sensors that have integrated image processing are increasingly being used as imaging sensors. Typically, a two-dimensional image of the scene is captured with this type of image processing sensor. It is desirable if the recognition could already take place on parts of the complete pattern, because in practice the parts can often be partially covered.
  • a first known method for object recognition is based on contour-based pattern recognition of objects in an image, for example using the “VISOR® Object” sensor from SensoPart Industriesensorik GmbH with canny edge detection and generalized Hough transformation.
  • the advantage of this method is its high-resolution, precise part finding and orientation (position and orientation), largely independent of the orientation and scaling of the object. Its disadvantages, however, are that it requires a large amount of equipment and computing power and is therefore expensive and slow.
  • NCF normalized cross-correlation
  • a sensor with background comparison was therefore proposed in EP 3 118 812 A1, in which image reduction, smoothing, gradient calculation and a two-stage search method are used for image processing.
  • An image processing sensor with integrated image processing was also described in EP 3 258 442 A1, in which objects are recognized by comparison with a background image.
  • the solutions proposed there have the advantage that they can be implemented inexpensively and work very quickly.
  • object detection requires a constant background, which limits its practical usability.
  • a method for extracting 3D data is known from US 2015/0003736 A1. This document relates to a method for evaluating stereo images. In such methods, two images of a scene are recorded from different positions.
  • the two images are first epipolar-corrected (rectified) and then the images from both cameras can be matched by comparing them line by line using a correlation method. So you choose a pattern in one picture and look for a similar one in the other.
  • a census transform can be used in the correlation method, see e.g. BC Ahlberg et al., The genetic algorithm census transform: evaluation of census Windows of different size and level of sparseness through hardware in-the-loop training, Journal of Real-Time Image Processing (2021) 18:539-559, published online on 6 July 2020.
  • the pattern used to match the two stereo images is not part of an object that is recorded in the scene; instead, a predetermined, simple pattern, for example a bright spot, is used for this purpose a two-dimensional Gaussian intensity distribution, which is projected onto the scene when the images are recorded, for example by means of a laser.
  • a predetermined, simple pattern for example a bright spot
  • a two-dimensional Gaussian intensity distribution which is projected onto the scene when the images are recorded, for example by means of a laser.
  • the projected pattern is transformed and stored using the modified census transformation.
  • the two stereo images are also transformed by a modified census transform and the transformed images are compared to the transformed template to locate it therein. Since the projected pattern is relatively small or simple, small windows of 3 x 3, 5 x 5 or 7 x 7 pixels are sufficient for the comparison.
  • the known method is carried out iteratively in order to find the points of the projected pattern successively in different magnification levels of the images.
  • the present invention is based on the object of creating a method and a system or device for machine vision and image recognition for recognizing an object in an electronic image, with which an object in an image of a scene with little equipment and computing requirements and can therefore be recognized quickly and inexpensively.
  • a pattern can also be recognized instead of an object.
  • object and pattern or object and pattern recognition are treated as synonymous.
  • a method according to the invention for machine vision and image recognition for recognizing an object in an electronic image in which a scene is recorded by means of an optical sensor and an electronic image of the scene is generated, and the electronic image of the scene is checked for the presence of the Object is checked by using the correlation method to compare the electronic image of the scene with a reference image of the object, includes a learning phase in which a reference image of the object to be recognized is transformed and binarized using a modified census transformation by pixels of the transformed reference image with the mean value formed from these pixels of the transform th reference image are compared and the value of a pixel is set to 1 if it is greater than the mean value and to 0 if it is smaller than the mean value, and - the result of this transformation is stored in a transformed, binarized reference vector and includes a work phase
  • the method according to the invention has the special feature that a modified census transformation in combination with a maximum determination and a threshold value setting is carried out as a special, simplified correlation method with which the electronic image of the scene is checked for the presence of the object for object recognition will.
  • This enables a simplified binarized comparison, by means of which the invention can be implemented with significantly less effort and cost compared to the prior art and, moreover, results can be delivered with an extremely fast frame rate, so that object recognition is possible in real time.
  • a method according to the invention is very well suited for implementation in an electronic circuit, e.g. B. in a Field Programmable Gate Array (FPGA).
  • FPGA Field Programmable Gate Array
  • the algorithm of a method according to the invention for performing a pattern comparison or for recognizing an object in a scene comprises the modified census transformation of a pattern taught in the learning phase (of the object to be recognized) and the recorded image of the scene, combined with a binarization of transformed object and transformed scene, and subsequent binary comparison (exclusive OR) with calculation of the Hamming distance as a measure of the match between the taught pattern and the recorded image.
  • a census transformation the surroundings of the pixel are examined for each pixel of an image window and the pixel is transformed using this surroundings data. Very often these are the eight pixels around a central pixel of a 3x3 neighborhood. Each of the eight pixels is compared to the center pixel. If the value of the pixel is greater than or equal to the value of the central pixel, the pixel returns a binary 1 as the output of the census transformation, otherwise a 0. Transformation means that the brightness of different pixels in an image window is compared with a central pixel. A binary value is formed for each comparison from the respective comparisons with the central pixel, with a census vector being formed from all binary values, which describes the binarized relative brightness or gray value distribution in the image window. In a correlation method, for example, the generated census vectors can then be compared with one another in order to find the same object as reliably as possible in the camera image and the reference image.
  • the output for the pixel is a binary 1 if the value of the pixel is greater than or equal to the mean value of all pixels in the area under consideration, otherwise a 0.
  • the modified census occurs -Transform only assumes that the value of a pixel is set to 1 if it is greater than the mean and to 0 if it is less than the mean.
  • the case where the pixel is equal to the mean can be handled in two alternative ways.
  • the method can be performed such that the value of a pixel is set to 1 if it is greater than or equal to the mean and set to 0 if it is less than the mean, or it can be performed such that the value of a pixel set to 1 if it is greater than the mean and set to 0 if it is less than or equal to the mean.
  • n x m gray value pattern It is therefore possible to use the modified census transformation to describe an n x m gray value pattern. This is done by comparing each of the n x m pixels with the mean value formed from n x m pixels and storing the binary result of this comparison in an n x m bit result vector.
  • An important advantage for the method according to the invention is that the result vector is independent of a linear transformation of the input pixels, ie it is independent of the contrast and brightness offset of the image data.
  • the modified census transformation results in a vector with only n x m bits. This corresponds to a data reduction by a factor of 8 to 10. Nevertheless, essential properties of the original pattern are retained during the transformation, which are also independent of a linear transformation. If you want to check two patterns for their similarity, it is sufficient to compare the bit vectors of the two patterns in binary form after the modified census transformation (via bitwise XOR or EQUIV) and to count the number of matching or different bits, what is called the Hamming distance.
  • the inventive method is particularly simple and can be very easily on logic circuits such.
  • the storage requirement for a transformed, binarized reference vector of a reference image with n ⁇ m bits is also comparatively small.
  • a further advantage of the invention is that objects can also be recognized if they are not completely present in the image of the scene. This often happens in practice, for example when an object is placed at the edge of the picture or is partially covered or covered by another object. Since the method according to the invention does not search for a complete match but only for the maximum match, objects can also be recognized using a part of the complete object, ie the recognition can already take place on parts of the complete pattern. Practice has shown that objects can also be recognized if up to 10% of them are not visible in the scene image.
  • the reference image of the object to be recognized can be recorded by means of an optical sensor in the learning phase.
  • This can expediently be the same optical sensor that is used to record the scene in the working phase that is being checked for the presence of the object.
  • the optical sensor for recording the reference image or the scene is preferably an image acquisition device that supplies digital greyscale images, for example a CCD or a CMOS sensor, a camera module, a circuit board camera, a housing camera or a digital camera.
  • the optical sensor or the object to be taught is first positioned in such a way that the object to be taught lies in the image of the optical sensor.
  • the image section and the zoom size can then be adjusted so that the object to be taught fills the image well.
  • the object is then selected for teaching and saved as a reference. If several objects are to be taught, for example for later sorting tasks in the work phase, this teaching is carried out separately for each object.
  • the recorded scenes are then compared with the reference object or reference objects.
  • the reference image of the object to be recognized can be calculated theoretically in the learning phase, or the reference image of the object to be recognized or the transformed, binarized reference vector can be read from a database. If the reference image is not read in using the optical sensor, but is calculated theoretically from properties of the object, for example its shape or contour, or the reference image or the transformed, binarized reference vector is provided from a database, for example from an earlier acquisition using a optical sensor or an earlier theoretical calculation, the method according to the invention can be switched very quickly and easily between different detection tasks (e.g. the presence or the position of a changed object to be detected) without having to take an image of the detected object must be recorded with the optical sensor.
  • different detection tasks e.g. the presence or the position of a changed object to be detected
  • a further advantageous embodiment can consist in transforming and binarizing a reference image of a plurality of objects to be recognized in the learning phase using a modified census transformation and storing the results of these transformations in transformed, binarized reference vectors, and in the working phase the results of the scene transformation as transformed, binarized scene vectors are sequentially compared to the transformed, binarized reference vectors to detect the multiple objects to be recognized in the scene.
  • a further advantage of the method according to the invention is that in this way several objects to be recognized can be recognized very easily and the associated patterns can be compared.
  • the reference images are recorded in the learning phase using an optical sensor and the transformed, binarized reference vectors are formed therefrom and stored, or the reference images of the object to be recognized are calculated theoretically, or the reference images of the objects to be recognized or the transformed, binarized reference vectors are read from a database.
  • An advantageous modification of this embodiment can consist in the fact that in the learning phase a reference image of a plurality of objects to be recognized is transformed and binarized by means of a modified census transformation and the results of these transformations are stored in transformed, binarized reference vectors, and in During the work phase, the results of the scene transformation as transformed, binarized scene vectors are compared in parallel with the transformed, binarized reference vectors in order to simultaneously recognize the multiple objects to be recognized in the scene.
  • a further advantage of the method according to the invention is that in this way a number of objects to be recognized can also be recognized very easily and the associated patterns can be compared at the same time.
  • the reference images are recorded in the learning phase using an optical sensor and the transformed, binarized reference vectors are formed and stored, or the reference images of the object to be recognized are calculated theoretically, or the reference images of the objects to be recognized or the transformed, binarized reference vectors goals are read from a database.
  • reference images of the object can be recorded in different scales and rotational positions during the learning phase or these can be calculated synthetically from a recording (or calculated theoretically or the reference images of the object to be recognized or the transformed, binarized reference vectors from a database) and save the modified census transformation of these different manifestations of the same object as several transformed, binarized reference vectors, which are searched for in the scene one after the other or simultaneously during the work phase.
  • the comparison can then take place one after the other, just as with different objects, or - preferably in an integrated circuit or FPGA - even in parallel, and by searching for the highest match not only the existence and the position of the object will be determined, but also its rotational position and/or scaling.
  • the methods explained for reducing the image data can be combined in any way and carried out in any order.
  • a first advantageous embodiment can consist in the image of the scene recorded by the optical sensor not being checked completely in one step for the presence of the object, but by means of a search window, which each contains an image section of the scene and which is so over the image of the Scene is performed in that the search window sweeps over the image of the scene, with each search window being checked sequentially for the presence of the object by means of transformed, binarized scene vectors.
  • a moving average of the pixels can be determined in the search window, and both for the calculation of the moving average and for the modified census transformation, the storage of the image data is only necessary for as many lines as the vertical extension of the search window corresponds to .
  • the size of a search window can advantageously be between 8 ⁇ 8 and 128 ⁇ 128 pixels, preferably 48 ⁇ 48 pixels. It is therefore large enough that the reference image of the object to be recognized or the object to be recognized is completely contained in it, even if it is a complex or extensive object.
  • a second advantageous embodiment can consist in the number of pixels in the image of the scene recorded by the optical sensor being reduced before the transformed, binarized scene vectors are formed. In practice it is often necessary to reduce the image of the optical sensor (e.g.
  • CMOS sensor for use in the method according to the invention.
  • VGA resolution 640 x 480 pixels
  • higher resolutions (1280 x 800 pixels or more).
  • the large number of pixels leads to a high computational effort during processing, especially given that frame rates of over 100 frames per second are desirable for use in factory automation. For this reason, in one of the first processing steps, the number of pixels in the image of the scene captured by the optical sensor can be reduced. This can be done in a number of ways, which can be done individually or in combination.
  • a first variant consists in that a partial image (so-called “region of interest”) is selected from the image of the scene recorded by the optical sensor. Only the partial image is then checked for the presence of the object, the other parts of the scene are ignored.
  • a partial image can be selected, for example, by setting a window or by "cropping".
  • a second variant consists in the image of the scene recorded by the optical sensor being reduced in resolution.
  • This primarily means the reduction of the physical image resolution, i.e. the number of picture elements per length or the pixel density, but the gray value resolution (e.g. from 16 to 8 bits) can also be reduced.
  • the reduction of the resolution can be preferred by a suitable Binning (combining adjacent picture elements), for example by summation or averaging of adjacent pixels, or an image pyramid (smoothing and downsampling).
  • the reduction in resolution can be selected in variably adjustable steps.
  • a third variant consists in the image of the scene recorded by the optical sensor being processed by subsampling, with only individual or some pixels of the image of the scene being read out and processed to transformed, binarized scene vectors and the others are left out. Omitting pixels from the image of the scene, which can also be referred to as "thinning out", so that they are ignored and not taken into account in the object detection according to the invention, can have various advantages.
  • a size of the transformed, binarized scene vectors can be achieved that is particularly well suited for digital processing, for example by matching their word size to that of the hardware used.
  • the algorithm according to the invention is implemented with a microprocessor or digital signal processor (DSP)
  • DSP digital signal processor
  • Some microprocessor architectures e.g. TI TMS320C64xx, ARM NEON
  • special instructions that can also be used to efficiently calculate the Hamming distance.
  • this reduces the memory requirement required for the comparison vectors, and on the other hand, the outlay on circuitry and the time required for sequential processing are also reduced. It is e.g. B. only necessary to compare the depleted pixels in the window with the mean value of the window.
  • the pixels of the image of the scene from which transformed, binarized scene vectors are formed can be selected in various ways, for example according to a fixed scheme (e.g. certain rows and columns or certain areas) or according to a random or pseudo-random scheme .
  • a fixed scheme e.g. certain rows and columns or certain areas
  • a random or pseudo-random scheme e.g. certain pseudo-random sequences.
  • common pseudo-random sequences often have the problem that they are intrinsically correlated, the use of a random sequence made from physical noise is preferred.
  • the object is quickly searched for and recognized using a method according to the invention, if necessary according to one or more of the advantageous further designs, for example the number of pixels in the image of the scene recorded by the optical sensor is reduced before the transformed, binarized scene vectors are formed.
  • the result found in the first stage is then verified by additionally carrying out a more precise object recognition in the area of the image of the scene in which the object was recognized in the first stage.
  • the more precise object recognition in the second stage can be achieved, for example, with a method according to the invention, the number of pixels in the image of the scene recorded by the optical sensor not being reduced or to a lesser extent than in the first stage before the transformed, binarized Scene vectors are formed, or by means of a common method known from the prior art for machine vision and image recognition for recognizing an object in an electronic image.
  • the object is thus recognized quickly and roughly, and in the second stage this result is checked by finer or more precise image recognition. If the second stage result confirms that of the first stage, it is accepted as verified, otherwise the result is discarded.
  • a method according to the invention can advantageously be used in the field of machine vision in industrial environments, manufacturing and applications as described above in the introductory part of this patent application. These include, in particular, recognizing the presence of an object (pattern), ie z. B. a distinction between an object present and not present or a qualitative statement about a scene such as good/bad or right/wrong, recognizing the position of an object (e.g. for bonders, placement machines and gluing processes), recognizing the rotational position of an object or performing pattern comparisons (e.g. to select one object among many, for example for sorting tasks).
  • a system according to the invention and a device according to the invention for detecting an object in an electronic image of a scene, comprising an optical sensor for recording an electronic image of a scene and a digital data processing unit for processing image data, is characterized in that the system or the Device is designed to carry out a method according to any one of the preceding claims.
  • a device can in particular be an image processing sensor which, integrated on a circuit board, comprises an optical sensor for recording an electronic image of a scene and a digital data processing unit for processing image data according to the method according to the invention.
  • the digital data processing unit can preferably be an FPGA module, a processor, a memory and a peripheral interface.
  • the method according to the invention can be modified in various ways. These modifications include, for example, the following (instead of modified census transformation and comparison):
  • a first modification consists in that the object in the image of the scene is not sought by comparing binarized vectors transformed with the modified census transformation, but by means of an absolute difference correlation (ADF).
  • ADF absolute difference correlation
  • the gray values of the image of the scene and the object (pattern) in the search window are subtracted and the amount of the difference is added up as an error measure.
  • the procedure works with any standard, e.g. B. also with the Euclidean norm.
  • a second variant consists in that the object in the image of the scene is not sought by comparing binarized vectors transformed with the modified census transformation, but by means of a Normalized Correlation Function (NCF).
  • NCF Normalized Correlation Function
  • brightness and contrast are normalized both for the search window in the image of the scene and for the object (pattern).
  • the standardization of the object (pattern) can already take place in the learning phase; the normalization for the search window is carried out using a telescopic method, that is to say with a sliding mean value in a search window.
  • a third modification consists in the fact that when the image of the scene is undersampled, the pixels of the image of the scene from which transformed, binarized scene vectors are formed are selected to lie along an object contour. This takes into account the fact that image areas with constant image brightness contain little information overall. keep. Rather, the information content lies in image areas with strong changes, i.e. in the contours. These are distinctive for a specific object. This saves the comparison of pixels, which contribute little to the description of the object anyway.
  • the disadvantage of the method is that if there are several reference patterns, the union set of all contour pixels for each object used must be compared, otherwise the generation of the maximum will not work. This can quickly lead to this strategy becoming inefficient.
  • a fourth modification may be as follows.
  • the objects to be searched for (pattern) or the object to be searched for (pattern) are typically taught once and no longer changed for a search task.
  • Use in an optical computer mouse is also possible. Since it is easily possible to compare several samples in parallel, this method makes it possible to measure not only a linear movement in the X and Y directions, but also a rotation.
  • FIG. 1 shows a simplified diagram of a method according to the invention
  • FIG. 2 shows a census transformation of a search window
  • FIG. 3 shows a modified census transformation of a search window
  • FIG. 4 shows a modified census transformation with a random selection of pixels in the search window
  • FIG. 5 the application principle of a search window
  • FIG. 6 shows an exemplary embodiment of a hardware implementation of the invention
  • FIG. 8 the acceleration of the method by means of several “embedded block RAMs” (EBR) and
  • FIG. 9 shows an exemplary basic sketch of components of a system according to the invention.
  • FIG. 1 illustrates the basic sequence of a pattern recognition method according to the invention for the case that in the learning phase L the reference image of the object to be recognized is recorded by means of an optical sensor 1 .
  • the learning phase L is shown in the upper part of FIG.
  • An optical sensor 1 is used to take a picture 2 of a reference image of the object to be recognized later in a scene.
  • Such an object can be any object that is characterized in particular by its shape, contour, size or rotational position, but can also be individualized by other parameters (surface texture, labeling, etc.).
  • an output image has 1280 x 800 pixels.
  • a pre-processing 3 of the image data which in particular sets a search window or data reduction by selecting a sub-image, reducing the resolution, for example by binning or an image pyramid, or undersampling, for example by using a fixed scheme, a random or pseudo-random scheme or physical noise. This reduces the image size to 128 x 128 pixels, for example.
  • the features are reduced by means of a modified census transformation 4, and the result of this transformation is stored in a transformed, binarized reference vector.
  • the learning phase L is carried out once for an object to be recognized.
  • the learning phase L is carried out once for each of several different objects.
  • the reference image of the object to be recognized can be calculated theoretically in the learning phase L, or the reference image of the object to be recognized or the transformed, binarized reference vector can be read from a database.
  • the work phase A is shown in the lower part of FIG.
  • an image 2 of a scene is generated by means of an optical sensor 1, which is checked for the presence of the object learned in the learning phase L to be recognized.
  • an output image has 1920 x 1080 or 1280 x 800 pixels with a refresh rate of 100 Hz.
  • the acquisition 2 is in turn followed by pre-processing 3 of the image data, which in particular involves setting a search window or data reduction by selecting a partial image, reducing the resolution, for example by binning or an image pyramid, or undersampling, for example by Using a fixed scheme, one random or pseudo-random schemes or physical noise.
  • the image can also be enlarged or reduced according to the settings selected by the user (zoom function).
  • the image size is reduced by preprocessing 3 to 48 x 48 or 128 x 128 pixels, for example. After that, the
  • Feature reduction using a modified census transformation 4 and the results of this scene transformation are stored and processed as transformed, binarized scene vectors.
  • the classification is also carried out with statement 8 using a pattern comparison 5, in which the transformed, binarized scene vectors are compared with the transformed, binarized reference vector, with the Hamming distances, i.e. the number of matching bits, between the transformed, binarized scene vectors and the transformed, binarized reference vector are determined as a measure of agreement and in a maximum determination 6 that transformed, binarized scene vector is determined which has the greatest agreement with the transformed, binarized reference vector.
  • a threshold value 7 is used. Images that do not meet the threshold are assumed not to contain the object.
  • the setting of the threshold value 7 thus determines the degree of correlation between the object and the scene required for a positive statement 8 .
  • the object to be recognized is classified as recognized in the scene or an affirmative statement 8 is made if the degree of agreement of the transformed, binarized scene vector, which has the greatest agreement with the transformed, binarized reference vector, exceeds the predetermined threshold value 7 increases.
  • the invention relates to a method for machine vision and image recognition for recognizing an object in an electronic image captured by an optical sensor 1 . It is proposed to teach in a learning phase L a reference image of the object to be recognized and to compare it with the image of the scene in a working phase A, with the pattern comparison 5 between the object and scene being seen by means of a modified census transformation 4 with maximum determination 6 takes place and for a positive statement 8 the degree of agreement must exceed a threshold value 7 .
  • the invention thus relates to the optical detection of objects, with an image of a scene being compared with a reference image of the object and the object being identified in the image using a correlation method.
  • the correlation method is based on a modified census transformation of object and image of the scene, the calculation of the Hamming distance of the vectors resulting from the transformation and a maximum determination with threshold value setting in order to identify the object to be recognized in the image of the scene.
  • the pattern comparison 5 can be carried out with a respective maximum search for each object in parallel between the (only once) transformed, binarized scene vectors and the transformed, binarized reference vector belonging to an object.
  • a match value is determined for each of the stored objects. This calculation can be done in parallel and at the same time for all objects.
  • Such an embodiment can be used, for example, when performing a sorting task when multiple objects need to be distinguished.
  • that object is then output that has the greatest correspondence to the recorded image.
  • the match value must be greater than a threshold value so that the object is classified as recognized.
  • work phase A can be repeated for this object with transformed, binarized scene vectors that belong to the object and its immediate surroundings in the image of the scene.
  • the pre-processing 3 of the image data no or less data reduction is carried out than in the first statement 8, so that the more precise repetition of the work phase A in the area of the scene belonging to the found object means that the statement 8 is more accurate, for example a higher resolution, is checked and thus becomes more reliable.
  • the preprocessing 3 in the preceding learning phase L is to be adapted accordingly to the changed preprocessing of the working phase A, if necessary. Additional checking of a specific detected area requires very little additional processing time.
  • the statement 8 on an object recognized in the work phase A can also be made using the recording 2 or its image data after a pre-processing 3 by a common, out-of-the-box computer vision and image recognition methods known in the art for detecting an object in an electronic image.
  • Figure 2 shows a census transform for a 3 x 3 pixel area. The comparison of pixels "1", “2", “3”, etc. is made with pixel C, for example in this order.
  • Figure 3 shows a modified census transform for a 3 x 3 pixel region.
  • the comparison of the pixels I 0 Ii ... Is takes place with the mean value avg
  • Figure 4 shows a modified census transform in a 48 x 48 window.
  • the comparison of the pixels I 0 , Ii ... I k with k ⁇ 2303 takes place with the mean value avg
  • FIG. 5 shows the application principle of a search window in the method according to the invention, in which the pattern comparison is carried out by means of a modified census transformation.
  • the image of the scene 9 recorded by the optical sensor 1, which may have been reduced in a preprocessing 3, is not completely checked in one step for the presence of the object in the exemplary embodiment shown with a resolution of 128 ⁇ 128 pixels , but by means of a search window 10.
  • the search window 10 includes in each case an image section 11 of the scene, in the exemplary embodiment shown with 48 x 48 pixels each, and it is guided over the image of the scene 9 in such a way that it sweeps over the image of the scene 9, with a search window 10 being sequentially created in each case by means of transformed, binarized scene vectors is checked for the presence of the object.
  • the transformed, binarized scene vectors in the search window 10 are randomly selected according to FIG.
  • the mean value avg is the mean value of all pixels in the 48 x 48 pixel search window 10
  • the bits b i of the scene vector, the length of which is k ⁇ 2303 bits, are set to 0 if l i ⁇ avg and to 1 set if I, > avg.
  • FIG. 6 shows an exemplary embodiment of a hardware implementation of the invention.
  • a CMOS sensor with a resolution of 1280 ⁇ 800 pixels and a global shutter is used as the optical sensor 1 .
  • Its video data is output as "Mipi CSI2", for example.
  • the image data output by the optical sensor 1 are reduced by means of a pre-processing 3 .
  • the preprocessing 3 comprises two areas, namely the selection 12 of image data on the optical sensor 1 itself or the limitation of the image data recorded by the optical sensor 1, and the reduction 13 of the image data output by the optical sensor 1.
  • the selection 12 binning to 640 ⁇ 400 pixels and the selection of a partial image of the scene (“region of interest”) are carried out by “cropping” with the control of shutter and gain.
  • the reduction 13 takes place by means of an image pyramid.
  • the factor of the image reduction in variably adjustable steps, e.g. B. integer increments to choose.
  • the selection 12 takes place directly on the CMOS sensor and the reduction 13 (Averaging, pyramid) in a stage that is implemented in the FPGA of the device, like all function blocks marked with (*) in FIG.
  • the resulting gray image of video data of the reduced image 14 then only has a resolution of 128 ⁇ 128 pixels.
  • the search window was realized with a fixed size of 48 x 48 pixels.
  • the moving average avg is first determined in the search window. This is preferably done with a so-called telescope, i.e. once the mean value for all image windows in the top line has been calculated, only two additions and two subtractions as well as normalization are required for each further result, because most pixels and also the sum of which agrees with the neighboring search windows. This speeds up the calculation of the mean value, since it does not have to be recalculated from scratch for all the pixels taken into account, but only the changed pixels resulting from the shifting of the search window are taken into account in the floating calculation. To calculate the moving average and also for the modified census transformation, it is necessary to store the image data for as many lines as the vertical extent of the search window corresponds to.
  • the memory 15 was implemented as an "embedded block RAM" in an FPGA, specifically in six EBR blocks of 1 kbyte each, each of which is configured as a dual-port RAM.
  • the RAM is loaded sequentially via a described random followed addressed.
  • the position of the selected pixels in the search window is distributed as randomly and evenly as possible in the window; However, the sequence is the same for all search windows, which is why it can be stored in the FPGA, e.g. B. in a ROM.
  • an address generator For each x-y position of the search window, an address generator generates the random sequence for the RAM, which outputs the corresponding gray value information for the pixel. This is compared in the pattern matching stage 18 with the previously calculated moving average avg, yielding one bit of the modified census transform for the search window.
  • This result bit can be compared immediately by means of an XOR logic comparison with the corresponding bit of a previously stored, transformed, binarized reference vector RI, which belongs to the object sought.
  • the reference vector RI is preferably stored in a shift register. The number of matching pixels is counted in a counter ZI. After sufficient (fixed value k) "samples" have been compared, the search window moves one pixel to the right or at the last pixel in a line to the beginning (left) of the next line.
  • FIG. 6 also shows that it is possible with relatively little effort to simultaneously compare several stored objects with the modified census transformation of the search window and thus to search in the search window or the image of the scene at the same time.
  • a transformed, binarized reference vector is stored for each object to be checked (in the exemplary embodiment RI, R2, R3 and R4) and the XOR comparison with the search window takes place in parallel and at the same time, with the number of matching pixels in one respective counter (in the exemplary embodiment ZI, Z2, Z3 and Z4) is stored.
  • the respective sliding maximum for the counter or correspondence value and its position in the x and y direction and the identification of the corresponding object are stored with a subsequent determination of the maximum.
  • these values or results become globally valid for the entire image and can be read out by a microprocessor via the readout 19 . It is also possible to use the microprocessor to read out the partial values immediately when they are generated and to use a program to determine the maximum.
  • the readout 19 takes place via a DMA channel 20 for the microprocessor, via which the video data for the reduced image 14 can also be transmitted.
  • This type of maximum determination is also referred to as a "winner-takes-AN" strategy.
  • a threshold value is used to enable an object to be recognized with sufficient accuracy. Images of the scene that do not meet the threshold are assumed not to contain the object.
  • FIG. 7 illustrates the determination of the maximum in detail.
  • the current modified census transformation values 21 provided by the pattern comparison stage are compared with the reference vectors R and the number of matching pixels is counted in a respective counter Z.
  • the current image position 22 is provided from the x and y registers.
  • the maximum determination 23 the respective maximum match is determined and the x-position, the y-position, the identification n of the associated object and the counter value 24 for the maximum found are saved. Via readout 19, these values are output once per image to a microprocessor for further evaluation.
  • the maximum determination can also be carried out in the FPGA instead of in a microprocessor.
  • FIG. 8 shows how the method can be accelerated by means of a number of “embedded block RAMs” (EBR).
  • EBR embedded block RAMs
  • additional acceleration is achieved by the fact that each of the EBRs used (a total of six block RAMs) can be read out in parallel, each with two ports, which makes parallelization and speed increase by a factor of 12 possible. 6144 bytes of buffer memory are required to store 48 lines with 128 pixels (eight bits each).
  • the FPGA used provides EBR memory with 1024 bytes each.
  • the EBRs can be configured as dual port RAM.
  • the pixel input 25 is compared in parallel with the comparison value avg by means of the six EBRs and twelve comparators 26 . In this way, twelve comparisons can be carried out simultaneously per cycle, which means a speedup by a factor of twelve. One cycle must be reserved per processing step only for entering and moving on new pixels.
  • FIG. 9 shows an exemplary basic sketch of components of a system according to the invention.
  • the optical sensor 1 has a resolution of 1280 ⁇ 800 pixels, for example.
  • the pre-processing 3 of the image data is carried out with an FPGA, for example by means of an image pyramid or by controlling the binning and cropping of the optical sensor 1.
  • the video data from the optical sensor 1 are transmitted via two Mipi CSI2 lanes to the FPGA, which transmits the optical sensor via controls an I 2 C interface.
  • the reduced video data is transferred from the FPGA in parallel to a microprocessor 27 or microcontroller with data memory (RAM), program memory (QSPI) and DMA.
  • the microprocessor controls the FGPA via an I 2 C and SPI interface.
  • Various peripheral Interfaces 28 e.g.
  • Ethernet, LAN, I 2 C, SPI, serial, IO-Link, Profinet can enable the microprocessor to communicate with the periphery.
  • a display and control unit 29 is optionally provided.
  • a power supply 30 can serve as a power sequencer, monitor and reset.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé pour la vision artificielle et la reconnaissance d'images, pour la reconnaissance d'un objet dans une image électronique, qui est capturée au moyen d'un capteur optique (1). Selon l'invention, une image de référence de l'objet à reconnaître est enseignée dans une phase d'apprentissage (L) et est comparée à l'image de la scène dans une phase de travail (A) ; la comparaison de motifs (5) entre l'objet et la scène est réalisée au moyen d'une transformée de census modifiée (4) avec détermination du maximum (6), et, pour une instruction positive (8), le degré de correspondance doit dépasser une valeur seuil (7).
EP21746686.1A 2020-07-21 2021-07-19 Procédé et système ou dispositif pour reconnaître un objet dans une image électronique Pending EP4185986A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102020119243.6A DE102020119243A1 (de) 2020-07-21 2020-07-21 Verfahren und System bzw. Vorrichtung zum Erkennen eines Objektes in einem elektronischen Bild
PCT/EP2021/070133 WO2022018019A1 (fr) 2020-07-21 2021-07-19 Procédé et système ou dispositif pour reconnaître un objet dans une image électronique

Publications (1)

Publication Number Publication Date
EP4185986A1 true EP4185986A1 (fr) 2023-05-31

Family

ID=77104042

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21746686.1A Pending EP4185986A1 (fr) 2020-07-21 2021-07-19 Procédé et système ou dispositif pour reconnaître un objet dans une image électronique

Country Status (5)

Country Link
US (1) US20230154144A1 (fr)
EP (1) EP4185986A1 (fr)
JP (1) JP2023535005A (fr)
DE (1) DE102020119243A1 (fr)
WO (1) WO2022018019A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023249973A1 (fr) * 2022-06-20 2023-12-28 Lean Ai Technologies Ltd. Réseaux neuronaux associés à des articles fabriqués

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150003573A (ko) 2013-07-01 2015-01-09 한국전자통신연구원 영상 패턴 검출 방법 및 그 장치
US9864758B2 (en) * 2013-12-12 2018-01-09 Nant Holdings Ip, Llc Image recognition verification
JP6278108B2 (ja) 2014-03-14 2018-02-14 オムロン株式会社 画像処理装置、画像センサ、画像処理方法
JP6730855B2 (ja) 2016-06-13 2020-07-29 株式会社キーエンス 画像処理センサ、画像処理方法、画像処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器

Also Published As

Publication number Publication date
US20230154144A1 (en) 2023-05-18
DE102020119243A1 (de) 2022-01-27
WO2022018019A1 (fr) 2022-01-27
JP2023535005A (ja) 2023-08-15

Similar Documents

Publication Publication Date Title
DE112012005350B4 (de) Verfahren zum Schätzen der Stellung eines Objekts
DE19521346C2 (de) Bilduntersuchungs/-Erkennungsverfahren, darin verwendetes Verfahren zur Erzeugung von Referenzdaten und Vorrichtungen dafür
EP2417561B1 (fr) Code a deux dimensions et procede
EP0338677B1 (fr) Procédé de traitement des images pour reconnaissance de formes
DE112011101695T5 (de) System und Verfahren zur Verarbeitung von Bilddaten in Bezug zu einer Konzentration innerhalb des gesamten Bildes
DE102007035884B4 (de) Linienrauschunterdrückungsvorrichtung, -verfahren und -programm
DE102016120775A1 (de) System und Verfahren zum Erkennen von Linien in einem Bild mit einem Sichtsystem
WO2007039202A1 (fr) Dispositif, procede et programme informatique pour determiner des informations relatives a la forme et/ou a l'emplacement d'une ellipse dans une image graphique
DE2703158A1 (de) Einrichtung zum erkennen einer zeichenposition
DE102013112040B4 (de) System und Verfahren zum Auffinden von sattelpunktartigen Strukturen in einem Bild und Bestimmen von Informationen daraus
DE102017220307A1 (de) Vorrichtung und Verfahren zum Erkennen von Verkehrszeichen
EP2028605A1 (fr) Procédé de détection de formes symétriques
DE4133590A1 (de) Verfahren zur klassifikation von signalen
DE102019124810A1 (de) Bildverarbeitungsgerät und Bildverarbeitungsverfahren
EP0484935B1 (fr) Méthode et dispositif pour lire et identifier l'information réprésentée par des signes, en particulier un code à barres, dans un champs deux-/ou trois-dimensionnel à l'aide d'une caméra vidéo qui est capable de générer un signal vidéo numérique de l'image
CN115641337B (zh) 一种线状缺陷检测方法、装置、介质、设备及系统
DE102006044595B4 (de) Bildverarbeitungsvorrichtung zur Segmentierung anhand von Konturpunkten
EP4185986A1 (fr) Procédé et système ou dispositif pour reconnaître un objet dans une image électronique
EP1709587B1 (fr) Systeme de traitement image
EP1766579A2 (fr) Procede de detection de structures geometriques dans des images
DE102020209080A1 (de) Bildverarbeitungssystem
EP3543901B1 (fr) Dispositif et procédé de détermination fiable de la position, de l'orientation, de l'identité et des informations d'état sélectionnées d'objets
DE10142457A1 (de) Digitale Bildmessung retroreflektierender Marken
DE19507059B9 (de) Verfahren zur omnidirektionalen Erfassung von OCR-Klarschrift auf Etiketten oder ähnlichen Datenträgern durch zufallsgesteuerte Suche und Dekodierung mit einem neuronalen Netzwerk
DE112022002410T5 (de) Informationsverarbeitungsvorrichtung, informationsverarbeitungsverfahren und steuerprogramm

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20230221

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)