RU2778906C1

RU2778906C1 - Method for automatically recognizing scenes and objects in an image

Info

Publication number: RU2778906C1
Application number: RU2021107871A
Authority: RU
Inventors: Владимир Алексеевич Тупиков; Валерия Анатольевна Павлова; Сергей Николаевич Крюков; Владимир Александрович Бондаренко
Original assignee: Акционерное общество Научно-производственное предприятие "Авиационная и Морская Электроника"
Filing date: 2021-03-23
Publication date: 2022-08-29

Abstract

FIELD: objects identifying.

SUBSTANCE: invention relates to the field of identifying objects in an image.

A method for automatically recognizing scenes and objects in an image, in which reference images are formed and stored, the stream of input images is processed to find an object of interest on them, using reference images, characterized in that key points and areas around them are selected on the reference image, after which search on the input image for the corresponding key points of the reference image. Next, descriptors of key points are created, after that, for each key point of the reference image, based on the maximum proximity of the descriptors, the corresponding key point on the current image is determined and, if the proximity size is less than a predetermined threshold, the corresponding key point on the current image is stored, and when the number of corresponding key points on the current image will exceed half of the key points on the reference image, the scene object in the image is considered recognized.

EFFECT: reducing the time and resources required to carry out image processing, as well as increasing the likelihood of detecting objects in the image.

1 cl, 20 dwg

Description

Изобретение относится к вычислительной технике, в частности к способам идентификации объектов на изображении.The invention relates to computer technology, in particular to methods for identifying objects in an image.

Известны способ и устройство для распознавания изображений объектов (см. патент на изобретения РФ №2361273, М.кл. G06K 9/62, опубл. 10.07.2009 г.). Способ заключается в следующем: эталонные изображения хранят в виде векторной трехмерной модели, для каждой модели фиксируют набор параметров для аффинных преобразований: углы поворота по осям X, Y, Z, масштаб, с учетом сложности формы модели, далее получают векторную трехмерную модель эталонного объекта путем геометрического построения и, изменяя ее положение в пространстве (поворот, отражение, масштабирование), получают вышеуказанные параметры, которые сохраняют и используют в дальнейшем при распознавании для воссоздания соответствующего ракурса эталона объекта, плоское изображение представляют в виде двумерного массива, элементами которого являются значения от 0 до 255- градации серого цвета, в набор параметров дополнительно включают соотношение сторон габаритного изображения контейнера объекта и кодированное представление объекта, которое позволяет определить его положение внутри габаритного контейнера. При этом под габаритным контейнером подразумевают минимальную прямоугольную область на плоскости, в которую вписывается изображение объекта, а кодирование производят путем разбиения габаритного контейнера на 25 одинаковых областей и определением наличия части объекта в каждой из них, получая, таким образом, 25-битный код данного ракурса объекта в двоичном виде: если часть изображения находится в области, то ее получают перебором значений меток в областях слева направо, сверху вниз; на вход распознавателя подают изображение, представленное массивом пикселей в градациях серого, т.е. каждый элемент массива имеет значение от 0 до 255, размерность массива зависит от параметров дискретизации изображения, распознавание производят следующим образом: определяют габаритный контейнер входного изображения объекта, затем кодируют вышеуказанным способом, исходя из отношения сторон габаритного контейнера и полученного кода, выбирают набор параметров из базы эталонов, после чего выполняют преобразование векторной модели эталонного объекта соответственно установленным ранее параметрам: поворот и масштабирование, после этого строят плоское изображение модели эталона, которое сравнивают с поданным на вход изображением посредством нейронной сети типа персептрон, сравнение производят путем анализа градаций серого для каждой дискретной области изображения, причем производят попиксельное сравнение, затем находят модуль разности для каждой пары пикселей изображения, поданного на вход распознавателя и полученной проекции векторной модели эталонного объекта и сравнивают его с пороговым значением, полученные данные подают на вход нейросети персептрон и, в зависимости от значения ее функции активации принимают решение о схожести проекции векторной модели эталонного объекта и входного изображения.A method and device for recognizing images of objects are known (see patent for inventions of the Russian Federation No. 2361273, MCL G06K 9/62, publ. 10.07.2009). The method is as follows: reference images are stored in the form of a vector three-dimensional model, for each model a set of parameters for affine transformations is fixed: rotation angles along the X, Y, Z axes, scale, taking into account the complexity of the model shape, then a vector three-dimensional model of the reference object is obtained by geometric construction and, by changing its position in space (rotation, reflection, scaling), the above parameters are obtained, which are stored and used later in recognition to recreate the corresponding angle of the object standard, a flat image is represented as a two-dimensional array, the elements of which are values from 0 up to 255-gradations of gray, the set of parameters additionally includes the aspect ratio of the overall image of the container of the object and the encoded representation of the object, which allows you to determine its position inside the overall container. In this case, the overall container is understood as the minimum rectangular area on the plane into which the image of the object fits, and encoding is performed by dividing the overall container into 25 identical areas and determining the presence of a part of the object in each of them, thus obtaining a 25-bit code of this view object in binary form: if a part of the image is in the region, then it is obtained by enumerating the label values in the regions from left to right, from top to bottom; the input of the recognizer is an image represented by an array of pixels in grayscale, i.e. each element of the array has a value from 0 to 255, the array dimension depends on the image sampling parameters, recognition is performed as follows: the overall container of the input image of the object is determined, then it is encoded in the above way, based on the aspect ratio of the overall container and the resulting code, a set of parameters is selected from the database standards, after which the transformation of the vector model of the reference object is performed according to the previously set parameters: rotation and scaling, then a flat image of the reference model is built, which is compared with the input image by means of a neural network of the perceptron type, the comparison is made by analyzing grayscale for each discrete area image, moreover, a pixel-by-pixel comparison is made, then the difference modulus is found for each pair of pixels of the image supplied to the input of the recognizer and the resulting projection of the vector model of the reference object and compared with the threshold value value, the obtained data is fed to the input of the perceptron neural network and, depending on the value of its activation function, a decision is made about the similarity of the projection of the vector model of the reference object and the input image.

Данный способ предполагает достаточно сложную обработку изображения объекта, что снижает точность распознавания из-за влияния суточных и сезонных изменений яркости выделяемых объектов, а так же требует огромного временного ресурса для компьютерной реализации.This method involves a rather complex processing of the object image, which reduces the recognition accuracy due to the influence of daily and seasonal changes in the brightness of the selected objects, and also requires a huge time resource for computer implementation.

Известен способ получения изображения объекта (см. патент на изобретение РФ №2243591, М.кл. G06T 5/50, опубл. 27.12.2004 г.), включающий предобработку сигналов представленной временной последовательности изображений сцены, на которой возможно появление объекта, запоминание опорных, характеризующих изображение сцены сигналов, вычисление взаимно-корреляционной функции опорного и текущих изображений, измерение параметров взаимно-корреляционной функции, получение в каждый момент времени полной информации о координатах и перемещениях объекта и измерение параметров объекта и сцены, отличиями способа являются разделение на фрагменты изображения рассматриваемой последовательности изображений, вычисление взаимно-корреляционной функции опорного и последующих текущих изображений, после чего получают сигналы, соответствующие, например, средней и максимальной величине амплитуды сигнала взаимно-корреляционной функции и сравнивают их, например, вычитают из максимальной величины амплитуды сигнала взаимно-корреляционной функции среднее значение величины амплитуды, полученный разностный сигнал сравнивают с заданной пороговой величиной сигнала, а затем формируют управляющие сигналы, с помощью которых осуществляют фрагментарную фильтрацию временной последовательности текущих изображений, причем для фрагментов, разностные сигналы которых превышают пороговое значение, формируют управляющие сигналы, блокирующие сигналы изображения, а для фрагментов, разностные сигналы которых меньше или равны пороговому значению, формируют управляющие сигналы, пропускающие сигналы соответствующих фрагментов.A known method for obtaining an image of an object (see patent for the invention of the Russian Federation No. 2243591, M. class G06T 5/50, publ. characterizing the image of the scene of signals, calculating the cross-correlation function of the reference and current images, measuring the parameters of the cross-correlation function, obtaining complete information about the coordinates and movements of the object at each moment of time and measuring the parameters of the object and the scene, the differences of the method are the division into fragments of the image of the considered sequence of images, calculating the cross-correlation function of the reference and subsequent current images, after which signals are obtained corresponding, for example, to the average and maximum value of the signal amplitude of the cross-correlation function and compared, for example, they are subtracted from the maximum value of the signal amplitude mutually of the multi-correlation function, the average value of the amplitude, the obtained difference signal is compared with a given threshold value of the signal, and then control signals are generated, with the help of which fragmentary filtering of the time sequence of current images is carried out, and for fragments, the difference signals of which exceed the threshold value, control signals are formed , blocking image signals, and for fragments, the difference signals of which are less than or equal to the threshold value, control signals are formed that pass the signals of the corresponding fragments.

При этом фрагменты изображения могут быть выбраны одинаковыми по конфигурации и равными по площади, с размерностью К пикселей, где 2 ≤ К ≤ N, причем N - число пикселей в изображении, или различными по конфигурации и по площади с размерностью К1 пикселей на каждом фрагменте, где К1 = 2, … nj1, причем nj1 ≤ N/2.In this case, image fragments can be chosen to be identical in configuration and equal in area, with a dimension of K pixels, where 2 ≤ K ≤ N, and N is the number of pixels in the image, or different in configuration and area with a dimension of K1 pixels on each fragment, where К1 = 2, … nj1, and nj1 ≤ N/2.

Причем на фрагменты могут быть разделены либо опорное, либо текущее изображения.Moreover, either the reference or the current image can be divided into fragments.

Кроме того, сигналы, принадлежащие выделенному изображению объекта, записывают в качестве опорного изображения.In addition, signals belonging to the extracted object image are recorded as a reference image.

А при выполнении операции вычитания сигналов взаимно-корреляционной функции из сигналов взаимно-корреляционной функции для фрагментов предыдущего кадра вычитают сигналы взаимно-корреляционной функции соответствующих фрагментов последующих кадров.And when performing the operation of subtracting the signals of the cross-correlation function from the signals of the cross-correlation function for fragments of the previous frame, the signals of the cross-correlation function of the corresponding fragments of subsequent frames are subtracted.

В данном способе изображение представляется в видеосигнальном виде, например, в виде телевизионного сигнала.In this method, the image is presented in video signal form, for example, in the form of a television signal.

В известном способе по патенту №2243591 также, как в предыдущем аналоге, осуществляется обработка изображений сцены в целом, без выделения их контуров и локализации объектов на изображении, поэтому корреляционная обработка, проводимая в способе, зависит в значительной мере от освещенности, погодных условий, что повышает вероятность ложного обнаружения объектов.In the known method according to patent No. 2243591, as well as in the previous analogue, images of the scene as a whole are processed, without highlighting their contours and localizing objects in the image, therefore, the correlation processing carried out in the method depends largely on illumination, weather conditions, which increases the probability of false detection of objects.

Данный способ выбран в качестве прототипа.This method was chosen as a prototype.

Проблемой, которую необходимо решить, является то, что обработка изображений сцены в целом и локализации объектов на изображении требует значительного времени и ресурсов для обработки изображений, а также то, что обработка зависит в значительной мере от освещенности, погодных условий, которые увеличивают вероятность ложного обнаружения объектов на изображении.The problem that needs to be solved is that the processing of images of the scene as a whole and the localization of objects in the image requires significant time and resources for image processing, as well as the fact that processing depends to a large extent on illumination, weather conditions, which increase the likelihood of false detection objects in the image.

Техническим результатом предлагаемого изобретения является уменьшение времени и ресурсов, требуемых для осуществления обработки изображений, а также повышение вероятности обнаружения объектов на изображении.The technical result of the invention is to reduce the time and resources required for image processing, as well as to increase the probability of detecting objects in the image.

Достижение технического результата обеспечивается в предлагаемом способе автоматического распознавания сцен и объектов на изображении, при котором формируют и запоминают эталонные изображения, осуществляют обработку потока входных изображений, для нахождения на них интересующего объекта, используя эталонные изображения, согласно предлагаемому изобретению, выделяют на эталонном изображении ключевые точки и участки вокруг них (окрестности ключевых точек), размеры которых выбирают в пределах необходимых для вычисления градиентов яркости, в которых происходит максимальное изменение градиента яркости и которые представляют собой, например, края линий, небольшие круги, резкие перепады освещенности, углы, после чего осуществляют поиск на входном изображении соответствующих ключевых точек эталонного изображения, при этом выделение ключевых точек на изображении осуществляют с помощью матрицы Гессе, определяя ее детерминанты, называемые гессианами, достигающими экстремума в точках максимального изменения градиента яркости, причем определение гессианов для каждой ключевой точки производят с помощью разномасштабных фильтров Хаара, далее создают дескрипторы ключевых точек, в виде чисел, отображающих флуктуации градиента яркости вокруг ключевой точки, полученные числа инвариантны к масштабу и вращению, причем флуктуации градиента окрестностей ключевой точки вычисляют относительно направления градиента по всей окрестности ключевой точки, после этого для каждой ключевой точки эталонного изображения по признаку максимальной близости дескрипторов определяют соответствующую ключевую точку на текущем изображении, и, если размер близости меньше заданного порога, соответствующую ключевую точку на текущем изображении запоминают, при этом заданный порог определяют экспериментально по результатам оценки среднего значения сумм разностей дескрипторов по всем ключевым точкам эталонного и текущего изображений, и, когда количество соответствующих ключевых точек на текущем изображении превысит половину ключевых точек на эталонном изображении, объект сцены на изображении считают распознанным.The achievement of the technical result is provided in the proposed method for automatic recognition of scenes and objects in the image, in which reference images are formed and stored, the stream of input images is processed to find the object of interest on them, using reference images, according to the invention, key points are selected on the reference image and areas around them (vicinities of key points), the sizes of which are chosen within the limits necessary for calculating the brightness gradients, in which the maximum change in the brightness gradient occurs and which are, for example, the edges of lines, small circles, sharp changes in illumination, corners, after which search on the input image of the corresponding key points of the reference image, while the selection of key points on the image is carried out using the Hessian matrix, determining its determinants, called Hessians, reaching an extremum at the points of maximum changes in the brightness gradient, and the Hessians for each key point are determined using different-scale Haar filters, then descriptors of the key points are created in the form of numbers representing the fluctuations of the brightness gradient around the key point, the resulting numbers are invariant to scale and rotation, and the fluctuations of the gradient of the surroundings of the key point is calculated relative to the direction of the gradient over the entire vicinity of the key point, after that, for each key point of the reference image, on the basis of the maximum proximity of the descriptors, the corresponding key point on the current image is determined, and if the proximity size is less than the specified threshold, the corresponding key point on the current image is stored, while the specified threshold is determined experimentally based on the results of estimating the average value of the sums of descriptor differences over all key points of the reference and current images, and when the number of corresponding key points on the current image exceeds half of the key points on the reference image, the scene object on the image is considered recognized.

Выделение на изображении ключевых точек с участками вокруг них, в которых происходит максимальное изменение градиента яркости, обеспечивает получение информации о локальных изменениях яркости в анализируемом изображении. При этом применяемая обработка ключевых точек с помощью матриц Гессе и фильтров Хаара, последующее создание дескрипторов ключевых точек, инвариантных к масштабу и вращению, и использование максимальной близости дескрипторов при сравнении с заданным порогом позволяют при обнаружении на текущем изображении половины количества ключевых точек эталонного изображения обеспечить высокую вероятность правильного обнаружения объектов интереса. При этом количество ключевых точек значительно меньше пикселов во всем анализируемом изображении, что значительно ускоряет процесс обработки потока входных изображений и уменьшает объем обрабатываемой информации.The selection of key points on the image with areas around them, in which the maximum change in the brightness gradient occurs, provides information on local changes in brightness in the analyzed image. At the same time, the applied processing of key points using Hesse matrices and Haar filters, the subsequent creation of descriptors of key points that are invariant to scale and rotation, and the use of the maximum proximity of descriptors when compared with a given threshold, make it possible to provide high probability of correct detection of objects of interest. At the same time, the number of key points is much less than pixels in the entire analyzed image, which significantly speeds up the processing of the input image stream and reduces the amount of information being processed.

Кроме того, поскольку применяемые при выделении ключевых точек на изображении с помощью матрицы Гессе детерминанты, называемые гессианами, являются производными и зависят только от перепада яркости, но не от абсолютного ее уровня, то они инвариантны по отношению к сдвигу яркости изображения, поэтому изменение уровня освещения образца не влияет на обнаружение ключевых точек, т.е. значительно уменьшаются зависимость от освещенности, погодных условий, вероятность ложного обнаружения объектов.In addition, since the determinants used in selecting key points on the image using the Hessian matrix, called Hessians, are derivatives and depend only on the brightness difference, but not on its absolute level, they are invariant with respect to the image brightness shift, so changing the illumination level sample does not affect keypoint detection, i.e. the dependence on illumination, weather conditions, and the probability of false detection of objects are significantly reduced.

Предлагаемый способ поясняется чертежами, где на фиг. 1 показаны ключевые точки изображения на эталоне(справа) и на сцене(слева). При этом, несмотря на то, что сцена имеет другой масштаб, угол обзора и частично заслонена другим объектом, ключевые точки достаточно точно идентифицируются.The proposed method is illustrated by drawings, where in Fig. 1 shows the key points of the image on the standard (on the right) and on the stage (on the left). At the same time, despite the fact that the scene has a different scale, viewing angle and is partially obscured by another object, the key points are identified quite accurately.

На фиг. 2 показаны ключевые точки изображения здания, найденные с помощью матрицы Гессе. Диаметр круга показывает масштаб ключевой точки. Прямая линия - направление градиента яркости.In FIG. 2 shows the key points of the building image found using the Hessian matrix. The diameter of the circle shows the scale of the key point. The straight line is the direction of the brightness gradient.

На фиг. 3 показаны экстремумы гессиана.In FIG. 3 shows the extrema of the Hessian.

На фиг. 4 приведен пример распознавания ключевых точек, где показаны концы отрезка, распознанные как ключевые точки, с помощью матрицы Гессе.In FIG. 4 shows an example of keypoint recognition, showing the ends of the segment recognized as keypoints using the Hessian matrix.

На фиг. 5 изображены дискретизированные фильтры для нахождения четырех элементов матрицы Гессе (четвертый - совпадает с третьим, поскольку матрица Гессе симметрична). Фильтры имеют пространственный масштаб 9×9 пикселов. Темные участки соответствуют отрицательным значениям фильтра, светлые - положительным.In FIG. Figure 5 shows discretized filters for finding four elements of the Hessian matrix (the fourth one is the same as the third one, since the Hessian matrix is symmetric). The filters have a spatial scale of 9×9 pixels. Dark areas correspond to negative filter values, light areas correspond to positive ones.

На фиг. 6 изображены фильтры, используемые для нахождения матрицы Гессе. Белые области соответствуют значению +1, черные -2 (на третьем фильтре - 1), серые - нулевые. Пространственный масштаб - 9×9 пикселов.In FIG. 6 shows the filters used to find the Hessian matrix. White areas correspond to the value +1, black -2 (on the third filter - 1), gray - zero. Spatial scale - 9×9 pixels.

На фиг. 7 поясняется использование гессиана в предлагаемом способе для распознавания как светлых точек на темном фоне, так и темных точек на светлом фоне.In FIG. 7 explains the use of the Hessian in the proposed method for recognizing both light dots on a dark background and dark dots on a light background.

На фиг. 8 показаны две ключевые точки разного масштаба в одной точке изображения.In FIG. 8 shows two key points of different scales in one point of the image.

На фиг. 9 показано разбиение всех множеств масштабов фильтров Fast-Hessian на октавы, показаны первые три октавы. Цифры в прямоугольниках показывают размер фильтра Fast-Hessian. Логарифмическая шкала снизу - показывает масштабы, покрываемые октавами.In FIG. 9 shows the partitioning of all Fast-Hessian filter scale sets into octaves, showing the first three octaves. The numbers in the boxes show the size of the Fast-Hessian filter. Logarithmic scale below - shows scales covered by octaves.

На фиг. 10 поясняется поиск локального максимума, где пиксел, помеченный крестиком, считается локальным максимумом, если его гессиан больше чем у любого соседнего пиксела в его масштабе, а также больше любого из соседних пикселов масштабом меньше и масштабом больше (всего 26 соседей).In FIG. 10 explains the search for a local maximum, where a pixel marked with a cross is considered a local maximum if its Hessian is greater than that of any neighboring pixel in its scale, and also greater than any of the neighboring pixels with a scale less and a scale greater (26 neighbors in total).

На фиг. 11 приведен вид фильтров Хаара, где черные области имеют значения -1, белые +1.In FIG. Figure 11 shows a view of the Haar filters, where black areas have values of -1, white areas +1.

На фиг. 12 показаны все найденные значения градиентов вейвлета Хаара dX и dY в виде точек в пространстве dX и dY.In FIG. 12 shows all the found values of the Haar wavelet gradients dX and dY as points in the space dX and dY.

На фиг. 13 приведен пример градиента при идеальном крае, и при крае с шумом.In FIG. 13 shows an example of a gradient with a perfect edge and a noisy edge.

На фиг. 14 иллюстрируется вычисление компонентов дескриптора в прямоугольной области, имеющей размер 20s.In FIG. 14 illustrates the calculation of descriptor components in a rectangular region having a size of 20s.

На фиг. 15 приведены примеры дескрипторов для изображений, где показано поведение дескриптора для разных изображений. Для равномерных областей - все значения близки к нулю. Для повторяющихся вертикальных полосок - все величины, кроме второй близки к нулю. При увеличении яркости в направлении оси X, две первые компоненты имеют большие значения.In FIG. Figure 15 shows examples of descriptors for images, showing the behavior of the descriptor for different images. For uniform regions, all values are close to zero. For repeating vertical stripes, all values except the second are close to zero. As the brightness increases in the X-axis direction, the first two components have larger values.

На фиг. 16 изображено эталонное изображение.In FIG. 16 shows a reference image.

На фиг. 17 - приведено изображение сцены.In FIG. 17 - shows the image of the scene.

На фиг. 18 - показаны ключевые точки эталонного изображения.In FIG. 18 shows the key points of the reference image.

На фиг. 19 приведен результат распознавания по ключевым точкам.In FIG. 19 shows the result of recognition by key points.

На фиг. 20 приведена блок-схема алгоритма реализации предлагаемого способа.In FIG. 20 shows a block diagram of the algorithm for implementing the proposed method.

Рассмотрим пример осуществления предлагаемого способа.Consider an example of the proposed method.

Предварим собственно описание предлагаемого способа следующими пояснениями.Let's preface the actual description of the proposed method with the following explanations.

Современные компьютеры могут хранить огромные объемы информации в виде изображений, и видео-файлов. Для распознавания объектов на текущем изображении, получаемом с приемной камеры, препятствует ряд моментов:Modern computers can store huge amounts of information in the form of images and video files. The recognition of objects in the current image received from the receiving camera is prevented by a number of factors:

1. Изображения имеют разный масштаб. Одинаковые объекты на самом деле занимают разную площадь на разных изображениях.1. Images have a different scale. The same objects actually occupy different areas in different images.

2. Интересующий нас объект может находиться в разных местах изображения.2. The object of interest to us can be located in different places in the image.

3. Объект, который мы воспринимаем как что-то отдельное, на изображении никак не выделен, и находится на фоне других предметов и объектов. Кроме того, изображение не идеально и может быть подвержено всякого рода искажениям и помехам.3. An object that we perceive as something separate is not highlighted in the image, and is located against the background of other objects and objects. In addition, the image is not perfect and may be subject to all sorts of distortion and noise.

4. Изображение является лишь двумерной проекцией трехмерного мира. Поэтому поворот объекта и изменение угла обзора кардинальным образом влияют на его двумерную проекцию - изображение. Один и тот же объект может давать совершенно разную картину, в зависимости от поворота или расстояния до него.4. The image is only a two-dimensional projection of the three-dimensional world. Therefore, the rotation of the object and the change in the viewing angle drastically affect its two-dimensional projection - the image. The same object can give a completely different picture, depending on the turn or the distance to it.

Итак, даны два изображения, одно из них будем считать эталонным изображением (образцом), другое - сценой. Задача сводится к определению факта наличия образца на сцене, и к его локализации.So, two images are given, one of them will be considered a reference image (sample), the other - a scene. The task is reduced to determining the fact of the presence of a sample on the stage, and to its localization.

При этом образец на сцене может:In this case, the sample on the stage can:

а) иметь другой масштабa) have a different scale

б) быть повернут в плоскости изображенияb) be rotated in the image plane

в) быть в произвольном месте сценыc) be in an arbitrary place on the stage

г) может быть зашумлен, виден не полностью, частично заслонен другими предметамиd) may be noisy, not fully visible, partially obscured by other objects

д) может иметь отличную от образца яркость и контрастe) may have a different brightness and contrast from the sample

е) его может не быть совсем.e) it may not be at all.

Метод корреляции, использованный в прототипе, предлагает получить образец в разных масштабах, повернуть его на всевозможные углы, перебрать все возможные места на сцене, и все эти эталоны попиксельно сравнить со сценой. Однако данное решение практически трудно реализовать. Действительно, если образец и сцена имеют типичные размеры - порядка сотен пикселов по вертикали и горизонтали, то посчитав общее число всевозможных эталонов, их поворотов, масштабов и локализации, а также умножив на число операций попиксельного (корреляционного) сравнения, получим около триллиона операций для поиска и локализации эталона на сцене.The correlation method used in the prototype proposes to get a sample at different scales, rotate it to various angles, sort through all possible places on the scene, and compare all these standards pixel by pixel with the scene. However, this solution is practically difficult to implement. Indeed, if the sample and the scene have typical dimensions - on the order of hundreds of pixels vertically and horizontally, then by counting the total number of various standards, their rotations, scales and localization, and also multiplying by the number of pixel-by-pixel (correlation) comparison operations, we get about a trillion operations for searching and localization of the standard on the stage.

Кроме того, непосредственное сравнение образца со сценой может дать плохой результат, из-за шумов, искажений, заслонения, объектов фона.In addition, a direct comparison of the sample with the scene may give a poor result, due to noise, distortion, obscuration, background objects.

В заявляемом способе предлагается выделить на образце некие ключевые точки и небольшие участки вокруг них. Ключевой точкой будем считать такую точку, которая имеет некие признаки, существенно отличающие ее от основной массы точек. Например, это могут быть края линий, небольшие круги, резкие перепады освещенности, углы и т.д. Предполагая, что ключевые точки присутствуют на образце всегда, то можно поиск образца свести к поиску на сцене ключевых точек образца. А поскольку ключевые точки сильно отличаются от основной массы точек, то их число будет существенно меньше, чем общее число точек эталона.In the claimed method, it is proposed to highlight certain key points and small areas around them on the sample. We consider a key point to be a point that has certain features that significantly distinguish it from the bulk of points. For example, it can be the edges of lines, small circles, sudden changes in illumination, corners, etc. Assuming that the key points are always present on the sample, then the search for the sample can be reduced to the search for the key points of the sample on the scene. And since the key points are very different from the main mass of points, their number will be significantly less than the total number of points of the standard.

Главное то, что их не слишком много и они присутствуют на изображении эталона всегда. Вокруг точек выделяют малые участки. Чем меньше участок, тем меньше на него влияют крупномасштабные искажения. Так, если объект в целом, подвержен эффекту перспективы (то есть ближний край объекта имеет больший видимый размер, чем дальний), то для малого его участка явлением перспективы можно пренебречь и заменить на изменение масштаба. Аналогично, небольшой поворот объекта вокруг некоторой оси может сильно изменить картинку объекта в целом, но малые участки изменятся незначительно. Кроме того, если часть объекта выходит за край изображения или заслонена, то небольшие участки вокруг части ключевых точек будут видны целиком, что также позволяет их легче идентифицировать. А еще, если малые области лежат целиком внутри искомого объекта, то на них не оказывают никакого влияния объекты фона. С другой стороны, участок вокруг ключевой точки не должен быть слишком мал. Очень малые участки несут слишком мало информации об изображении и с большей вероятностью могут случайно совпадать между собой.The main thing is that there are not too many of them and they are always present on the image of the standard. Small areas are allocated around the points. The smaller the area, the less it is affected by large-scale distortions. So, if the object as a whole is subject to the perspective effect (that is, the near edge of the object has a larger apparent size than the far one), then for a small part of it, the phenomenon of perspective can be neglected and replaced by a change in scale. Similarly, a small rotation of an object around some axis can greatly change the image of the object as a whole, but small areas will change slightly. Also, if part of the object extends off the edge of the image or is obscured, then small areas around part of the key points will be visible in their entirety, which also makes them easier to identify. And yet, if small areas lie entirely inside the desired object, then they are not affected by background objects. On the other hand, the area around the key point should not be too small. Very small areas carry too little information about the image and are more likely to coincide with each other by chance.

На фиг. 1 изображены ключевые точки изображения на эталоне(справа) и на сцене(слева).In FIG. 1 shows the key points of the image on the standard (on the right) and on the stage (on the left).

С другой стороны, участок вокруг ключевой точки не должен быть слишком мал. Очень малые участки несут слишком мало информации об изображении и с большей вероятностью могут случайно совпадать между собой.On the other hand, the area around the key point should not be too small. Very small areas carry too little information about the image and are more likely to coincide with each other by chance.

Предлагаемый способ решает две задачи - поиск ключевых точек изображения и создание их дескрипторов, инвариантных к масштабу и вращению. Это значит, что описание ключевой точки будет одинаково, даже если образец изменит размер и будет повернут (здесь и далее мы будем говорить только о вращении в плоскости изображения). Кроме того, сам поиск ключевых точек тоже должен обладать инвариантностью. Так, что бы повернутый объект сцены имел тот же набор ключевых точек, что и образец.The proposed method solves two problems - finding the key points of the image and creating their descriptors that are invariant to scale and rotation. This means that the description of the key point will be the same even if the sample is resized and rotated (hereinafter we will only talk about rotation in the image plane). In addition, the search for key points itself must also be invariant. So that the rotated scene object has the same set of key points as the sample.

В способе выделяют ключевые точки с помощью матрицы Гессе. Детерминант матрицы Гессе (т.н. гессиан) достигает экстремума в точках максимального изменения градиента яркости. Он хорошо детектирует пятна, углы и края линий.In the method, key points are identified using the Hessian matrix. The determinant of the Hessian matrix (the so-called Hessian) reaches an extremum at the points of maximum change in the brightness gradient. It detects spots, corners and edges of lines well.

Выше приведены формулы для матрицы Гессе (1.) и Гессиана (детерминанта) (2.), достигающего максимума в точке максимального изменения градиента яркости.Above are the formulas for the Hessian matrix (1.) and the Hessian (determinant) (2.), reaching a maximum at the point of maximum change in the brightness gradient.

Гессиан инвариантен относительно вращения. Но не инвариантен масштабу. Поэтому в способе используют разномасштабные фильтры для нахождения гессианов. Для каждой ключевой точки рассчитывают направление максимального изменения яркости (градиент) и масштаб, взятый из масштабного коэффициента матрицы Гессе. Градиент в точке вычисляется с помощью фильтров Хаара.The Hessian is invariant under rotation. But it is not scale invariant. Therefore, the method uses multi-scale filters to find the Hessians. For each key point, the direction of maximum change in brightness (gradient) and the scale taken from the scale factor of the Hessian matrix are calculated. The gradient at a point is calculated using Haar filters.

После нахождения ключевых точек, в способе формируют их дескрипторы. Дескриптор представляет собой набор из 64 (либо 128) чисел для каждой ключевой точки. Эти числа отображают флуктуации градиента вокруг ключевой точки (что понимается под флуктуацией - рассмотрим ниже). Поскольку ключевая точка представляет собой максимум гессиана, то это гарантирует, что в окрестности точки должны быть участки с разными градиентами. Таким образом, обеспечивается дисперсия (различие) дескрипторов для разных ключевых точек.After finding the key points, their descriptors are formed in the method. The descriptor is a set of 64 (or 128) numbers for each key point. These numbers represent fluctuations in the gradient around the key point (what is meant by fluctuation - see below). Since the key point is the maximum of the Hessian, this guarantees that there must be areas with different gradients in the vicinity of the point. Thus, the dispersion (difference) of descriptors for different key points is provided.

Флуктуации градиента окрестностей ключевой точки считаются относительно направления градиента вокруг точки в целом (по всей окрестности ключевой точки). Таким образом, достигается инвариантность дескриптора относительно вращения. Размер же области, на которой считается дескриптор, определяется масштабом матрицы Гессе, что обеспечивает инвариантность относительно масштаба. Флуктуации градиента также считаются с помощью фильтра Хаара.Fluctuations of the gradient around the key point are calculated relative to the direction of the gradient around the point as a whole (over the entire neighborhood of the key point). Thus, the invariance of the descriptor with respect to rotation is achieved. The size of the area on which the descriptor is considered is determined by the scale of the Hessian matrix, which ensures scale invariance. Gradient fluctuations are also considered using the Haar filter.

Значение гессиана используется для нахождения локального минимума или максимума яркости изображения. В этих точках значение гессиана достигает экстремума. На фиг. 3 видно, что особые точки (очерченные цветными кругами) представляют собой локальные экстремумы яркости изображения. Мелкие точки не распознаны как особые, из-за порогового отсечения по величине гессиана.The Hessian value is used to find the local minimum or maximum of the image brightness. At these points, the value of the Hessian reaches an extremum. In FIG. 3 it can be seen that the singular points (outlined by colored circles) are local extrema of the image brightness. Small dots are not recognized as special, due to the threshold cutoff by the Hessian value.

На Фиг. 4 показаны концы отрезка, распознанные как ключевые точки, с помощью матрицы Гессе.On FIG. 4 shows the ends of the segment recognized as key points using the Hessian matrix.

Теоретически, вычисление матрицы Гессе сводится к нахождению Лапласиана Гауссиан. По сути, элементы матрицы Гессе вычисляются как свертка (сумма произведений) пикселов изображения на фильтры, изображенные на фиг. 5.Theoretically, the calculation of the Hessian matrix is reduced to finding the Laplacian Gaussian. Essentially, the elements of the Hessian matrix are calculated as the convolution (sum of products) of image pixels by the filters shown in FIG. 5.

На Фиг. 5 изображены дискретизированные фильтры для нахождения четырех элементов матрицы Гессе (четвертый - совпадает с третьим, поскольку матрица Гессе симметрична). Фильтры имеют пространственный масштаб 9×9 пикселов. Темные участки соответствуют отрицательным значениям фильтра, светлые - положительным. Однако, предлагаемый способ не использует лапласиан гауссиан в том виде, который изображен на фиг. 5. Во-первых, дискретизированный лапласиан гауссиан имеет довольно большой разброс значений детерминанта, при вращении образца (в идеале гессиан должен быть инвариантен к вращению). Особенно детерминант «проседает» в районе поворота на 45 градусов. А во-вторых, и это главное, фильтр для лапласиана гауссианы имеет непрерывный характер. Почти все пикселы фильтра имеют разные величины яркости. А это не позволяет использовать быстрый механизм расчета. Поэтому используется бинаризированная аппроксимация лапласиана гауссиан. (Fast-Hessian): На Фиг. 6 изображены бинаризированные фильтры, используемые для нахождения матрицы Гессе. Белые области соответствуют значению +1, черные -2 (на третьем фильтре -1), серые - нулевые. Пространственный масштаб - 9×9 пикселов. Этот фильтр более устойчив к вращению, и его можно эффективно вычислить. Таким образом, гессиан вычисляется так (3.):On FIG. Figure 5 shows discretized filters for finding four elements of the Hessian matrix (the fourth one is the same as the third one, since the Hessian matrix is symmetric). The filters have a spatial scale of 9×9 pixels. Dark areas correspond to negative filter values, light areas correspond to positive ones. However, the proposed method does not use the Gaussian Laplacian as shown in FIG. 5. First, the discretized Laplacian Gaussian has a fairly large spread of determinant values as the sample rotates (ideally, the Hessian should be invariant to rotation). Especially the determinant "sags" in the area of rotation by 45 degrees. And secondly, and most importantly, the filter for the Laplacian of the Gaussian has a continuous character. Almost all filter pixels have different brightness values. And this does not allow using a fast calculation mechanism. Therefore, a binarized approximation of the Laplacian of the Gaussians is used. (Fast-Hessian): In FIG. 6 shows the binarized filters used to find the Hessian matrix. White areas correspond to the value +1, black -2 (on the third filter -1), gray - zero. Spatial scale - 9×9 pixels. This filter is more resistant to rotation and can be computed efficiently. Thus, the Hessian is calculated as follows (3.):

Где Dxx, Dyy, Dxy - свертки по фильтрам, изображенным на фиг. 6. Коэффициент 0.9 имеет теоретическое обоснование, и корректирует приближенный характер вычислений.Where Dxx, Dyy, Dxy are convolutions over the filters shown in FIG. 6. Coefficient 0.9 has a theoretical justification, and corrects the approximate nature of the calculations.

Итак, для нахождения ключевых точек, анализируют точки изображения и определяют максимум гессиана. В способе задается пороговое значение гессиана. Если вычисленное значение для анализируемой точки выше порога - точка рассматривается как кандидат на ключевую точку.So, to find the key points, the points of the image are analyzed and the maximum of the Hessian is determined. The method specifies the threshold value of the Hessian. If the calculated value for the analyzed point is above the threshold - the point is considered as a key point candidate.

Следует заметить, что поскольку гессиан является производной, и зависит только от перепада яркости, но не от абсолютного ее уровня, то он инвариантен по отношению к сдвигу яркости изображения. Таким образом, изменение уровня освещения образца не влияет на обнаружение ключевых точек.It should be noted that since the Hessian is a derivative, and depends only on the brightness difference, but not on its absolute level, it is invariant with respect to the image brightness shift. Thus, changing the light level of the sample does not affect keypoint detection.

Кроме того, свойства гессиана таковы, что он достигает максимума, как в точке белого пятна на черном фоне, так и черного пятна на белом фоне. Таким образом, способ обнаруживает и темные, и светлые особенности изображения.(Фиг. 7). Однако, гессиан не инвариантен относительно масштаба. Это значит, что для одного и того же пиксела, гессиан может меняться при изменении масштаба фильтра. Решение этой проблемы только одно - перебирать различные масштабы фильтров и поочередно их применять к данному пикселу.In addition, the properties of the Hessian are such that it reaches a maximum both at the point of a white spot on a black background and a black spot on a white background. Thus, the method detects both dark and light features of the image (FIG. 7). However, the Hessian is not scale invariant. This means that for the same pixel, the Hessian can change as the filter scale changes. There is only one solution to this problem - to sort through different filter scales and apply them one by one to a given pixel.

Из соображений симметрии и дискретизации, размер фильтра Fast-Hessian не может принимать произвольные значения. Допустимые размеры этого фильтра таковы (начиная с минимального): 9, 15, 21, 27 и так далее, с шагом 6. Однако, на практике, постепенно увеличивать размер фильтра на 6 - не выгодно, потому что для крупных масштабов шаг 6 оказывается слишком мелким, а фильтры - избыточными. Поэтому (и по некоторым другим причинам), разбивается все множество масштабов на так называемые октавы. Каждая октава покрывает определенный интервал масштабов, и имеет свой характерный размер фильтра.For reasons of symmetry and sampling, the size of the Fast-Hessian filter cannot be arbitrary. The allowable sizes of this filter are as follows (starting from the minimum): 9, 15, 21, 27, and so on, in increments of 6. However, in practice, gradually increasing the filter size by 6 is not beneficial, because for large scales, step 6 is too small, and filters - redundant. Therefore (and for some other reasons), the whole set of scales is divided into so-called octaves. Each octave covers a certain range of scales, and has its own characteristic filter size.

При этом если бы на октаву приходился только один фильтр, это было бы слишком грубым приближением. Кроме того, мы бы не могли найти локальный максимум гессиана, среди разных масштабов, в разных октавах. Ведь одна и та же точка может иметь несколько локальных максимумов гессиана, в разных масштабах. Это хорошо видно на фиг.8, где рассмотрены две ключевые точки разного масштаба в одной точке изображения Если искать максимум среди всех гессианов, по всем масштабам, то нашелся бы только один из максимумов, в то время как их может быть несколько. Один - в одном масштабе, другой - в другом.Moreover, if there was only one filter per octave, this would be too rough an approximation. In addition, we could not find the local maximum of the Hessian, among different scales, in different octaves. After all, the same point can have several local maxima of the Hessian, on different scales. This is clearly seen in Fig. 8, where two key points of different scales are considered in one point of the image. If we search for the maximum among all Hessians, for all scales, then only one of the maxima would be found, while there may be several. One - in one scale, the other - in another.

Исходя из перечисленного, октава содержит не один фильтр, а четыре фильтра, которые хорошо покрывают характерный масштаб октавы:Based on the above, the octave contains not one filter, but four filters that cover the characteristic scale of the octave well:

На фиг. 9 показаны первые три октавы Цифры в прямоугольниках показывают размер фильтра Fast-Hessian. Логарифмическая шкала снизу - показывает масштабы, покрываемые октавами.In FIG. Figure 9 shows the first three octaves. The numbers in the boxes show the size of the Fast-Hessian filter. Logarithmic scale below - shows scales covered by octaves.

Шаг размера фильтра в первой октаве - составляет 6, во второй - 12, в третьей - 24 и так далее.The filter size step in the first octave is 6, in the second - 12, in the third - 24, and so on.

Как видим, октавы значительно перекрываются друг другом. Это увеличивает надежность нахождения локальных максимумов. Почему в октаве именно четыре фильтра станет ясно из следующих пояснений.As you can see, the octaves overlap significantly with each other. This increases the reliability of finding local maxima. Why exactly four filters in an octave will become clear from the following explanations.

Возникает вопрос, а сколько собственно октав достаточно для покрытия множества ключевых точек разных масштабов? Теоретически, масштабы бесконечны, однако в реальных изображениях, они вполне конечны, и основная масса сосредоточена в интервале от 1 до 10 (по данным анализа множества изображений). Для покрытия этого диапазона достаточно четырех октав. Плюс добавляется одна или две октавы для покрытия больших масштабов. Итого, используется 5-6 октав. Теоретически, этого вполне достаточно для покрытия всевозможных масштабов на изображении 1024×768 пикселов.The question arises, how many actual octaves are enough to cover many key points of different scales? Theoretically, the scales are infinite, but in real images, they are quite finite, and the bulk is concentrated in the range from 1 to 10 (according to the analysis of many images). Four octaves are enough to cover this range. Plus one or two octaves are added to cover larger scales. In total, 5-6 octaves are used. Theoretically, this is quite enough to cover all possible scales on a 1024×768 pixel image.

Для нахождения локального максимума гессиана, используется так называемый метод соседних точек 3×3×3.To find the local maximum of the Hessian, the so-called 3×3×3 neighboring points method is used.

Его смысл понятен из фиг. 10, где изображен поиск локального максимума. Пиксел, помеченный крестиком считается локальным максимумом, если его гессиан больше чем у любого его соседа в его масштабе, а также больше любого из соседей масштабом меньше и масштабом больше (всего 26 соседей).Its meaning is clear from Fig. 10, which shows the search for a local maximum. A pixel marked with a cross is considered a local maximum if its Hessian is greater than that of any of its neighbors in its scale, and also greater than any of its neighbors with a scale smaller and larger (26 neighbors in total).

Исходя из такого определения локального максимума, понятно, что октава должна содержать не менее трех фильтров, иначе мы не сможем определить факт нахождения локального максимума гессиана внутри октавы.Based on this definition of the local maximum, it is clear that the octave must contain at least three filters, otherwise we will not be able to determine the fact that the local maximum of the Hessian is inside the octave.

Отметим, что фильтры октавы вычисляют не для всех пикселов подряд. Первая октава считается для каждого второго пиксела изображения. Вторая - для каждого четвертого, третья - для каждого восьмого и так далее. Смысл понятен - две точки с расстоянием 2 не могут содержать более одного максимума масштаба 2, 3 или более высоких масштабов. Поэтому нет смысла перебирать все точки изображения, для нахождения максимума масштаба 3, например.Note that octave filters are not calculated for all pixels in a row. The first octave is counted for every second pixel in the image. The second - for every fourth, the third - for every eighth and so on. The meaning is clear - two points with a distance of 2 cannot contain more than one scale maximum of 2, 3 or higher scales. Therefore, it makes no sense to iterate over all points of the image, to find the maximum scale of 3, for example.

Удвоение шага пикселов для октав позволяет экономить при расчете фильтров. Размеры фильтров в октавах повторяются. Так, например, фильтр размером 27 присутствует в трех октавах, при вычислениях, этот фильтр будет считаться только для первой октавы. Вторая и третья - просто используют расчеты первой октавы. А удвоение шага пикселов гарантирует, что точки в которых нужно вычислять гессиан, уже были просчитаны предыдущей октавой.Doubling the pixel pitch for octaves saves money when calculating filters. Filter sizes in octaves are repeated. So, for example, a filter of size 27 is present in three octaves, in calculations, this filter will be considered only for the first octave. The second and third - just use the calculations of the first octave. And doubling the pixel pitch ensures that the points at which you need to calculate the Hessian have already been calculated by the previous octave.

Поэтому, несмотря на то, что октава содержит четыре фильтра, на самом деле каждая октава (кроме первой) считает только два характерных для нее размера, два других - всегда можно взять из предыдущих октав. Первая же октава вынуждена вычислять все четыре своих фильтра.Therefore, despite the fact that an octave contains four filters, in fact, each octave (except the first one) counts only two signatures characteristic of it, the other two can always be taken from previous octaves. The first octave is forced to calculate all four of its filters.

Итак, после нахождения максимального гессиана методом соседних точек 3×3×3, мы нашли пиксел, в котором этот максимум достигается. Однако, поскольку, октава перебирает не все точки изображения, то истинный максимум может не совпадать с найденным пикселом, а лежать где-то рядом, в соседних пикселах.So, after finding the maximum Hessian using the 3×3×3 neighboring points method, we found the pixel where this maximum is reached. However, since the octave does not go through all the points of the image, the true maximum may not coincide with the found pixel, but lie somewhere nearby, in neighboring pixels.

Для нахождения точки истинного максимума, используется интерполирование найденных гессианов куба 3×3×3 квадратичной функцией. Далее, вычисляется производная (методом конечных разностей соседних точек). Если она близка к нулю - мы в точке истинного максимума. Если производная велика - сдвигаемся в сторону ее уменьшения, и повторяем итерацию, до тех пор пока производная не станет меньше заданного порога. Если в процессе итераций мы отходим от начальной точки слишком далеко, то это считается ложным максимумом, и точка больше не считается ключевой.To find the true maximum point, the interpolation of the found Hessians of the 3×3×3 cube by a quadratic function is used. Further, the derivative is calculated (by the method of finite differences of neighboring points). If it is close to zero, we are at the true maximum point. If the derivative is large, we shift in the direction of its decrease, and repeat the iteration until the derivative becomes less than the specified threshold. If, during the iteration process, we move too far from the starting point, then this is considered a false maximum, and the point is no longer considered a key point.

Для инвариантности вычисления дескрипторов особой точки, которые будут рассмотрены ниже, требуется определить преобладающую ориентацию перепадов яркости в ключевой точке. Это понятие близко к понятию градиента, но здесь используется немного другой алгоритм нахождения вектора ориентации.For the invariance of the calculation of the descriptors of the singular point, which will be considered below, it is required to determine the predominant orientation of the brightness drops at the key point. This concept is close to the concept of a gradient, but a slightly different algorithm for finding the orientation vector is used here.

Сначала, вычисляются точечные градиенты в пикселах, соседних с ключевой точкой. Для рассмотрения берутся пикселы в окружности радиуса 6s вокруг особой точки. Где s - масштаб особой точки. Для первой октавы берутся точки из окрестности радиусом 12.First, point gradients are calculated in pixels adjacent to the key point. For consideration, pixels are taken in a circle of radius 6s around a singular point. Where s is the scale of the singular point. For the first octave, points are taken from a neighborhood with a radius of 12.

Для вычисления градиента, используется фильтр Хаара. Размер фильтра берется равным 4s, где s - масштаб особой точки. Вид фильтров Хаара показан на фиг. 11. Черные области имеют значения -1, белые +1.To calculate the gradient, a Haar filter is used. The filter size is taken equal to 4s, where s is the scale of the singular point. The Haar filters are shown in Fig. 11. Black areas have values -1, white areas +1.

Фильтры Хаара дают точечное значение перепада яркости по оси X и Y соответственно. Поскольку фильтры Хаара имеют прямоугольную форму, их значения легко вычисляются. Значения вейвлета Хаара dX и dY для каждой точки умножаются на вес и запоминаются в массиве. Вес определяется как значение гауссианы с центром в особой точке и сигмой равной 2s. Взвешивание на гауссиане необходимо для отсечения случайных помех на далеких от ключевой точки расстояниях.Haar filters give a point value of the brightness difference along the X and Y axes, respectively. Because Haar filters are rectangular, their values are easy to calculate. The Haar wavelet values dX and dY for each point are multiplied by the weight and stored in an array. The weight is defined as the value of the Gaussian centered at the singular point and sigma equal to 2s. Weighing on the Gaussian is necessary to cut off random noise at distances far from the key point.

Далее, все найденные значения dX и dY, условно наносятся в виде точек на плоскость, как показано на фиг. 12:Further, all the found values of dX and dY are conditionally plotted as points on the plane, as shown in Fig. 12:

На фиг. 12 показаны все найденные градиенты в виде точек в пространстве dXdY.In FIG. 12 shows all found gradients as points in dXdY space.

Далее, берется угловое окно (показано серым на фиг. 12) размером π/3, и вращается вокруг центра координат. Выбирается такое положение окна, при котором длина суммарного вектора для попавших в окно точек - максимальна. Вычисленный таким образом вектор нормируется и принимается как приоритетное направление в области ключевой точки.Next, a corner window (shown in gray in Fig. 12) of size π/3 is taken and rotated around the center of coordinates. The position of the window is chosen, at which the length of the total vector for the points included in the window is maximum. The vector calculated in this way is normalized and accepted as a priority direction in the region of the key point.

Манипуляции с окном нужны для уменьшения влияния шумовых точек. На фиг. 13 приведен пример градиента при идеальном крае, и при крае с шумом:Manipulations with the window are needed to reduce the influence of noise points. In FIG. Figure 13 shows an example of a gradient with a perfect edge and a noisy edge:

Как видим, шум дает дополнительные градиенты в направлениях, не совпадающих с направлением основного градиента. Использование окна позволяет отсечь такие шумовые точки, и более точно вычислить истинный градиент.As you can see, the noise gives additional gradients in directions that do not coincide with the direction of the main gradient. Using a window allows you to cut off such noise points, and calculate the true gradient more accurately.

Дескриптор ключевой точки представляют собой массив из 64 (в расширенной версии 128) чисел, позволяющих идентифицировать особую точку. Дескрипторы одной и той же ключевой точки на образце и на сцене должны примерно совпадать. Метод расчета дескриптора таков, что он не зависит от вращения и масштаба.The key point descriptor is an array of 64 (in the extended version 128) numbers that allow you to identify a key point. The descriptors of the same key point on the sample and on the stage should approximately match. The descriptor calculation method is such that it does not depend on rotation and scale.

Для вычисления дескриптора, вокруг ключевой точки формируется прямоугольная область, имеющая размер 20s, где s - масштаб в котором была найдена ключевая точка (фиг. 14). Для первой октавы, область имеет размер 40×40 пикселов. Квадрат ориентируется вдоль приоритетного направления, вычисленного для ключевой точки. Дескриптор считается как описание градиента для 16 квадратов вокруг ключевой точки. Далее, квадрат разбивается на 16 более мелких квадратов, как показано на фиг. 14. В каждом квадрате берется регулярная сетка 5×5 и для точки сетки ищется градиент, с помощью фильтра Хаара. Размер фильтра Хаара берется равным 2s, и для первой октавы составляет 4×4.To calculate the descriptor, a rectangular area is formed around the key point, having a size of 20s, where s is the scale at which the key point was found (Fig. 14). For the first octave, the area is 40×40 pixels. The square is oriented along the priority direction calculated for the key point. The descriptor counts as the description of the gradient for the 16 squares around the key point. Next, the square is divided into 16 smaller squares, as shown in Fig. 14. A regular 5×5 grid is taken in each square and a gradient is found for the grid point using a Haar filter. The size of the Haar filter is taken equal to 2s, and for the first octave is 4×4.

Следует отметить, что при расчете фильтра Хаара, изображение не поворачивается, фильтр считается в обычных координатах изображения. А вот полученные координаты градиента (dX, dY) поворачиваются на угол, соответствующий ориентации квадрата.It should be noted that when calculating the Haar filter, the image is not rotated, the filter is calculated in the usual image coordinates. But the resulting gradient coordinates (dX, dY) are rotated by an angle corresponding to the orientation of the square.

Итого, для вычисления дескриптора ключевой точки, нужно вычислить 25 фильтров Хаара, каждый из 16 квадрантов. Итого, 400 фильтров Хаара. Учитывая, что на фильтр нужно 6 операций, выходит, что дескриптор обойдется минимум в 2400 операций.In total, to calculate the cue point descriptor, you need to calculate 25 Haar filters, each of 16 quadrants. Total, 400 Haar filters. Given that the filter needs 6 operations, it turns out that the descriptor will cost at least 2400 operations.

После нахождения 25 точечных градиентов квадранта, вычисляются четыре величины, которые собственно и являются компонентами дескриптора:After finding the 25 point gradients of the quadrant, four values are calculated, which are actually the components of the descriptor:

Две из них есть просто суммарный градиент по квадранту, а две других - сумма модулей точечных градиентов.Two of them are simply the total gradient over the quadrant, and the other two are the sum of the modules of the point gradients.

На фиг. 15 показано поведение этих величин для разных участков изображений. Рисунок на фиг. 15 показывает поведение дескриптора для разных изображений. Для равномерных областей - все значения близки к нулю. Для повторяющихся вертикальных полосок - все величины, кроме второй близки к нулю. При увеличении яркости в направлении оси X, две первые компоненты имеют большие значения.In FIG. 15 shows the behavior of these values for different parts of the images. The drawing in FIG. 15 shows the behavior of the descriptor for different images. For uniform regions, all values are close to zero. For repeating vertical stripes, all values except the second are close to zero. As the brightness increases in the x-axis direction, the first two components have larger values.

Четыре компонента на каждый квадрант, и 16 квадрантов, дают 64 компонента дескриптора для всей области ключевой точки. При занесении в массив, значения дескрипторов взвешиваются на гауссиану, с центром в ключевой точке и с сигмой 3.3s. Это нужно для большей устойчивости дескриптора к шумам в удаленных от ключевой точки областях.Four components per quadrant, and 16 quadrants, give 64 descriptor components for the entire cue point area. When arrayed, the descriptor values are Gaussian weighted, centered on the key point, and with a sigma of 3.3s. This is necessary for greater resistance of the descriptor to noise in areas remote from the key point.

Плюс к дескриптору, для описания ключевой точки используется знак следа матрицы Гессе, то есть величина sign(Dxx + Dyy). Для светлых точек на темном фоне, след отрицателен, для темных точек на светлом фоне - положителен. Таким образом различаются светлые и темные пятна.In addition to the descriptor, the trace sign of the Hessian matrix is used to describe the key point, that is, the value sign(Dxx + Dyy). For light dots on a dark background, the trail is negative; for dark dots on a light background, it is positive. Thus, light and dark spots are distinguished.

Таким образом для каждой ключевой точки размер дескриптора 4*16=64Thus, for each key point, the size of the descriptor is 4*16=64

Имея дескрипторы всех ключевых точек эталона и дескрипторы всех ключевых точек изображения сцены, для каждой точки эталона определяется самая близкая (по дескриптору) точка на изображении сцены.Having descriptors of all key points of the template and descriptors of all key points of the scene image, for each point of the template the closest (by descriptor) point on the scene image is determined.

PE(i)(j) - j-ый элемент дескриптора для i-той ключевой точки эталона.PE(i)(j) - j-th element of the descriptor for the i-th key point of the pattern.

PI(k)(j) - j-ый элемент дескриптора для k-той ключевой точки сценыPI(k)(j) - j-th element of the descriptor for the k-th key point of the scene

Имея дескрипторы всех ключевых точек эталона и дескрипторы всех ключевых точек изображения сцены, для каждой точки эталона ищется самая близкая (по дескриптору) точка на изображении сцены.Having descriptors of all key points of the template and descriptors of all key points of the scene image, for each point of the template, the closest (by descriptor) point on the scene image is searched.

Критерий близости точек определяется как минимум суммы модулей разности значений дескрипторов для всех точек эталона с точками текущего изображения.The point proximity criterion is defined as the minimum sum of the modules of the difference between descriptor values for all points of the reference with the points of the current image.

Для каждой i-ой точки эталона ищется такая точка к на текущем изображении, для которой сумма модулей разности значений дескрипторов минимальна.For each i-th point of the reference, such a point k on the current image is searched for, for which the sum of the moduli of the difference in the values of the descriptors is minimal.

Таким образом на изображении сцены определяются точки, соответствующие эталонному изображению. Если эта сумма меньше заданного порога, то эта точка определяется как точка, соответствующая точке на эталонном изображении. Заданный порог определяется экспериментально по результатам оценки среднего значения сумм разностей дескрипторов по всем точкам эталонного и текущего изображений.Thus, points corresponding to the reference image are determined on the scene image. If this sum is less than a given threshold, then this point is defined as a point corresponding to a point on the reference image. The specified threshold is determined experimentally based on the results of estimating the average value of the sums of descriptor differences over all points of the reference and current images.

Таким образом определенные точки на изображении сцены соответствуют точкам эталонного изображения.Thus, the determined points on the scene image correspond to the points of the reference image.

Количество соответственных точек должно превышать половину точек эталона. Если это условие выполнено, то считается, что объект распознан на изображении сцены. На фиг. 16 представлен пример эталона для изображения сцены на фиг. 17.The number of corresponding points must exceed half the points of the standard. If this condition is met, then the object is considered to be recognized in the scene image. In FIG. 16 shows an example of a reference for the image of the scene in FIG. 17.

На фиг. 18 показаны ключевые точки, определенные на эталонном изображении. На фиг. 19 показан результат распознавания объекта по ключевым точкам изображения сцены.In FIG. 18 shows the key points defined in the reference image. In FIG. 19 shows the result of object recognition by the key points of the scene image.

На фиг. 20 Изображена блок-схема алгоритма автоматического распознавания объектов на изображении сцены. На блок-схеме приняты следующие сокращенные обозначения:In FIG. 20 The block diagram of the algorithm for automatic recognition of objects in the scene image is shown. The following abbreviations are used in the block diagram:

КТ - ключевая точка.CT is a key point.

Nt- число КТ текущего изображения.Nt is the number of CTs of the current image.

Ne- число КТ эталонного изображения.Ne is the number of CTs of the reference image.

Nsp- номер КТ эталона, для которой ищется наиболее близкая КТ текущего изображения.Nsp is the number of reference CT for which the closest CT of the current image is being searched.

dei -множество компонентов дескриптора для i-той ключевой точки эталонного изображения.dei is the set of descriptor components for the i-th key point of the reference image.

dti -множество компонентов дескриптора для i-той ключевой точки текущего изображения.dti - set of descriptor components for the i-th key point of the current image.

Dik- критерий близости i-ой ключевой точки эталона и k-ой ключевой точки текущего изображения.Dik is the proximity criterion of the i-th key point of the reference and the k-th key point of the current image.

minDik - минимальное значение близости между i-ой КТ эталона и k-ой КТ текущего изображения.minDik - the minimum proximity value between the i-th standard CT and the k-th CT of the current image.

Nsp- номер текущей точки эталона.Nsp is the number of the current reference point.

Т - порог для распознавания объекта, определяется экспериментально по результатам оценки среднего значения сумм модулей разностей дескрипторов по всем точкам эталона и текущего изображения.T - the threshold for object recognition, is determined experimentally based on the results of estimating the average value of the sums of the descriptor difference modules for all points of the reference and the current image.

Claims

A method for automatically recognizing scenes and objects in an image, in which reference images are formed and stored, the stream of input images is processed to find an object of interest on them, using reference images, characterized in that key points and areas around them are selected on the reference image, the dimensions of which are chosen within the limits necessary to calculate the brightness gradients, in which the maximum change in the brightness gradient occurs and which represent, for example, the edges of lines, small circles, sharp changes in illumination, corners, after which they search the input image for the corresponding key points of the reference image, with In this case, the selection of key points on the image is carried out using the Hessian matrix, determining its determinants, called Hessians, reaching an extremum at the points of maximum change in the brightness gradient, and the Hessians for each key point are determined using different large-scale Haar filters, then descriptors of key points are created in the form of numbers representing fluctuations of the brightness gradient around the key point and invariant to scale and rotation, and the fluctuations of the gradient of the neighborhood of the key point are calculated relative to the direction of the gradient over the entire neighborhood of the key point, after that, for each key point of the reference images on the basis of the maximum proximity of the descriptors determine the corresponding key point on the current image and, if the size of the proximity is less than the specified threshold, the corresponding key point on the current image is stored, while the specified threshold is determined experimentally based on the results of estimating the average value of the sums of the differences of the descriptors over all key points of the reference and of the current image, and when the number of corresponding key points on the current image exceeds half of the key points on the reference image, the scene object on the image is considered recognized.