RU2771745C1

RU2771745C1 - Method for tracking in real time anatomical references of an object

Info

Publication number: RU2771745C1
Application number: RU2021130751A
Authority: RU
Inventors: Александр Владимирович Колсанов; Сергей Сергеевич Чаплыгин; Сергей Викторович Ровнов; Сергей Сергеевич Денисов; Александр Евгеньевич Кузнецов
Filing date: 2021-10-21
Publication date: 2022-05-11

Abstract

FIELD: computer technology.

SUBSTANCE: present technical solution relates to the field of computer technology. The effect is achieved due to the method for tracking (tracking) in real time the anatomical landmarks of the object, in which at the stage of creating a training data set, when training the object recognition neural network, a uniform background and lighting of the object are set, the image is taken and the image is cut out from the original image of the object along the contour and placed on a randomly selected background and then the neural network is trained on a similarly augmented data set, after that, to give the tracking module stability when the object is close, their proportions are estimated and a weight coefficient is used for balancing, then the moving object changes the distance closer or further from the camera, after which it moves in the right direction.

EFFECT: improving the accuracy of real-time pose determination with high resistance to background and proportions.

1 cl

Description

Изобретение относится к области распознавания и анализа движения на основе изображений с помощью измерительной техники, и может применяться в различных областях, где требуется высокоточное определение позы человека, а также может применяться для контроля различной двигательной активности объекта наблюдения при реабилитации, занятиях лечебной физкультурой или в тренировочном процессе.The invention relates to the field of motion recognition and analysis based on images using measuring technology, and can be used in various areas where a highly accurate determination of a person's posture is required, and can also be used to control various motor activity of an object of observation during rehabilitation, physical therapy or training process.

Изобретение применяется в системе трекинга маркерных объектов и обратной связи в мобильных устройствах дополненной реальности, созданной в рамках программы деятельности лидирующего исследовательского центра «Платформенная технология виртуальной и дополненной реальности» ФГБОУ ВО СамГМУ Минздрава России. The invention is used in the tracking system of marker objects and feedback in augmented reality mobile devices, created as part of the program of activities of the leading research center "Platform technology of virtual and augmented reality" of Samara State Medical University of the Ministry of Health of Russia.

Из уровня техники, патент RU № 2530334 с приоритетом от 12.01.2010 г., известно визуальное отслеживание цели заключающаяся в отслеживании позы человека с использованием датчика глубины, связывающая модель человека в виртуальном пространстве и вносящая изменение в положение модели, основанное на изменении позы человека.From the prior art, patent RU No. 2530334 with priority dated 01/12/2010, visual target tracking is known, which consists in tracking a person's posture using a depth sensor, connecting a person model in virtual space and making a change in the position of the model based on a change in the person's posture.

Основным недостатком данного аналога (как и всей линейки патентов от MicroSoft) является необходимость использовать дополнительное специализированное оборудование в виде датчика глубины, что в итоге, снижает доступность использования данного патента для решения поставленных задач определения позы.The main disadvantage of this analogue (as well as the entire line of patents from MicroSoft) is the need to use additional specialized equipment in the form of a depth sensor, which ultimately reduces the availability of using this patent to solve the tasks of determining the pose.

Из патента RU № 2635226 с приоритетом от 25.05.2012 г. известен способ измерения тела путем сравнения статического изображения тела с базой данных с использованием предварительного разбиения изображения на сегменты и составления карты вероятностей.From patent RU No. 2635226 with priority dated May 25, 2012, a method for measuring the body by comparing a static image of the body with a database using a preliminary division of the image into segments and mapping of probabilities is known.

Недостатком данного аналога является невозможность отслеживания положения частей тела, а значит и контроля сложных поз. Хоть подход и позволяет получить антропометрические измерения тела, применение его к решению определения позы человека значительно усложнено.The disadvantage of this analogue is the inability to track the position of body parts, and hence the control of complex postures. Although the approach allows one to obtain anthropometric measurements of the body, its application to the decision of determining the human posture is much more complicated.

Самым близким по своей технической сущности является способ для захвата безмаркерных движений человека известный из патента RU № 2534892 с приоритетом от 08.04.2010 г., Способ захвата движений человека содержит:
обнаружение по входным изображениям, с помощью детекторного блока 2D частей тела, расположений возможных 2D частей тела возможных 2D частей тела;
вычисление, с помощью вычислительного блока 3D нижних частей тела, 3D нижних частей тела с использованием обнаруженных расположений возможных 2D частей тела;
вычисление, с помощью вычислительного блока 3D верхних частей тела, 3D верхних частей тела на основе модели тела; и визуальное воспроизведение, с помощью блока визуального воспроизведения модели тела, модели тела в соответствии с результатом вычисленных 3D верхних частей тела, в котором результат визуального воспроизведения в виде модели предоставляется в детекторный блок 2D частей тела, 3D нижние части тела являются частями, у которых диапазон движений больше, чем опорное значение среди возможных 2D частей тела, и 3D верхние части тела являются частями, у которых диапазон движений меньше опорного значения среди возможных 2D частей тела.The closest in its technical essence is a method for capturing marker-free human movements known from patent RU No. 2534892 with priority dated 04/08/2010, The method for capturing human movements contains:
detecting from the input images, using the 2D body parts detector unit, the locations of the possible 2D body parts of the possible 2D body parts;
calculating, by the 3D lower body computing unit, 3D lower body parts using the detected locations of possible 2D body parts;
calculating, with the 3D upper body computing unit, the 3D upper body based on the body model; and visual reproduction, by the body model visual rendering unit, of the body model according to the result of the computed 3D upper body parts, in which the visual reproduction result in the form of a model is provided to the 2D body part detector unit, the 3D lower body parts are parts whose range motion is greater than the reference value among the possible 2D body parts, and the 3D upper body parts are the parts whose range of motion is less than the reference value among the possible 2D body parts.

Данное изобретение описывает применение нескольких камер, или стереокамеры, для привязки трехмерной модели к позе регистрируемого человека, причем с раздельным анализом верхних и нижних конечностей. Так же описан вариант с использованием дополнительных маркеров и их отслеживания. Основным недостатком данного изобретения так же является необходимость использовать нескольких камер или одной стереокамеры. Такое требование усложняет установку и требует более длительной подготовки для проведения регистрации, снижая общую мобильность решения.This invention describes the use of multiple cameras, or a stereo camera, for linking a three-dimensional model to the pose of a registered person, and with separate analysis of the upper and lower extremities. A variant with the use of additional markers and their tracking is also described. The main disadvantage of this invention is also the need to use multiple cameras or one stereo camera. This requirement complicates the installation and requires longer preparation for registration, reducing the overall portability of the solution.

Техническим результатом заявленного способа отслеживания (трекинга) в реальном времени анатомических ориентиров объекта является расширение арсенала средств, обеспечивающих повышение точности определения позы в реальном времени с высокой устойчивостью к фону и пропорциям.The technical result of the claimed method of real-time tracking (tracking) of the anatomical landmarks of an object is to expand the arsenal of tools that improve the accuracy of determining a pose in real time with high resistance to the background and proportions.

Заявленный технический результат достигается за счет того, что способ отслеживания (трекинга) в реальном времени анатомических ориентиров объекта содержащий захват одного или более исходных цветных изображений объекта целиком при помощи одной цифровой камеры, предварительная обработка исходных изображений объекта, определение абсолютного значения координат анатомических ориентиров маркерных точек объекта на исходных изображениях с учетом случаев, при которых некоторые ориентиры перекрываются телом объекта или отсутствуют, а так же с учетом характера движения объекта с помощью учета абсолютного положения анатомических ориентиров объекта на соседних кадрах, определение дополнительной информации о надежности определения абсолютных координат анатомических ориентиров объекта на исходных изображениях, преобразование абсолютных значений координат анатомических маркерных ориентиров объекта на исходных изображениях в координаты виртуального трехмерного пространства с учетом случаев, при которых некоторые ориентиры перекрываются телом объекта или отсутствуют, а так же с учетом характера движения объекта с помощью учета абсолютного положения анатомических ориентиров объекта на соседних кадрах, вычисление углов между частями тела объекта, определяющих его позу, при помощи ребер трехмерного графа, вершинами которого являются точки в виртуальном трехмерном пространстве с ранее вычисленными координатами, сравнение вычисленных значений углов между частями тела объекта, абсолютных координат анатомических маркерных ориентиров объекта на изображении и относительных координат анатомических ориентиров в виртуальном трехмерном пространстве с заранее заданными референсными значениями, генерацию сигнала обратной связи, пропорционального степени соответствия оцененной позы объекта заранее заданной референсной позе, причем на этапе создания обучающего набора данных при обучении нейронной сети распознавания объекта устанавливают однородный фон и освещение объекта, производят съемку и из оригинального изображения объекта по контуру вырезают изображение и помещают на случайно выбранный фон и далее производят обучение нейронной сети на аугментированном подобным образом наборе данных, после чего для придания модулю отслеживания устойчивости при близком расположении объекта, оцениваются их пропорции и используют весовой коэффициент для балансировки затем движущийся объект меняет расстояние ближе-дальше от камеры после чего двигается в нужном направлении. Для повышения эффективности обнаружения двумерных анатомических ориентиров фон должен быть однородным, и не содержать сложных объектов, таких как полки с книгами, мусор, инструменты, рисунки на стене, окна и т.д. Освещение объекта например человека, должно быть настроено таким образом, чтобы человек в кадре не был затемнен или излишне засвечен. Для создания алгоритма более устойчивым к неоднородностям фона, на этапе создания обучающего набора данных, применяется аугментация данных с различным фоном. Для этого выполняется следующая последовательность действий: из оригинального изображения объекта по контуру вырезают изображение и помещают на случайно выбранный из определенной базы фон. При обучении нейронной сети на аугментированном подобным образом наборе данных она становится устойчивой к изменению фона. Модуль отслеживания анатомических ориентиров не зависит от фона, но от фона зависит модуль восстановления двумерных анатомических ориентиров по изображению. Для придания модулю отслеживания устойчивости при близком расположении объектов, оцениваются их пропорции и используется весовой коэффциент для балансировки. Модуль отслеживания более устойчив при распознавании объектов, которые находятся на разном расстоянии от камеры. Чтобы уменьшить вероятность ошибочного отслеживания объектов, движущийся человек должен подойти ближе или отойти дальше от камеры и уже после этого двигаться в нужном направлении. Таким образом изобретение реализуется за счет дополнительных эвристик для повышения точности распознавания ключевых точек (анатомических ориентиров) объекта для обеспечения высокой скорости детектирования; за счет последовательного применения нескольких искусственных нейронных сетей для корректировки координат точек; за счет анализа кинематичности движений и определения относительной глубины нахождения точек; за счет использования для трекинга способа получения содержащих подвижный объект кадров, обнаружения объекта на кадрах, определения координат ключевых точек объекта в трехмерной системе координат, определения посредством искусственной нейронной сети координат ключевых точек объекта в трехмерной системе координат на основании местоположения и параметров движения объекта, определенных на предыдущих шагах. Аппаратная реализация изобретения включает в себя электронное вычислительное устройство с объёмом оперативной памяти не менее 16 Гб, графический процессор с объёмом видеопамяти не менее 6 Гб и видеокамеру с разрешением не менее 1280x720 пикселов и частотой съёмки не менее 25 кадров в секунду. Серверный процесс программной реализации способа отслеживания в реальном времени анатомических ориентиров объекта запускается на электронном вычислительном устройстве, получает изображение с видеокамеры, предоставляет программный интерфейс для доступа из других приложений и устройств. Нейронные сети реализуются с использованием библиотеки TensorFlow версии не ниже 2.2.0 и осуществляют вычисления в реальном времени с использованием графического процессора.The claimed technical result is achieved due to the fact that the method of tracking (tracking) real-time anatomical landmarks of an object, which includes capturing one or more original color images of the entire object using one digital camera, pre-processing the original images of the object, determining the absolute value of the coordinates of the anatomical landmarks of the marker points object on the source images, taking into account cases in which some landmarks are overlapped by the body of the object or absent, as well as taking into account the nature of the movement of the object by taking into account the absolute position of the anatomical landmarks of the object on neighboring frames, determining additional information about the reliability of determining the absolute coordinates of the anatomical landmarks of the object on original images, the transformation of the absolute values of the coordinates of the anatomical marker landmarks of the object on the original images into the coordinates of the virtual three-dimensional space, taking into account the cases in which some landmarks are overlapped by the body of the object or are absent, as well as taking into account the nature of the movement of the object by taking into account the absolute position of the anatomical landmarks of the object on neighboring frames, calculating the angles between the parts of the body of the object that determine its pose, using the edges of a three-dimensional graph, the vertices of which are points in virtual three-dimensional space with previously calculated coordinates, comparing the calculated values of the angles between the body parts of the object, the absolute coordinates of the anatomical marker landmarks of the object on the image and the relative coordinates of the anatomical landmarks in the virtual three-dimensional space with predetermined reference values, generating a feedback signal proportional to the degree of compliance with the estimated poses of the object to a predetermined reference pose, and at the stage of creating a training data set, when training the object recognition neural network, a uniform background and lighting of the object are set, shooting is performed and from the of the original image of the object along the contour, the image is cut out and placed on a randomly selected background and then the neural network is trained on a data set augmented in this way, after which, to give the tracking module stability when the object is close, their proportions are estimated and a weight coefficient is used to balance the moving object changes the distance closer or further from the camera and then moves in the desired direction. To improve the efficiency of 2D anatomical landmark detection, the background should be uniform and free of complex objects such as bookshelves, trash, tools, wall art, windows, etc. The lighting of an object, such as a person, should be adjusted in such a way that the person in the frame is not darkened or overexposed. To create an algorithm more resistant to background inhomogeneities, at the stage of creating a training data set, data augmentation with a different background is used. To do this, the following sequence of actions is performed: an image is cut out from the original image of the object along the contour and placed on a background randomly selected from a certain base. When training a neural network on a data set augmented in this way, it becomes resistant to background changes. The module for tracking anatomical landmarks does not depend on the background, but the module for restoring two-dimensional anatomical landmarks from the image depends on the background. To make the tracking module stable when objects are close together, their proportions are evaluated and a weighting factor is used for balancing. The tracking module is more stable when recognizing objects that are at different distances from the camera. To reduce the chance of tracking objects by mistake, a moving person should move closer or further away from the camera and then move in the desired direction. Thus, the invention is implemented through additional heuristics to improve the accuracy of recognition of key points (anatomical landmarks) of the object to ensure high detection speed; due to the sequential application of several artificial neural networks to correct the coordinates of points; by analyzing the kinematics of movements and determining the relative depth of the points; by using for tracking the method of obtaining frames containing a moving object, detecting an object on frames, determining the coordinates of the key points of the object in a three-dimensional coordinate system, determining, by means of an artificial neural network, the coordinates of the key points of an object in a three-dimensional coordinate system based on the location and motion parameters of the object determined on the previous steps. The hardware implementation of the invention includes an electronic computing device with at least 16 GB of RAM, a graphics processor with at least 6 GB of video memory, and a video camera with a resolution of at least 1280x720 pixels and a shooting rate of at least 25 frames per second. The server process for software implementation of the method for real-time tracking of anatomical landmarks of an object is launched on an electronic computing device, receives an image from a video camera, and provides a software interface for access from other applications and devices. Neural networks are implemented using the TensorFlow library version 2.2.0 or higher and perform real-time calculations using a GPU.

Claims

A method for real-time tracking (tracking) of anatomical landmarks of an object, comprising capturing one or more original color images of the entire object using one digital camera, pre-processing the original images of the object, determining the absolute value of the coordinates of the anatomical landmarks of the marker points of the object on the original images, taking into account cases, in which some landmarks are overlapped by the body of the object or are absent, and also taking into account the nature of the movement of the object by taking into account the absolute position of the anatomical landmarks of the object on neighboring frames, determining additional information about the reliability of determining the absolute coordinates of the anatomical landmarks of the object on the original images, converting the absolute values of the coordinates of the anatomical markers object landmarks on the original images into the coordinates of the virtual three-dimensional space, taking into account cases in which some landmarks overlap with the object body or are absent t, as well as taking into account the nature of the movement of the object by taking into account the absolute position of the anatomical landmarks of the object on neighboring frames, calculating the angles between the parts of the body of the object that determine its pose, using the edges of a three-dimensional graph, the vertices of which are points in a virtual three-dimensional space with previously calculated coordinates , comparing the calculated values of the angles between the body parts of the object, the absolute coordinates of the anatomical marker landmarks of the object in the image and the relative coordinates of the anatomical landmarks in the virtual three-dimensional space with predefined reference values, generating a feedback signal proportional to the degree of compliance of the estimated pose of the object with the predetermined reference pose, different topics that at the stage of creating a training data set, when training the object recognition neural network, a uniform background and object lighting are set, shooting is performed and the image is cut out from the original image of the object along the contour and placed on a randomly selected background, and then the neural network is trained on a similarly augmented data set, after that, to give the tracking module stability when the object is close, their proportions are estimated and a weight coefficient is used for balancing, then the moving object changes the distance closer or further from the camera, after which it moves in the right direction.