RU2801426C1

RU2801426C1 - Method and system for real-time recognition and analysis of user movements

Info

Publication number: RU2801426C1
Application number: RU2022124564A
Authority: RU
Inventors: Эмиль Юрьевич Большаков
Original assignee: Эмиль Юрьевич Большаков
Filing date: 2022-09-18
Publication date: 2023-08-08

Abstract

FIELD: computer technology.

SUBSTANCE: video stream is captured and transmitted to an open source neural network to obtain data on the position of the key points of the pose that constitutes the estimated movement of the user's exercise, in three dimensions in the given frame of the video stream, the subsequent recording frame by frame of the received data in turn into an associative array for each user pose and writing such an array to a file. Next, the positions of the key points are evaluated in turn for each pose of the user in real time in comparison with the data of the poses of the reference movement of a professional, obtained on the basis of the recording of the movements of a professional, by positioning a set of key points of the poses of the reference movement in accordance with the position and angle of rotation of the user in real time with respect to the source of the video stream, and further frame-by-frame comparison of the positions of such key points at a specific stage of the movement (exercise) by the user.

EFFECT: improved accuracy of recording and recognition of the posture of the user, which is the object of evaluation.

13 cl, 4 dwg

Description

Группа заявленных изобретений относится к компьютерным технологиям, в частности к области компьютерного зрения, к средствам дистанционного обучения, обучению заочному, обучению практическим навыкам (демонстрации), обучению при помощи симуляторов и т.д.The group of claimed inventions relates to computer technologies, in particular to the field of computer vision, distance learning tools, distance learning, practical skills training (demonstrations), training using simulators, etc.

Известно решение по патенту CN 108597578, состоящее из человека, источника видеопотока, вычислительного устройства и алгоритма распознавания позы человека в двухмерном пространстве, включающее метод оценки позы человека в двухмерном пространстве основываясь на матрице ключевых точек тела человека. Недостатком данного решения является неточность оценивания позы, основываясь на данных о положениях ключевых точек тела человека в двух измерениях, отсутствие возможности записи эталонов движений с человека.A solution is known according to patent CN 108597578, consisting of a person, a video stream source, a computing device and an algorithm for recognizing a person's posture in two-dimensional space, including a method for estimating a person's posture in two-dimensional space based on a matrix of key points of the human body. The disadvantage of this solution is the inaccuracy of pose estimation based on data on the positions of key points of the human body in two dimensions, the inability to record movement standards from a person.

Известно решение по патенту ЕР 2815341, состоящее из человека (пользователя), фитнес-девайса, вычислительного устройства, алгоритма оценивания движений человека, включающее метод оценки движения человека по отношению к описанному в патенте фитнес-девайсу. Недостатком данного решения является отсутствие технологий компьютерного зрения и невозможность оценки движений человека без специализированного фитнес-устройства.A solution is known according to patent EP 2815341, consisting of a person (user), a fitness device, a computing device, an algorithm for estimating human movements, including a method for estimating a person's movement in relation to the fitness device described in the patent. The disadvantage of this solution is the lack of computer vision technologies and the impossibility of assessing human movements without a specialized fitness device.

Известно решение по патенту CN 104021573, состоящее из человека, вычислительного устройства, инерциальных сенсоров, алгоритма оценивания движений человека, включающее метод оценивания движений человека основываясь на данных об углах между ключевыми точками тела человека, полученных с инерциальных сенсоров, расположенных на теле. Недостатком данного решения является отсутствие возможности получить координаты ключевых точек тела человека в пространстве, отсутствие алгоритма обучения движениям на базе данного метода.A solution is known according to patent CN 104021573, consisting of a person, a computing device, inertial sensors, an algorithm for estimating human movements, including a method for estimating human movements based on data on the angles between key points of the human body obtained from inertial sensors located on the body. The disadvantage of this solution is the inability to obtain the coordinates of the key points of the human body in space, the absence of an algorithm for teaching movements based on this method.

Наиболее близким аналогом, прототипом, заявляемых технических решений является международная публикация WO 201261804 A1. Существо данного решения состоит в том, что система оценивает движение упражнения в двухмерном пространстве основываясь на данных об углах между ключевыми точками человека, вычисленных эмпирическим путем, с двух и более источников видеопотока и, в некоторых случаях, специализированного фитнес-браслета, и кроссовок, оснащенных датчиками.The closest analogue, prototype, of the claimed technical solutions is the international publication WO 201261804 A1. The essence of this solution is that the system evaluates the movement of the exercise in two-dimensional space based on data on the angles between the key points of a person, calculated empirically, from two or more video stream sources and, in some cases, a specialized fitness bracelet, and sneakers equipped with sensors.

Прототип WO 201261804 A1 обладает следующими, существенными по сравнению с заявляемым техническим решением, недостатками:Prototype WO 201261804 A1 has the following, significant compared with the claimed technical solution, disadvantages:

- использование двух и более источников видеопотока: в описании прототипа указано, что компьютер может использовать одно изображение для определения положения человека в пространстве, но не для определения положения в пространстве отдельных частей тела. Далее уже описано, что компьютер может обрабатывать 2 и более изображения, снятых одновременно с разных углов, и тогда уже может определять положения в пространстве отдельных частей тела;- the use of two or more sources of video stream: in the description of the prototype it is indicated that the computer can use one image to determine the position of a person in space, but not to determine the position in space of individual parts of the body. Further, it has already been described that a computer can process 2 or more images taken simultaneously from different angles, and then it can already determine the positions in space of individual parts of the body;

- оценка изображений и позы человека в 2-мерном пространстве: в описании прототипа указано, что компьютер может делить изображение на секции и оценивать. Пример секций представлен на Фигуре 14 к международной публикации. Эти секции отображены в двухмерном пространстве (2D). На фигурах 15, 16А, 16В к указанной международной публикации, оценка производится на базе информации о положениях точек соответствующих частей тела в двухмерном пространстве. Возможность добавления третьего измерения и оценки по совокупности углов/точек в 3-х измерениях (3D) в прототипе не рассматривается, не упоминается и отсутствует в качестве примера на рисунках;- assessment of images and postures of a person in 2-dimensional space: in the description of the prototype it is indicated that the computer can divide the image into sections and evaluate. An example of sections is shown in Figure 14 for international publication. These sections are displayed in two-dimensional space (2D). In figures 15, 16A, 16B to the specified international publication, the assessment is made on the basis of information about the positions of the points of the corresponding parts of the body in two-dimensional space. The possibility of adding a third dimension and evaluation of the set of angles/points in 3 dimensions (3D) in the prototype is not considered, not mentioned and absent as an example in the figures;

- отсутствие эталонного изображения/движения: в описании прототипа отсутствует информация о способе получения данных о правильности с точки зрения спорта движения упражнения. В описании также не указана информация, откуда взяты точные градусные меры и их диапазоны. Логически можно сделать вывод, что их получают эмпирическим путем, то есть это относительно субъективная, не способная к систематизации и адекватной оценке спорта, информация. То же самое можно сказать про "этапы выполнения упражнения". Запись эталонного движения и соответствующий алгоритм в прототипе отсутствуют;- lack of reference image/movement: in the description of the prototype there is no information on how to obtain data on the correctness in terms of the sport of movement of the exercise. The description also does not contain information from where the exact degree measures and their ranges are taken. Logically, we can conclude that they are obtained empirically, that is, it is relatively subjective, incapable of systematization and adequate assessment of sports, information. The same can be said about the "stages of the exercise." Record of the reference movement and the corresponding algorithm in the prototype are missing;

- недостаточная точность позиционирования: в описании прототипа на Фигуре 5 к указанной выше международной публикации представлено тело человека, изображенное в двух плоскостях (профиль и анфас) - спереди и сбоку. Для каждой плоскости мы видим 12 секторов оценки, т.е. 12 точек, расположенных в одной плоскости. Исходя из вышеописанных возможностей технологии в один момент времени система по прототипу может оценивать только одну плоскость, то есть позу 12 секторов (точек) в двухмерном пространстве (2D). В заявляемом решении оценивается 15 и более точек.- insufficient positioning accuracy: in the description of the prototype in Figure 5 to the above international publication, a human body is shown, depicted in two planes (profile and full face) - front and side. For each plane, we see 12 evaluation sectors, i.e. 12 points located in the same plane. Based on the capabilities of the technology described above, at one point in time, the prototype system can evaluate only one plane, that is, the position of 12 sectors (points) in two-dimensional space (2D). In the claimed solution, 15 or more points are evaluated.

Раскрытие изобретенияDisclosure of invention

Задачей, на решение которой было направлено создание группы заявленных изобретений, является создание способа и системы, включающей как минимум один из заявляемых блоков обработки информации, позволяющих с помощью компьютера и/или мобильного устройства, камеры и/или видеозаписи/видеопотока, в том числе с возможностью передачи через интернет-соединение, осуществлять запись и воспроизведение ключевых точек движений пользователя, осуществлять сравнение позы пользователя, являющегося объектом оценивания, в реальном времени в соответствии с имеющейся записью поз эталонного движения профессионала, при этом обеспечивая скорость и точность, надлежащие для профессиональной оценки, в целях выдачи рекомендаций по коррекции выполнения движения в реальном времени.The task to which the creation of a group of the claimed inventions was directed is the creation of a method and a system that includes at least one of the claimed information processing units that allows, using a computer and/or mobile device, a camera and/or video recording/video stream, including the ability to transmit over an Internet connection, to record and reproduce the key points of the user's movements, to compare the pose of the user being the object of assessment in real time in accordance with the existing recording of the poses of the reference movement of a professional, while ensuring the speed and accuracy appropriate for professional assessment, in order to issue recommendations for the correction of the execution of the movement in real time.

При реализации группы заявляемых изобретений достигаются следующие общие для всех решений группы технические результаты:When implementing a group of claimed inventions, the following technical results common to all solutions of the group are achieved:

- обеспечение полной или частичной автоматизации процесса записи движения пользователя;- ensuring full or partial automation of the process of recording the user's movement;

- сокращение времени, требуемого для подготовки к записи движений пользователя в связи с отсутствием необходимости объекту записи надевать специализированные костюмы/датчики, и отсутствием необходимости также оборудовать помещение набором датчиков/камер;- reduction of the time required to prepare for the recording of the user's movements due to the absence of the need for the recording object to wear specialized suits/sensors, and the absence of the need to also equip the room with a set of sensors/cameras;

- повышение точности записи и распознавания позы пользователя, являющегося объектом оценивания, благодаря оценке его позы на базе данных о положении ключевых точек тела пользователя, в том числе в трех измерениях;- improving the accuracy of recording and recognizing the posture of the user, which is the object of evaluation, due to the assessment of his posture on the basis of the data on the position of the key points of the user's body, including in three dimensions;

- сохранение и воспроизведение поз движения пользователя, являющегося объектом оценивания, в универсальном текстовом формате для последующего воспроизведения графическими интерпретаторами;- saving and reproduction of the postures of the movement of the user, which is the object of evaluation, in a universal text format for subsequent reproduction by graphic interpreters;

- обеспечение полной и/или частичной автоматизации оценивания выполнения движения пользователем, являющимся объектом оценивания, в реальном времени по сравнению с записью эталонной эталонного движения упражнения профессионала;- providing full and/or partial automation of the assessment of the movement performance by the user, who is the object of assessment, in real time in comparison with the recording of the reference reference movement of the professional exercise;

- обеспечение надлежащего качества профессионального оценивания движения пользователя, являющегося объектом оценивания в реальном времени по сравнению с записью эталонного движения упражнения;- ensuring the proper quality of the professional evaluation of the user's movement, which is the object of evaluation in real time, in comparison with the recording of the reference movement of the exercise;

- обеспечение распознавания и оценки движений, видеопоток которых получен с маломощных устройств в открытых пространствах, например, на улице, благодаря клиент-серверной вариации системы, используя передачу видеопотока, в том числе через интернет-соединение;- providing recognition and evaluation of movements, the video stream of which is received from low-power devices in open spaces, for example, on the street, thanks to the client-server variation of the system, using video stream transmission, including via an Internet connection;

- обеспечение полной и/или частичной автоматизации процесса воспроизведения движения пользователя, являющегося объектом оценивания из файла.- providing full and/or partial automation of the process of reproducing the movement of the user, which is the object of evaluation from the file.

На достижение заявленных технических результатов оказывают влияние следующие признаки изобретений из группы.The achievement of the claimed technical results is influenced by the following features of inventions from the group.

Способ распознавания и анализа движений пользователя, являющегося объектом оценивания объектов в реальном времени характеризуется тем, что вначале создают систему, включающую в себя источник видеопотока, пользователя, ЭВМ, включающую постоянное хранилище данных (память компьютера), интернет-соединение, устройство отображения изображения, нейронную сеть с открытым исходным кодом для получения данных о положении ключевых точек позы пользователя в трех измерениях в поданном кадре видеопотока и модуль обработки взаимодействия с нейронной сетью, модуль обработки передачи позы в графическое пространство, модуль записи в файл поз эталонного движения упражнения профессионала, модуль воспроизведения из файла поз эталонного движения и модуль оценивания движений пользователя, являющегося объектом оценивания, в реальном времени в трехмерном формате в сравнении с массивом поз эталонного движения. После чего посредством созданной системы осуществляют захват и передачу видеопотока в нейронную сеть с открытым исходным кодом для получения данных о положении ключевых точек позы, составляющей оцениваемое движение упражнения пользователя, в трех измерениях в поданном кадре видеопотока, последующую запись кадр за кадром полученных данных по очереди в ассоциативный массив для каждой позы пользователя и запись такого массива в файл. Далее осуществляют оценку положений ключевых точек поочередно для каждой позы, являющегося объектом оценивания, в реальном времени в сравнении с данными поз эталонного движения профессионала, полученными на основе записи движений профессионала в интересующей области, посредством позиционирования набора ключевых точек поз эталонного движения в соответствии с позицией и углом поворота пользователя, являющегося объектом оценивания, в реальном времени по отношению к источнику видеопотока, и дальнейшее покадровое сравнение положений соответствующих ключевых точек на конкретном этапе выполнения движения пользователем. Затем обеспечивают отображение на устройстве отображения изображения результатов оценивания движений выполненного пользователем упражнения посредством звуковых сигналов и/или синтеза голоса, и/или графического отображения.The method for recognizing and analyzing the movements of the user, which is the object of evaluating objects in real time, is characterized by the fact that first a system is created that includes a video stream source, a user, a computer, including a permanent data storage (computer memory), an Internet connection, an image display device, a neural an open source network for obtaining data on the position of the key points of the user's pose in three dimensions in the submitted frame of the video stream and a module for processing interaction with a neural network, a module for processing the transmission of poses to the graphics space, a module for recording the reference movement of a professional exercise to a file of poses, a module for reproducing from a file of poses of the reference movement and a module for evaluating the movements of the user, which is the object of evaluation, in real time in three-dimensional format in comparison with the array of poses of the reference movement. Then, using the created system, the video stream is captured and transmitted to the open source neural network to obtain data on the position of the key points of the pose that makes up the estimated movement of the user’s exercise in three dimensions in the given frame of the video stream, followed by recording frame by frame of the received data in turn into an associative array for each user pose and writing such an array to a file. Next, the positions of the key points are evaluated in turn for each posture that is the object of evaluation in real time in comparison with the posture data of the professional's reference movement obtained on the basis of the record of the professional's movements in the area of interest, by positioning a set of key points of the reference movement's postures in accordance with the position and the angle of rotation of the user, which is the object of evaluation, in real time with respect to the source of the video stream, and further frame-by-frame comparison of the positions of the corresponding key points at a specific stage of the user's movement. Then, display on the display device of the image of the results of evaluating the movements of the exercise performed by the user by means of sound signals and/or voice synthesis, and/or graphic display.

Система для распознавания и анализа движений пользователя в реальном времени, заявленная в составе группы изобретений, также направлена на достижение завяленных выше технических результатов.The system for recognizing and analyzing user movements in real time, claimed as part of a group of inventions, is also aimed at achieving the technical results stated above.

Сущность системы для распознавания и анализа движений человека в реальном времени заключается в следующем. Система включает в себя источник видеопотока, пользователя, ЭВМ, включающую постоянное хранилище данных (память компьютера), сетевое соединение, устройство отображения изображения, компьютер-сервер с нейронной сетью с открытым исходным кодом и модулем обработки взаимодействия с нейронной сетью, модуль обработки передачи позы в графическое пространство, модуль записи в файл поз + эталонных движений профессионала, модуль воспроизведения из файла поз эталонных движений профессионала и модуль оценивания движений пользователя, являющегося объектом оценивания, в реальном времени в сравнении с массивом поз эталонного движения профессионала (Фиг. 1).The essence of the system for recognition and analysis of human movements in real time is as follows. The system includes a video stream source, a user, a computer including a persistent data storage (computer memory), a network connection, an image display device, a server computer with an open source neural network and a neural network interaction processing module, a posture transfer processing module in a graphic space, a module for writing to a file of poses + reference movements of a professional, a module for reproducing from a file of poses of reference movements of a professional, and a module for evaluating the movements of the user, which is the object of evaluation, in real time in comparison with the array of poses of the reference movement of a professional (Fig. 1).

Перечень фигурList of figures

Осуществление изобретения поясняется следующими фигурами.The implementation of the invention is illustrated by the following figures.

На фиг. 1 представлены основные действия способа и принцип работы системы для распознавания и анализа движений пользователя в реальном времениIn FIG. 1 shows the main steps of the method and the principle of operation of the system for recognizing and analyzing user movements in real time

На фиг. 2 наглядно представлен пример структуры ключевых точек и положения векторов одной позы пользователя.In FIG. 2 clearly shows an example of the structure of key points and the position of the vectors of one user pose.

На фиг. 3 представлено схематическое отображение необходимых положений человека при записи движения.In FIG. 3 shows a schematic representation of the necessary positions of a person when recording a movement.

На фиг. 4 схематично представлен пример работы алгоритма подстановки данных записанной позы эталонного движения профессионала к данным о позе оцениваемого движения пользователя в реальном времени.In FIG. Figure 4 schematically shows an example of the operation of the algorithm for substituting the data of the recorded pose of the reference movement of a professional to the data on the pose of the estimated movement of the user in real time.

Способ распознавания и анализа движений пользователя и принцип работы непосредственно системы распознавания и анализа движений пользователя, в частности, человека, в реальном времени, как это наглядно представлено на Фиг. 1, заключается в следующем. Вначале создают систему распознавания и анализа движений пользователя 100, включающую в себя пользователя 101, источник видеопотока 102, компьютер-сервер 103, содержащий нейронную сеть с открытым исходным кодом 104 для получения данных о положении ключевых точек позы пользователя в поданном кадре видеопотока и модуль обработки взаимодействия с нейронной сетью 105, сетевое соединение, например сеть Интернет, 106, посредством которого осуществляется передача видеосигнала на компьютер-сервер 103, ЭВМ, включающую постоянное хранилище данных (память компьютера) 107, модуль обработки передачи позы в графическое пространство 108, модуль записи в файл поз эталонного движения упражнения профессионала 109, модуль воспроизведения из файла поз эталонного движения 110, модуль оценивания движений пользователя, являющегося объектом оценивания, в реальном времени в сравнении с массивом поз эталонного движения 111 и устройство отображения изображения 112. При этом пользователем 101 может быть не только человек, но и другое живое существо, а также, например, роботизированный человекоподобный организм. Источником видеопотока 102 могут быть камера, в том числе веб-камера, видеопоток, получаемый из сети Интернет, а также заранее записанное видео с камеры. В качестве ЭВМ 107 могут быть использованы как персональный компьютер, так и ноутбук, терминал, портативное устройство, смартфон и т.д., включающие постоянное хранилище данных (память компьютера), осуществляющее хранение и доступ к файлам поз эталонных движений. Компьютером-сервером 103 может быть как персональный компьютер, ноутбук, терминал, портативное устройство, смартфон, так и суперкомпьютер, облачный сервис, выделенный сервер. В качестве устройства отображения изображения 112 используют дисплей, монитор, телевизор, экран мобильного устройства, терминала, ноутбука, проектор и т.д.The method for recognizing and analyzing user movements and the principle of operation of the system for recognizing and analyzing user movements, in particular, a person, in real time, as is clearly shown in Fig. 1 is as follows. First, a system for recognizing and analyzing user movements 100 is created, including a user 101, a video stream source 102, a server computer 103 containing an open source neural network 104 for obtaining data on the position of key points of the user's pose in the submitted frame of the video stream, and an interaction processing module with a neural network 105, a network connection, for example, the Internet, 106, through which a video signal is transmitted to a computer server 103, a computer including a permanent data storage (computer memory) 107, a module for processing the transfer of poses to the graphic space 108, a module for writing to a file poses of the reference movement of the professional exercise 109, a module for reproducing from the file of poses of the reference movement 110, a module for evaluating the movements of the user who is the object of evaluation in real time in comparison with the array of poses of the reference movement 111 and the image display device 112. In this case, the user 101 can be not only a person, but also another living being, as well as, for example, a robotic humanoid organism. The source of the video stream 102 can be a camera, including a webcam, a video stream received from the Internet, as well as pre-recorded video from a camera. As a computer 107, both a personal computer and a laptop, terminal, portable device, smartphone, etc. can be used, including a permanent data storage (computer memory) that stores and accesses files of poses of reference movements. The server computer 103 can be either a personal computer, a laptop, a terminal, a portable device, a smartphone, or a supercomputer, a cloud service, a dedicated server. As the image display device 112, a display, a monitor, a TV, a screen of a mobile device, a terminal, a laptop, a projector, etc. are used.

Затем осуществляют запуск системы 100. Объект - пользователь 101 осуществляет двигательную активность, например, физические упражнения перед камерой 102. Далее осуществляют захват и передачу видеопотока в нейронную сеть с открытым исходным кодом 104 для получения данных о положении ключевых точек позы, составляющей оцениваемое движение упражнения пользователя, в трех измерениях в поданном кадре видеопотока. Видеопоток с камеры 102 со скоростью примерно 30 изображений в секунду, которая зависит от мощности ЭВМ 107 и/или скорости сетевого соединения 106 поступает на модуль обработки передачи позы в графическое пространство 108, который в свою очередь запускает работу модуля обработки взаимодействия с нейронной сетью 105, где осуществляется преобразование изображения в формат «numpy array» - подходящий для восприятия формат, и далее это изображение в формате «numpy array» отправляется в качестве входных данных в нейронную сеть с открытым исходным кодом 104. Модуль обработки передачи позы в графическое пространство 108 и модуль обработки взаимодействия с нейронной сетью 105 также могут взаимодействовать, например, через «websocket» по интернет-соединению. Модуль обработки взаимодействия с нейронной сетью 105 заранее должен быть активирован на компьютере-сервере (host PC) 103, так как модуль обработки передачи позы в графическое пространство 108 не запускает модуль обработки взаимодействия с нейросетью 105. Далее осуществляют последующую запись кадр за кадром полученных данных по очереди в ассоциативный массив для каждой позы пользователя и запись такого массива в файл в трех измерениях в поданном кадре видеопотока. В качестве выходных данных от нейронной сети 104 модуль обработки взаимодействия с нейросетью 105 получает массив - набор значений о положении в трехмерном пространстве 15 ключевых точек (200-215) позы пользователя 101, например, человека (см. Фиг. 2). Поза, полученная от нейронной сети 104, структурируется и преобразуется в строку в формате «JSON» и отправляет такую строку (выходные данные) через websocket-соединение в ответ на каждый кадр, присланный через сеть Интернет 106 модулем обработки передачи позы в графическое пространство 108. Одна поза представляет собой положение относительно центра отсчета координат (0,0,0) 15-ти ключевых точек (200-215) позы человека по трем координатам (x,y,z), обеспечивая распознавание позы в трехмерном формате (см. Фиг. 2). Таким образом, одна поза представляет собой 45 записей-переменных в формате «JSON». Данные с видеопотока - изображение, ранее поданное на вход нейронной сети 104 в формате «numpy array», кодируется в массив байт, который, преобразуется в строку в формате «base64string» и отправляет такую строку на компьютер-сервер 103, посредством заранее активированного на нем модуля обработки взаимодействия с нейронной сетью 105 с помощью функции «websocket.send». Структура общей строки, которая создается в Python-процессе, то есть каждый распознаваемый кадр выглядит следующим образом: "JSON | base64strmg", или "{dot0: {х:0.12, у:0.45, z:-0.31}, dot1:{х:0.4,…} | 4sddfg43dh7ignf7gqwdposc3pjn…". Далее эта строка отправляется в оперативную память ЭВМ с помощью функции «stdout.buffer.write», и далее буфер оперативной памяти очищается. Модуль обработки передачи позы в графическое пространство 108 считывает данные из оперативной памяти ЭВМ по мере подачи их запущенным модулем обработки взаимодействия с нейронной сетью 105. Передача данных через оперативную память ЭВМ значительно быстрее передачи данных с помощью записи/считывания данных из файла, что ускоряет процесс передачи точек в графическую часть и облегчает нагрузку на процессор ЭВМ.Then the system 100 is launched. The user object 101 performs physical activity, for example, physical exercises in front of the camera 102. Next, the video stream is captured and transmitted to the open source neural network 104 to obtain data on the position of the key points of the pose that constitutes the estimated movement of the user's exercise , in three dimensions in the given frame of the video stream. The video stream from the camera 102 at a rate of approximately 30 images per second, which depends on the power of the computer 107 and/or the speed of the network connection 106, enters the processing module for transmitting poses to the graphics space 108, which in turn starts the processing module for processing interaction with the neural network 105, where the image is converted into a "numpy array" format - a perceptually appropriate format, and then this image in the "numpy array" format is sent as input to the open source neural network 104. The pose transfer processing module to the graphics space 108 and the module neural network interaction processing 105 may also communicate, for example, via a "websocket" over an internet connection. The neural network interaction processing module 105 must be activated in advance on the server computer (host PC) 103, since the pose transfer processing module to the graphics space 108 does not start the neural network interaction processing module 105. Next, the received data is recorded frame by frame according to queuing into an associative array for each user pose and writing such an array to a file in three dimensions in the given frame of the video stream. As an output from the neural network 104, the neural network interaction processing module 105 receives an array - a set of values about the position in three-dimensional space of 15 key points (200-215) of the posture of the user 101, for example, a person (see Fig. 2). The pose received from the neural network 104 is structured and converted into a string in the "JSON" format and sends such a string (output data) via a websocket connection in response to each frame sent via the Internet 106 by the pose transmission processing module to the graphics space 108. One pose represents the position relative to the coordinate reference center (0,0,0) of 15 key points (200-215) of a person's pose in three coordinates (x,y,z), providing pose recognition in a three-dimensional format (see Fig. 2). Thus, one pose is 45 variable entries in the "JSON" format. Data from the video stream - the image previously submitted to the input of the neural network 104 in the "numpy array" format is encoded into an array of bytes, which is converted into a string in the "base64string" format and sends such a string to the server computer 103, by means of a previously activated on it the neural network interaction processing module 105 using the "websocket.send" function. The structure of the general string that is created in the Python process, that is, each frame recognized, looks like this: "JSON | base64strmg", or "{dot0: {x:0.12, y:0.45, z:-0.31}, dot1:{x :0.4,…} | 4sddfg43dh7ignf7gqwdposc3pjn…". Further, this line is sent to the main memory of the computer using the "stdout.buffer.write" function, and then the buffer of the main memory is cleared. The module for processing the transfer of poses to the graphics space 108 reads data from the main memory of the computer as it is fed by the running module for processing the interaction with the neural network 105. Transferring data through the main memory of the computer is much faster than transferring data by writing / reading data from a file, which speeds up the transfer process points into the graphic part and lightens the load on the computer processor.

Модуль записи в файл эталонных движений упражнения с профессионала 109 находится в следящем режиме, пока пользователю или разработчику не потребуется записать движение нового упражнения. Модуль оценивания движений упражнения 111 в реальном времени в трехмерном формате (3D) в сравнении с эталонной записью начинает работу, как только человек выберет в меню пункт, например, "Заниматься фитнесом" и система 100 начнет оценивать выполнение упражнений.The module for recording the reference movements of the exercise from the professional 109 file is in the tracking mode until the user or developer needs to record the movement of a new exercise. The module for evaluating the movement of exercise 111 in real time in three-dimensional format (3D) in comparison with the reference record begins as soon as a person selects an item in the menu, for example, "Do fitness" and the system 100 begins to evaluate the exercise.

Также модуль обработки передачи позы в графическое пространство 109 десериализует данные после получения строки от модуля обработки взаимодействия с нейросетью 105: берет их из данных строки в формате JSON и записывает в переменные, а далее нормализует, посредством умножения некоторых из них на коэффициенты (-1; 1.5; -1.5) для верной систематизации точек в модулях, которые обрабатывают (оценивают и/или записывают) движения и корректного зеркального отображения ключевых точек по отношению к пользователю в графической составляющей, а данные сроки в формате «base64string» декодируются и преобразуются обратно в изображение и отображаются в графической составляющей. Таким образом происходит отображение видеопотока и точек в графическом пространстве (Фиг. 2). За графическую составляющую отвечает, например, Unity3d engine. Далее с модуля записи в файл эталонных движений упражнения с профессионала 109, модуля воспроизведения из файла записанных эталонных движений 110 и модуля оценивания движений упражнения в реальном времени в трехмерном формате в сравнении с эталонной записью 111 вычисляют и добавляют еще одну точку 216 и два вектора 217 и 218, основываясь на положениях точек, полученных от нейронной сети (Фиг. 2).Also, the pose transfer processing module to the graphics space 109 deserializes the data after receiving the string from the neural network interaction processing module 105: it takes them from the string data in JSON format and writes them to variables, and then normalizes them by multiplying some of them by coefficients (-1; 1.5; -1.5) for the correct systematization of points in modules that process (evaluate and / or record) movements and the correct mirroring of key points in relation to the user in the graphic component, and these terms in the “base64string” format are decoded and converted back into an image and are displayed in the graphic component. Thus, the video stream and points are displayed in the graphics space (Fig. 2). For the graphic component is responsible, for example, Unity3d engine. Further, from the module for recording to the file of the reference movements of the exercise from the professional 109, the module for reproducing from the file of the recorded reference movements 110 and the module for evaluating the movements of the exercise in real time in three-dimensional format, in comparison with the reference record 111, one more point 216 and two vectors 217 are calculated and added 218 based on the positions of the points obtained from the neural network (FIG. 2).

Вычисляемая точка 216 расположена посередине между точками правого и левого бедра и служит для определения положения центра таза. Первый вектор 217, расположен там же, где точка, соответствующая правому бедру 203, и направлен на левое бедро 202 и служит для определения градусной меры поворота корпуса в пространстве относительно камеры 102. Второй вектор 218 расположен там же, где точка, соответствующая правому плечу 213, и служит для определения градусной меры поворота "бюста" в пространстве относительно камеры 102.The calculated point 216 is located in the middle between the points of the right and left thigh and serves to determine the position of the center of the pelvis. The first vector 217 is located in the same place as the point corresponding to the right thigh 203 and is directed to the left thigh 202 and serves to determine the degree measure of rotation of the body in space relative to the camera 102. The second vector 218 is located in the same place as the point corresponding to the right shoulder 213 , and serves to determine the degree measure of rotation of the "bust" in space relative to the camera 102.

Модуль записи в файл эталонных движений упражнения с профессионала 109 получает значения о положении 15 или более точек в реальном времени, что позволяет задать необходимую длину массива поз. При активации переменной isRecord записывает в массив поз 15 или более точек (200-215) позы человека в каждом новом кадре, полученном с камеры 102. По окончанию записи скрипт преобразует массив поз в строку формата JSON "{pose0: {dot0: {x: '0.46', у: '0.2', z: '0.65'}, …}, pose1:{dot0:{… }, … }, … }" и записывает в текстовый файл. Записывается одно упражнение за один "круг" записи. В одном файле хранится набор данных эталонных поз одного упражнения. Как представлено на Фиг. 3, одно движение (упражнение) записывается следующим способом: после нажатия кнопки "запись" профи выполняет по одному повторению упражнения, стоя перед камерой в 3-х позициях: под 45 градусов правым боком к камере Фиг. 3(а), лицом к камере Фиг. 3(b), под 45 градусов левым боком к камере Фиг. 3(с). На изображении левое плечо обозначается 302а, 302b, 302с, правое плечо обозначается 303а, 303b, 303с соответственно. После этого запись останавливается кнопкой "остановить запись" и массив поз движений записывается в файл в память компьютера 107. После окончания записи движений упражнения в модуле записи в файл эталонных движений упражнения с профессионала нужно указать номера кадров в записи (массиве поз) начала и конца повторения упражнения в каждой позиции Фиг. 3 (а, b, с) положения профессионала перед камерой 102 текущего движения. Модуль составляет из них массив, конвертирует его в строку в формате "{0,12,36,50,67,89,106}" и сохраняет в текстовый файл в памяти компьютера 107. Таким образом посредством модуля записи в файл эталонных движений упражнения мы получаем на одно упражнение 2 текстовых файла с движением, записанным в 3-х положениях относительно камеры 102 и с номерами кадров начала и конца движения в каждом из этих 3-х положений (пример: movel.txt и movel_mp.txt). Модуль воспроизведения из файла записанных эталонных движений 110 берет текстовый файл из памяти компьютера 107 с записанным массивом поз в формате JSON и десериализует JSON-строку, далее записывая ее в пустой аналогичный массив. Так же берется второй текстовый файл с номерами начальных и конечных кадров поз в 3-х позициях и аналогичным образом десериализуются данные из него. Далее этот код при активации переменной (нажатии на кнопку) присваивает физическим точкам на экране позиции точек из считанного файла по очереди кадр за кадром. Позиция позы в массиве - номер кадра при записи. Таким образом воспроизводится движение.The module for writing to the file of the reference movements of the exercise from the professional 109 receives values about the position of 15 or more points in real time, which allows you to set the required length of the array of poses. When the isRecord variable is activated, it writes 15 or more points (200-215) of human poses to the pose array in each new frame received from camera 102. When the recording is completed, the script converts the pose array into a JSON format string "{pose0: {dot0: {x: '0.46', y: '0.2', z: '0.65'}, …}, pose1:{dot0:{… }, … }, … }" and writes to a text file. One exercise is recorded for one "circle" of the record. One file stores a set of reference posture data for one exercise. As shown in FIG. 3, one movement (exercise) is recorded in the following way: after pressing the "record" button, the pro performs one repetition of the exercise, standing in front of the camera in 3 positions: at 45 degrees with the right side to the camera Fig. 3(a), facing the camera of FIG. 3(b), 45 degrees left side to the camera of FIG. 3(c). In the image, the left arm is labeled 302a, 302b, 302c, the right arm is labeled 303a, 303b, 303c, respectively. After that, the recording is stopped by the "stop recording" button and the array of motion poses is written to a file in the computer memory 107. After the recording of the exercise movements in the module for recording to the file of the reference movements of the exercise from the professional, you need to specify the frame numbers in the recording (array of poses) of the beginning and end of the repetition exercises in each position Fig. 3 (a, b, c) positions of the professional in front of the camera 102 of the current movement. The module composes an array from them, converts it into a string in the format "{0,12,36,50,67,89,106}" and saves it to a text file in computer memory 107. one exercise 2 text files with movement recorded in 3 positions relative to camera 102 and with frame numbers of the beginning and end of the movement in each of these 3 positions (example: movel.txt and movel_mp.txt). The playback module from the file of recorded reference movements 110 takes a text file from the memory of the computer 107 with a recorded array of poses in JSON format and deserializes the JSON string, then writing it to an empty analogous array. The second text file is also taken with the numbers of the initial and final frames of poses in 3 positions and the data from it is deserialized in the same way. Further, this code, when activating the variable (pressing the button), assigns to the physical points on the screen the positions of the points from the read file frame by frame in turn. The position of the pose in the array is the frame number when recording. Thus, the movement is reproduced.

Модуль оценивания движений упражнения в реальном времени в 3D формате в сравнении с эталонной записью 111 получает данные о ключевых точках человека в реальном времени в 3-мерном пространстве 401, и также берет уже записанный массив поз 402, используя модуль воспроизведения из файла записанных эталонных движений 111 (Фиг. 4). Далее модуль оценивания движений упражнения в реальном времени в сравнении с эталонной записью 111 оценивает положение человека 101 в пространстве, оценивает поворот к камере 102, и подставляет соответствующую градусной мере поворота человека запись данного движения 403, 404 (если человек примерно под углом 45 градусов правым боком - подставляет и нормализует под более точный реальный угол поворота запись эталона под 45 градусов тем же боком к камере). Далее вычисляются и для воспроизводимого массива и для распознанной в реальном времени позы точка 216 и 2 вектора 217 и 218 положения соответствующих точек и векторов эталона и позы на текущем этапе выполнения движения сравниваются, с учетом переменной погрешности, которая нужна для того, чтобы задать возможную допустимую разницу движения человека 101 по сравнению с движением эталона, для допущения различного роста и пропорций тела. На базе данного сравнения алгоритм отображает на дисплее 112 ошибочные положения конечностей пользователя 101, отображая соответствующие точки красным цветом.The real-time exercise movement evaluation module in 3D format against the reference recording 111 obtains the person's key points in real time in 3-dimensional space 401, and also takes the already recorded array of poses 402 using the playback module from the recorded reference movement file 111 (Fig. 4). Further, the module for evaluating the movements of the exercise in real time, in comparison with the reference record 111, evaluates the position of the person 101 in space, evaluates the turn to the camera 102, and substitutes the record of this movement 403, 404 corresponding to the degree measure of the turn of the person (if the person is approximately at an angle of 45 degrees with his right side - substitutes and normalizes for a more accurate real angle of rotation the recording of the standard at 45 degrees with the same side to the camera). Next, both for the reproducible array and for the recognized real-time pose, point 216 and 2 vectors 217 and 218 of the position of the corresponding points and vectors of the reference and pose at the current stage of the movement are compared, taking into account the variable error, which is needed in order to set the possible allowable the difference in the movement of a person 101 compared to the movement of the standard, to allow for different heights and body proportions. Based on this comparison, the algorithm displays on the display 112 the erroneous positions of the limbs of the user 101, displaying the corresponding points in red.

Таким образом, реализация на практике описанных выше системы и способа позволяет записывать движения человека, человекоподобного роботизированного устройства, изображения человека, осуществлять воспроизведение и оценку движений пользователя в реальном времени (в формате 3D) по сравнению с записанным ранее эталонным движением профессионала.Thus, the implementation in practice of the system and method described above allows recording the movements of a person, a humanoid robotic device, images of a person, reproducing and evaluating the user's movements in real time (in 3D format) in comparison with the previously recorded reference movement of a professional.

Claims

1. A method for recognizing and analyzing user movements in real time, characterized by the fact that

- first, a system is created that includes a video stream source, a user, a computer containing a permanent data storage (computer memory), a network connection, an image display device, a neural network to obtain data on the position of key points of the user's pose in three dimensions in the given frame of the video stream, and a module for processing interaction with a neural network, a module for processing the transfer of poses to the graphic space, a module for recording professional exercises in a file of reference movements, a module for reproducing recorded reference movements from a file, and a module for evaluating user movements in real time in three-dimensional format in comparison with an array of poses of a professional’s reference movement ;

- after that, the video stream is captured and transmitted to the neural network to obtain data on the position of the key points of the pose that constitutes the estimated movement of the user's exercise, in three dimensions in the submitted frame of the video stream, followed by recording frame by frame of the received data in turn into an associative array for each user pose and writing such an array to a file, wherein the video stream is transmitted to the neural network to obtain data on the position of the key points of the user's pose in three dimensions in the given frame of the video stream, the subsequent recording frame by frame of the received data in turn into an associative array and writing this array to the file is carried out by encoding the image into an array of bytes, which is converted to a string in the base64string format, and forming an associative array of key points of the user's pose received from the neural network, and the structure of the common string, which is created in the module for processing the transfer of poses to the graphics space, has the format: “JSON | base64string";

- obtaining the position value of 15 (fifteen) or more key points of the pose of one reference movement of a professional in real time in each new frame is provided when writing to the file a reference array of poses of the movement of a professional, consisting of a set of associative arrays of the specified key points of the pose of a professional in each new frame, received from the source of the video stream, which allows you to set the required length of the array of poses, after which the array of poses is converted into a JSON string “{pose0: { dot0:{x: '0.46', y: '0.2', z: '0.65'], …} , pose1:{dot0:{ … } , … }, … }” and write this line to a text file in such a way that one movement is recorded in one recording stage, and the reference is stored in one file array of poses of one movement;

- moreover, the recording of one reference movement is carried out as follows: first, the recording is activated and the professional performs one repetition of the reference exercise, standing in front of the source of the video stream, for example, a camera, in three positions: at 45 degrees with the right side to the source of the video stream, directly opposite the source of the video stream and under 45 degrees left side to the source of the video stream, while its left and right shoulders are indicated on the image display device, after that the recording is stopped, and the data array of reference movements' poses is saved to a file in the computer's memory; after that, the frames are numbered in the received record, creating an array of indexes of the start and end positions of the movement repetition in each position of the professional in front of the source of the video stream, converting such an array into a string in the format “{0,12,36,50,67,89,106}” and saving into a text file in the computer memory, as a result of which two text files are obtained for one movement: a file with a reference array of poses containing a set of associative arrays of key points of movement repetition poses in three positions relative to the video stream source, and a file with an array of indexes of poses of the beginning and end of the reference movement for each of the three positions of a professional relative to the source of the video stream, while to reproduce the movement from a file with a reference array of poses, first use a text file from the computer memory with a recorded reference array of poses in JSON format, deserialize the JSON string and install it as a reference array of poses in the system , then the second text file is used in the same way, containing a string with an array of indexes of initial and final poses in three positions and similarly, data is deserialized from it and installed in the system, after which the positions of the key points from the array of poses are assigned to the points on the display device of the computer, in turn frame by frame in such a way that the pose index in the pose array corresponds to the frame number when recording;

- in addition, when writing to the file of the associative array of the pose that makes up the reference movement of the professional exercise, when playing back from the file of the associative array of the pose of the reference movement of the professional, when evaluating the movements of the exercise in three-dimensional format in real time, in comparison with the recorded reference movement of the professional, one more is calculated and added 1 point and 2 vectors for the reference movement of a professional and for the estimated movement of the user, based on the data on the positions of key points received from the neural network, while the calculated and added point is located in the middle between the points of the right and left hips of a person and serves to determine the position of the center of the human pelvis ; the first calculated and added vector is located identically to the location of the key point corresponding to the right thigh, is directed to the left thigh and serves to determine the degree measure of the rotation of the human body in space relative to the source of the video stream, and the second calculated and added vector is located identically to the location of the key point corresponding to the right shoulder , and serves to determine the degree measure of rotation of the user's body in space relative to the source of the video stream;

- further, the positions of the key points are evaluated in turn for each pose of the user in real time in comparison with the data of the reference movements of the exercise, obtained on the basis of the recording of the movements of a professional, by positioning a set of key points of the poses of the reference movement in accordance with the position and rotation angle of the user in real time according to in relation to the source of the video stream, and further frame-by-frame comparison of the positions of the corresponding key points at a specific stage of the movement by the user and the reference movement of the professional;

- then provide display on the display device of the image of the results of evaluation of the movements of the exercise performed by the user, by means of sound signals and/or voice synthesis, and/or graphic display;

- at the same time, interaction with the neural network is carried out by obtaining data on the key points of the user's pose from a string in JSON format and writing to variables, and then normalizing data processing by multiplying the data for which correction of the image of the proportions of the recognized key points of the user's movements is required, depending from the error of the neural network, the quality of the video stream and differences in the height and proportions of the body of the indicated objects of comparison, to the coefficients of the dynamic permissible error (-1; 1.5; -1.5) for the correct systematization of key points in the system modules that process the data of the user's movement and correct mirroring the key points with respect to the user in the graphics component, and the resulting base64string is decoded and converted back to an image and displayed on the image display device.

2. The method according to claim 1, characterized in that the image stream from the source of the video stream using a computer is transmitted via an Internet connection to a server computer that contains a running neural network and feeds into it the image stream received via an Internet connection, and the server computer sends an associative array of user poses as a string in JSON format back to the computer that sends the image stream to the server computer.

3. The method according to claim 1, characterized in that the implementation / implementation of the results of the system through sound signals, voice synthesis and graphic display contains, among other things, recommendations for correcting the execution of the movement, the number of times that the object of evaluation repeated the movement, an indication of the correct / incorrect positions of the key points of the pose of the object of evaluation compared to the reference array of poses at this stage of the movement.

4. The method according to claim 1, characterized in that the image display device displays the erroneous positions of the user's limbs by marking the corresponding key points with a color different from the rest of the correct positions, for example, in red.

5. A system for real-time recognition and analysis of user movements, including a video stream source, a user, a computer containing a permanent data storage, a network connection, an image display device, an open source neural network, a module for processing interaction with a neural network, a module for processing posture transmission in graphic space, a module for recording the reference movements of a professional exercise to a file, a module for reproducing from a file of recorded reference movements and a module for evaluating user movements in real time in comparison with a reference array of poses, while the module for recording reference movements of an exercise in a file of poses records one reference movement of a professional for one stage of recording and allows you to set the required length of the array of poses by obtaining at the beginning of recording from the source of the video stream the position values of 15 (fifteen) or more key points of the reference movement in real time in each new frame received from the source of the video stream, which allows you to set the required length array of poses, after which the array of poses is converted into a format string by means of the module for writing to the file of reference movements of the system, and the script transforms the array of poses into a JSON format string “{pose0: { dot0:{x: '0.46', y: '0.2', z: '0.65'], …} , pose1:{dot0:{ … } , … }, … }” and writing this line to a text file at the end of recording in such a way that one movement is recorded in one stage of recording, and stored in one file reference array of poses of one movement.

6. The system according to claim 5, characterized in that the playback module from the file of reference movements deserializes the reference array of poses of the movement recorded from the professional from the file, and deserializes the array of indices of the beginning and end of the execution of the reference movement of the professional in each of 3 positions relative to video stream source.

7. The system according to claim 5, characterized in that the module for evaluating user movements in real time, in comparison with the reference array of professional movement poses, substitutes the reference movement of a professional reproduced from the file for the corresponding position of the user in real time relative to the source of the video stream and evaluates and displays errors/correct position of the key points of his posture when performing the movement.

8. The system according to claim 5, characterized in that the module for processing interaction with the neural network receives from the neural network an associative array with position data, for example, in three-dimensional space, 15 (fifteen) key points of the user, structures and converts this array into a format string JSON.

9. The system according to claim 5, characterized in that, for example, a camera, a webcam, a video stream from the Internet, a pre-recorded video from a camera can be used as a source of a video stream.

10. The system according to claim 5, characterized in that, for example, a personal computer, a laptop, a tablet, a terminal, a portable device, a smartphone can be used as a computer.

11. The system according to claim 5, characterized in that, for example, a personal computer, a server computer, a supercomputer, a cloud service, a dedicated server, a laptop, a terminal, a tablet, a portable device, a mobile device, smartphone.

12. The system according to claim 5, characterized in that, for example, a display, monitor, TV, screen of a mobile device, terminal, laptop, tablet, portable device, projector can be used as an image display device.

13. The system according to claim 5, characterized in that the movements of, for example, a professional in the field of interest, a person, a robotic humanoid device can be used as a reference movement.