RU2618389C2

RU2618389C2 - Method for contactless controlling mouse cursor

Info

Publication number: RU2618389C2
Application number: RU2015124282A
Authority: RU
Inventors: Алексей Анатольевич Карпов; Андрей Леонидович Ронжин
Priority date: 2015-06-22
Filing date: 2015-06-22
Publication date: 2017-05-03
Also published as: RU2015124282A

Abstract

FIELD: physics.

SUBSTANCE: method for contactless controlling the mouse cursor is proposed. According to the method, the location of the interest area is determined on the convex shape. The convex form is a man's head, the interest area is a man's face area between the eyebrows and the lower lip. The digitized image of the said convex shape is obtained by a video camera. After the said stored brightness pattern of the interest area is stored in the form of a matrix, the location of the five reference points is determined within the said stored brightness pattern of the interest area, and then the plurality of the digitized video images is registered. The two-dimensional coordinates of the said reference points are determined in each of the said plurality of the video images by comparing the stored brightness pattern matrix of the interest area and the registered brightness pattern matrix of the interest area, and then the two-dimensional coordinates of the said reference points are used for controlling the mouse cursor.

EFFECT: increasing the robustness of tracking the movement of the user's head by raising the productivity of the system and reducing the level of the object allocating errors.

3 cl, 1 tbl, 7 dwg

Description

Изобретение относится к области искусственного интеллекта, а именно к области бесконтактного человеко-машинного взаимодействия. Оно может быть использовано для управления компьютером или другими техническими устройствами с помощью манипулятора типа «мышь» (далее - мышь), в частности, предназначенными для людей с нарушениями моторных функций рук. Для таких людей предназначаются ассистивные технологии для бесконтактного взаимодействия с компьютером посредством отслеживания осмысленных движений (жестов) головы или частей тела человека.The invention relates to the field of artificial intelligence, namely to the field of contactless human-machine interaction. It can be used to control a computer or other technical devices using a mouse-type manipulator (hereinafter - the mouse), in particular, designed for people with impaired motor functions of the hands. Assistive technologies for contactless interaction with a computer are intended for such people by tracking meaningful movements (gestures) of the head or parts of the human body.

Для решения задачи отслеживания движений головы пользователя возможно использовать различные аппаратные средства, когда пользователь надевает на голову специальное устройство (шлем, очки виртуальной реальности или специальную конструкцию с отражающими метками). Например, американская компания NaturalPoint (www.naturalpoint.com/smartnav) выпускает устройства SmartNav, реализующие функции бесконтактной мыши. Эта система состоит из инфракрасного приемо-передатчика и нескольких отражающих меток, которые должны быть прикреплены к лицу пользователя или к специальной кепке. Другая американская компания InterSence (www.intersense.com) производит аппаратные трекеры InterTrax для шлемов виртуальной реальности. Внутри такого устройства находится микроминиатюрный гироскоп, который позволяет отслеживать положение и ориентацию головы в трехмерном пространстве.To solve the task of tracking the movements of the user's head, it is possible to use various hardware when the user puts on a special device (helmet, virtual reality glasses or a special design with reflective marks). For example, the American company NaturalPoint (www.naturalpoint.com/smartnav) launches SmartNav devices that implement the functions of a contactless mouse. This system consists of an infrared transceiver and several reflective labels that must be attached to the face of the user or to a special cap. Another American company, InterSence (www.intersense.com), produces InterTrax hardware trackers for virtual reality helmets. Inside such a device is a microminiature gyroscope that allows you to track the position and orientation of the head in three-dimensional space.

Кроме того, для этой задачи могут также применяться специальные устройства со светодиодами (и аккумуляторами), которые отслеживаются инфракрасной видеокамерой. Например, комплект для ассистивного управления компьютером КАУ-09-1 (http://www.fatum-spb.ru/razrabotki-dlya-invalidov.html) или цветными реперными (контрольными) точками-мишенями, которые крепятся на специальном шлеме, надеваемом на голову. Еще одним аналогом является аппаратная система «Шлемомышь» (Кричевец, А. Шлемомышь // Компьютерра, №434, 2002. - С. 48-51. - Режим доступа: www.computerra.ru/offline/2002/434/16588), в которой используется специальная мишень на шлеме, одеваемом на голову пользователя. Реперные точки на таких устройствах отслеживаются, как правило, посредством инфракрасной, либо оптической видеокамеры. Однако как пользователи, так и психофизиологи говорят о том, что люди не желают использовать для человеко-машинного взаимодействия носимые на голове или теле аппаратные устройства, значительно снижающие естественность взаимодействия и мобильность передвижения из-за наличия проводов, кабелей, аккумуляторов для их автономной работы, их общей громоздкости и технических сложностей в калибровке и установке. Кроме того, люди без рук не могут надеть такое устройство сами себе на голову, поэтому им в любом случае нужна сторонняя помощь.In addition, special devices with LEDs (and batteries), which are monitored by an infrared video camera, can also be used for this task. For example, a kit for assistive computer control KAU-09-1 (http://www.fatum-spb.ru/razrabotki-dlya-invalidov.html) or colored reference (control) target points, which are mounted on a special helmet, worn on the head. Another analogue is the Shlemomysh hardware system (Krichevets, A. Shlemomysh // Computerra, No. 434, 2002. - P. 48-51. - Access mode: www.computerra.ru/offline/2002/434/16588), which uses a special target on a helmet worn on the user's head. Reference points on such devices are tracked, as a rule, by means of an infrared or optical video camera. However, both users and psychophysiologists say that people do not want to use hardware devices that are worn on the head or body for human-machine interaction, which significantly reduce the naturalness of the interaction and mobility of movement due to the presence of wires, cables, batteries for their autonomous operation, their overall bulkiness and technical difficulties in calibration and installation. In addition, people without hands can not put such a device on their own heads, so in any case they need outside help.

Возможны также случаи, когда в результате болезни помимо рук может парализовать также и шею человека, в этом случае он не может использовать жесты головой для управления курсором (указателем) мыши на экране компьютера. Чтобы решить эту проблему, возможно применение системы отслеживания взгляда пользователя (eye-tracking). Подобные аналоги (патент на изобретение РФ 2522848 от 20.07.2014; система Eyegaze System (http://www.eyegaze.com) от компании LC Technologies; 208. Tinto Garcia-Moreno, F. Eye Gaze Tracking System Visual Mouse Application Development // Technical Report, Ecole Nationale Superiere de Physique de Strasbourg (ENSPS) and School of Computer Science, Queen's University Belfast, 2001. - 77 p.) позволяют пользователю с помощью взгляда указывать на объекты воздействия или выбирать элементы меню графического интерфейса компьютера. Их применение осложняется тем, что необходимо использовать дорогие высокоскоростные цифровые видеокамеры высокой четкости (с большим оптическим разрешением), так как область глаза незначительна по размеру и сложна в распознавании. Существуют также варианты размещения видеокамеры прямо перед глазами человека на специальном шлеме, надеваемом пользователем (http://neurobotics.ru/products/eye_tracking). Однако как показывают когнитивные исследования, использование отслеживания направления взгляда для управления курсором намного хуже, чем отслеживание движений/жестов головой в таких показателях как производительность, эмоциональная нагрузка на пользователя, удобство использования и т.д.There are also cases when, as a result of an illness, in addition to hands, it can also paralyze a person’s neck, in which case he cannot use gestures with his head to control the mouse cursor (pointer) on a computer screen. To solve this problem, it is possible to use a system of tracking the user's gaze (eye-tracking). Similar analogues (patent for the invention of the Russian Federation 2522848 from 07.20.2014; the Eyegaze System (http://www.eyegaze.com) from LC Technologies; 208. Tinto Garcia-Moreno, F. Eye Gaze Tracking System Visual Mouse Application Development / / Technical Report, Ecole Nationale Superiere de Physique de Strasbourg (ENSPS) and School of Computer Science, Queen's University Belfast, 2001. - 77 p.) Allow the user to use a glance to point out objects of influence or select menu items on the computer's graphical interface. Their use is complicated by the fact that it is necessary to use expensive high-speed digital cameras of high definition (with high optical resolution), since the eye area is small in size and difficult to recognize. There are also options for placing a video camera right in front of a person’s eyes on a special helmet worn by the user (http://neurobotics.ru/products/eye_tracking). However, as cognitive studies show, the use of gaze tracking to control the cursor is much worse than tracking head movements / gestures in terms of performance, emotional load on the user, usability, etc.

Известны аналоги (патент на изобретение РФ 2401629 от 20.10.2010; патент на изобретение РФ 2542369 от 20.02.2015; Аграновский, А.В. Аппаратно-программные инструментальные средства проектирования виртуальных акустических объектов и сцен для слепых пользователей персональных компьютеров / А.В. Аграновский, Г.Е. Евреинов, А.С. Яшкин // Материалы IX Международной конференции-выставки «Информационные технологии в образовании». - Москва, 1999), в которых управление мышью осуществляется с использованием ног вместо рук, манипулятора, располагаемого в полости рта, или специального тактильного манипулятора, функционирующего за счет изменения положения центра масс тела человека. Общими недостатками указанных аналогов являются низкие производительность и удобство использования, большая эмоциональная нагрузка.Analogs are known (patent for the invention of the Russian Federation 2401629 from 10/20/2010; patent for the invention of the Russian Federation 2542369 from 02.20.2015; Agranovsky, A.V. Hardware and software tools for designing virtual acoustic objects and scenes for blind users of personal computers / A.V. Agranovsky, G.E. Evreinov, A.S. Yashkin // Materials of the IX International Conference and Exhibition "Information Technologies in Education" - Moscow, 1999), in which the mouse is controlled using legs instead of arms, a manipulator located in the cavity mouth, or with a special tactile manipulator that functions by changing the position of the center of mass of the human body. Common disadvantages of these analogues are low productivity and ease of use, a large emotional burden.

Наиболее близким по технической сущности к заявляемому способу и выбранным в качестве прототипа является способ отслеживания местоположения подвижной трехмерной выпуклой формы (поверхности) с помощью видеокамеры (патент US 6925122 В2 от 02.08.2005), содержащий этапы:The closest in technical essence to the claimed method and selected as a prototype is a method for tracking the location of a moving three-dimensional convex shape (surface) using a video camera (patent US 6925122 B2 dated 02.08.2005), comprising the steps of:

этап а: определяют местоположение области интереса на указанной выпуклой форме, указанное местоположение выбирают из группы, состоящей из точки на указанной выпуклой форме, ближайшей к видеокамере, и точки на указанной выпуклой форме, ближайшей к фиксированной точке пространства, указанное местоположение имеет возможность перемещения на указанной выпуклой форме так, что выпуклая форма изменяет местоположение и ориентацию в пространстве;stage a: determine the location of the region of interest on the specified convex shape, the specified location is selected from the group consisting of a point on the specified convex shape closest to the camera, and a point on the specified convex shape closest to the fixed point in space, the specified location has the ability to move on the specified convex shape so that the convex shape changes the location and orientation in space;

этап b: сохраняют оцифрованное видеоизображение указанной выпуклой формы в окрестности указанной области интереса, размер указанной окрестности определяют по площади поверхности указанной выпуклой формы с постоянной сферической кривизной, указанное оцифрованное видеоизображение имеет паттерн (эталон) яркости, называемый сохраненным паттерном яркости области интереса, указанный сохраненный паттерн яркости области интереса сохраняют в виде матрицы;step b: save the digitized video image of the specified convex shape in the vicinity of the specified region of interest, the size of the specified neighborhood is determined by the surface area of the specified convex shape with constant spherical curvature, the specified digitized video image has a brightness pattern (reference), called the saved brightness pattern of the region of interest, the specified saved pattern the brightness of the region of interest is stored as a matrix;

этап с: определяют местоположение опорной точки, выбранной из группы, состоящей из центра указанного сохраненного паттерна яркости области интереса и местоположения в пределах указанного сохраненного паттерна яркости области интереса;step c: determining the location of the reference point selected from the group consisting of the center of the specified stored luminance pattern of the region of interest and the location within the specified stored luminance pattern of the region of interest;

этап d: регистрируют множество оцифрованных видеоизображений, каждое из упомянутого множества видеоизображений содержит видеоизображение указанной выпуклой формы, названное зарегистрированным паттерном яркости области интереса, указанный зарегистрированный паттерн яркости области интереса приводят к тому же размеру, что и сохраненный паттерн яркости области интереса, и для каждого из упомянутого множества видеоизображений регистрируют в виде матрицы;step d: a plurality of digitized video images are recorded, each of said plurality of video images contains a video image of the indicated convex shape, called the registered brightness pattern of the region of interest, said registered brightness pattern of the region of interest lead to the same size as the stored brightness pattern of the region of interest, and for each of said plurality of video images are recorded as a matrix;

этап е: сравнивают матрицу сохраненного паттерна яркости области интереса и матрицу зарегистрированного паттерна яркости области интереса для каждого из указанного множества видеоизображений на основе попиксельного сравнения или корреляционного анализа, чтобы определить двумерные координаты указанной опорной точки в каждом из указанного множества видеоизображений с точностью до одного пикселя; иstep e: comparing the matrix of the stored luminance pattern of the region of interest and the matrix of the registered luminance pattern of the region of interest for each of the specified set of video images based on pixel-by-pixel comparison or correlation analysis to determine two-dimensional coordinates of the specified reference point in each of the specified set of video images with an accuracy of one pixel; and

этап g: используют двумерные координаты указанной опорной точки в качестве информации, необходимой для управления компьютером, в каждом из указанного множества видеоизображений;step g: use the two-dimensional coordinates of the specified reference point as the information necessary to control the computer in each of the specified set of video images;

при этом выпуклая форма является формой кончика носа на лице, а этап е дополнительно содержит определение окна поиска в каждом из указанного множества видеоизображений, внутри которого выполняют сравнение, указанное окно поиска выбирают из группы, состоящей из:the convex shape is the shape of the tip of the nose on the face, and step e further comprises determining a search box in each of the specified set of video images, within which a comparison is performed, the specified search box is selected from the group consisting of:

a) окно поиска представляет собой квадратную область с длиной стороны в пределах от четверти ширины указанного лица до ширины указанного лица, указанная область имеет центр в местоположении кончика носа на предыдущем видеоизображении из упомянутого множества видеоизображений, если указанное местоположение известно;a) the search box is a square region with a side length ranging from a quarter of the width of the specified face to the width of the specified face, the specified area has a center at the location of the tip of the nose in the previous video image from the above-mentioned many video images, if the specified location is known;

b) окно поиска представляет собой прямоугольную область, определяемую с использованием автоматизированной технологии обнаружения лица;b) the search box is a rectangular area determined using automated face detection technology;

c) окно поиска представляет собой всю область изображения видеокадра.c) the search box is the entire image area of the video frame.

Недостатком способа прототипа является низкая робастность слежения за перемещением головы пользователя, обусловленная использованием только анализа положения кончика носа на видеоизображениях.The disadvantage of the prototype method is the low robustness of tracking the movement of the user's head, due to the use of only analysis of the position of the tip of the nose in video images.

Для оценивания робастности принято использовать методику международного стандарта ISO 9241-9:2000 "Requirements for non-keyboard input devices" («Требования к неклавитаурным устройствам ввода информации»), которая базируется на экспериментах и законах, разработанных в середине 20 века американским психологом-когнитивистом П. Фиттсом (Paul Morris Fitts), и впоследствии развитых другими учеными [Soukoreff, R.W. Towards а standard for pointing device evaluation, perspectives on 27 years of Fitts' law research in HCI / R.W. Soukoreff, I.S. MacKenzie // Int. Journal of Human Computer Studies, Vol. 61, No. 6, 2004. - pp. 751-789].To assess robustness, it is customary to use the methodology of the international standard ISO 9241-9: 2000 "Requirements for non-keyboard input devices", which is based on experiments and laws developed in the mid-20th century by an American cognitive psychologist P. Fitris (Paul Morris Fitts), and subsequently developed by other scientists [Soukoreff, RW Towards a standard for pointing device evaluation, perspectives on 27 years of Fitts' law research in HCI / R.W. Soukoreff, I.S. MacKenzie // Int. Journal of Human Computer Studies, Vol. 61, No. 6, 2004. - pp. 751-789].

Данная методика состоит в следующем. Пользователи при помощи предоставленного им устройства указательного ввода, должны насколько возможно быстро отметить на экране набор целей-объектов, последовательно появляющихся по круговой схеме на экране. При этом порядок целей задается таким образом, чтобы пользователь последовательно выделял наиболее удаленно расположенные друг от друга объекты, совершая движения указателем в различных направлениях [Schapira, Е. Experimental evaluation of vision and speech based multimodal interfaces / E. Schapira, R. Sharma // In Proc. Workshop on Perceptive User Interfaces PUI, USA, 2001. - pp. 1-9]. При этом вычисляется индекс сложности задачи ID ("index of difficulty"), измеряемый в битах, в соответствии с формулой Шэннона [Carbini, S. Evaluation of contact-less multimodal pointing devices / S. Carbini, J.E. Viallet // In Proc. 2-nd IASTED International Conference on Human-Computer Interaction, Chamonix, France, 2006. - pp. 226-231]:This technique is as follows. Users, using the pointing input device provided to them, should as quickly as possible mark on the screen a set of targets-objects that appear sequentially in a circular pattern on the screen. In this case, the order of goals is set so that the user sequentially selects the objects most distant from each other, moving the pointer in different directions [Schapira, E. Experimental evaluation of vision and speech based multimodal interfaces / E. Schapira, R. Sharma // In Proc. Workshop on Perceptive User Interfaces PUI, USA, 2001. - pp. 1-9]. In this case, the complexity index of the task ID ("index of difficulty") is calculated, measured in bits, in accordance with the Shannon formula [Carbini, S. Evaluation of contact-less multimodal pointing devices / S. Carbini, J.E. Viallet // In Proc. 2-nd IASTED International Conference on Human-Computer Interaction, Chamonix, France, 2006. - pp. 226-231]:

где D - расстояние между центрами целей (диаметр окружности); W - диаметр круглой цели в экранных пикселях. Согласно закону Фиттса время движения MT между целями линейно зависит от индекса сложности ID задания. Однако координаты точки, где происходит выделение цели, зависят как от фактического расстояния между точками, так и от фактического диаметра самих целей (чем меньше цель, тем сложнее попасть по ее центру). Поэтому фактический (эффективный, "effective") индекс сложности выражается следующим образом:where D is the distance between the centers of the targets (circle diameter); W is the diameter of the circular target in screen pixels. According to Fitts’s law, the MT movement time between goals linearly depends on the complexity index of the task ID. However, the coordinates of the point where the target is selected depend both on the actual distance between the points and on the actual diameter of the goals themselves (the smaller the target, the more difficult it is to get to its center). Therefore, the actual (effective, "effective") complexity index is expressed as follows:

где De - фактическое расстояние между точками кликов целей и We - фактический диаметр цели, который принято вычислять через энтропию нормального распределения величины:where De is the actual distance between the click points of the targets and We is the actual diameter of the target, which is usually calculated through the entropy of the normal distribution of the value:

где σ - среднеквадратическое отклонение координат точки выделения, проецируемой на ось, которая соединяет центры начальной и конечной целей.where σ is the standard deviation of the coordinates of the selection point projected onto the axis, which connects the centers of the initial and final targets.

Получаемые значения IDe отличаются от значений ID, более точно учитывая качество выполнения тестового задания пользователем. При этом согласно методике Фиттса основным показателем оценки робастности является производительность работы с системой TP ("throughput"), отражающая компромисс между временем движения (выполнения задания) МТ и точностью выделения целей:The resulting IDe values differ from the ID values, more accurately taking into account the quality of the test task execution by the user. Moreover, according to the Fitts technique, the main indicator of robustness assessment is the performance of working with the TP ("throughput") system, which reflects a compromise between the time of movement (task fulfillment) of the MT and the accuracy of target allocation:

Задачей изобретения является разработка способа бесконтактного управления курсором мыши, позволяющего повысить робастность слежения за перемещением головы пользователя путем повышения производительности работы с системой и снижения уровня ошибок выделения объектов.The objective of the invention is to develop a method of non-contact control of the mouse cursor, which allows to increase the robustness of tracking the movement of the user's head by increasing productivity with the system and reducing the level of selection errors.

В заявленном способе эта задача решается тем, что в способе бесконтактного управления курсором мыши, заключающемся в том, что определяют местоположение области интереса на выпуклой форме, сохраняют оцифрованное видеоизображение указанной выпуклой формы в окрестности указанной области интереса, указанное оцифрованное видеоизображение имеет паттерн яркости, называемый сохраненным паттерном яркости области интереса, указанный сохраненный паттерн яркости области интереса сохраняют в виде матрицы, регистрируют множество оцифрованных видеоизображений, каждое из упомянутого множества видеоизображений содержит видеоизображение указанной выпуклой формы, названное зарегистрированным паттерном яркости области интереса, указанный зарегистрированный паттерн яркости области интереса приводят к тому же размеру, что и сохраненный паттерн яркости области интереса, и для каждого из упомянутого множества видеоизображений регистрируют в виде матрицы, сравнивают матрицу сохраненного паттерна яркости области интереса и матрицу зарегистрированного паттерна яркости области интереса для каждого из указанного множества видеоизображений на основе попиксельного сравнения или корреляционного анализа, при этом окна поиска в каждом из указанного множества видеоизображений, внутри которого выполняют сравнение, выбирают из группы, состоящей из: окна поиска, представляющего собой квадратную область с длиной стороны в пределах от четверти ширины лица до ширины лица и центром, соответствующим местоположению кончика носа на предыдущем видеоизображении из упомянутого множества видеоизображений, если указанное местоположение известно, окна поиска, представляющего собой прямоугольную область, определяемую с использованием автоматизированной технологии обнаружения лица, или окна поиска, представляющего собой всю область изображения видеокадра, дополнительно принимают, что выпуклая форма представляет собой голову человека, а область интереса - область лица человека между бровями и нижней губой. Перед тем как определяют местоположение области интереса на выпуклой форме, получают оцифрованное изображение указанной выпуклой формы с помощью видеокамеры. После того как указанный сохраненный паттерн яркости области интереса сохраняют в виде матрицы, определяют местоположение пяти опорных точек в пределах указанного сохраненного паттерна яркости области интереса, а затем регистрируют множество оцифрованных видеоизображений. По результатам сравнения матрицы сохраненного паттерна яркости области интереса и матрицы зарегистрированного паттерна яркости области интереса определяют двумерные координаты указанных опорных точек в каждом из указанного множества видеоизображений. Затем используют двумерные координаты указанных опорных точек для управления курсором мыши.In the claimed method, this problem is solved in that in the method of non-contact control of the mouse cursor, which consists in determining the location of the region of interest on a convex shape, save the digitized video image of the specified convex shape in the vicinity of the specified region of interest, the specified digitized video image has a brightness pattern called stored the brightness pattern of the region of interest, the stored pattern of brightness of the region of interest stored in the form of a matrix, register a lot of digitized of ideo images, each of said plurality of video images comprises a video image of said convex shape, called a registered brightness pattern of a region of interest, said registered brightness pattern of a region of interest is brought to the same size as a stored brightness pattern of a region of interest, and for each of said plurality of video images is recorded as matrices, compare the matrix of the stored luminance pattern of the region of interest and the matrix of the registered luminance pattern of the region and interest for each of the specified set of video images based on pixel-by-pixel comparison or correlation analysis, while the search boxes in each of the specified set of video images within which the comparison is performed are selected from the group consisting of: a search window, which is a square area with side lengths within from a quarter of the width of the face to the width of the face and the center corresponding to the location of the tip of the nose in the previous video image from the above set of video images, if the specified location the position is known, the search box, which is a rectangular area, determined using automated face detection technology, or the search box, which is the entire image area of the video frame, further assume that the convex shape is a human head, and the region of interest is the area of a person’s face between the eyebrows and lower lip. Before determining the location of the region of interest on the convex shape, a digital image of the specified convex shape is obtained using a video camera. After the indicated stored luminance pattern of the region of interest is stored as a matrix, the location of the five reference points within the specified saved luminance pattern of the region of interest is determined, and then a plurality of digitized video images are recorded. By comparing the matrix of the stored brightness pattern of the region of interest and the matrix of the registered brightness pattern of the region of interest, two-dimensional coordinates of the indicated reference points in each of the specified set of video images are determined. Then use the two-dimensional coordinates of these reference points to control the mouse cursor.

Опорные точки лица человека представляют собой центр верхней губы, кончик носа, точку между глаз на переносице, зрачок правого глаза и зрачок левого глаза пользователя.The reference points of the person’s face are the center of the upper lip, the tip of the nose, the point between the eyes on the nose, the pupil of the right eye and the pupil of the left eye of the user.

Кроме того, в случае потери одной из опорных точек осуществляют ее восстановление на основе двумерных координат двух точек, оставшихся в прямоугольной области, объединяющей тройки опорных точек.In addition, in the event of the loss of one of the control points, it is restored based on the two-dimensional coordinates of two points remaining in a rectangular region that combines the triples of control points.

Новая совокупность существенных признаков позволяет достичь указанного технического результата за счет:A new set of essential features allows you to achieve the specified technical result due to:

- использования пяти естественных опорных точек на лице человека, формирующих две перпендикулярные линии;- the use of five natural reference points on the person’s face, forming two perpendicular lines;

- возможности корректировки положения отслеживаемых опорных точек, определяемой различием пропорций лиц различных людей;- the possibility of adjusting the position of the tracked reference points, determined by the difference in the proportions of the faces of different people;

- восстановления опорных точек в области интереса в случае потери одной из них.- restoration of reference points in the area of interest in the event of the loss of one of them.

Проведенный анализ уровня техники позволил установить, что аналоги, характеризующиеся совокупностью признаков, тождественных всем признакам заявленного способа бесконтактного управления курсором мыши, отсутствуют. Следовательно, заявленное изобретение соответствует условию патентоспособности «новизна».The analysis of the prior art made it possible to establish that there are no analogues that are characterized by a combination of features identical to all the features of the claimed method of contactless control of the mouse cursor. Therefore, the claimed invention meets the condition of patentability "novelty."

Результаты поиска известных решений в данной и смежных областях техники с целью выявления признаков, совпадающих с отличительными от прототипа признаками заявленного объекта, показали, что они не следуют явным образом из уровня техники. Из уровня техники также не выявлена известность влияния предусматриваемых существенными признаками заявленного изобретения преобразований на достижение указанного технического результата. Следовательно, заявленное изобретение соответствует условию патентоспособности «изобретательский уровень».Search results for known solutions in this and related fields of technology in order to identify features that match the distinctive features of the claimed object from the prototype showed that they do not follow explicitly from the prior art. The prior art also did not reveal the popularity of the impact provided by the essential features of the claimed invention, the transformations on the achievement of the specified technical result. Therefore, the claimed invention meets the condition of patentability "inventive step".

Заявленное изобретение поясняется следующими чертежами:The claimed invention is illustrated by the following drawings:

- фиг. 1, на которой представлена блок-схема последовательности действий, реализующих предлагаемый способ;- FIG. 1, which shows a block diagram of a sequence of actions that implement the proposed method;

- фиг. 2, на которой представлена система пяти опорных точек на лице человека;- FIG. 2, which presents a system of five reference points on the face of a person;

- фиг. 3, отображающей график скорости движения головы пользователя в видеокадрах при бесконтактном выделении целей на экране;- FIG. 3, displaying a graph of the speed of movement of the user's head in video frames with the contactless selection of targets on the screen;

- фиг. 4, отображающая схему и порядок расположения целей на экране для проведения экспериментов по методике Фиттса (а) и траекторию движения курсора при выполнении задания жестами головой (б);- FIG. 4, showing the layout and order of the targets on the screen for experiments using the Fitts technique (a) and the trajectory of the cursor when performing tasks with head gestures (b);

- фиг. 5, на которой приведено соотношение значений фактической сложности IDe и теоретической сложности ID задачи;- FIG. 5, which shows the relationship between the values of the actual complexity IDe and the theoretical complexity ID of the task;

- фиг. 6, на которой представлены результаты анализа значений времени движения МТ от одной цели к другой при выполнении пользователями поставленной тестовой задачи;- FIG. 6, which presents the results of the analysis of the values of the time the MT moves from one target to another when users perform a test task;

- фиг. 7, на которой представлены результаты анализа значений производительности TP по методике Фиттса при выполнении тестового сценария согласно разработанного способа.- FIG. 7, which presents the results of the analysis of TP performance values by the Fitts method when executing a test scenario according to the developed method.

Реализация заявленного способа заключается в следующем (фиг. 1).Implementation of the claimed method is as follows (Fig. 1).

В блоке 101 получают оцифрованное изображение выпуклой формы с помощью видеокамеры, при этом выпуклая форма представляет собой голову человека (пользователя) с нарушениями моторных функций рук, управляющего компьютером или другими техническими устройствами.In block 101, a digitized image of a convex shape is obtained using a video camera, while the convex shape is the head of a person (user) with impaired motor functions of the hands controlling a computer or other technical devices.

В блоке 102 определяют местоположение области интереса на выпуклой форме, представляющей собой область лица человека между бровями и нижней губой.At a block 102, a region of interest is determined on a convex shape that is the region of a person’s face between the eyebrows and the lower lip.

Поиск области интереса может проводиться, например, методом AdaBoost [Вежневец, A. Boosting - Усиление простых классификаторов / А. Вежневец, В. Вежневец // Компьютерная графика и мультимедиа. Вып. 4(2), 2006. - Режим доступа: http://cgm.computergraphics.ru/content/view/112] на основе алгоритма Виола-Джонс [Viola, P. Rapid Object Detection using a Boosted Cascade of Simple Features / P. Viola, M. Jones // In Proc. 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR-2001, 2001. - pp. 511-515]. Изображение сканируется рамкой-окном заданного размера и строится пирамида копий объектов. Построенная пирамида анализируется заранее обученными каскадами Хаара, и на изображении находятся графические области, отвечающие заданной визуальной модели [Lienhart, R. An Extended Set of Haar-like Features for Rapid Object Detection / R. Lienhart, J. Maydt // In Proc. IEEE International Conference on Image Processing ICIP'2002, Rochester, New York, USA, 2002. - pp. 900-903]. Метод детекции лица пользователя находит прямоугольные графические области на изображении с видеокадров, с высокой степенью вероятности содержащие изображение лица человека. Введено ограничение, что размер такой области должен быть не менее 220×250 пикселей (при оптическом разрешении видеокадров 640×480 пикселей), чтобы захватывать только одно лицо в кадре, достаточно близко расположенное по отношению к видеокамере, а кроме того, это ускоряет процесс обработки видеопотока. Данные методы видеообработки реализованы в библиотеке компьютерного зрения OpenCV [Bradsky, G. Learning OpenCV / G. Bradsky, A. Kaehler // O'Reilly Publisher, 2008. - 571 p.] и применяются в модифицированном виде в данном способе.The search for a region of interest can be carried out, for example, by the AdaBoost method [Vezhnivets, A. Boosting - Strengthening simple classifiers / A. Vezhnivets, V. Vezhnivets // Computer Graphics and Multimedia. Vol. 4 (2), 2006. - Access mode: http://cgm.computergraphics.ru/content/view/112] based on the Viola-Jones algorithm [Viola, P. Rapid Object Detection using a Boosted Cascade of Simple Features / P Viola, M. Jones // In Proc. 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition CVPR-2001, 2001. - pp. 511-515]. The image is scanned by a frame-window of a given size and a pyramid of copies of objects is built. The constructed pyramid is analyzed by previously trained Haar cascades, and the image contains graphic areas that correspond to a given visual model [Lienhart, R. An Extended Set of Haar-like Features for Rapid Object Detection / R. Lienhart, J. Maydt // In Proc. IEEE International Conference on Image Processing ICIP'2002, Rochester, New York, USA, 2002. - pp. 900-903]. The method of detecting the user's face finds rectangular graphic areas in the image from video frames, with a high degree of probability containing the image of the person’s face. A restriction has been introduced that the size of such an area must be at least 220 × 250 pixels (with optical resolution of video frames 640 × 480 pixels) to capture only one face in the frame, close enough to the camera, and in addition, this speeds up the processing process video stream. These video processing methods are implemented in the OpenCV computer vision library [Bradsky, G. Learning OpenCV / G. Bradsky, A. Kaehler // O'Reilly Publisher, 2008. - 571 p.] And are used in a modified form in this method.

В блоке 103 сохраняют оцифрованное видеоизображение указанной выпуклой формы в окрестности указанной области интереса. Указанное оцифрованное видеоизображение имеет паттерн яркости, называемый сохраненным паттерном яркости области интереса, поэтому в блоке 104 указанный сохраненный паттерн яркости области интереса сохраняют в виде матрицы.In block 103, a digitized video image of the indicated convex shape is stored in the vicinity of the specified region of interest. The specified digitized video image has a luminance pattern called a stored luminance pattern of the region of interest, therefore, in block 104, the indicated luminance pattern of the region of interest is stored as a matrix.

В блоке 105 определяют местоположение пяти опорных точек (фиг. 2) в пределах указанного сохраненного паттерна яркости области интереса.In block 105, the location of the five reference points (Fig. 2) is determined within the specified stored brightness pattern of the region of interest.

Естественные опорные точки лица человека представляют собой: центр верхней губы, кончик носа, точку между глаз на переносице, зрачок правого глаза и зрачок левого глаза. Причем эти точки формируют две перпендикулярные линии: вертикальную (точки 1-3) и горизонтальную (точки 3-5). Пропорции лиц различных людей схожи, но не идентичны, поэтому способ предусматривает возможность корректировки положения системы отслеживаемых точек, которую можно произвести путем изменения соответствующих параметров. Кроме того, экспериментально было обнаружено, что для людей со светлыми глазами эти две естественных точки не являются надежными для отслеживания, поэтому их также можно исключить при определении местоположения опорных точек.The natural reference points of a person's face are: the center of the upper lip, the tip of the nose, the point between the eyes on the bridge of the nose, the pupil of the right eye and the pupil of the left eye. Moreover, these points form two perpendicular lines: vertical (points 1-3) and horizontal (points 3-5). The proportions of the faces of different people are similar, but not identical, so the method provides for the possibility of adjusting the position of the system of tracked points, which can be done by changing the corresponding parameters. In addition, it was experimentally found that for people with bright eyes, these two natural points are not reliable for tracking, so they can also be excluded when determining the location of control points.

В блоке 106 регистрируют множество оцифрованных видеоизображений, каждое из упомянутого множества видеоизображений содержит видеоизображение указанной выпуклой формы, названное зарегистрированным паттерном яркости области интереса.At a block 106, a plurality of digitized video images are recorded, each of said plurality of video images comprising a video image of said convex shape, called a registered brightness pattern of the region of interest.

Для отслеживания движений головы пользователя применяется метод, реализованный на основе базового алгоритма Лукас-Канаде (Lukas-Kanade) [Lucas, B.D. An Iterative Image Registration Technique with an Application to Stereo Vision / B.D. Lucas, T. Kanade // IJCAI, 1981. - pp. 674-679] и его более поздней пирамидальной модификации [Bouguet, J.-Y. Pyramidal Implementation of the Lucas-Kanade Feature Tracker Description of the algorithm // Intel Corporation Microprocessor Research Labs, 2000] для анализа оптического потока (т.е. изображение видимого движения объектов, поверхностей или краев сцены, получаемое в результате перемещения наблюдателя относительно сцены или наоборот сцены относительно наблюдателя).To track the user's head movements, a method is implemented based on the basic Lukas-Kanade algorithm [Lucas, B.D. An Iterative Image Registration Technique with an Application to Stereo Vision / B.D. Lucas, T. Kanade // IJCAI, 1981. - pp. 674-679] and its later pyramidal modification [Bouguet, J.-Y. Pyramidal Implementation of the Lucas-Kanade Feature Tracker Description of the algorithm // Intel Corporation Microprocessor Research Labs, 2000] for analyzing optical flow (ie, the image of the visible movement of objects, surfaces or edges of the scene, resulting from the movement of the observer relative to the scene or vice versa scene relative to the observer).

Указанный зарегистрированный паттерн яркости области интереса в блоке 107 приводят к тому же размеру, что и сохраненный паттерн яркости области интереса, и для каждого из упомянутого множества видеоизображений регистрируют в виде матрицы в блоке 108.The specified registered brightness pattern of the region of interest in block 107 is brought to the same size as the stored brightness pattern of the region of interest, and is recorded as a matrix in block 108 for each of the plurality of video images.

В блоке 109 сравнивают матрицу сохраненного паттерна яркости области интереса и матрицу зарегистрированного паттерна яркости области интереса для каждого из указанного множества видеоизображений на основе попиксельного сравнения или корреляционного анализа. При этом окно поиска в каждом из указанного множества видеоизображений, внутри которого выполняют сравнение, выбирают из группы, состоящей из:At a block 109, a matrix of a stored luminance pattern of a region of interest and a matrix of a registered luminance pattern of a region of interest for each of the plurality of video images are compared based on pixel-by-pixel comparison or correlation analysis. Moreover, the search box in each of the specified set of video images, within which the comparison is performed, is selected from the group consisting of:

- окна поиска, представляющего собой квадратную область с длиной стороны в пределах от четверти ширины лица до ширины лица и центром, соответствующим местоположению кончика носа на предыдущем видеоизображении из упомянутого множества видеоизображений, если указанное местоположение известно;- search box, which is a square region with a side length ranging from a quarter of the width of the face to the width of the face and a center corresponding to the location of the tip of the nose in the previous video image from the above-mentioned many video images, if the specified location is known;

- окна поиска, представляющего собой прямоугольную область, определяемую с использованием автоматизированной технологии обнаружения лица;- search box, which is a rectangular area, determined using automated face detection technology;

- окна поиска, представляющего собой всю область изображения видеокадра.- search box, which is the entire image area of the video frame.

По результатам сравнения матрицы сохраненного паттерна яркости области интереса и матрицы зарегистрированного паттерна яркости области интереса в блоке 110 определяют двумерные координаты указанных опорных точек

,

в каждом из указанного множества видеоизображений. Смещение двухмерных координат данных опорных точек в последовательных видеокадрах преобразуется (блок 111) в синхронные перемещения курсора мыши на экране.By comparing the matrix of the saved luminance pattern of the region of interest and the matrix of the registered luminance pattern of the region of interest in block 110, two-dimensional coordinates of these reference points are determined

,

in each of said plurality of video images. The shift of the two-dimensional coordinates of the data of the control points in successive video frames is converted (block 111) into synchronous movements of the mouse cursor on the screen.

Для вычисления текущего положения курсора мыши на экране М=(М^Х, M^Y) используется линейная комбинация изменения координат реперных точек 1-3 (для абсциссы М^Х координат курсора мыши) и точек 3-5 (для ординаты M^Y) в соседних видеокадрах:To calculate the current position of the mouse cursor on the screen M = (M ^X , M ^Y ), a linear combination of changing the coordinates of the reference points 1-3 (for the abscissa M ^{X of the} coordinates of the mouse cursor) and points 3-5 (for the ordinate M ^Y ) in adjacent video frames is used :

где C_i определяет i-ю реперную точку на текущем кадре, а

- на предыдущем кадре видеопотока,

- коэффициент скорости движения курсора мыши.where C _i defines the i-th reference point on the current frame, and

- on the previous frame of the video stream,

- coefficient of the speed of the mouse cursor.

Таким образом, курсор сдвигается пропорционально перемещению трех точек между соседними кадрами видеопотока. При этом точки 4-5 не учитываются при формировании координаты X курсора, так как при повороте головы их смещение оказывается нелинейным (различным для каждой из точек) в декартовой системе координат. Для Y координаты курсора аналогично из рассмотрения исключаются точки 1-2. Таким образом, курсор сдвигается на экране пропорционально сдвигу отслеживаемых точек лица человека с учетом заданного коэффициента скорости движения курсора K_P.Thus, the cursor moves in proportion to the movement of three points between adjacent frames of the video stream. In this case, points 4-5 are not taken into account when forming the X coordinate of the cursor, since when the head is turned, their shift is non-linear (different for each of the points) in the Cartesian coordinate system. For Y, the coordinates of the cursor similarly exclude points 1-2 from consideration. Thus, the cursor is shifted on the screen in proportion to the shift of the tracked points of the person’s face, taking into account the specified coefficient of cursor speed K _P.

Разработанный способ также учитывает скорость перемещения головы пользователя. Если пользователю необходимо передвинуть курсор на значительное расстояние (например, от одного угла экрана к другому), то пользователь двигает головой достаточно быстро и применяется большой коэффициент скорости K₁ (не менее 3 единиц, в зависимости от разрешения экрана). Если же пользователь хочет выделить некоторый объект на экране, то совершает незначительные движения головой и применяется малый коэффициент умножения K₂ (не более 3 единиц, в зависимости от разрешения экрана). Этот процесс может быть представлен следующей формулой:The developed method also takes into account the speed of movement of the user's head. If the user needs to move the cursor a considerable distance (for example, from one corner of the screen to another), then the user moves his head quickly enough and a large speed coefficient K ₁ is applied (at least 3 units, depending on the screen resolution). If the user wants to select some object on the screen, he makes insignificant head movements and applies a small multiplication coefficient K ₂ (no more than 3 units, depending on the screen resolution). This process can be represented by the following formula:

где К_Р - коэффициент скорости перемещения курсора мыши; V_H - скорость перемещения головы пользователя на видеокадрах; T_H - максимальное пороговое значение малой скорости перемещения головы (настраивается в зависимости от установленного разрешения экрана, удаленности пользователя от видеокамеры и эргономических предпочтений пользователя).where K _P - coefficient of the speed of movement of the mouse cursor; V _H is the speed of movement of the user's head on video frames; T _H - the maximum threshold value of the low speed of moving the head (it is set depending on the set screen resolution, the distance of the user from the camera, and ergonomic preferences of the user).

Таким образом, в способе бесконтактного управления курсором мыши применяются несколько адаптивных значений скорости движения курсора мыши в зависимости от скорости перемещения головы пользователя.Thus, in the method of contactless control of the mouse cursor, several adaptive values of the speed of movement of the mouse cursor are used depending on the speed of movement of the user's head.

Для выбора оптимального значения порога скорости T_H были проведены соответствующие эксперименты. Скорость движения головы была вычислена при работе пользователя с интеллектуальной системой, установленной на ноутбуке с монитором размером 15'' (около 37 см) формата 16:9 и разрешением 1280×800 пикселей. Задача пользователя состояла в выделении небольших круглых целей разного размера, последовательно появляющихся в различных частях экрана, и подтверждении нажатия цели. Средняя скорость обработки видеокадров системой составила около 15 кадров в секунду, что достаточно для работы в реальном режиме работы без задержек и рывков с плавным перемещением курсора. График на фиг. 3 показывает значения скорости (пикселей/кадр) движения головы оператора в 2D координатах кадра видеокамеры разрешением 640×480 пикселей. При этом скорость движения головы пользователя в видеопотоке вычисляется по формуле:Corresponding experiments were performed to select the optimal value of the velocity threshold T _H. The head movement speed was calculated when a user worked with an intelligent system installed on a laptop with a 15 '' (about 37 cm) monitor with a 16: 9 format and a resolution of 1280 × 800 pixels. The user's task was to highlight small round targets of different sizes that appear sequentially in different parts of the screen, and confirm that the target was pressed. The average processing speed of video frames by the system was about 15 frames per second, which is enough to work in real mode without delays and jerks with a smooth cursor movement. The graph in FIG. 3 shows the speed (pixels / frame) of the head’s movement in 2D coordinates of the frame of the video camera with a resolution of 640 × 480 pixels. In this case, the head movement speed of the user in the video stream is calculated by the formula:

где C_i определяет i-ю опорную точку на текущем кадре, а

- на предыдущем кадре видеопотока.where C _i defines the i-th reference point on the current frame, and

- on the previous frame of the video stream.

Из фиг. 3 можно сделать вывод, что когда пользователь двигает курсор мыши от одного объекта на экране к другому, он выполняет движения головой достаточно быстро (обычно не менее 10 экранных пикселей за время между двумя видеокадрами, но когда пользователь старается точно попасть курсором в требуемую цель на экране, движения головой производятся аккуратно с заметно меньшей амплитудой (обычно 1-3 экранных пикселей за время между двумя соседними видеокадрами). Поэтому значение пороговой константы скорости T_H устанавливается равным 3,0, но может адаптивно настраиваться в зависимости от скорости обработки видеоданных, частоты кадров, расстояния пользователя до экрана и индивидуальных эргономических предпочтений пользователя.From FIG. 3 it can be concluded that when the user moves the mouse cursor from one object on the screen to another, he performs head movements quickly enough (usually at least 10 screen pixels in the time between two video frames, but when the user tries to hit the exact target on the screen with the cursor , head movements are performed carefully with a noticeably smaller amplitude (usually 1-3 screen pixels between two adjacent video frames.) Therefore, the value of the threshold rate constant T _{H is} set to 3.0, but can adapt but it can be adjusted depending on the processing speed of video data, frame rate, user distance to the screen and individual ergonomic preferences of the user.

Кроме того, предлагаемый способ в случае потери одной из опорных точек осуществляет ее восстановление на основе двумерных координат двух точек, оставшихся в прямоугольной области, объединяющей тройки опорных точек. Например, если точка 2 (фиг. 2, б) выходит за пределы прямоугольной зоны, образованной точками 1 и 3, то ее правильное положение будет восстановлено как линейная комбинация координат двух других точек. Для точки 3 определены сразу две прямоугольных рабочих области (вертикальная и горизонтальная), поэтому эта точка является самой надежной в данном методе (фиг. 2, в).In addition, the proposed method in the event of the loss of one of the control points carries out its restoration on the basis of the two-dimensional coordinates of two points remaining in a rectangular region that combines the triples of control points. For example, if point 2 (Fig. 2, b) extends beyond the rectangular zone formed by points 1 and 3, then its correct position will be restored as a linear combination of coordinates of two other points. For point 3, two rectangular working areas (vertical and horizontal) are defined at once, therefore this point is the most reliable in this method (Fig. 2, c).

Способ бесконтактного управления курсором мыши может быть реализован с помощью известных устройств. Так, получение оцифрованного изображения выпуклой формы может быть осуществлено с помощью web-камеры с разрешением 640×480 пикселей и частотой до 25 кадров в секунд.The non-contact control method of the mouse cursor can be implemented using known devices. Thus, obtaining a digitized image of a convex shape can be carried out using a web camera with a resolution of 640 × 480 pixels and a frequency of up to 25 frames per second.

Для сохранения (регистрации) оцифрованных видеоизображений используется буферное устройство, которое может быть реализовано с использованием матрицы ОЗУ. Схемы ОЗУ известны и описаны, например, в книге В.Н. Вениаминова, О.Н. Лебедева, А.И. Мирошниченко «Микросхемы и их применение» (М.: Радио и связь, 1989, с. 146). В частности, ОЗУ может быть реализовано на микросхемах К565 серии.To save (register) the digitized video images, a buffer device is used, which can be implemented using the RAM matrix. RAM circuits are known and described, for example, in the book of V.N. Veniaminova, O.N. Lebedeva A.I. Miroshnichenko "Microcircuits and their application" (M .: Radio and communications, 1989, p. 146). In particular, RAM can be implemented on K565 series chips.

Устройства хранения матрицы сохраненного паттерна яркости области интереса и матрицы зарегистрированного паттерна яркости области интереса могут быть реализованы на основе постоянных запоминающих устройств (ПЗУ). Схемы ПЗУ известны и описаны, например, в книге В.Н. Вениаминова, О.Н. Лебедева, А.И. Мирошниченко. Микросхемы и их применение. М.: Радио и связь, 1989. - С. 156. В частности, ПЗУ может быть реализовано на микросхемах К555 серии.Devices for storing a matrix of a stored brightness pattern of a region of interest and a matrix of a registered brightness pattern of a region of interest can be implemented on the basis of read-only memory (ROM). ROM circuits are known and described, for example, in the book of V.N. Veniaminova, O.N. Lebedeva A.I. Miroshnichenko. Microcircuits and their application. M .: Radio and communications, 1989. - S. 156. In particular, ROM can be implemented on K555 series microcircuits.

Блоки 102, 105, 107, 109, 110 могут быть реализованы на устройствах сходящихся вычислений. Схемы устройств сходящихся вычислений известны и описаны, например, в книге Э. Айфичера, Б. Джервиса «Цифровая обработка сигналов: практический подход» (М.: Издательский дом «Вильямс», 2004. - С. 850). В частности, такая схема может быть реализована на комплексных умножителях PDSP16112A (Mitel) и комплексных накопителях PDSP16318A (Mitel).Blocks 102, 105, 107, 109, 110 may be implemented on convergent computing devices. Schemes of convergent computing devices are known and described, for example, in the book by E. Ayficher, B. Jervis, “Digital Signal Processing: A Practical Approach” (M .: Williams Publishing House, 2004. - P. 850). In particular, such a scheme can be implemented on complex multipliers PDSP16112A (Mitel) and complex drives PDSP16318A (Mitel).

Мышь представляет собой устройство ввода информации. Описание устройств ввода представлено в книге Авдеев В.А. Периферийные устройства: интерфейсы, схемотехника, программирование. - М.: ДМК Пресс, 2009, 848 с.: ил. - С. 414-433.A mouse is an information input device. Description of input devices is presented in the book Avdeev V.A. Peripherals: interfaces, circuitry, programming. - M.: DMK Press, 2009, 848 p.: Ill. - S. 414-433.

Монитор представляет собой устройство вывода информации и предназначено для отображения графических объектов и курсора мыши. Описание устройств вывода представлено в книге Авдеев В.А. Периферийные устройства: интерфейсы, схемотехника, программирование. - М.: ДМК Пресс, 2009, 848 с.: ил. - С. 451-526.The monitor is an information output device and is designed to display graphical objects and the mouse cursor. Description of output devices is presented in the book Avdeev V.A. Peripherals: interfaces, circuitry, programming. - M.: DMK Press, 2009, 848 p.: Ill. - S. 451-526.

Заявленный способ бесконтактного управления курсором мыши позволяет повысить робастность слежения за перемещением головы пользователя путем повышения производительности работы с системой.The claimed method of contactless control of the mouse cursor allows you to increase the robustness of tracking the movement of the user's head by increasing productivity with the system.

Для доказательства достижения заявленного технического результата проведены следующие эксперименты, для проведения которых было разработано соответствующее программное обеспечение, позволяющее произвольно задавать значения D и W в выражении (1), а также вычислять результаты прохождения теста. Программное обеспечение предлагает пользователю последовательно выбрать 16 целей, которые появляются на экране монитора компьютера (фиг. 4, а). На фиг. 4, б показан реальный пример траектории движения курсора при бесконтактном выполнении задания жестами головой, полученный посредством разработанного способа.To prove the achievement of the claimed technical result, the following experiments were carried out, for which the corresponding software was developed that allows you to arbitrarily set the values of D and W in expression (1), as well as calculate the results of the test. The software offers the user to sequentially select 16 goals that appear on the computer screen (Fig. 4, a). In FIG. 4b shows a real example of the cursor trajectory during non-contact execution of the task with head gestures, obtained by the developed method.

Для проведения экспериментов были привлечены шесть потенциальных пользователей разного уровня и опыта общения с компьютером, которым предлагались задания с 16-ю круглыми целями, по очереди появляющимися в различных точках экрана на окружности заданного диаметра D. Каждым пользователем были проведены серии по 10 тестов с дискретным изменением диаметра цели W в пределах 32-128 пикселей и расстояния D между целями в пределах 96-650 пикселей (при стандартном разрешении экрана 1280×1024), таким образом, значение ID варьировалось от 1,32 до 4,4 бит. Всего пользователями было выполнено 360 тестов, каждый из которых занимал от 30 секунд до 2 минут.To conduct the experiments, six potential users of different levels and computer experience were invited, who were offered tasks with 16 round goals, appearing in turn at different points on the screen on a circle of a given diameter D. Each series of 10 tests were conducted with a discrete change the target diameter W within 32-128 pixels and the distance D between targets within 96-650 pixels (with a standard screen resolution of 1280 × 1024), so the ID value varied from 1.32 to 4.4 bits. In total, 360 tests were performed by users, each of which took from 30 seconds to 2 minutes.

График на фиг. 5 показывает полученные в результате экспериментов и усредненные по всем пользователям зависимости результирующих соотношений IDe (фактическая сложность) и ID (теоретически рассчитанная сложность), полученные при разных значениях D и W. Причем, данный график лежит выше пунктирной линии (ожидаемая сложность выполнения задачи), а это означает, что выполнение данной задачи оказалось несколько сложнее, чем ожидалось теоретически (в том случае, если график лежит ниже пунктирной линии, то можно говорить о том, что предлагаемая тестерам задача легче расчетной сложности).The graph in FIG. Figure 5 shows the dependences of the resulting relations IDe (actual complexity) and ID (theoretically calculated complexity) obtained at different values of D and W, obtained as a result of experiments and averaged over all users. Moreover, this graph lies above the dashed line (expected complexity of the task), and this means that the implementation of this task turned out to be somewhat more complicated than theoretically expected (if the graph lies below the dashed line, then we can say that the task offered to testers is easier computational complexity).

Согласно экспериментам по методике Фиттса, время движения МТ между двумя целями есть линейная функция индекса сложности ID задачи. Для каждого проведенного теста измерялось время между соседними кликами целей, а также количество ошибок выделений (непопадание внутрь цели). На фиг. 6 приведен статистический анализ полученных значений времени движения МТ для всех тестеров. Верхняя и нижняя границы прямоугольника означают покрытие 75% и 25% (верхние и нижние квартили) всех полученных значений МТ, соответственно. Верхняя и нижняя горизонтальные черточки на вертикальной линии означают покрытие 90% и 10% (верхние и нижние децили) всех значений МТ, соответственно. Полоса внутри прямоугольника обозначает медиану (медианное значение) величины МТ, т.е. около 2,5 секунд между речевыми "подтверждениями" цели.According to experiments using the Fitts method, the time of MT movement between two goals is a linear function of the complexity index of the task ID. For each test conducted, the time was measured between neighboring clicks of goals, as well as the number of selection errors (misses inside the target). In FIG. Figure 6 shows a statistical analysis of the obtained MT motion time values for all testers. The upper and lower boundaries of the rectangle mean coverage of 75% and 25% (upper and lower quartiles) of all obtained MT values, respectively. The upper and lower horizontal lines on the vertical line mean coverage of 90% and 10% (upper and lower deciles) of all MT values, respectively. The bar inside the rectangle indicates the median (median value) of the MT value, i.e. about 2.5 seconds between speech "confirmations" of the target.

На фиг. 7 представлен статистический анализ полученных значений производительности TP бесконтактного многомодального интерфейса по методике Фиттса при выполнении пользователями поставленной тестовой задачи. На данном рисунке показаны медиана, верхние и нижние квартили и децили полученных значений ТР.In FIG. Figure 7 presents a statistical analysis of the obtained values of the TP performance of a contactless multimodal interface according to the Fitts method when users perform a test task. This figure shows the median, upper and lower quartiles and deciles of the obtained TP values.

Также было проведено тестирование разработанного способа и способа прототипа на указанной тестовой задаче. Тестирование системы было произведено несколькими добровольными тестерами, которые имели незначительный опыт работы с персональным компьютером.The developed method and the prototype method were also tested on the specified test problem. System testing was performed by several voluntary testers who had little experience with a personal computer.

В таблице 1 приведены результаты экспериментов и сравнение указанных способов по трем количественным показателям:Table 1 shows the experimental results and a comparison of these methods for three quantitative indicators:

1) среднее время движения МТ между двумя целями;1) the average MT movement time between two goals;

2) процент ошибок выделения целей (непопадание курсором в цель);2) the percentage of errors in target allocation (miss the cursor on the target);

3) общая производительность ТР.3) the total performance of TR.

Данная таблица показывает, что наилучшие результаты по производительности и ошибкам выделения цели были показаны при использовании разработанного способа, что свидетельствует о достижении заявленного способа бесконтактного управления курсором мыши.This table shows that the best results on performance and target selection errors were shown using the developed method, which indicates the achievement of the claimed method of contactless control of the mouse cursor.

При проведении экспериментов способ-прототип и разработанный способ моделировались на ноутбуке HP с многоядерным процессором Intel Core i5 2.5 ГГц, 3 Гб оперативной памяти и экраном диагональю 15''. В качестве дополнительного аппаратного обеспечения использовалась USB веб-камера Logitech QuickCam or Notebooks Pro, обеспечивающая разрешение 640×480 точек при 25 кадрах в секунду. Очевидно, что использование профессиональной цифровой видеокамеры позволит достичь лучшей точности выделения целей, а следовательно, большей робастности слежения за перемещением головы пользователя.During the experiments, the prototype method and the developed method were modeled on an HP laptop with a 2.5 GHz Intel Core i5 multi-core processor, 3 GB of RAM and a 15 '' screen. As additional hardware, a Logitech QuickCam or Notebooks Pro USB webcam was used, providing a resolution of 640 × 480 pixels at 25 frames per second. Obviously, the use of a professional digital video camera will achieve better accuracy in targeting, and therefore greater robustness of tracking the movement of the user's head.

Claims

1. The method of non-contact control of the mouse cursor, which consists in determining the location of the region of interest on a convex shape, storing a digitized video image of the indicated convex shape in the vicinity of the specified region of interest, the specified digitized video image has a brightness pattern called a stored brightness pattern of the region of interest, the specified saved pattern the brightness of the region of interest is stored in the form of a matrix, a plurality of digitized video images are recorded, each of the plurality the video image contains a video image of the specified convex shape, called the registered brightness pattern of the region of interest, the specified registered brightness pattern of the region of interest is brought to the same size as the stored brightness pattern of the region of interest, and for each of the plurality of video images recorded in the form of a matrix, the matrix of the saved pattern is compared the brightness of the region of interest and the matrix of the registered pattern of brightness of the region of interest for each of the specified set of videos images based on pixel-by-pixel comparison or correlation analysis, wherein the search boxes in each of the specified set of video images within which the comparison is performed are selected from the group consisting of: a search box, which is a square region with a side length ranging from a quarter of the width of the face to the width face and the center corresponding to the location of the tip of the nose in the previous video image from the aforementioned set of video images, if the specified location is known, the search box representing a rectangular area defined using automated face detection technology, or a search box representing the entire image area of a video frame, characterized in that the convex shape is a person’s head, the region of interest is the area of a person’s face between the eyebrows and lower lip, before how to determine the location of the region of interest on the convex shape, get a digital image of the specified convex shape using a video camera, after the specified saved the brightness pattern of the region of interest is saved as a matrix, the location of five reference points within the specified saved brightness pattern of the region of interest is recorded, and then a lot of digitized video images are recorded, two-dimensional coordinates are determined by comparing the matrix of the saved brightness pattern of the region of interest and the matrix of the registered brightness pattern of the region of interest these reference points in each of the specified set of video images, and then use two-dimensional coordinates of the OF DATA reference points for the mouse control.

2. The method according to p. 1, characterized in that the natural reference points of the person’s face are the center of the upper lip, the tip of the nose, the point between the eyes on the nose, the pupil of the right eye and the pupil of the left eye of the user.

3. The method according to p. 2, characterized in that in the event of the loss of one of the reference points, it is restored based on the two-dimensional coordinates of the two points remaining in the rectangular region that combines the triples of the reference points.