CN109670432B - Action recognition method and device - Google Patents

Action recognition method and device Download PDF

Info

Publication number
CN109670432B
CN109670432B CN201811522373.XA CN201811522373A CN109670432B CN 109670432 B CN109670432 B CN 109670432B CN 201811522373 A CN201811522373 A CN 201811522373A CN 109670432 B CN109670432 B CN 109670432B
Authority
CN
China
Prior art keywords
pixel
user
image
distance
finger
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811522373.XA
Other languages
Chinese (zh)
Other versions
CN109670432A (en
Inventor
金涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN201811522373.XA priority Critical patent/CN109670432B/en
Publication of CN109670432A publication Critical patent/CN109670432A/en
Application granted granted Critical
Publication of CN109670432B publication Critical patent/CN109670432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present disclosure relates to a method and an apparatus for identifying actions, the method comprising: collecting multi-frame user images, wherein the user images comprise user fingers and user eyes; determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image; calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance; and if the relative distance is larger than the set threshold value, determining that the user finger performs a sliding action on the first pixel distance. Therefore, the method and the device avoid action recognition errors caused by the distance between the user and the intelligent terminal, and improve the reliability of action recognition results.

Description

Action recognition method and device
Technical Field
The disclosure relates to the technical field of remote control, and in particular relates to a method and a device for identifying actions.
Background
With the continuous development of remote control technology, intelligent terminals with somatosensory remote control functions are more and more. In the related art, there are two general implementation methods for an intelligent terminal with a somatosensory remote control function: remote control handle and camera. Wherein, the remote control handle can use a built-in gyroscope and an acceleration sensor to sense the motion; the camera may use image recognition techniques to recognize specific operations of the human body. However, the mode of the remote control handle is not separated from the remote controller, so that the man-machine interaction is not simple and humanized; the camera mode can only identify the movement of the palm and arm, the identification precision is not high, and meanwhile, an intelligent terminal with a finger remote control function does not exist at present.
Disclosure of Invention
In order to overcome the problems in the related art, the present disclosure provides a method and apparatus for motion recognition.
According to a first aspect of embodiments of the present disclosure, there is provided an action recognition method, the method comprising:
collecting multi-frame user images, wherein the user images comprise user fingers and user eyes;
determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image;
calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance;
and if the relative distance is larger than the set threshold value, determining that the user finger performs a sliding action on the first pixel distance.
Optionally, the determining, according to the user image, a first pixel distance of finger sliding, an indication direction of finger sliding, and a second pixel distance between eyes of the user includes:
acquiring a first image and a second image from the user image;
determining a first pixel location of a user's finger in the first image and a second pixel location in the second image;
if the first pixel position is different from the second pixel position, determining a pixel distance between the first pixel position and the second pixel position as the first pixel distance;
Determining a first pixel pitch of both eyes of a user in the first image and a second pixel pitch in the second image;
and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance.
Optionally, the first image is an ith frame image shot by a camera, the second image is an (i+n) th frame image shot by the camera, and the value of n depends on the frame rate of the camera;
and if the relative distance is greater than the set threshold, determining that the user finger performs a sliding motion on the first pixel distance, including:
and if the relative distance is larger than the set threshold value, determining the direction from the first pixel position to the second pixel position as an indication direction of finger sliding, and determining that the user finger performs sliding operation in the indication direction.
Optionally, the method further comprises:
and if the first pixel interval is different from the second pixel interval, acquiring another first image and/or another second image from the user image again.
Optionally, the calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance includes:
calculating a ratio between the first pixel distance and the second pixel distance;
the ratio is determined as the relative distance.
Optionally, the method further comprises:
setting the set threshold, wherein the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of a camera for acquiring the image of the user.
Optionally, after the determining that the user finger makes the sliding motion in the indication direction of the finger sliding, the method further includes:
generating a finger sliding event, wherein the finger sliding event is used for representing that a user finger makes a sliding action in the indication direction of the finger sliding;
triggering a corresponding finger remote control function according to the finger sliding event.
According to a second aspect of embodiments of the present disclosure, there is provided an action recognition apparatus, the apparatus comprising:
the acquisition module is configured to acquire a plurality of frames of user images, wherein the user images comprise user fingers and user eyes;
a first determining module configured to determine a first pixel distance of finger sliding and a second pixel distance between eyes of a user from the user image;
A calculation module configured to calculate a relative distance of finger sliding from the first pixel distance and the second pixel distance;
and the second determining module is configured to determine that the user finger performs a sliding action on the first pixel distance if the relative distance is greater than a set threshold.
Optionally, the first determining module includes:
a first acquisition sub-module configured to acquire a first image and a second image from the user image;
a first determination sub-module configured to determine a first pixel location of a user's finger in the first image and a second pixel location in the second image;
a second determination sub-module configured to determine a pixel distance between the first pixel location and the second pixel location as the first pixel distance if the first pixel location and the second pixel location are different;
a third determination sub-module configured to determine a first pixel pitch of both eyes of a user in the first image and a second pixel pitch in the second image;
and a fourth determination sub-module configured to determine the first pixel pitch or the second pixel pitch as the second pixel distance if the first pixel pitch is the same as the second pixel pitch.
Optionally, the first image is an ith frame image shot by a camera, the second image is an (i+n) th frame image shot by the camera, and the value of n depends on the frame rate of the camera; the second determining module includes:
and a fifth determining sub-module configured to determine a direction from the first pixel position to the second pixel position as an indication direction of finger sliding if the relative distance is greater than a set threshold, and determine that a sliding operation is made by a user's finger in the indication direction.
Optionally, the first determining module further includes:
and a second acquisition sub-module configured to acquire another of the first image and another of the second image from the user image again if the first pixel pitch is different from the second pixel pitch.
Optionally, the computing module includes:
a calculation sub-module configured to calculate a ratio between the first pixel distance and the second pixel distance;
a sixth determination submodule configured to determine the ratio as the relative distance.
Optionally, the apparatus further comprises:
the setting module is configured to set the set threshold, and the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of a camera for acquiring the user image.
Optionally, the apparatus further comprises:
a generating module configured to generate a finger sliding event for characterizing that a user finger has made a sliding motion in the indicated direction of the finger sliding after the second determining module determines that the user finger has made a sliding motion in the indicated direction of the finger sliding;
and the finger remote control module is configured to trigger a corresponding finger remote control function according to the finger sliding event.
According to a third aspect of embodiments of the present disclosure, there is provided an action recognition apparatus, the apparatus comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
collecting multi-frame user images, wherein the user images comprise user fingers and user eyes;
determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image;
calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance;
and if the relative distance is larger than the set threshold value, determining that the user finger performs a sliding action on the first pixel distance.
The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:
according to the intelligent terminal, multiple frames of user images can be acquired, the user images comprise user fingers and user eyes, the first pixel distance of finger sliding and the second pixel distance between the user eyes are determined according to the user images, the relative distance of finger sliding is calculated according to the first pixel distance and the second pixel distance, if the relative distance of finger sliding is larger than a set threshold value, the user fingers are determined to make sliding actions on the first pixel distance, and therefore action recognition errors caused by the fact that the user is far from or near from the intelligent terminal are avoided, and reliability of action recognition results is improved.
The intelligent terminal can acquire a first image and a second image from the user image, determine a first pixel position of a finger of the user in the first image and a second pixel position in the second image, and determine a pixel distance between the first pixel position and the second pixel position as a first pixel distance if the first pixel position is different from the second pixel position; and determining a first pixel pitch of the eyes of the user in the first image and a second pixel pitch of the eyes of the user in the second image, and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance, thereby realizing motion recognition according to the first image and the second image and improving the accuracy of the motion recognition.
According to the method and the device for determining the relative distance between the two eyes of the user, the intelligent terminal can calculate the ratio between the first pixel distance and the second pixel distance, the calculated ratio is determined to be the relative distance between the two eyes of the user and the finger, and therefore the relative distance between the two eyes of the user and the finger is determined based on the unit distance, difference of the pixel distances caused by the fact that the user is far from or near from the intelligent terminal is avoided, and flexibility of motion recognition is improved.
After the intelligent terminal in the method determines that the user finger makes the sliding action in the indication direction, a finger sliding event can be generated, and a corresponding finger remote control function is triggered according to the finger sliding event, so that the practicability of action recognition is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
FIG. 1 is a flowchart of a method of action recognition, according to an exemplary embodiment of the present disclosure;
FIG. 2 is an application scenario diagram of a method of action recognition according to an exemplary embodiment of the present disclosure;
FIG. 3 is a flowchart of another method of action recognition, according to an exemplary embodiment of the present disclosure;
FIG. 4 is a flowchart of another method of action recognition, according to an exemplary embodiment of the present disclosure;
FIG. 5 is a flowchart of another method of action recognition, according to an exemplary embodiment of the present disclosure;
FIG. 6 is a block diagram of an action recognition device according to an exemplary embodiment of the present disclosure;
FIG. 7 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 8 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 9 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 10 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 11 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
FIG. 12 is a block diagram of another action recognition device shown in accordance with an exemplary embodiment of the present disclosure;
fig. 13 is a schematic diagram of a structure for an action recognition device according to an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in this disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
As shown in fig. 1, fig. 1 is a flowchart of an action recognition method according to an exemplary embodiment of the present disclosure, and fig. 2 is a scene diagram of an action recognition method according to an exemplary embodiment; the method can be used for intelligent terminals with action recognition functions, such as: as shown in fig. 1, the motion recognition method may include the following steps 110-140:
in step 110, a plurality of frames of user images are acquired, the user images including user fingers and user eyes.
In the embodiment of the disclosure, the intelligent terminal can acquire multiple frames of user images in real time through the camera, and the user images can be arranged in time. Since the intelligent terminal needs to involve both eyes of the user in recognizing the finger sliding motion, the user image here needs to include not only the finger of the user but also both eyes of the user.
In step 120, a first pixel distance for finger swipe and a second pixel distance between the eyes of the user are determined from the user image.
In the embodiment of the disclosure, after determining the first pixel distance of the finger sliding, the intelligent terminal does not directly determine whether the finger of the user makes the sliding action according to the first pixel distance, but also determines the second pixel distance between the eyes of the user, and then determines whether the finger of the user makes the sliding action according to the first pixel distance and the second pixel distance, thereby improving the accuracy of determining the finger sliding action.
In addition, the intelligent terminal can determine the first pixel distance and the second pixel distance through different user images. Such as: a first image and a second image acquired from a user image.
In step 130, a relative distance of finger swipe is calculated from the first pixel distance and the second pixel distance.
In the embodiment of the disclosure, as the distance between the user and the intelligent terminal is different, the finger slides by the same spatial distance, and the pixel distance reflected on the user image is also different.
In order to avoid judgment errors caused by the distance between the user and the intelligent terminal, the method and the device can calculate the relative distance of the sliding of the finger according to the first pixel distance and the second pixel distance, judge whether the sliding action is made by the finger of the user or not based on the relative distance of the sliding of the finger, and accurately judge whether the sliding action is made by the finger of the user or not no matter how far or how close the user is from the intelligent terminal.
In step 140, if the relative distance between the finger and the sliding motion is greater than the set threshold, it is determined that the user's finger has performed a sliding motion on the first pixel distance.
In the embodiment of the present disclosure, the set threshold is an empirical value set in advance. And, the set threshold value may be the same or different from the use scene.
In one embodiment, the action recognition method may further include, prior to performing step 140:
(1-1) setting the set threshold, wherein the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of a camera for acquiring the image of the user.
Wherein the measured average value of the finger sliding speeds is a preset empirical value, the average distance between the eyes of the user is also a preset empirical value, the frame rate of the camera is how many frames of images are collected by the camera within 1 second, the frame rate is changed according to the specifications of different cameras,
such as: the calculation process for setting the threshold value X may be as shown in equation 1, and equation (1) is as follows:
x= (0.6 ≡0.065) ≡p×n … … … … … … … … … … … … … … … … … … … formula (1)
Wherein 0.6 is the measured average value of the finger sliding speeds, 0.065 is the average distance between the eyes of the user, P is the frame rate of the camera, and n is the frame difference between different images used for determining the first pixel distance and the second pixel distance in the step 120.
Such as: the different images comprise an ith frame image and an ith+nth frame image, wherein n is a frame difference, the value of n is dependent on the frame rate of the camera, and the value range of n can be 1 to 100.
If n is 1, the obtained X is a threshold value of finger sliding between two adjacent frames;
if n is greater than 1, the resulting X is a threshold of finger slip between n frames apart.
Since the finger is not uniform in speed throughout the sliding process, the sliding distance between the multiple frames may be selected here in order to improve the accuracy of motion recognition.
In an exemplary scenario, as shown in fig. 2, comprising a user and a smart television. The intelligent television can acquire multi-frame user images through a camera, wherein the user images comprise user fingers and user eyes, a first pixel distance for sliding the fingers and a second pixel distance between the user eyes are determined according to the user images, the relative distance for sliding the fingers is calculated according to the first pixel distance and the second pixel distance, and if the relative distance for sliding the fingers is larger than a set threshold value, the user fingers are determined to make sliding actions on the first pixel distance.
As can be seen from the above embodiment, by collecting a multi-frame user image, the user image includes a user finger and two eyes of the user, determining a first pixel distance for sliding the finger and a second pixel distance between the two eyes of the user according to the user image, and calculating a relative distance for sliding the finger according to the first pixel distance and the second pixel distance, if the relative distance for sliding the finger is greater than a set threshold, determining that the sliding action is performed on the first pixel distance by the user finger, thereby avoiding action recognition errors caused by the distance between the user and the intelligent terminal, and improving reliability of action recognition results.
As shown in fig. 3, fig. 3 is a flowchart of another action recognition method according to an exemplary embodiment of the present disclosure, which may be used for a smart terminal having an action recognition function, such as: smart phones, smart televisions, smart speakers, smart gaming machines, etc., and are based on the method of fig. 1, when executing step 120, as shown in fig. 3, may include the following steps 310-350:
in step 310, a first image and a second image are acquired from a user image.
In the embodiment of the present disclosure, the first image and the second image are two images for determining a first pixel distance of finger sliding, an indication direction of finger sliding, and a second pixel distance between both eyes of a user.
In an embodiment, in step 310, the first image may be an i-th frame image captured by the camera, the second image may be an i+n-th frame image captured by the camera, and the value of n depends on the frame rate of the camera.
That is, the corresponding n may be the same or different for different camera frame rates. The value of n may range from 1 to 100.
In step 320, a first pixel location of the user's finger in the first image and a second pixel location in the second image are determined.
In the embodiment of the disclosure, the user finger in the first image can be identified and positioned through an image identification technology to obtain a first pixel position; and in the same way, the user finger in the second image can be identified and positioned by the image identification technology, so that the second pixel position is obtained.
In step 330, if the first pixel position is different from the second pixel position, the pixel distance between the first pixel position and the second pixel position is determined as the first pixel distance.
In the embodiment of the disclosure, when determining the first pixel distance, the pixel distance between the first pixel position and the second pixel position may be calculated first, and then the pixel distance may be determined as the first pixel distance.
Such as: the first pixel position is { w (x 1), h (y 1) }, and the second pixel position is { w (x 2), h (y 2) }, where the pixel distance between the two can be calculated according to the Pythagorean theorem.
And, for example: the first pixel position is { w (x 1), h (y 1) }, the second pixel position is { w (x 1), h (y 2) }, and the pixel distance between the two may be |h (y 1) -h (y 2) |.
And, for example: the first pixel position is { w (x 1), h (y 1) }, the second pixel position is { w (x 2), h (y 1) }, and the pixel distance between the two may be |w (x 1) -w (x 2) |.
In step 340, a first pixel pitch of both eyes of the user in the first image and a second pixel pitch in the second image are determined.
In the embodiment of the disclosure, the two eyes of a user in a first image can be identified and positioned through an image identification technology, so that a first pixel interval is obtained; in the same way, the user eyes in the second image can be identified and positioned by the image identification technology, so that the first pixel distance is obtained.
In step 350, if the first pixel pitch is the same as the second pixel pitch, the first pixel pitch or the second pixel pitch is determined as the second pixel distance.
In an embodiment, as shown in fig. 3, the action recognition method may further include the following step 360:
in step 360, if the first pixel pitch is different from the second pixel pitch, another first image and/or another second image is retrieved from the user image. Next, the above-described step 320 may be performed again for another first image and/or another second image that is re-acquired.
In an embodiment, if in step 310, the first image is the i-th frame image captured by the camera, the second image may be the i+n-th frame image captured by the camera, and the value of n depends on the frame rate of the camera. Correspondingly, when executing step 140, the method may include:
And (2-1) if the relative distance is greater than a set threshold, determining the direction from the first pixel position to the second pixel position as an indication direction of finger sliding, and determining that the user finger performs a sliding operation in the indication direction.
As can be seen from the above embodiments, it is possible to acquire a first image and a second image from a user image, and determine a first pixel position of a finger of the user in the first image and a second pixel position in the second image, and if the first pixel position is different from the second pixel position, determine a pixel distance between the first pixel position and the second pixel position as a first pixel distance; and determining a first pixel pitch of the eyes of the user in the first image and a second pixel pitch of the eyes of the user in the second image, and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance, thereby realizing motion recognition according to the first image and the second image and improving the accuracy of the motion recognition.
As shown in fig. 4, fig. 4 is a flowchart of another action recognition method according to an exemplary embodiment of the present disclosure, which may be used for a smart terminal having an action recognition function, such as: smart phones, smart televisions, smart speakers, smart gaming machines, etc., and are based on the method of fig. 1, when executing step 130, as shown in fig. 4, may include the following steps 410-420:
In step 410, a ratio between the first pixel distance and the second pixel distance is calculated.
In the embodiment of the disclosure, the relative distance of finger sliding may be calculated by taking the second pixel distance, i.e. the pixel distance of both eyes of the user, as the unit distance.
In step 420, the calculated ratio is determined as the relative distance of finger swipes.
As can be seen from the above embodiments, the ratio between the first pixel distance and the second pixel distance can be calculated and the calculated ratio can be determined as the relative distance of finger sliding
According to the embodiment, the ratio between the first pixel distance and the second pixel distance can be calculated, and the calculated ratio is determined to be the relative distance of finger sliding, so that the relative distance of finger sliding is determined based on the unit distance of the pixel distance of the eyes of the user, the difference of the pixel distances caused by the distance between the user and the intelligent terminal is avoided, and the flexibility of motion recognition is improved.
As shown in fig. 5, fig. 5 is a flowchart of another action recognition method according to an exemplary embodiment of the present disclosure, which may be used for a smart terminal having an action recognition function, such as: smart phones, smart televisions, smart speakers, smart gaming machines, etc., and based on the method of fig. 1, after performing step 140, as shown in fig. 5, may include the following steps 510-520:
In step 510, a finger swipe event is generated that characterizes a user's finger as having made a swipe motion in the indicated direction of the finger swipe.
In the embodiment of the disclosure, if the intelligent terminal supports the finger remote control function, after the finger sliding event is identified, the corresponding finger remote control function may be triggered according to the finger sliding event.
In step 520, a corresponding finger remote control function is triggered based on the finger swipe event.
According to the embodiment, after the user finger is determined to perform the sliding motion in the indication direction, the finger sliding event can be generated, and the corresponding finger remote control function is triggered according to the finger sliding event, so that the practicability of motion recognition is improved.
Corresponding to the foregoing embodiments of the method for motion recognition, the present disclosure also provides embodiments of the device for motion recognition.
As shown in fig. 6, fig. 6 is a block diagram of an action recognition apparatus according to an exemplary embodiment of the present disclosure, which may be used for a smart terminal having an action recognition function, such as: smart phones, smart televisions, smart speakers, smart gaming machines, etc., and for performing the action recognition method shown in fig. 1, the apparatus may include:
An acquisition module 61 configured to acquire a plurality of frames of user images including user fingers and user eyes;
a first determining module 62 configured to determine a first pixel distance of finger swipe and a second pixel distance between the eyes of the user from the user image;
a calculation module 63 configured to calculate a relative distance of finger sliding from the first pixel distance and the second pixel distance;
the second determining module 64 is configured to determine that the user finger has performed a sliding motion on the first pixel distance if the relative distance is greater than a set threshold.
As can be seen from the above embodiment, by collecting a multi-frame user image, the user image includes a user finger and two eyes of the user, determining a first pixel distance for sliding the finger and a second pixel distance between the two eyes of the user according to the user image, and calculating a relative distance for sliding the finger according to the first pixel distance and the second pixel distance, if the relative distance for sliding the finger is greater than a set threshold, determining that the sliding action is performed on the first pixel distance by the user finger, thereby avoiding action recognition errors caused by the distance between the user and the intelligent terminal, and improving reliability of action recognition results.
In an embodiment, based on the apparatus shown in fig. 6, as shown in fig. 7, the first determining module 62 may include:
a first acquisition sub-module 71 configured to acquire a first image and a second image from the user image;
a first determination sub-module 72 configured to determine a first pixel location of a user's finger in the first image and a second pixel location in the second image;
a second determination sub-module 73 configured to determine a pixel distance between the first pixel position and the second pixel position as the first pixel distance if the first pixel position is different from the second pixel position;
a third determination sub-module 74 configured to determine a first pixel pitch of both eyes of a user in the first image and a second pixel pitch in the second image;
a fourth determination sub-module 75 configured to determine the first pixel pitch or the second pixel pitch as the second pixel distance if the first pixel pitch is the same as the second pixel pitch.
As can be seen from the above embodiments, it is possible to acquire a first image and a second image from a user image, and determine a first pixel position of a finger of the user in the first image and a second pixel position in the second image, and if the first pixel position is different from the second pixel position, determine a pixel distance between the first pixel position and the second pixel position as a first pixel distance; and determining a first pixel pitch of the eyes of the user in the first image and a second pixel pitch of the eyes of the user in the second image, and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance, thereby realizing motion recognition according to the first image and the second image and improving the accuracy of the motion recognition.
In an embodiment, based on the device shown in fig. 7, the first image is an ith frame image shot by a camera, the second image is an (i+n) th frame image shot by the camera, and the value of n depends on the frame rate of the camera; as shown in fig. 8, the second determining module 64 may include:
a fifth determining sub-module 81 configured to determine a direction from the first pixel position to the second pixel position as an indication direction of finger sliding and determine that a sliding operation is made by a user's finger in the indication direction if the relative distance is greater than a set threshold.
In an embodiment, based on the apparatus shown in fig. 7, as shown in fig. 9, the first determining module 62 may further include:
a second acquiring sub-module 91 configured to acquire, if the first pixel pitch is different from the second pixel pitch, another one of the first image and the second image from the user image.
In an embodiment, based on the apparatus shown in fig. 6, as shown in fig. 10, the calculating module 63 may include:
a calculation sub-module 101 configured to calculate a ratio between the first pixel distance and the second pixel distance;
A fifth determination submodule 102 configured to determine the ratio as the relative distance.
According to the embodiment, the ratio between the first pixel distance and the second pixel distance can be calculated, and the calculated ratio is determined to be the relative distance of finger sliding, so that the relative distance of finger sliding is determined based on the unit distance of the pixel distance of the eyes of the user, the difference of the pixel distances caused by the distance between the user and the intelligent terminal is avoided, and the flexibility of motion recognition is improved.
In an embodiment, the device shown in fig. 11 may further comprise:
the setting module 111 is configured to set the set threshold, and the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of the camera for acquiring the image of the user.
In an embodiment, based on the apparatus shown in fig. 6, as shown in fig. 12, the apparatus may further include:
a generating module 121 configured to generate a finger sliding event for characterizing that a user finger has made a sliding motion in the indicated direction of the finger sliding after the second determining module 64 determines that the user finger has made a sliding motion in the indicated direction of the finger sliding;
The finger remote control module 122 is configured to trigger a corresponding finger remote control function according to the finger sliding event.
According to the embodiment, after the user finger is determined to perform the sliding motion in the indication direction, the finger sliding event can be generated, and the corresponding finger remote control function is triggered according to the finger sliding event, so that the practicability of motion recognition is improved.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the disclosed solution. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
Corresponding to fig. 6, the present disclosure also provides another action recognition device, the device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
collecting multi-frame user images, wherein the user images comprise user fingers and user eyes;
determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image;
calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance;
and if the relative distance is larger than the set threshold value, determining that the user finger performs a sliding action on the first pixel distance.
As shown in fig. 13, fig. 13 is a schematic diagram of a structure for an action recognition device 1300 according to an exemplary embodiment of the present disclosure. For example, apparatus 1300 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like having routing functionality.
Referring to fig. 13, apparatus 1300 may include one or more of the following components: a processing component 1302, a memory 1304, a power component 1306, a multimedia component 1308, an audio component 1310, an input/output (I/O) interface 1312, a sensor component 1314, and a communication component 1316.
The processing component 1302 generally controls overall operation of the apparatus 1300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 1302 may include one or more processors 1320 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 1302 can include one or more modules that facilitate interactions between the processing component 1302 and other components. For example, the processing component 1302 may include a multimedia module to facilitate interaction between the multimedia component 1308 and the processing component 1302.
The memory 1304 is configured to store various types of data to support operations at the apparatus 1300. Examples of such data include instructions for any application or method operating on the apparatus 1300, contact data, phonebook data, messages, pictures, videos, and the like. The memory 1304 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply assembly 1306 provides power to the various components of the device 1300. The power supply components 1306 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 1300.
The multimedia component 1308 includes a screen between the device 1300 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1308 includes a front-facing camera and/or a rear-facing camera. When the apparatus 1300 is in an operation mode, such as a photographing mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 1310 is configured to output and/or input audio signals. For example, the audio component 1310 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 1300 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 1304 or transmitted via the communication component 1316. In some embodiments, the audio component 1310 also includes a speaker for outputting audio signals.
The I/O interface 1312 provides an interface between the processing component 1302 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 1314 includes one or more sensors for providing status assessment of various aspects of the apparatus 1300. For example, the sensor assembly 1314 may detect the on/off state of the device 1300, the relative positioning of the components, such as the display and keypad of the device 1300, the sensor assembly 1314 may also detect changes in positional information of the device 1300 or a component of the device 1300, the presence or absence of user contact with the device 1300, the orientation or acceleration/deceleration of the device 1300, and temperature changes of the device 1300. The sensor assembly 1314 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 1314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1314 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, a microwave sensor, or a temperature sensor.
The communication component 1316 is configured to facilitate communication between the apparatus 1300 and other devices, either wired or wireless. The apparatus 1300 may access a wireless network based on a communication standard, such as WiFi,2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1316 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 1316 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 1300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
In an exemplary embodiment, a non-transitory computer-readable storage medium is also provided, such as memory 1304, including instructions executable by processor 1320 of apparatus 1300 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (13)

1. A method of motion recognition, the method comprising:
collecting multi-frame user images, wherein the user images comprise user fingers and user eyes;
determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image;
calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance;
If the relative distance is larger than a set threshold value, determining that the user finger makes a sliding motion on the first pixel distance;
the determining a first pixel distance of finger sliding and a second pixel distance between eyes of a user according to the user image comprises the following steps:
acquiring a first image and a second image from the user image;
determining the first pixel distance according to a first pixel position of a finger of a user in the first image and a second pixel position in the second image;
determining the second pixel distance according to a first pixel distance between the eyes of a user in the first image and a second pixel distance between the eyes of the user in the second image;
the calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance comprises the following steps:
calculating a ratio between the first pixel distance and the second pixel distance;
the ratio is determined as the relative distance.
2. The method of claim 1, wherein the determining the first pixel distance from a first pixel location of a user's finger in the first image and a second pixel location in the second image comprises:
If the first pixel position is different from the second pixel position, determining a pixel distance between the first pixel position and the second pixel position as the first pixel distance;
the determining the second pixel distance according to the first pixel spacing of the eyes of the user in the first image and the second pixel spacing in the second image comprises the following steps:
and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance.
3. The method according to claim 2, wherein the first image is an i-th frame image captured by a camera, the second image is an i+n-th frame image captured by the camera, and the value of n depends on the frame rate of the camera;
and if the relative distance is greater than the set threshold, determining that the user finger performs a sliding motion on the first pixel distance, including:
and if the relative distance is larger than the set threshold value, determining the direction from the first pixel position to the second pixel position as an indication direction of finger sliding, and determining that the user finger performs sliding operation in the indication direction.
4. The method according to claim 2, wherein the method further comprises:
and if the first pixel interval is different from the second pixel interval, acquiring another first image and/or another second image from the user image again.
5. The method according to claim 1, wherein the method further comprises:
setting the set threshold, wherein the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of a camera for acquiring the image of the user.
6. The method of claim 1, wherein the determining that the user's finger has made a sliding motion in the indicated direction of the finger sliding, the method further comprises:
generating a finger sliding event, wherein the finger sliding event is used for representing that a user finger makes a sliding action in the indication direction of the finger sliding;
triggering a corresponding finger remote control function according to the finger sliding event.
7. An action recognition device, the device comprising:
the acquisition module is configured to acquire a plurality of frames of user images, wherein the user images comprise user fingers and user eyes;
A first determining module configured to determine a first pixel distance of finger sliding and a second pixel distance between eyes of a user from the user image;
a calculation module configured to calculate a relative distance of finger sliding from the first pixel distance and the second pixel distance;
the second determining module is configured to determine that the user finger performs a sliding action on the first pixel distance if the relative distance is greater than a set threshold;
the first determining module includes:
a first acquisition sub-module configured to acquire a first image and a second image from the user image;
a first determination sub-module configured to determine the first pixel distance from a first pixel location of a user's finger in the first image and a second pixel location in the second image;
a second determination sub-module configured to determine the second pixel distance from a first pixel pitch of both eyes of a user in the first image and a second pixel pitch in the second image;
the computing module includes:
a calculation sub-module configured to calculate a ratio between the first pixel distance and the second pixel distance;
A sixth determination submodule configured to determine the ratio as the relative distance.
8. The apparatus of claim 7, wherein the first determination submodule is specifically configured to:
determining a first pixel location of a user's finger in the first image and a second pixel location in the second image;
if the first pixel position is different from the second pixel position, determining a pixel distance between the first pixel position and the second pixel position as the first pixel distance;
the second determination submodule is specifically configured to:
determining a first pixel pitch of both eyes of a user in the first image and a second pixel pitch in the second image;
and if the first pixel pitch is the same as the second pixel pitch, determining the first pixel pitch or the second pixel pitch as the second pixel distance.
9. The apparatus of claim 8, wherein the first image is an i-th frame image captured by a camera, the second image is an i+n-th frame image captured by the camera, and the value of n is dependent on the frame rate of the camera; the second determining module includes:
And a fifth determining sub-module configured to determine a direction from the first pixel position to the second pixel position as an indication direction of finger sliding if the relative distance is greater than a set threshold, and determine that a sliding operation is made by a user's finger in the indication direction.
10. The apparatus of claim 8, wherein the first determining module further comprises:
and the second acquisition sub-module is configured to acquire another first image and/or another second image from the user image again if the first pixel pitch is different from the second pixel pitch.
11. The apparatus of claim 7, wherein the apparatus further comprises:
the setting module is configured to set the set threshold, and the value of the set threshold depends on the measured average value of the finger sliding speeds, the average distance between the eyes of the user and the frame rate of a camera for acquiring the user image.
12. The apparatus of claim 7, wherein the apparatus further comprises:
a generating module configured to generate a finger sliding event for characterizing that a user finger has made a sliding motion in the indicated direction of the finger sliding after the second determining module determines that the user finger has made a sliding motion in the indicated direction of the finger sliding;
And the finger remote control module is configured to trigger a corresponding finger remote control function according to the finger sliding event.
13. An action recognition device, the device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to:
collecting multi-frame user images, wherein the user images comprise user fingers and user eyes;
determining a first pixel distance for finger sliding and a second pixel distance between eyes of a user according to the user image;
calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance;
if the relative distance is larger than a set threshold value, determining that the user finger makes a sliding motion on the first pixel distance;
the determining a first pixel distance of finger sliding and a second pixel distance between eyes of a user according to the user image comprises the following steps:
acquiring a first image and a second image from the user image;
determining the first pixel distance according to a first pixel position of a finger of a user in the first image and a second pixel position in the second image;
determining the second pixel distance according to a first pixel distance between the eyes of a user in the first image and a second pixel distance between the eyes of the user in the second image;
The calculating the relative distance of finger sliding according to the first pixel distance and the second pixel distance comprises the following steps:
calculating a ratio between the first pixel distance and the second pixel distance;
the ratio is determined as the relative distance.
CN201811522373.XA 2018-12-13 2018-12-13 Action recognition method and device Active CN109670432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811522373.XA CN109670432B (en) 2018-12-13 2018-12-13 Action recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811522373.XA CN109670432B (en) 2018-12-13 2018-12-13 Action recognition method and device

Publications (2)

Publication Number Publication Date
CN109670432A CN109670432A (en) 2019-04-23
CN109670432B true CN109670432B (en) 2023-08-04

Family

ID=66145093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811522373.XA Active CN109670432B (en) 2018-12-13 2018-12-13 Action recognition method and device

Country Status (1)

Country Link
CN (1) CN109670432B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017067481A1 (en) * 2015-10-23 2017-04-27 努比亚技术有限公司 Method and mobile terminal for processing image
CN106648063A (en) * 2016-10-19 2017-05-10 北京小米移动软件有限公司 Gesture recognition method and device
CN107800868A (en) * 2017-09-21 2018-03-13 维沃移动通信有限公司 A kind of method for displaying image and mobile terminal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017067481A1 (en) * 2015-10-23 2017-04-27 努比亚技术有限公司 Method and mobile terminal for processing image
CN106648063A (en) * 2016-10-19 2017-05-10 北京小米移动软件有限公司 Gesture recognition method and device
CN107800868A (en) * 2017-09-21 2018-03-13 维沃移动通信有限公司 A kind of method for displaying image and mobile terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于视线跟踪和手势识别的人机交互;肖志勇等;《计算机工程》;20090805(第15期);全文 *

Also Published As

Publication number Publication date
CN109670432A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
CN106572299B (en) Camera opening method and device
EP3032821B1 (en) Method and device for shooting a picture
EP3575862B1 (en) Method and device for adjusting lens position
EP3182716A1 (en) Method and device for video display
JP6587628B2 (en) Instruction generation method and apparatus
EP3173970A1 (en) Image processing method and apparatus
CN107102772B (en) Touch control method and device
CN107656682B (en) Mobile terminal and bending angle calculation method
CN106648063B (en) Gesture recognition method and device
CN107798309B (en) Fingerprint input method and device and computer readable storage medium
EP3113071A1 (en) Method and device for acquiring iris image
CN109325908B (en) Image processing method and device, electronic equipment and storage medium
CN110059547B (en) Target detection method and device
CN107222623B (en) Holding state recognition device and method and electronic equipment
EP3232301A1 (en) Mobile terminal and virtual key processing method
US20160313969A1 (en) Electronic apparatus, image display system, and recording medium
CN108829475B (en) UI drawing method, device and storage medium
CN112202962B (en) Screen brightness adjusting method and device and storage medium
CN108154090B (en) Face recognition method and device
CN107239758B (en) Method and device for positioning key points of human face
CN103973883B (en) A kind of method and device controlling voice-input device
CN109670432B (en) Action recognition method and device
CN114430457B (en) Shooting method, shooting device, electronic equipment and storage medium
CN104954683B (en) Determine the method and device of photographic device
CN106131403B (en) Touch focusing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant