WO2019200785A1 - Procédé de suivi de main rapide, dispositif, terminal et support d'informations - Google Patents

Procédé de suivi de main rapide, dispositif, terminal et support d'informations Download PDF

Info

Publication number
WO2019200785A1
WO2019200785A1 PCT/CN2018/100227 CN2018100227W WO2019200785A1 WO 2019200785 A1 WO2019200785 A1 WO 2019200785A1 CN 2018100227 W CN2018100227 W CN 2018100227W WO 2019200785 A1 WO2019200785 A1 WO 2019200785A1
Authority
WO
WIPO (PCT)
Prior art keywords
calibration frame
calibration
frame
standard calibration
human hand
Prior art date
Application number
PCT/CN2018/100227
Other languages
English (en)
Chinese (zh)
Inventor
阮晓雯
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019200785A1 publication Critical patent/WO2019200785A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of hand tracking technology, and in particular, to a fast hand tracking method, device, terminal, and storage medium.
  • gestures As an important means of natural interaction, gestures have important research value and broad application prospects.
  • the first and most important step in gesture recognition and hand tracking is to segment the hand area from the graphics.
  • the advantages and disadvantages of hand segmentation directly affect the subsequent gesture recognition and gesture tracking effects.
  • the collected photos contain the whole body of the human body. Since such photos have a large amount of background, the hand area is only a small part of the picture. How to detect the hand from a large number of background areas and segment it quickly and accurately is a problem worth studying.
  • a first aspect of the present application provides a fast hand tracking method, the method comprising:
  • Tracking the hand image with a continuous adaptive mathematical expectation movement operator wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
  • a second aspect of the present application provides a fast hand tracking device, the device comprising:
  • a display module configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region
  • a calibration module configured to receive a calibration frame that is calibrated by the user on the video that includes the human hand region
  • a segmentation module configured to: Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image
  • a tracking module for utilizing continuous adaptive mathematics It is desirable for the mobile operator to track the hand image.
  • a third aspect of the present application provides a terminal comprising a processor and a memory, the processor implementing the fast hand tracking method when executing computer readable instructions stored in a memory.
  • a fourth aspect of the present application provides a non-volatile readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the fast hand tracking method.
  • the fast hand tracking method, device, terminal and storage medium described in the present application firstly perform rough calibration on the hand region to obtain a calibration frame, and then extract the HOG feature in the calibration area of the calibration frame, according to the HOG feature.
  • the hand region is accurately segmented from the calibration area of the calibration frame, thereby reducing the area of the region for extracting the HOG feature, effectively shortening the time for extracting the HOG feature, and thus capable of quickly segmenting and tracking the hand region;
  • obtaining the depth information of the video containing the hand can further ensure the clarity of the hand contour, especially in the hand region tracking under complex background, and the tracking efficiency is particularly remarkable.
  • FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
  • FIG. 3 is a structural diagram of a fast hand tracking device according to Embodiment 3 of the present application.
  • FIG. 4 is a structural diagram of a fast hand tracking device according to Embodiment 4 of the present application.
  • FIG. 5 is a schematic diagram of a terminal provided in Embodiment 5 of the present application.
  • the fast hand tracking method of the embodiment of the present application is applied to one or more terminals.
  • the fast hand tracking method can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network.
  • the fast hand tracking method of the embodiment of the present application may be executed by a server or by a terminal; or may be performed by a server and a terminal together.
  • the fast hand tracking function provided by the method of the present application may be directly integrated on the terminal, or the client for implementing the method of the present application may be installed.
  • the method provided by the present application can also be run on a server, such as a software development kit (SDK), to provide an interface for fast hand tracking function in the form of an SDK, and the terminal or other device passes The interface provided enables hand tracking.
  • SDK software development kit
  • FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
  • the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region.
  • the imaging device is a 2D camera.
  • the calibration target is added by adding a calibration box on the display interface. Hand information.
  • the user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
  • the specific process of extracting the Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame includes:
  • the first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
  • the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example.
  • the area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
  • G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
  • M(x, y) and ⁇ (x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
  • an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
  • the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
  • the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
  • the calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
  • Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
  • the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
  • the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
  • the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
  • the tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
  • the zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
  • the fast hand tracking method described in the present application is performed by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration area of the calibration frame.
  • the HOG feature segments the hand region from the calibration area of the calibration frame based on the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated.
  • the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
  • the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit
  • the calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
  • FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
  • 201 Display a video of the human hand region collected by the imaging device on the display interface, and display a preset standard calibration frame in a preset display manner.
  • the terminal provides a display interface
  • the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
  • the imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information.
  • the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
  • the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
  • the preset display manner includes one or a combination of the following:
  • the display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
  • a first preset time period for example, 1 second
  • a first preset voice for example, "calibration box"
  • the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
  • the hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
  • a second preset time period for example, 2 seconds
  • a second preset voice for example, "exit"
  • the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
  • the hidden instruction may be the same as or different from the display instruction.
  • the first preset time period may be the same as or different from the second preset time period.
  • the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long.
  • the second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
  • Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time.
  • the preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
  • the preset standard calibration frame After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously.
  • the display of the command and the display of the preset standard calibration frame for a long time occurs.
  • the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
  • the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
  • the calibration is performed by adding a standard calibration box on the display interface. Hand information.
  • the standard calibration frame that is received by the receiving user on the video including the human hand region includes the following two situations:
  • a first case receiving a rough calibration frame drawn by the user in the video containing the human hand region; matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method;
  • the standard calibration frame is calibrated to the video containing the human hand region and displays a calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame.
  • the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing.
  • the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame.
  • the corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
  • the second case directly receiving the standard calibration frame selected by the user, and performing calibration on the video containing the human hand region according to the standard calibration frame and displaying the calibration standard calibration frame.
  • the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. .
  • the user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
  • the step 202 may further include: when receiving the instructions of zooming in, zooming out, moving, and deleting, zooming in, zooming out, moving, and deleting the displayed standard calibration frame.
  • the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  • the grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
  • the correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
  • the step 204 described in this embodiment is the same as the step 103 described in the first embodiment, and details are not described herein again.
  • the step 205 described in this embodiment is the same as the step 104 described in the first embodiment, and details are not described herein again.
  • the method further includes: acquiring the depth information in the video including the human hand region corresponding to the calibration area of the calibration frame. And normalizing the hand image according to the depth information.
  • the depth information is obtained from the 3D depth camera.
  • the specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2.
  • the hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
  • the size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
  • the fast hand tracking method described in the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration is a standard calibration frame, and then the segmentation is obtained.
  • the shape of the hand area is standard, and the hand tracking effect is better based on the standard calibration frame of the division.
  • the fast dynamic hand tracking method described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands.
  • the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any method that uses the idea of this application for hand tracking should be It is included in the scope of this application.
  • FIG. 3 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
  • the fast hand tracking device 30 operates in a terminal.
  • the fast hand tracking device 30 can include a plurality of functional modules comprised of program code segments.
  • the program code for each of the program segments in the fast hand tracking device 30 can be stored in a memory and executed by at least one processor to perform (see Figure 1 and its associated description) tracking of the hand region.
  • the fast hand tracking device 30 of the terminal can be divided into multiple functional modules according to the functions performed by the terminal.
  • the function module may include: a display module 301, a calibration module 302, a segmentation module 303, and a tracking module 304.
  • the display module 301 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region.
  • the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region.
  • the imaging device is a 2D camera.
  • the calibration module 302 is configured to receive a calibration frame that is calibrated by the user on the video including the human hand region.
  • the calibration target is added by adding a calibration box on the display interface. Hand information.
  • the user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
  • the segmentation module 303 is configured to extract a gradient direction histogram feature of the calibration frame calibration region, and divide the calibration frame calibration region according to the gradient direction histogram feature to obtain a hand image.
  • the segmentation module 303 extracts a Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame, and specifically includes:
  • the first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
  • the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example.
  • the area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
  • G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
  • M(x, y) and ⁇ (x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
  • an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
  • the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
  • the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
  • the calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
  • Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
  • the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
  • the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
  • the tracking module 304 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
  • the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
  • the tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
  • the zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
  • the fast hand tracking device 30 of the present application is configured by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration of the calibration frame.
  • the HOG feature of the region, the hand region is segmented from the region of the calibration frame according to the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated.
  • the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
  • the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit
  • the calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
  • FIG. 4 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
  • the fast hand tracking device 40 operates in a terminal.
  • the fast hand tracking device 40 can include a plurality of functional modules comprised of program code segments.
  • the program code for each of the program segments in the fast hand tracking device 40 can be stored in a memory and executed by at least one processor to perform (see Figure 2 and its associated description) tracking of the hand region.
  • the fast hand tracking device of the terminal may be divided into multiple functional modules according to the functions performed by the terminal.
  • the function module may include: a display module 401, a calibration module 402, a pre-processing module 403, a segmentation module 404, a tracking module 405, and a normalization module 406.
  • the display module 401 includes: a first display sub-module 4010 and a second display sub-module 4012.
  • the first display sub-module 4010 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region
  • the second display sub-module 4012 is configured to display a preset standard in a preset display manner. Calibration box.
  • the terminal provides a display interface
  • the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
  • the imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information.
  • the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
  • the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
  • the preset display manner includes one or a combination of the following:
  • the display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
  • a first preset time period for example, 1 second
  • a first preset voice for example, "calibration box"
  • the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
  • the hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
  • a second preset time period for example, 2 seconds
  • a second preset voice for example, "exit"
  • the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
  • the hidden instruction may be the same as or different from the display instruction.
  • the first preset time period may be the same as or different from the second preset time period.
  • the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long.
  • the second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
  • Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time.
  • the preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
  • the preset standard calibration frame After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously.
  • the display of the command and the display of the preset standard calibration frame for a long time occurs.
  • the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
  • the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
  • the calibration module 402 is configured to receive a standard calibration frame that is calibrated by the user on the video including the human hand region.
  • the calibration is performed by adding a standard calibration box on the display interface. Hand information.
  • the calibration module 402 further includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024.
  • the first standard stator module 4020 is configured to receive a rough calibration frame drawn by a user in the video containing the human hand region; and match a preset setting corresponding to the coarse calibration frame by a fuzzy matching method. a standard calibration frame; calibrating and displaying a calibration standard calibration frame in the video containing the human hand region according to the matched standard calibration frame, wherein the geometric center of the coarse calibration frame and the matched standard calibration frame The geometric center is the same.
  • the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing.
  • the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame.
  • the corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
  • the second standard stator module 4022 is configured to directly receive a standard calibration frame selected by a user, perform calibration on the video containing the human hand region according to the standard calibration frame, and display a calibration standard calibration frame.
  • the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. .
  • the user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
  • the third standard stator module 4024 is configured to enlarge, reduce, move, and delete the displayed standard calibration frame when receiving an instruction to zoom in, zoom out, move, or delete.
  • the pre-processing module 403 is configured to pre-process the area of the standard calibration frame.
  • the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  • the grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
  • the correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
  • a segmentation module 404 configured to extract a gradient direction histogram feature of the pre-processed calibration area of the standard calibration frame, and segment the area of the standard calibration frame according to the gradient direction histogram feature to obtain a hand Part image.
  • the tracking module 405 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
  • the fast hand tracking device 40 further includes a normalization module 406, configured to acquire depth information in a video corresponding to a human hand region corresponding to the calibration area of the calibration frame, and the hand is used according to the depth information. The image is normalized.
  • the depth information is obtained from the 3D depth camera.
  • the specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2.
  • the hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
  • the size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
  • the fast hand tracking device 40 of the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration can be a standard calibration frame, and then segmented.
  • the shape of the obtained hand region is standard, and the hand tracking effect is better based on the standard calibration frame of the segmentation.
  • fast dynamic hand tracking device 30, 40 described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands.
  • the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any device that uses the idea of this application for hand tracking should It is included in the scope of this application.
  • FIG. 5 is a schematic diagram of a terminal according to Embodiment 5 of the present application.
  • the terminal 5 includes a memory 51, at least one processor 52, computer readable instructions 53 stored in the memory 51 and operable on the at least one processor 52, at least one communication bus 54 and an imaging device 55. .
  • the at least one processor 52 implements the steps in the fast hand tracking method embodiment when the computer readable instructions 53 are executed, such as steps 101 to 104 shown in FIG. 1 or steps 201 to 205 shown in FIG. 2.
  • the at least one processor 52 implements the functions of the modules/units in the above-described apparatus embodiments when the computer readable instructions 53 are executed, such as the modules 301 to 304 in FIG. 3 or the modules 401 to 406 in FIG.
  • the computer readable instructions 53 may be partitioned into one or more modules/units, the one or more modules/units being stored in the memory 51 and by the at least one processor 52 Execute to complete this application.
  • the one or more modules/units may be a series of computer readable instruction instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 53 in the terminal 5.
  • the computer readable instructions 53 may be divided into the display module 301, the calibration module 302, the segmentation module 303, and the tracking module 304 in FIG. 3, or may be divided into the display module 401, the calibration module 402, and the pre-FIG.
  • the display module 401 includes a first display sub-module 4010 and a second display sub-module 4012.
  • the calibration module 402 includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024. For specific functions, see Embodiments 1 and 2 and their corresponding descriptions.
  • the imaging device 55 includes a 2D camera, a 3D depth camera, etc., and the imaging device 55 may be mounted on the terminal 5 or may be separated from the terminal 5 as an independent component.
  • the terminal 5 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It can be understood by those skilled in the art that the schematic diagram 5 is only an example of the terminal 5, does not constitute a limitation of the terminal 5, may include more or less components than the illustration, or combine some components, or different components.
  • the terminal 5 may further include an input/output device, a network access device, a bus, and the like.
  • the at least one processor 52 may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). ), a Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and the like.
  • the processor 52 may be a microprocessor or the processor 52 may be any conventional processor or the like.
  • the processor 52 is a control center of the terminal 5, and connects the entire terminal 5 with various interfaces and lines. section.
  • the memory 51 can be used to store the computer readable instructions 53 and/or modules/units by running or executing computer readable instructions and/or modules/units stored in the memory 51, and The data stored in the memory 51 is called to implement various functions of the terminal 5.
  • the memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.); and the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the terminal 5 is stored.
  • the memory 51 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD).
  • SSD secure digital
  • flash card at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the terminal 5 can be stored in a non-volatile readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by computer-readable instructions, which may be stored in a non-volatile manner. In reading a storage medium, the computer readable instructions, when executed by a processor, implement the steps of the various method embodiments described above. Wherein, the computer readable instructions comprise computer readable instruction code, which may be in the form of source code, an object code form, an executable file or some intermediate form or the like.
  • the non-transitory readable medium may include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only Memory), Random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media.
  • ROM Read Only memory
  • RAM Random Access Memory
  • the contents of the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, Volatile readable media does not include electrical carrier signals and telecommunication signals.
  • the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.
  • the term "comprising” does not exclude other elements or the singular does not exclude the plural.
  • a plurality of units or devices recited in the system claims can also be implemented by a unit or device by software or hardware.
  • the first, second, etc. words are used to denote names and do not denote any particular order.

Abstract

La présente invention concerne un procédé de suivi de main rapide comprenant : l'affichage, sur une interface d'affichage, d'une vidéo contenant une région de main acquise par un appareil d'imagerie ; la réception d'un rectangle englobant marqué par un utilisateur sur la vidéo contenant la région de main ; l'extraction d'une caractéristique d'histogramme de gradient orienté (HOG) d'une région marquée par le rectangle englobant, et la segmentation, selon la caractéristique de HOG, de la région marquée par le rectangle englobant afin d'obtenir une image de main ; et le suivi de l'image de main au moyen d'un opérateur de décalage moyen adaptatif en continu. La présente invention concerne en outre un dispositif de suivi de main rapide, un terminal et un support d'informations. La présente invention permet une extraction rapide d'une caractéristique de HOG dans un rectangle englobant marqué par un utilisateur, et réalise avec précision une segmentation pour obtenir une région de main selon la caractéristique de HOG, ce qui permet d'arriver à un meilleur résultat de suivi.
PCT/CN2018/100227 2018-04-18 2018-08-13 Procédé de suivi de main rapide, dispositif, terminal et support d'informations WO2019200785A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810349972.X 2018-04-18
CN201810349972.XA CN108682021B (zh) 2018-04-18 2018-04-18 快速手部跟踪方法、装置、终端及存储介质

Publications (1)

Publication Number Publication Date
WO2019200785A1 true WO2019200785A1 (fr) 2019-10-24

Family

ID=63801123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100227 WO2019200785A1 (fr) 2018-04-18 2018-08-13 Procédé de suivi de main rapide, dispositif, terminal et support d'informations

Country Status (2)

Country Link
CN (1) CN108682021B (fr)
WO (1) WO2019200785A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886928B (zh) * 2019-01-24 2023-07-14 平安科技(深圳)有限公司 一种目标细胞标记方法、装置、存储介质及终端设备
SG10201913029SA (en) * 2019-12-23 2021-04-29 Sensetime Int Pte Ltd Target tracking method and apparatus, electronic device, and storage medium
CN115701873A (zh) * 2021-07-19 2023-02-14 北京字跳网络技术有限公司 一种图像匹配方法、装置、设备及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015139750A1 (fr) * 2014-03-20 2015-09-24 Telecom Italia S.P.A. Système et procédé de capture de mouvements
CN105825524A (zh) * 2016-03-10 2016-08-03 浙江生辉照明有限公司 目标跟踪方法和装置
CN105957107A (zh) * 2016-04-27 2016-09-21 北京博瑞空间科技发展有限公司 行人检测与跟踪方法及装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8963829B2 (en) * 2009-10-07 2015-02-24 Microsoft Corporation Methods and systems for determining and tracking extremities of a target
CN103390168A (zh) * 2013-07-18 2013-11-13 重庆邮电大学 基于Kinect深度信息的智能轮椅动态手势识别方法
CN105678809A (zh) * 2016-01-12 2016-06-15 湖南优象科技有限公司 手持式自动跟拍装置及其目标跟踪方法
CN106157308A (zh) * 2016-06-30 2016-11-23 北京大学 矩形目标物检测方法
US20180047173A1 (en) * 2016-08-12 2018-02-15 Qualcomm Incorporated Methods and systems of performing content-adaptive object tracking in video analytics
CN107240117B (zh) * 2017-05-16 2020-05-15 上海体育学院 视频中运动目标的跟踪方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015139750A1 (fr) * 2014-03-20 2015-09-24 Telecom Italia S.P.A. Système et procédé de capture de mouvements
CN105825524A (zh) * 2016-03-10 2016-08-03 浙江生辉照明有限公司 目标跟踪方法和装置
CN105957107A (zh) * 2016-04-27 2016-09-21 北京博瑞空间科技发展有限公司 行人检测与跟踪方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LAN, TIANTIAN ET AL.: "Study on Hand Gesture Recognition Used for Air Conditioning Control", CHINESE MASTER'S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE & TECHNOLOGY, 1 May 2015 (2015-05-01), pages 1 - 70, XP055645970 *

Also Published As

Publication number Publication date
CN108682021B (zh) 2021-03-05
CN108682021A (zh) 2018-10-19

Similar Documents

Publication Publication Date Title
CN109493350B (zh) 人像分割方法及装置
CN107516319B (zh) 一种高精度简易交互式抠图方法、存储设备及终端
Xiao et al. Fast image dehazing using guided joint bilateral filter
EP2864933B1 (fr) Procédé, appareil et produit de programme informatique d'extraction de caractéristiques de visage humain
EP3104332B1 (fr) Manipulation d'images numériques
US8437570B2 (en) Geodesic image and video processing
CN109934065B (zh) 一种用于手势识别的方法和装置
US10277806B2 (en) Automatic image composition
US9639943B1 (en) Scanning of a handheld object for 3-dimensional reconstruction
US20110211749A1 (en) System And Method For Processing Video Using Depth Sensor Information
WO2019071976A1 (fr) Procédé de détection de relief dans une image panoramique, reposant sur une fusion de régions et sur un modèle de mouvement des yeux
WO2020134818A1 (fr) Procédé de traitement d'images et produit associé
WO2013086255A1 (fr) Calculs de distances alignées par mouvement permettant des comparaisons d'images
US20230334235A1 (en) Detecting occlusion of digital ink
WO2018082308A1 (fr) Procédé de traitement d'image et terminal
WO2019200785A1 (fr) Procédé de suivi de main rapide, dispositif, terminal et support d'informations
CN111402170A (zh) 图像增强方法、装置、终端及计算机可读存储介质
CN114255337A (zh) 文档图像的矫正方法、装置、电子设备及存储介质
KR20220153667A (ko) 특징 추출 방법, 장치, 전자 기기, 저장 매체 및 컴퓨터 프로그램
CN107578053B (zh) 轮廓提取方法及装置、计算机装置及可读存储介质
US9171357B2 (en) Method, apparatus and computer-readable recording medium for refocusing photographed image
CN108960012B (zh) 特征点检测方法、装置及电子设备
CN108647605B (zh) 一种结合全局颜色与局部结构特征的人眼凝视点提取方法
KR20190044761A (ko) 이미지 처리 장치 및 방법
WO2022021287A1 (fr) Procédé d'accentuation de données et procédé d'apprentissage pour modèle de segmentation d'instances, et appareil associé

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914953

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.12.2020)