WO2019200785A1 - Fast hand tracking method, device, terminal, and storage medium - Google Patents

Fast hand tracking method, device, terminal, and storage medium Download PDF

Info

Publication number
WO2019200785A1
WO2019200785A1 PCT/CN2018/100227 CN2018100227W WO2019200785A1 WO 2019200785 A1 WO2019200785 A1 WO 2019200785A1 CN 2018100227 W CN2018100227 W CN 2018100227W WO 2019200785 A1 WO2019200785 A1 WO 2019200785A1
Authority
WO
WIPO (PCT)
Prior art keywords
calibration frame
calibration
frame
standard calibration
human hand
Prior art date
Application number
PCT/CN2018/100227
Other languages
French (fr)
Chinese (zh)
Inventor
阮晓雯
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019200785A1 publication Critical patent/WO2019200785A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20092Interactive image processing based on input by user
    • G06T2207/20104Interactive definition of region of interest [ROI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the field of hand tracking technology, and in particular, to a fast hand tracking method, device, terminal, and storage medium.
  • gestures As an important means of natural interaction, gestures have important research value and broad application prospects.
  • the first and most important step in gesture recognition and hand tracking is to segment the hand area from the graphics.
  • the advantages and disadvantages of hand segmentation directly affect the subsequent gesture recognition and gesture tracking effects.
  • the collected photos contain the whole body of the human body. Since such photos have a large amount of background, the hand area is only a small part of the picture. How to detect the hand from a large number of background areas and segment it quickly and accurately is a problem worth studying.
  • a first aspect of the present application provides a fast hand tracking method, the method comprising:
  • Tracking the hand image with a continuous adaptive mathematical expectation movement operator wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
  • a second aspect of the present application provides a fast hand tracking device, the device comprising:
  • a display module configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region
  • a calibration module configured to receive a calibration frame that is calibrated by the user on the video that includes the human hand region
  • a segmentation module configured to: Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image
  • a tracking module for utilizing continuous adaptive mathematics It is desirable for the mobile operator to track the hand image.
  • a third aspect of the present application provides a terminal comprising a processor and a memory, the processor implementing the fast hand tracking method when executing computer readable instructions stored in a memory.
  • a fourth aspect of the present application provides a non-volatile readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the fast hand tracking method.
  • the fast hand tracking method, device, terminal and storage medium described in the present application firstly perform rough calibration on the hand region to obtain a calibration frame, and then extract the HOG feature in the calibration area of the calibration frame, according to the HOG feature.
  • the hand region is accurately segmented from the calibration area of the calibration frame, thereby reducing the area of the region for extracting the HOG feature, effectively shortening the time for extracting the HOG feature, and thus capable of quickly segmenting and tracking the hand region;
  • obtaining the depth information of the video containing the hand can further ensure the clarity of the hand contour, especially in the hand region tracking under complex background, and the tracking efficiency is particularly remarkable.
  • FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
  • FIG. 3 is a structural diagram of a fast hand tracking device according to Embodiment 3 of the present application.
  • FIG. 4 is a structural diagram of a fast hand tracking device according to Embodiment 4 of the present application.
  • FIG. 5 is a schematic diagram of a terminal provided in Embodiment 5 of the present application.
  • the fast hand tracking method of the embodiment of the present application is applied to one or more terminals.
  • the fast hand tracking method can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network.
  • the fast hand tracking method of the embodiment of the present application may be executed by a server or by a terminal; or may be performed by a server and a terminal together.
  • the fast hand tracking function provided by the method of the present application may be directly integrated on the terminal, or the client for implementing the method of the present application may be installed.
  • the method provided by the present application can also be run on a server, such as a software development kit (SDK), to provide an interface for fast hand tracking function in the form of an SDK, and the terminal or other device passes The interface provided enables hand tracking.
  • SDK software development kit
  • FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
  • the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region.
  • the imaging device is a 2D camera.
  • the calibration target is added by adding a calibration box on the display interface. Hand information.
  • the user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
  • the specific process of extracting the Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame includes:
  • the first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
  • the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example.
  • the area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
  • G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
  • M(x, y) and ⁇ (x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
  • an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
  • the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
  • the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
  • the calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
  • Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
  • the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
  • the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
  • the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
  • the tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
  • the zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
  • the fast hand tracking method described in the present application is performed by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration area of the calibration frame.
  • the HOG feature segments the hand region from the calibration area of the calibration frame based on the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated.
  • the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
  • the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit
  • the calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
  • FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
  • 201 Display a video of the human hand region collected by the imaging device on the display interface, and display a preset standard calibration frame in a preset display manner.
  • the terminal provides a display interface
  • the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
  • the imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information.
  • the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
  • the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
  • the preset display manner includes one or a combination of the following:
  • the display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
  • a first preset time period for example, 1 second
  • a first preset voice for example, "calibration box"
  • the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
  • the hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
  • a second preset time period for example, 2 seconds
  • a second preset voice for example, "exit"
  • the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
  • the hidden instruction may be the same as or different from the display instruction.
  • the first preset time period may be the same as or different from the second preset time period.
  • the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long.
  • the second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
  • Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time.
  • the preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
  • the preset standard calibration frame After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously.
  • the display of the command and the display of the preset standard calibration frame for a long time occurs.
  • the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
  • the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
  • the calibration is performed by adding a standard calibration box on the display interface. Hand information.
  • the standard calibration frame that is received by the receiving user on the video including the human hand region includes the following two situations:
  • a first case receiving a rough calibration frame drawn by the user in the video containing the human hand region; matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method;
  • the standard calibration frame is calibrated to the video containing the human hand region and displays a calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame.
  • the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing.
  • the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame.
  • the corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
  • the second case directly receiving the standard calibration frame selected by the user, and performing calibration on the video containing the human hand region according to the standard calibration frame and displaying the calibration standard calibration frame.
  • the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. .
  • the user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
  • the step 202 may further include: when receiving the instructions of zooming in, zooming out, moving, and deleting, zooming in, zooming out, moving, and deleting the displayed standard calibration frame.
  • the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  • the grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
  • the correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
  • the step 204 described in this embodiment is the same as the step 103 described in the first embodiment, and details are not described herein again.
  • the step 205 described in this embodiment is the same as the step 104 described in the first embodiment, and details are not described herein again.
  • the method further includes: acquiring the depth information in the video including the human hand region corresponding to the calibration area of the calibration frame. And normalizing the hand image according to the depth information.
  • the depth information is obtained from the 3D depth camera.
  • the specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2.
  • the hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
  • the size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
  • the fast hand tracking method described in the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration is a standard calibration frame, and then the segmentation is obtained.
  • the shape of the hand area is standard, and the hand tracking effect is better based on the standard calibration frame of the division.
  • the fast dynamic hand tracking method described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands.
  • the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any method that uses the idea of this application for hand tracking should be It is included in the scope of this application.
  • FIG. 3 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
  • the fast hand tracking device 30 operates in a terminal.
  • the fast hand tracking device 30 can include a plurality of functional modules comprised of program code segments.
  • the program code for each of the program segments in the fast hand tracking device 30 can be stored in a memory and executed by at least one processor to perform (see Figure 1 and its associated description) tracking of the hand region.
  • the fast hand tracking device 30 of the terminal can be divided into multiple functional modules according to the functions performed by the terminal.
  • the function module may include: a display module 301, a calibration module 302, a segmentation module 303, and a tracking module 304.
  • the display module 301 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region.
  • the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region.
  • the imaging device is a 2D camera.
  • the calibration module 302 is configured to receive a calibration frame that is calibrated by the user on the video including the human hand region.
  • the calibration target is added by adding a calibration box on the display interface. Hand information.
  • the user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
  • the segmentation module 303 is configured to extract a gradient direction histogram feature of the calibration frame calibration region, and divide the calibration frame calibration region according to the gradient direction histogram feature to obtain a hand image.
  • the segmentation module 303 extracts a Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame, and specifically includes:
  • the first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
  • the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example.
  • the area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
  • G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
  • M(x, y) and ⁇ (x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
  • an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
  • the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
  • the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
  • the calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
  • Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
  • the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
  • the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
  • the tracking module 304 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
  • the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
  • the tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
  • the zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
  • the fast hand tracking device 30 of the present application is configured by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration of the calibration frame.
  • the HOG feature of the region, the hand region is segmented from the region of the calibration frame according to the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated.
  • the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
  • the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit
  • the calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
  • FIG. 4 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
  • the fast hand tracking device 40 operates in a terminal.
  • the fast hand tracking device 40 can include a plurality of functional modules comprised of program code segments.
  • the program code for each of the program segments in the fast hand tracking device 40 can be stored in a memory and executed by at least one processor to perform (see Figure 2 and its associated description) tracking of the hand region.
  • the fast hand tracking device of the terminal may be divided into multiple functional modules according to the functions performed by the terminal.
  • the function module may include: a display module 401, a calibration module 402, a pre-processing module 403, a segmentation module 404, a tracking module 405, and a normalization module 406.
  • the display module 401 includes: a first display sub-module 4010 and a second display sub-module 4012.
  • the first display sub-module 4010 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region
  • the second display sub-module 4012 is configured to display a preset standard in a preset display manner. Calibration box.
  • the terminal provides a display interface
  • the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
  • the imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information.
  • the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
  • the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
  • the preset display manner includes one or a combination of the following:
  • the display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
  • a first preset time period for example, 1 second
  • a first preset voice for example, "calibration box"
  • the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
  • the hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
  • a second preset time period for example, 2 seconds
  • a second preset voice for example, "exit"
  • the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
  • the hidden instruction may be the same as or different from the display instruction.
  • the first preset time period may be the same as or different from the second preset time period.
  • the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long.
  • the second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
  • Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time.
  • the preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
  • the preset standard calibration frame After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously.
  • the display of the command and the display of the preset standard calibration frame for a long time occurs.
  • the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
  • the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
  • the calibration module 402 is configured to receive a standard calibration frame that is calibrated by the user on the video including the human hand region.
  • the calibration is performed by adding a standard calibration box on the display interface. Hand information.
  • the calibration module 402 further includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024.
  • the first standard stator module 4020 is configured to receive a rough calibration frame drawn by a user in the video containing the human hand region; and match a preset setting corresponding to the coarse calibration frame by a fuzzy matching method. a standard calibration frame; calibrating and displaying a calibration standard calibration frame in the video containing the human hand region according to the matched standard calibration frame, wherein the geometric center of the coarse calibration frame and the matched standard calibration frame The geometric center is the same.
  • the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing.
  • the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame.
  • the corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
  • the second standard stator module 4022 is configured to directly receive a standard calibration frame selected by a user, perform calibration on the video containing the human hand region according to the standard calibration frame, and display a calibration standard calibration frame.
  • the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. .
  • the user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
  • the third standard stator module 4024 is configured to enlarge, reduce, move, and delete the displayed standard calibration frame when receiving an instruction to zoom in, zoom out, move, or delete.
  • the pre-processing module 403 is configured to pre-process the area of the standard calibration frame.
  • the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  • the grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
  • the correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
  • a segmentation module 404 configured to extract a gradient direction histogram feature of the pre-processed calibration area of the standard calibration frame, and segment the area of the standard calibration frame according to the gradient direction histogram feature to obtain a hand Part image.
  • the tracking module 405 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
  • the fast hand tracking device 40 further includes a normalization module 406, configured to acquire depth information in a video corresponding to a human hand region corresponding to the calibration area of the calibration frame, and the hand is used according to the depth information. The image is normalized.
  • the depth information is obtained from the 3D depth camera.
  • the specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2.
  • the hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
  • the size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
  • the fast hand tracking device 40 of the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration can be a standard calibration frame, and then segmented.
  • the shape of the obtained hand region is standard, and the hand tracking effect is better based on the standard calibration frame of the segmentation.
  • fast dynamic hand tracking device 30, 40 described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands.
  • the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any device that uses the idea of this application for hand tracking should It is included in the scope of this application.
  • FIG. 5 is a schematic diagram of a terminal according to Embodiment 5 of the present application.
  • the terminal 5 includes a memory 51, at least one processor 52, computer readable instructions 53 stored in the memory 51 and operable on the at least one processor 52, at least one communication bus 54 and an imaging device 55. .
  • the at least one processor 52 implements the steps in the fast hand tracking method embodiment when the computer readable instructions 53 are executed, such as steps 101 to 104 shown in FIG. 1 or steps 201 to 205 shown in FIG. 2.
  • the at least one processor 52 implements the functions of the modules/units in the above-described apparatus embodiments when the computer readable instructions 53 are executed, such as the modules 301 to 304 in FIG. 3 or the modules 401 to 406 in FIG.
  • the computer readable instructions 53 may be partitioned into one or more modules/units, the one or more modules/units being stored in the memory 51 and by the at least one processor 52 Execute to complete this application.
  • the one or more modules/units may be a series of computer readable instruction instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 53 in the terminal 5.
  • the computer readable instructions 53 may be divided into the display module 301, the calibration module 302, the segmentation module 303, and the tracking module 304 in FIG. 3, or may be divided into the display module 401, the calibration module 402, and the pre-FIG.
  • the display module 401 includes a first display sub-module 4010 and a second display sub-module 4012.
  • the calibration module 402 includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024. For specific functions, see Embodiments 1 and 2 and their corresponding descriptions.
  • the imaging device 55 includes a 2D camera, a 3D depth camera, etc., and the imaging device 55 may be mounted on the terminal 5 or may be separated from the terminal 5 as an independent component.
  • the terminal 5 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It can be understood by those skilled in the art that the schematic diagram 5 is only an example of the terminal 5, does not constitute a limitation of the terminal 5, may include more or less components than the illustration, or combine some components, or different components.
  • the terminal 5 may further include an input/output device, a network access device, a bus, and the like.
  • the at least one processor 52 may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). ), a Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and the like.
  • the processor 52 may be a microprocessor or the processor 52 may be any conventional processor or the like.
  • the processor 52 is a control center of the terminal 5, and connects the entire terminal 5 with various interfaces and lines. section.
  • the memory 51 can be used to store the computer readable instructions 53 and/or modules/units by running or executing computer readable instructions and/or modules/units stored in the memory 51, and The data stored in the memory 51 is called to implement various functions of the terminal 5.
  • the memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.); and the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the terminal 5 is stored.
  • the memory 51 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD).
  • SSD secure digital
  • flash card at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the terminal 5 can be stored in a non-volatile readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by computer-readable instructions, which may be stored in a non-volatile manner. In reading a storage medium, the computer readable instructions, when executed by a processor, implement the steps of the various method embodiments described above. Wherein, the computer readable instructions comprise computer readable instruction code, which may be in the form of source code, an object code form, an executable file or some intermediate form or the like.
  • the non-transitory readable medium may include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only Memory), Random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media.
  • ROM Read Only memory
  • RAM Random Access Memory
  • the contents of the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, Volatile readable media does not include electrical carrier signals and telecommunication signals.
  • the functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules.
  • the term "comprising” does not exclude other elements or the singular does not exclude the plural.
  • a plurality of units or devices recited in the system claims can also be implemented by a unit or device by software or hardware.
  • the first, second, etc. words are used to denote names and do not denote any particular order.

Abstract

A fast hand tracking method comprises: displaying, on a display interface, a video containing a hand region acquired by an imaging apparatus; receiving a bounding box marked by a user on the video containing the hand region; extracting a histogram-of-oriented-gradient (HOG) feature of a region marked by the bounding box, and segmenting, according to the HOG feature, the region marked by the bounding box, so as to obtain a hand image; and tracking the hand image by means of a continuously adaptive mean shift operator. The present application further provides a fast hand tracking device, a terminal, and a storage medium. The present application enables fast extraction of a HOG feature in a bounding box marked by a user, and accurately performs segmentation to obtain a hand region according to the HOG feature, thereby achieving a better tracking result.

Description

快速手部跟踪方法、装置、终端及存储介质Fast hand tracking method, device, terminal and storage medium
本申请要求于2018年04月18日提交中国专利局,申请号为201810349972.X发明名称为“快速手部跟踪方法、装置、终端及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application entitled "Fast Hand Tracking Method, Apparatus, Terminal, and Storage Medium" by the Chinese Patent Office on April 18, 2018. The application number is 201810349972.X. The citations are incorporated herein by reference.
技术领域Technical field
本申请涉及手部跟踪技术领域,具体涉及一种快速手部跟踪方法、装置、终端及存储介质。The present application relates to the field of hand tracking technology, and in particular, to a fast hand tracking method, device, terminal, and storage medium.
背景技术Background technique
手势作为自然交互的一种重要手段,有着重要的研究价值和广泛的应用前景。手势识别和手部跟踪的第一步,也是最重要的一步,便是将手部区域从图形中分割出来。手部区域分割的优劣直接影响到后续的手势识别和手势跟踪的效果。As an important means of natural interaction, gestures have important research value and broad application prospects. The first and most important step in gesture recognition and hand tracking is to segment the hand area from the graphics. The advantages and disadvantages of hand segmentation directly affect the subsequent gesture recognition and gesture tracking effects.
在人与机器人交互过程中,机器人上安装的视频采集设备与人体有一定的距离时,采集的照片中包含人体全身。由于此类照片存在大量背景,手部区域只是图片中很小的一部分,如何从大量背景区域中检测到手部,并将其快速且准确的分割出来,是值得研究的问题。In the process of interaction between human and robot, when the video acquisition device installed on the robot has a certain distance from the human body, the collected photos contain the whole body of the human body. Since such photos have a large amount of background, the hand area is only a small part of the picture. How to detect the hand from a large number of background areas and segment it quickly and accurately is a problem worth studying.
发明内容Summary of the invention
鉴于以上内容,有必要提出一种快速手部跟踪方法、装置、终端及存储介质,能够缩短提取手部区域的时间,提高手部识别和手部跟踪的准确度及效率,尤其是在复杂背景下的手部区域跟踪,跟踪效率较佳。In view of the above, it is necessary to propose a fast hand tracking method, device, terminal and storage medium, which can shorten the time for extracting the hand region, improve the accuracy and efficiency of hand recognition and hand tracking, especially in complex backgrounds. Under the hand area tracking, tracking efficiency is better.
本申请的第一方面提供一种快速手部跟踪方法,所述方法包括:A first aspect of the present application provides a fast hand tracking method, the method comprising:
在显示界面上显示成像设备采集的包含人体手部区域的视频;Displaying a video of the human hand region collected by the imaging device on the display interface;
接收用户在所述包含人体手部区域的视频上标定的标定框;Receiving a calibration frame that is calibrated by the user on the video containing the human hand region;
提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and
利用连续自适应数学期望移动算子对所述手部图像进行跟踪,其中所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:Tracking the hand image with a continuous adaptive mathematical expectation movement operator, wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
将所述手部图像的色彩空间转换到HSV色彩空间,分离出色调分量的手部图像,基于所述色调分量的手部图像I(i,j)及初始化的搜索框的质心位置和大小,计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00)及当前搜索窗的大小
Figure PCTCN2018100227-appb-000001
其中,
Figure PCTCN2018100227-appb-000002
Figure PCTCN2018100227-appb-000003
为当前搜索窗的一阶矩,
Figure PCTCN2018100227-appb-000004
为当前搜索窗的零阶矩,i为I(i,j)的水平方向上的像素值,j I(i,j)的垂直方向上的像素值。
Converting the color space of the hand image to the HSV color space, separating the hand image of the hue component, based on the hand image I(i,j) of the hue component and the centroid position and size of the initialized search box, Calculate the centroid position of the current search window (M 10 /M 00 , M 01 /M 00 ) and the size of the current search window
Figure PCTCN2018100227-appb-000001
among them,
Figure PCTCN2018100227-appb-000002
and
Figure PCTCN2018100227-appb-000003
Is the first moment of the current search window,
Figure PCTCN2018100227-appb-000004
For the zeroth moment of the current search window, i is the pixel value in the horizontal direction of I(i,j), and the pixel value in the vertical direction of j I(i,j).
本申请的第二方面提供一种快速手部跟踪装置,所述装置包括:A second aspect of the present application provides a fast hand tracking device, the device comprising:
显示模块,用于在显示界面上显示成像设备采集的包含人体手部区域的视频;标定模块,用于接收用户在所述包含人体手部区域的视频上标定的标定框;分割模块,用于提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及跟踪模块,用于利用连续自适应数学期望移动算子对所述手部图像进行跟踪。a display module, configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region; a calibration module, configured to receive a calibration frame that is calibrated by the user on the video that includes the human hand region; and a segmentation module, configured to: Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and a tracking module for utilizing continuous adaptive mathematics It is desirable for the mobile operator to track the hand image.
本申请的第三方面提供一种终端,所述终端包括处理器和存储器,所述处理器用于执行存储器中存储的计算机可读指令时实现所述快速手部跟踪方 法。A third aspect of the present application provides a terminal comprising a processor and a memory, the processor implementing the fast hand tracking method when executing computer readable instructions stored in a memory.
本申请的第四方面提供一种非易失性可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现所述快速手部跟踪方法。A fourth aspect of the present application provides a non-volatile readable storage medium having stored thereon computer readable instructions that, when executed by a processor, implement the fast hand tracking method.
本申请所述的快速手部跟踪方法、装置、终端及存储介质,先对手部区域进行粗略标定得到标定框,再提取所述标定框标定的区域中的HOG特征,根据所述HOG特征将将手部区域从所述标定框标定的区域中精确的分割出来,从而减少了提取HOG特征的区域的面积,有效的缩短了提取HOG特征的时间,因而能够快速的进行手部区域分割和跟踪;其次,获取包含手部的视频的深度信息,能够进一步保证手部轮廓的清晰度,尤其是在复杂背景下的手部区域跟踪,跟踪效率尤其显著。The fast hand tracking method, device, terminal and storage medium described in the present application firstly perform rough calibration on the hand region to obtain a calibration frame, and then extract the HOG feature in the calibration area of the calibration frame, according to the HOG feature. The hand region is accurately segmented from the calibration area of the calibration frame, thereby reducing the area of the region for extracting the HOG feature, effectively shortening the time for extracting the HOG feature, and thus capable of quickly segmenting and tracking the hand region; Secondly, obtaining the depth information of the video containing the hand can further ensure the clarity of the hand contour, especially in the hand region tracking under complex background, and the tracking efficiency is particularly remarkable.
附图说明DRAWINGS
图1是本申请实施例一提供的快速手部跟踪方法的流程图。FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
图2是本申请实施例二提供的快速手部跟踪方法的流程图。FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
图3是本申请实施例三提供的快速手部跟踪装置的结构图。FIG. 3 is a structural diagram of a fast hand tracking device according to Embodiment 3 of the present application.
图4是本申请实施例四提供的快速手部跟踪装置的结构图。4 is a structural diagram of a fast hand tracking device according to Embodiment 4 of the present application.
图5是本申请实施例五提供的终端的示意图。FIG. 5 is a schematic diagram of a terminal provided in Embodiment 5 of the present application.
具体实施方式detailed description
本申请实施例的快速手部跟踪方法应用在一个或者多个终端中。所述快速手部跟踪方法也可以应用于由终端和通过网络与所述终端进行连接的服务器所构成的硬件环境中。本申请实施例的快速手部跟踪方法可以由服务器来执行,也可以由终端来执行;还可以是由服务器和终端共同执行。The fast hand tracking method of the embodiment of the present application is applied to one or more terminals. The fast hand tracking method can also be applied to a hardware environment composed of a terminal and a server connected to the terminal through a network. The fast hand tracking method of the embodiment of the present application may be executed by a server or by a terminal; or may be performed by a server and a terminal together.
所述对于需要进行快速手部跟踪方法的终端,可以直接在终端上集成本申请的方法所提供的快速手部跟踪功能,或者安装用于实现本申请的方法的客户端。再如,本申请所提供的方法还可以以软件开发工具包(Software Development Kit,SDK)的形式运行在服务器等设备上,以SDK的形式提供快速手部跟踪功能的接口,终端或其他设备通过提供的接口即可实现手部的跟踪。For the terminal that needs to perform the fast hand tracking method, the fast hand tracking function provided by the method of the present application may be directly integrated on the terminal, or the client for implementing the method of the present application may be installed. For example, the method provided by the present application can also be run on a server, such as a software development kit (SDK), to provide an interface for fast hand tracking function in the form of an SDK, and the terminal or other device passes The interface provided enables hand tracking.
实施例一Embodiment 1
图1是本申请实施例一提供的快速手部跟踪方法的流程图。FIG. 1 is a flowchart of a fast hand tracking method according to Embodiment 1 of the present application.
101:在显示界面上显示成像设备采集的包含人体手部区域的视频。101: Display a video of the human hand region collected by the imaging device on the display interface.
本实施例中,所述终端提供一显示界面,所述显示界面用以同步显示成像设备采集的包含人体手部区域的视频。所述成像设备为2D相机。In this embodiment, the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region. The imaging device is a 2D camera.
102:接收用户在所述包含人体手部区域的视频上标定的标定框。102: Receive a calibration frame that is calibrated by the user on the video including the human hand region.
本实施例中,当用户在所述显示界面显示的包含人体手部区域的视频中发现了感兴趣的手部信息时,通过在所述显示界面上添加一个标定框表示标定出的感兴趣的手部信息。In this embodiment, when the user finds the hand information of interest in the video including the human hand region displayed on the display interface, the calibration target is added by adding a calibration box on the display interface. Hand information.
用户可以用手指、触控笔或者其他任何合适的物体触摸所述显示界面,优选为手指触摸所述显示界面并在所述显示界面上添加一个标定框。The user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
103:提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像。103: Extract a gradient direction histogram feature of the calibration area of the calibration frame, and segment the calibration target area according to the gradient direction histogram feature to obtain a hand image.
所述提取所述标定框标定的区域的梯度方向直方图(Histogram Of Gradient,HOG)特征具体过程包括:The specific process of extracting the Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame includes:
11)计算所述标定框标定的区域的各个像素点的梯度信息,所述梯度信息包括梯度幅值及梯度方向;11) calculating gradient information of each pixel of the calibration area of the calibration frame, the gradient information including a gradient amplitude and a gradient direction;
可以采用一维中心[1,0,-1]、一维非中心[-1,1]、一维立方修正[1,-8,-8,-1]、索贝尔(Soble)算子等一阶微分模板分别计算所述标定框标定的区域的各个像素点在水平方向上和垂直方向上的梯度;根据水平方向上的梯度 和垂直方向上的梯度计算该标定框标定的区域的梯度幅值以及梯度方向。One-dimensional center [1,0,-1], one-dimensional non-central [-1,1], one-dimensional cubic correction [1,-8,-8,-1], Soble operator, etc. can be used. The first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
本较佳实施例中,以一维中心[1,0,-1]模板为例计算所述标定框标定的区域的各个像素点的梯度信息。将所述标定框标定的区域记为I(x,y),计算像素点在水平方向和垂直方上向的梯度分别如下式(1-1)所示:In the preferred embodiment, the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example. The area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
Figure PCTCN2018100227-appb-000005
Figure PCTCN2018100227-appb-000005
其中,G h(x,y)和G v(x,y)分别表示像素点(x,y)在水平方向和垂直方向上的梯度值。 Where G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
计算像素点(x,y)的梯度幅值(或称之为梯度强度)以及梯度方向如下式(1-2)所示:Calculate the gradient magnitude (or gradient strength) of the pixel (x, y) and the gradient direction as shown in the following equation (1-2):
Figure PCTCN2018100227-appb-000006
Figure PCTCN2018100227-appb-000006
其中,M(x,y)和θ(x,y)分别表示像素点(x,y)的梯度幅值及梯度方向。Where M(x, y) and θ(x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
进一步地,对于梯度方向的范围限定,一般可以采用无符号的范围,即忽略梯度方向的角度度数的正负级,无符号的梯度方向可用下式(1-3)所示表示:Further, for the range definition of the gradient direction, an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
Figure PCTCN2018100227-appb-000007
Figure PCTCN2018100227-appb-000007
经过式(1-3)的计算后,所述标定框标定的区域的各个像素点的梯度方向限定为0度至180度。After the calculation of the formula (1-3), the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
12)将所述标定框标定的区域划分为多个块,每个块划分为多个细胞单元,每个细胞单元包括多个像素点;12) dividing the area marked by the calibration frame into a plurality of blocks, each block being divided into a plurality of cell units, each cell unit including a plurality of pixel points;
本实施例中,所述细胞单元的尺寸为8*8像素,相邻的细胞单元之间不重叠。In this embodiment, the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
举例而言,假设所述标定框标定的区域I(x,y)大小为64*128,设定每个块的大小为16*16,每个细胞单元的大小为8*8,则所述标定框标定的区域可以划分为105个块,每个块包括4个细胞单元,每个细胞单元包括64个像素点。For example, assuming that the size of the calibration area I (x, y) is 64*128, and the size of each block is 16*16, and the size of each cell unit is 8*8, The calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
本实施例中使用不重叠的方式划分细胞单元,可以使得计算每个块中的梯度方向直方图速度更快。Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
13)对每个细胞单元中的各个像素的梯度信息进行量化处理,得到所述标定框标定的区域的梯度直方图;13) performing quantization processing on gradient information of each pixel in each cell unit to obtain a gradient histogram of the region of the calibration frame calibration;
本实施例中,首先将每个细胞单元的各个像素点的梯度方向划分为9个bin(9个方向通道),该9个bin作为梯度直方图的横轴,分别是[0°,20°]、[20°,40°]、[40°,40°]、[40°,80°]、[80°,100°]、[100°,120°]、[120°,140°]、[140°,140°]、[140°,180°];然后将每个bin所对应的像素点的梯度幅值进行累加作为梯度直方图的纵轴。In this embodiment, the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
14)对每个块的梯度直方图进行归一化处理,得到每个块的梯度直方图归一化结果;14) normalizing the gradient histogram of each block to obtain a gradient histogram normalization result of each block;
本较佳实施例中,可以采用归一化函数对每个块的梯度直方图进行归一化,所述归一化函数可以是L2范数、L1范数。In the preferred embodiment, the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
由于局部光照的变化以及前景/背景对比度的变化,使得像素点的梯度幅值的变化范围非常大,归一化能够对光照、阴影和边缘进行压缩,使得梯度方向直方图特征向量空间对光照、阴影和边缘变化具有鲁棒性。Due to the change of local illumination and the change of foreground/background contrast, the gradient amplitude of the pixel points varies greatly. Normalization can compress the illumination, shadow and edge, so that the gradient direction histogram feature vector space is illuminated, Shadow and edge changes are robust.
15)将所有块的梯度直方图归一化结果进行连接,得到所述标定框标定的区域最终的HOG特征;15) normalizing the gradient histogram normalization results of all the blocks to obtain the final HOG feature of the region calibrated by the calibration frame;
16)根据所述最终的HOG特征,将手部区域从所述标定框标定的区域中分割出来。16) Segmenting the hand region from the region of the calibration frame according to the final HOG feature.
104:利用连续自适应数学期望移动算子对所述手部图像进行跟踪。104: Tracking the hand image with a continuous adaptive math expectation movement operator.
本实施例中,连续自适应数学期望移动(Continuously Adaptive Mean Shift,CamShift)算法,是一种基于颜色信息的方法,能够利用目标的特定颜色进行跟踪,自动调节搜索窗的大小和位置,定位被跟踪目标的大小和中心,并把前一帧的结果(即搜索窗大小和质心)作为下一帧目标在图像中的大小和质心。In this embodiment, the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:The tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
21)将所述手部图像的色彩空间转换到HSV(Hue色度,Saturation饱和度,Value纯度)色彩空间,分离出色调H分量的手部图像;21) converting the color space of the hand image to an HSV (Hue chromaticity, Saturation saturation, Value purity) color space, and separating the hand image of the hue H component;
22)基于色调H分量的手部图像,初始化搜索窗W的质心位置和大小S;22) based on the hand image of the hue H component, initializing the centroid position and size S of the search window W;
23)计算当前搜索窗的阶矩;23) calculating the moment of the current search window;
根据式(1-4)计算当前搜索窗的零阶矩,根据式(1-5)计算当前搜索窗的一阶矩。The zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
Figure PCTCN2018100227-appb-000008
Figure PCTCN2018100227-appb-000008
Figure PCTCN2018100227-appb-000009
Figure PCTCN2018100227-appb-000009
24)根据当前搜索窗的阶矩计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00) 24) Calculate the centroid position of the current search window according to the moment of the current search window (M 10 /M 00 , M 01 /M 00 )
25)根据当前搜索窗的阶矩计算当前搜索窗
Figure PCTCN2018100227-appb-000010
25) Calculate the current search window according to the moment of the current search window
Figure PCTCN2018100227-appb-000010
比较当前计算的搜索窗与预设搜索窗阈值之间的关系,当当前计算的搜索窗大于或等于所述预设搜索窗阈值时,重复执行上述步骤21)-25);当当前计算的搜索窗小于所述预设搜索窗阈值时,则结束跟踪,此时搜索窗的质心所在的位置就是跟踪目标的当前位置。Comparing the relationship between the currently calculated search window and the preset search window threshold, when the currently calculated search window is greater than or equal to the preset search window threshold, repeating the above steps 21)-25); when the currently calculated search When the window is smaller than the preset search window threshold, the tracking ends. At this time, the position of the center of the search window is the current position of the tracking target.
综上所述,本申请所述的快速手部跟踪方法,由用户对所述包含人体手部区域的视频中感兴趣的手部信息用标定框标定后,再提取所述标定框标定的区域的HOG特征,根据所述HOG特征将手部区域从所述标定框标定的区域中分割出来。因而,仅需计算所述标定框标定的区域中的HOG特征,相较于计算整个包含人体手部区域的视频图像,本申请通过接收用户标定的标定框,能减少提取HOG特征的区域面积,从而有效的缩短提取HOG特征的时间,因而能够快速的将手部区域从所述包含人体手部区域的视频中分割出来。In summary, the fast hand tracking method described in the present application is performed by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration area of the calibration frame. The HOG feature segments the hand region from the calibration area of the calibration frame based on the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated. Compared with the calculation of the entire video image including the human hand region, the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
另外,由于所述标定框标定的区域中的各个像素的梯度信息是以细胞单元为处理单元,因而计算得到的HOG特征能保持手部区域的几何和光学特性;其次,分块分细胞单元的计算处理方式,可使得手部区域各个像素点之间的关系能够得到很好的表征;最后采取归一化处理,可以部分抵消光照变 化带来的影响,进而保证了提取出的手部区域的清晰度,将手部区域准确地分割出来。In addition, since the gradient information of each pixel in the calibration area of the calibration frame is a cell unit as a processing unit, the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit The calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
实施例二Embodiment 2
图2是本申请实施例二提供的快速手部跟踪方法的流程图。FIG. 2 is a flowchart of a fast hand tracking method according to Embodiment 2 of the present application.
201:在显示界面上显示成像设备采集的包含人体手部区域的视频,同时以预先设置的显示方式显示预先设置的标准标定框。201: Display a video of the human hand region collected by the imaging device on the display interface, and display a preset standard calibration frame in a preset display manner.
本实施例中,所述终端提供一显示界面,所述显示界面用以同步显示成像设备采集的包含人体手部区域的视频,所述显示界面还同时显示标准标定框。In this embodiment, the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
所述成像设备为3D深度相机,所述3D深度相机与2D相机不同之处在于3D深度相机可同时拍摄景物的灰阶影像资讯及包含深度信息的3维资讯。当采用所述3D深度相机采集到包含人体手部区域的视频之后,在终端的显示界面上同步显示所述包含人体手部区域的视频。The imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information. After the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
本实施例中,所述预先设置的标准标定框供用户在所显示的包含人体手部区域的视频上进行标定以获得感兴趣的手部信息。In this embodiment, the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
所述预先设置的显示方式包括以下一种或多种的组合:The preset display manner includes one or a combination of the following:
1)接收到显示指令时,显示所述预先设置的标准标定框;1) when the display instruction is received, displaying the preset standard calibration frame;
所述显示指令对应用户输入的显示操作,所述用户输入的显示操作包括,但不限于:点击显示界面任意位置,或者触摸显示界面任意位置的时间超过第一预设时间段(例如,1秒),或者发出第一预设语音(例如,“标定框”)等。The display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
当侦测到用户在所述显示界面上执行了点击操作,或者当侦测到用户在所述显示界面上执行的触摸操作时间超过预设时间,或者当侦测到用户发出了所述第一预设语音时,所述终端确定接收到了显示指令,显示所述预先设置的标准标定框。When it is detected that the user performs a click operation on the display interface, or when detecting that the touch operation performed by the user on the display interface exceeds a preset time, or when detecting that the user issues the first When the voice is preset, the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
2)接收到隐藏指令时,隐藏所述预先设置的标准标定框;2) hiding the preset standard calibration frame when receiving the hidden instruction;
所述隐藏指令对应用户输入的隐藏操作,所述用户输入的隐藏操作包括,但不限于:点击显示界面任意位置,或者触摸显示界面任意位置的时间超过第二预设时间段(例如,2秒),或者发出第二预设语音(例如,“退出”)等。The hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
当侦测到用户在所述显示界面上执行了点击操作,或者当侦测到用户在所述显示界面上执行的触摸操作时间超过第二预设时间,或者当侦测到用户发出了第二预设语音时,所述终端确定接收到了隐藏指令,隐藏所述预先设置的标准标定框。When it is detected that the user performs a click operation on the display interface, or when detecting that the touch operation performed by the user on the display interface exceeds a second preset time, or when detecting that the user issues a second When the voice is preset, the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
所述隐藏指令可以与所述显示指令相同,也可以不同。第一预设时间段可以与第二预设时间段相同,也可以不相同。优选地,所述第一预设时间段小与所述第二预设时间段,设置较短的第一预设时间段,能够快速的显示所述预先设置的标准标定框,设置较长的第二预设时间段,能够避免用户无意识或者操作失误造成的隐藏所述预先设置的标准标定框的情况发生。The hidden instruction may be the same as or different from the display instruction. The first preset time period may be the same as or different from the second preset time period. Preferably, the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long. The second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
接收到显示指令时显示所述预先设置的标准标定框,能够使得显示界面在显示所述包含人体手部区域的视频时,用户能够对感兴趣的手部区域进行标定;同时在没有接收到所述显示指令时,不显示所述预先设置的标准标定框,或者接收到所述隐藏指令隐藏所述预先设置的标准标定框,能够避免显示的所述包含人体手部区域的视频长时间被所述预先设置的标准标定框遮挡,从而造成了重要信息的遗漏或者给用户在查看所述包含人体手部区域的视频时带来视觉上的不适感。Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time. The preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
3)在接收到所述显示指令显示所述预先设置的标准标定框,且之后没有 接收到任何指令的时间超过第三预设时间段时,自动隐藏所述预先设置的标准标定框。3) automatically resetting the preset standard calibration frame when receiving the display instruction to display the preset standard calibration frame, and the time after receiving no instruction exceeds the third preset time period.
在显示所述预先设置的标准标定框后,当用户不再输入任何操作且超过所述第三预设时间段时,自动将预先设置的标准标定框进行隐藏,可以避免用户在无意识间触发了显示指令而长时间显示预先设置的标准标定框的情况发生,其次,自动将预先设置的标准标定框进行隐藏,也有助于提升用户的交互体验。After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously. The display of the command and the display of the preset standard calibration frame for a long time occurs. Secondly, the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
本实施例中,所述预先设置的标准标定框可以为圆形、椭圆形、长方形、正方形等。In this embodiment, the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
202:接收用户在所述包含人体手部区域的视频上标定的标准标定框。202: Receive a standard calibration frame that is calibrated by the user on the video including the human hand region.
本实施例中,当用户在所述显示界面显示的包含人体手部区域的视频中发现了感兴趣的手部信息时,通过在所述显示界面上添加一个标准标定框表示标定出的感兴趣的手部信息。In this embodiment, when the user finds the hand information of interest in the video including the human hand region displayed on the display interface, the calibration is performed by adding a standard calibration box on the display interface. Hand information.
本实施例中,所述接收用户在所述包含人体手部区域的视频上标定的标准标定框包括以下两种情形:In this embodiment, the standard calibration frame that is received by the receiving user on the video including the human hand region includes the following two situations:
第一种情形:接收用户在所述包含人体手部区域的视频中画出的粗略标定框;通过模糊匹配的方法匹配出与所述粗略标定框相对应的预先设置的标准标定框;根据匹配出的标准标定框对所述包含人体手部区域的视频中进行标定并显示标定的标准标定框,其中,所述粗略标定框的几何中心与所匹配出的标准标定框的几何中心相同。a first case: receiving a rough calibration frame drawn by the user in the video containing the human hand region; matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method; The standard calibration frame is calibrated to the video containing the human hand region and displays a calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame.
本实施例中,由于用户通过手指在所述显示界面上画出的标定框的形状并非规范或者标准,例如,用户画出的圆形的标定框并不是十分精准,因而当终端接收到用户画出的大致的粗略标定框的形状后,根据所述粗略标定框的大致形状匹配出相对应的预先设置的标准标定框的形状。通过模糊匹配的方法匹配出对应的标准标定框,便于后续对该标定框标定的区域进行裁剪。In this embodiment, since the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing. After roughly sketching the shape of the frame, the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame. The corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
第二种情形:直接接收用户选取的标准标定框,根据所述标准标定框在所述包含人体手部区域的视频上进行标定并显示标定的标准标定框。The second case: directly receiving the standard calibration frame selected by the user, and performing calibration on the video containing the human hand region according to the standard calibration frame and displaying the calibration standard calibration frame.
本实施例中,用户输入显示操作触发显示指令,从而显示预先设置的多个标准标定框,用户触摸标准标定框,终端侦测到标准标定框上的触摸信号后,确定该标准标定框被选取。用户移动被选取的标准标定框并将其拖拽在所述包含人体手部区域的视频上,终端即在所述包含人体手部区域的视频上显示被拖拽的标准标定框。In this embodiment, the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. . The user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
优选地,所述步骤202还可以包括:接收到放大、缩小、移动、删除的指令时,对显示的标准标定框进行放大、缩小、移动、删除。Preferably, the step 202 may further include: when receiving the instructions of zooming in, zooming out, moving, and deleting, zooming in, zooming out, moving, and deleting the displayed standard calibration frame.
203:对所述标准标定框标定的区域进行预处理。203: Pre-processing the area marked by the standard calibration frame.
本实施例中,所述预处理可以包括以下一种或多种的组合:灰度化处理,校正处理。In this embodiment, the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
所述灰度化处理是指将所述标准标定框标定的区域图像转化为灰度图像,因为色彩信息对提取梯度方向直方图特征影响不大,因而所述标准标定框标定的区域图像转化为灰度图像,既不会影响后续计算所述标准标定框标定的区域的各个像素点的梯度信息,还可以减少各个像素点的梯度信息的计算量。The grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
所述校正处理可以采用伽马(Gamma)校正,因在图像的纹理强度中,局部的表层曝光贡献的比重较大,经过Gamma校正处理后的图像能够有效地降低局部的阴影和光照变化。The correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
204:提取所述预处理后的所述标准标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标准标定框标定的区域进行分 割得到手部图像。204: Extract a gradient direction histogram feature of the pre-processed calibration area of the standard calibration frame, and divide the area of the standard calibration frame according to the gradient direction histogram feature to obtain a hand image.
本实施例中所述的步骤204同实施例一中所述的步骤103,在此不再详细赘述。The step 204 described in this embodiment is the same as the step 103 described in the first embodiment, and details are not described herein again.
205:利用连续自适应数学期望移动算子对所述手部图像进行跟踪。205: Tracking the hand image with a continuous adaptive math expectation movement operator.
本实施例中所述的步骤205同实施例一中所述的步骤104,在此不再详细赘述。The step 205 described in this embodiment is the same as the step 104 described in the first embodiment, and details are not described herein again.
进一步地,为了充分利用深度信息,在所述步骤204之后,在所述步骤205之前,所述方法还包括:获取所述标定框标定的区域对应的包含人体手部区域的视频中的深度信息,根据所述深度信息对所述手部图像进行规范化。Further, in order to make full use of the depth information, after the step 204, before the step 205, the method further includes: acquiring the depth information in the video including the human hand region corresponding to the calibration area of the calibration frame. And normalizing the hand image according to the depth information.
所述深度信息为从所述3D深度相机中获取。所述根据所述深度信息对所述手部图像进行规范化的具体过程为:将从第一次的标准标定框标定的区域分割得到的手部图像的尺寸记为标准尺寸S1,第一次的标定框标定的区域对应的景深信息为记为标准景深H1;当前的标准标定框标定的区域分割得到的手部图像的尺寸记为S2,当前的标定框标定的区域对应的景深信息记为H2,对当前的标定框标定的区域分割得到的手部图像进行规范化为S2*(H2/H1)。The depth information is obtained from the 3D depth camera. The specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2. The hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
对所述手部图像的尺寸进行规范化,是为了使得最终提取出的HOG特征表示具有统一的批判标准,即具有相同的量纲,提高手部跟踪的准确度。The size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
综上所述,本申请所述的快速手部跟踪方法,提供两种标准标定框对所述包含人体手部区域的视频进行标定,能够使得用户标定的标定框为标准标定框,进而分割得到的手部区域的形状是标准的,基于该分割的标准的标定框进行手部跟踪效果更佳。In summary, the fast hand tracking method described in the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration is a standard calibration frame, and then the segmentation is obtained. The shape of the hand area is standard, and the hand tracking effect is better based on the standard calibration frame of the division.
需要说明的是,本申请所述的快速动态手部跟踪方法可以适用于单个手部的跟踪,也可以适用于多个手部的跟踪。对于多个手部的跟踪,采用并行跟踪的方法进行跟踪,其实质为多个单一的手部跟踪的过程,在此不进行详细描述,任何采用本申请的思想进行手部跟踪的方法都应包含在本申请的范围内。It should be noted that the fast dynamic hand tracking method described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands. For the tracking of multiple hands, the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any method that uses the idea of this application for hand tracking should be It is included in the scope of this application.
以上对图1和图2中步骤的描述,根据不同的需求,图1和图2流程图中的执行顺序可以改变,某些步骤可以省略。The above description of the steps in FIG. 1 and FIG. 2, the order of execution in the flowcharts of FIGS. 1 and 2 may be changed according to different requirements, and some steps may be omitted.
下面结合第3至5图,分别对实现上述快速手部跟踪方法的终端的功能模块及硬件结构进行介绍。The function modules and hardware structures of the terminal for implementing the above fast hand tracking method are respectively described below with reference to the third to fifth figures.
实施例三Embodiment 3
图3为本申请快速手部跟踪装置较佳实施例中的功能模块图。3 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
在一些实施例中,所述快速手部跟踪装置30运行于终端中。所述快速手部跟踪装置30可以包括多个由程序代码段所组成的功能模块。所述快速手部跟踪装置30中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行(详见图1及其相关描述)对手部区域的跟踪。In some embodiments, the fast hand tracking device 30 operates in a terminal. The fast hand tracking device 30 can include a plurality of functional modules comprised of program code segments. The program code for each of the program segments in the fast hand tracking device 30 can be stored in a memory and executed by at least one processor to perform (see Figure 1 and its associated description) tracking of the hand region.
本实施例中,所述终端的快速手部跟踪装置30根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:显示模块301、标定模块302、分割模块303及跟踪模块304。In this embodiment, the fast hand tracking device 30 of the terminal can be divided into multiple functional modules according to the functions performed by the terminal. The function module may include: a display module 301, a calibration module 302, a segmentation module 303, and a tracking module 304.
显示模块301,用于在显示界面上显示成像设备采集的包含人体手部区域的视频。The display module 301 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region.
本实施例中,所述终端提供一显示界面,所述显示界面用以同步显示成像设备采集的包含人体手部区域的视频。所述成像设备为2D相机。In this embodiment, the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region. The imaging device is a 2D camera.
标定模块302,用于接收用户在所述包含人体手部区域的视频上标定的标定框。The calibration module 302 is configured to receive a calibration frame that is calibrated by the user on the video including the human hand region.
本实施例中,当用户在所述显示界面显示的包含人体手部区域的视频中 发现了感兴趣的手部信息时,通过在所述显示界面上添加一个标定框表示标定出的感兴趣的手部信息。In this embodiment, when the user finds the hand information of interest in the video including the human hand region displayed on the display interface, the calibration target is added by adding a calibration box on the display interface. Hand information.
用户可以用手指、触控笔或者其他任何合适的物体触摸所述显示界面,优选为手指触摸所述显示界面并在所述显示界面上添加一个标定框。The user can touch the display interface with a finger, a stylus or any other suitable object, preferably a finger touching the display interface and adding a calibration frame to the display interface.
分割模块303,用于提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像。The segmentation module 303 is configured to extract a gradient direction histogram feature of the calibration frame calibration region, and divide the calibration frame calibration region according to the gradient direction histogram feature to obtain a hand image.
所述分割模块303提取所述标定框标定的区域的梯度方向直方图(Histogram Of Gradient,HOG)特征具体包括:The segmentation module 303 extracts a Histogram Of Gradient (HOG) feature of the calibration area of the calibration frame, and specifically includes:
11)计算所述标定框标定的区域的各个像素点的梯度信息,所述梯度信息包括梯度幅值及梯度方向;11) calculating gradient information of each pixel of the calibration area of the calibration frame, the gradient information including a gradient amplitude and a gradient direction;
可以采用一维中心[1,0,-1]、一维非中心[-1,1]、一维立方修正[1,-8,-8,-1]、索贝尔(Soble)算子等一阶微分模板分别计算所述标定框标定的区域的各个像素点在水平方向上和垂直方向上的梯度;根据水平方向上的梯度和垂直方向上的梯度计算该标定框标定的区域的梯度幅值以及梯度方向。One-dimensional center [1,0,-1], one-dimensional non-central [-1,1], one-dimensional cubic correction [1,-8,-8,-1], Soble operator, etc. can be used. The first-order differential template respectively calculates gradients of horizontal and vertical directions of respective pixels of the calibration area of the calibration frame; and calculates a gradient width of the calibration area of the calibration frame according to the gradient in the horizontal direction and the gradient in the vertical direction Value and gradient direction.
本较佳实施例中,以一维中心[1,0,-1]模板为例计算所述标定框标定的区域的各个像素点的梯度信息。将所述标定框标定的区域记为I(x,y),计算像素点在水平方向和垂直方上向的梯度分别如下式(1-1)所示:In the preferred embodiment, the gradient information of each pixel of the calibration area of the calibration frame is calculated by taking a one-dimensional center [1, 0, -1] template as an example. The area marked by the calibration frame is denoted as I(x, y), and the gradients of the calculated pixel points in the horizontal direction and the vertical direction are respectively as shown in the following formula (1-1):
Figure PCTCN2018100227-appb-000011
Figure PCTCN2018100227-appb-000011
其中,G h(x,y)和G v(x,y)分别表示像素点(x,y)在水平方向和垂直方向上的梯度值。 Where G h (x, y) and G v (x, y) represent gradient values of the pixel points (x, y) in the horizontal direction and the vertical direction, respectively.
计算像素点(x,y)的梯度幅值(或称之为梯度强度)以及梯度方向如下式(1-2)所示:Calculate the gradient magnitude (or gradient strength) of the pixel (x, y) and the gradient direction as shown in the following equation (1-2):
Figure PCTCN2018100227-appb-000012
Figure PCTCN2018100227-appb-000012
其中,M(x,y)和θ(x,y)分别表示像素点(x,y)的梯度幅值及梯度方向。Where M(x, y) and θ(x, y) represent the gradient magnitude and gradient direction of the pixel point (x, y), respectively.
进一步地,对于梯度方向的范围限定,一般可以采用无符号的范围,即忽略梯度方向的角度度数的正负级,无符号的梯度方向可用下式(1-3)所示表示:Further, for the range definition of the gradient direction, an unsigned range can be generally used, that is, the positive and negative levels of the angular degree of the gradient direction are ignored, and the unsigned gradient direction can be expressed by the following formula (1-3):
Figure PCTCN2018100227-appb-000013
Figure PCTCN2018100227-appb-000013
经过式(1-3)的计算后,所述标定框标定的区域的各个像素点的梯度方向限定为0度至180度。After the calculation of the formula (1-3), the gradient direction of each pixel of the region to which the calibration frame is calibrated is limited to 0 to 180 degrees.
12)将所述标定框标定的区域划分为多个块,每个块划分为多个细胞单元,每个细胞单元包括多个像素点;12) dividing the area marked by the calibration frame into a plurality of blocks, each block being divided into a plurality of cell units, each cell unit including a plurality of pixel points;
本实施例中,所述细胞单元的尺寸为8*8像素,相邻的细胞单元之间不重叠。In this embodiment, the size of the cell unit is 8*8 pixels, and adjacent cell units do not overlap.
举例而言,假设所述标定框标定的区域I(x,y)大小为64*128,设定每个块的大小为16*16,每个细胞单元的大小为8*8,则所述标定框标定的区域可以划分为105个块,每个块包括4个细胞单元,每个细胞单元包括64个像素 点。For example, assuming that the size of the calibration area I (x, y) is 64*128, and the size of each block is 16*16, and the size of each cell unit is 8*8, The calibration framed area can be divided into 105 blocks, each block comprising 4 cell units, each cell unit comprising 64 pixel points.
本实施例中使用不重叠的方式划分细胞单元,可以使得计算每个块中的梯度方向直方图速度更快。Dividing the cell units in a non-overlapping manner in this embodiment makes it possible to calculate the gradient direction histogram in each block faster.
13)对每个细胞单元中的各个像素的梯度信息进行量化处理,得到所述标定框标定的区域的梯度直方图;13) performing quantization processing on gradient information of each pixel in each cell unit to obtain a gradient histogram of the region of the calibration frame calibration;
本实施例中,首先将每个细胞单元的各个像素点的梯度方向划分为9个bin(9个方向通道),该9个bin作为梯度直方图的横轴,分别是[0°,20°]、[20°,40°]、[40°,40°]、[40°,80°]、[80°,100°]、[100°,120°]、[120°,140°]、[140°,140°]、[140°,180°];然后将每个bin所对应的像素点的梯度幅值进行累加作为梯度直方图的纵轴。In this embodiment, the gradient direction of each pixel of each cell unit is first divided into 9 bins (9 directional channels), and the 9 bins are the horizontal axes of the gradient histograms, which are [0°, 20°, respectively. ], [20°, 40°], [40°, 40°], [40°, 80°], [80°, 100°], [100°, 120°], [120°, 140°], [140°, 140°], [140°, 180°]; then the gradient magnitudes of the pixels corresponding to each bin are accumulated as the vertical axis of the gradient histogram.
14)对每个块的梯度直方图进行归一化处理,得到每个块的梯度直方图归一化结果;14) normalizing the gradient histogram of each block to obtain a gradient histogram normalization result of each block;
本较佳实施例中,可以采用归一化函数对每个块的梯度直方图进行归一化,所述归一化函数可以是L2范数、L1范数。In the preferred embodiment, the gradient histogram of each block can be normalized using a normalization function, which can be an L2 norm, an L1 norm.
由于局部光照的变化以及前景/背景对比度的变化,使得像素点的梯度幅值的变化范围非常大,归一化能够对光照、阴影和边缘进行压缩,使得梯度方向直方图特征向量空间对光照、阴影和边缘变化具有鲁棒性。Due to the change of local illumination and the change of foreground/background contrast, the gradient amplitude of the pixel points varies greatly. Normalization can compress the illumination, shadow and edge, so that the gradient direction histogram feature vector space is illuminated, Shadow and edge changes are robust.
15)将所有块的梯度直方图归一化结果进行连接,得到所述标定框标定的区域最终的HOG特征;15) normalizing the gradient histogram normalization results of all the blocks to obtain the final HOG feature of the region calibrated by the calibration frame;
16)根据所述最终的HOG特征,将手部区域从所述标定框标定的区域中分割出来。16) Segmenting the hand region from the region of the calibration frame according to the final HOG feature.
跟踪模块304,用于利用连续自适应数学期望移动算子对所述手部图像进行跟踪。The tracking module 304 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
本实施例中,连续自适应数学期望移动(Continuously Adaptive Mean Shift,CamShift)算法,是一种基于颜色信息的方法,能够利用目标的特定颜色进行跟踪,自动调节搜索窗的大小和位置,定位被跟踪目标的大小和中心,并把前一帧的结果(即搜索窗大小和质心)作为下一帧目标在图像中的大小和质心。In this embodiment, the Continuously Adaptive Mean Shift (CamShift) algorithm is a method based on color information, which can track the specific color of the target, automatically adjust the size and position of the search window, and locate the position. Track the size and center of the target and take the result of the previous frame (ie search window size and centroid) as the size and centroid of the next frame target in the image.
所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:The tracking of the hand image by using the continuous adaptive math expectation mobile operator specifically includes:
21)将所述手部图像的色彩空间转换到HSV(Hue色度,Saturation饱和度,Value纯度)色彩空间,分离出色调H分量的手部图像;21) converting the color space of the hand image to an HSV (Hue chromaticity, Saturation saturation, Value purity) color space, and separating the hand image of the hue H component;
22)基于色调H分量的手部图像,初始化搜索窗W的质心位置和大小S;22) based on the hand image of the hue H component, initializing the centroid position and size S of the search window W;
23)计算当前搜索窗的阶矩;23) calculating the moment of the current search window;
根据式(1-4)计算当前搜索窗的零阶矩,根据式(1-5)计算当前搜索窗的一阶矩。The zeroth moment of the current search window is calculated according to equation (1-4), and the first moment of the current search window is calculated according to equation (1-5).
Figure PCTCN2018100227-appb-000014
Figure PCTCN2018100227-appb-000014
Figure PCTCN2018100227-appb-000015
Figure PCTCN2018100227-appb-000015
24)根据当前搜索窗的阶矩计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00) 24) Calculate the centroid position of the current search window according to the moment of the current search window (M 10 /M 00 , M 01 /M 00 )
25)根据当前搜索窗的阶矩计算当前搜索窗的大小
Figure PCTCN2018100227-appb-000016
25) Calculate the size of the current search window according to the moment of the current search window
Figure PCTCN2018100227-appb-000016
比较当前计算的搜索窗与预设搜索窗阈值之间的关系,当当前计算的搜 索窗大于或等于所述预设搜索窗阈值时,重复执行上述步骤21)-25);当当前计算的搜索窗小于所述预设搜索窗阈值时,则结束跟踪,此时搜索窗的质心所在的位置就是跟踪目标的当前位置。Comparing the relationship between the currently calculated search window and the preset search window threshold, when the currently calculated search window is greater than or equal to the preset search window threshold, repeating the above steps 21)-25); when the currently calculated search When the window is smaller than the preset search window threshold, the tracking ends. At this time, the position of the center of the search window is the current position of the tracking target.
综上所述,本申请所述的快速手部跟踪装置30,由用户对所述包含人体手部区域的视频中感兴趣的手部信息用标定框标定后,再提取所述标定框标定的区域的HOG特征,根据所述HOG特征将手部区域从所述标定框标定的区域中分割出来。因而,仅需计算所述标定框标定的区域中的HOG特征,相较于计算整个包含人体手部区域的视频图像,本申请通过接收用户标定的标定框,能减少提取HOG特征的区域面积,从而有效的缩短提取HOG特征的时间,因而能够快速的将手部区域从所述包含人体手部区域的视频中分割出来。In summary, the fast hand tracking device 30 of the present application is configured by the user to calibrate the hand information of interest in the video containing the human hand region, and then extract the calibration of the calibration frame. The HOG feature of the region, the hand region is segmented from the region of the calibration frame according to the HOG feature. Therefore, only the HOG feature in the calibration area of the calibration frame needs to be calculated. Compared with the calculation of the entire video image including the human hand region, the present application can reduce the area of the region in which the HOG feature is extracted by receiving the calibration frame of the user calibration. Thereby, the time for extracting the HOG feature is effectively shortened, and thus the hand region can be quickly separated from the video containing the human hand region.
另外,由于所述标定框标定的区域中的各个像素的梯度信息是以细胞单元为处理单元,因而计算得到的HOG特征能保持手部区域的几何和光学特性;其次,分块分细胞单元的计算处理方式,可使得手部区域各个像素点之间的关系能够得到很好的表征;最后采取归一化处理,可以部分抵消光照变化带来的影响,进而保证了提取出的手部区域的清晰度,将手部区域准确地分割出来。In addition, since the gradient information of each pixel in the calibration area of the calibration frame is a cell unit as a processing unit, the calculated HOG feature can maintain the geometric and optical characteristics of the hand region; secondly, the cell division unit The calculation processing method can make the relationship between the pixel points in the hand area be well characterized; finally, the normalization process can partially offset the influence of the illumination change, thereby ensuring the extracted hand region. Sharpness, accurately segmenting the hand area.
实施例四Embodiment 4
图4为本申请快速手部跟踪装置的较佳实施例中的功能模块图。4 is a functional block diagram of a preferred embodiment of the fast hand tracking device of the present application.
在一些实施例中,所述快速手部跟踪装置40运行于终端中。所述快速手部跟踪装置40可以包括多个由程序代码段所组成的功能模块。所述快速手部跟踪装置40中的各个程序段的程序代码可以存储于存储器中,并由至少一个处理器所执行,以执行(详见图2及其相关描述)对手部区域的跟踪。In some embodiments, the fast hand tracking device 40 operates in a terminal. The fast hand tracking device 40 can include a plurality of functional modules comprised of program code segments. The program code for each of the program segments in the fast hand tracking device 40 can be stored in a memory and executed by at least one processor to perform (see Figure 2 and its associated description) tracking of the hand region.
本实施例中,所述终端的快速手部跟踪装置根据其所执行的功能,可以被划分为多个功能模块。所述功能模块可以包括:显示模块401、标定模块402、预处理模块403、分割模块404、跟踪模块405及规范化模块406。In this embodiment, the fast hand tracking device of the terminal may be divided into multiple functional modules according to the functions performed by the terminal. The function module may include: a display module 401, a calibration module 402, a pre-processing module 403, a segmentation module 404, a tracking module 405, and a normalization module 406.
显示模块401包括:第一显示子模块4010及第二显示子模块4012。其中,所述第一显示子模块4010用于在显示界面上显示成像设备采集的包含人体手部区域的视频,所述第二显示子模块4012用于以预先设置的显示方式显示预先设置的标准标定框。The display module 401 includes: a first display sub-module 4010 and a second display sub-module 4012. The first display sub-module 4010 is configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region, and the second display sub-module 4012 is configured to display a preset standard in a preset display manner. Calibration box.
本实施例中,所述终端提供一显示界面,所述显示界面用以同步显示成像设备采集的包含人体手部区域的视频,所述显示界面还同时显示标准标定框。In this embodiment, the terminal provides a display interface, and the display interface is used to synchronously display a video that is collected by the imaging device and includes a human hand region, and the display interface also displays a standard calibration frame.
所述成像设备为3D深度相机,所述3D深度相机与2D相机不同之处在于3D深度相机可同时拍摄景物的灰阶影像资讯及包含深度信息的3维资讯。当采用所述3D深度相机采集到包含人体手部区域的视频之后,在终端的显示界面上同步显示所述包含人体手部区域的视频。The imaging device is a 3D depth camera, and the 3D depth camera is different from the 2D camera in that the 3D depth camera can simultaneously capture grayscale image information of the scene and 3-dimensional information including depth information. After the video containing the human hand region is acquired by the 3D depth camera, the video including the human hand region is synchronously displayed on the display interface of the terminal.
本实施例中,所述预先设置的标准标定框供用户在所显示的包含人体手部区域的视频上进行标定以获得感兴趣的手部信息。In this embodiment, the preset standard calibration frame is provided for the user to perform calibration on the displayed video containing the human hand region to obtain the hand information of interest.
所述预先设置的显示方式包括以下一种或多种的组合:The preset display manner includes one or a combination of the following:
1)接收到显示指令时,显示所述预先设置的标准标定框;1) when the display instruction is received, displaying the preset standard calibration frame;
所述显示指令对应用户输入的显示操作,所述用户输入的显示操作包括,但不限于:点击显示界面任意位置,或者触摸显示界面任意位置的时间超过第一预设时间段(例如,1秒),或者发出第一预设语音(例如,“标定框”)等。The display instruction corresponds to a display operation input by the user, and the display operation input by the user includes, but is not limited to, clicking an arbitrary position of the display interface, or touching the arbitrary position of the display interface for more than a first preset time period (for example, 1 second) ), or issue a first preset voice (for example, "calibration box").
当侦测到用户在所述显示界面上执行了点击操作,或者当侦测到用户在所述显示界面上执行的触摸操作时间超过预设时间,或者当侦测到用户发出 了所述第一预设语音时,所述终端确定接收到了显示指令,显示所述预先设置的标准标定框。When it is detected that the user performs a click operation on the display interface, or when detecting that the touch operation performed by the user on the display interface exceeds a preset time, or when detecting that the user issues the first When the voice is preset, the terminal determines that the display instruction is received, and displays the preset standard calibration frame.
2)接收到隐藏指令时,隐藏所述预先设置的标准标定框;2) hiding the preset standard calibration frame when receiving the hidden instruction;
所述隐藏指令对应用户输入的隐藏操作,所述用户输入的隐藏操作包括,但不限于:点击显示界面任意位置,或者触摸显示界面任意位置的时间超过第二预设时间段(例如,2秒),或者发出第二预设语音(例如,“退出”)等。The hidden instruction corresponds to a hidden operation input by the user, and the hidden operation input by the user includes, but is not limited to, clicking on any position of the display interface, or touching the arbitrary position of the display interface for more than a second preset time period (for example, 2 seconds) ), or issue a second preset voice (for example, "exit").
当侦测到用户在所述显示界面上执行了点击操作,或者当侦测到用户在所述显示界面上执行的触摸操作时间超过第二预设时间,或者当侦测到用户发出了第二预设语音时,所述终端确定接收到了隐藏指令,隐藏所述预先设置的标准标定框。When it is detected that the user performs a click operation on the display interface, or when detecting that the touch operation performed by the user on the display interface exceeds a second preset time, or when detecting that the user issues a second When the voice is preset, the terminal determines that a hidden command is received, and the preset standard calibration frame is hidden.
所述隐藏指令可以与所述显示指令相同,也可以不同。所述第一预设时间段可以与所述第二预设时间段相同,也可以不相同。优选地,所述第一预设时间段小与所述第二预设时间段,设置较短的第一预设时间段,能够快速的显示所述预先设置的标准标定框,设置较长的第二预设时间段,能够避免用户无意识或者操作失误造成的隐藏所述预先设置的标准标定框的情况发生。The hidden instruction may be the same as or different from the display instruction. The first preset time period may be the same as or different from the second preset time period. Preferably, the first preset time period is smaller than the second preset time period, and a shorter first preset time period is set, and the preset standard calibration frame can be quickly displayed, and the setting is long. The second preset time period can avoid the situation that the user unconscious or the operation error causes the hidden standard calibration frame to be hidden.
接收到显示指令时显示所述预先设置的标准标定框,能够使得显示界面在显示所述包含人体手部区域的视频时,用户能够对感兴趣的手部区域进行标定;同时在没有接收到所述显示指令时,不显示所述预先设置的标准标定框,或者接收到所述隐藏指令隐藏所述预先设置的标准标定框,能够避免显示的所述包含人体手部区域的视频长时间被所述预先设置的标准标定框遮挡,从而造成了重要信息的遗漏或者给用户在查看所述包含人体手部区域的视频时带来视觉上的不适感。Displaying the preset standard calibration frame when receiving the display instruction enables the display interface to calibrate the hand region of interest when displaying the video including the human hand region; and simultaneously receiving the When the instruction is displayed, the preset standard calibration frame is not displayed, or the hidden instruction is received to hide the preset standard calibration frame, so that the displayed video containing the human hand region can be prevented from being displayed for a long time. The preset standard calibration frame is occluded, thereby causing omission of important information or giving the user a visual discomfort when viewing the video containing the human hand region.
3)在接收到所述显示指令显示所述预先设置的标准标定框,且之后没有接收到任何指令的时间超过第三预设时间段时,自动隐藏所述预先设置的标准标定框。3) automatically resetting the preset standard calibration frame when receiving the display instruction to display the preset standard calibration frame, and the time after receiving no instruction exceeds the third preset time period.
在显示所述预先设置的标准标定框后,当用户不再输入任何操作且超过所述第三预设时间段时,自动将预先设置的标准标定框进行隐藏,可以避免用户在无意识间触发了显示指令而长时间显示预先设置的标准标定框的情况发生,其次,自动将预先设置的标准标定框进行隐藏,也有助于提升用户的交互体验。After displaying the preset standard calibration frame, when the user no longer inputs any operation and exceeds the third preset time period, the preset standard calibration frame is automatically hidden, thereby preventing the user from triggering unconsciously. The display of the command and the display of the preset standard calibration frame for a long time occurs. Secondly, the preset standard calibration frame is automatically hidden, which also helps to enhance the user's interactive experience.
本实施例中,所述预先设置的标准标定框可以为圆形、椭圆形、长方形、正方形等。In this embodiment, the preset standard calibration frame may be a circle, an ellipse, a rectangle, a square, or the like.
标定模块402,用于接收用户在所述包含人体手部区域的视频上标定的标准标定框。The calibration module 402 is configured to receive a standard calibration frame that is calibrated by the user on the video including the human hand region.
本实施例中,当用户在所述显示界面显示的包含人体手部区域的视频中发现了感兴趣的手部信息时,通过在所述显示界面上添加一个标准标定框表示标定出的感兴趣的手部信息。In this embodiment, when the user finds the hand information of interest in the video including the human hand region displayed on the display interface, the calibration is performed by adding a standard calibration box on the display interface. Hand information.
本实施例中,所述标定模块402还包括第一标定子模块4020、第二标定子模块4022及第三标定子模块4024。In this embodiment, the calibration module 402 further includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024.
所述第一标定子模块4020,用于接收用户在所述包含人体手部区域的视频中画出的粗略标定框;通过模糊匹配的方法匹配出与所述粗略标定框相对应的预先设置的标准标定框;根据匹配出的标准标定框对所述包含人体手部区域的视频中进行标定并显示标定的标准标定框,其中,所述粗略标定框的几何中心与所匹配出的标准标定框的几何中心相同。The first standard stator module 4020 is configured to receive a rough calibration frame drawn by a user in the video containing the human hand region; and match a preset setting corresponding to the coarse calibration frame by a fuzzy matching method. a standard calibration frame; calibrating and displaying a calibration standard calibration frame in the video containing the human hand region according to the matched standard calibration frame, wherein the geometric center of the coarse calibration frame and the matched standard calibration frame The geometric center is the same.
本实施例中,由于用户通过手指在所述显示界面上画出的标定框的形状 并非规范或者标准,例如,用户画出的圆形的标定框并不是十分精准,因而当终端接收到用户画出的大致的粗略标定框的形状后,根据所述粗略标定框的大致形状匹配出相对应的预先设置的标准标定框的形状。通过模糊匹配的方法匹配出对应的标准标定框,便于后续对该标定框标定的区域进行裁剪。In this embodiment, since the shape of the calibration frame drawn by the user on the display interface by the finger is not a specification or a standard, for example, the circular calibration frame drawn by the user is not very accurate, and thus the terminal receives the user's drawing. After roughly sketching the shape of the frame, the shape of the corresponding standard calibration frame is matched according to the approximate shape of the rough calibration frame. The corresponding standard calibration frame is matched by the fuzzy matching method, so that the area to be calibrated by the calibration frame is subsequently cut.
所述第二标定子模块4022,用于直接接收用户选取的标准标定框,根据所述标准标定框在所述包含人体手部区域的视频上进行标定并显示标定的标准标定框。The second standard stator module 4022 is configured to directly receive a standard calibration frame selected by a user, perform calibration on the video containing the human hand region according to the standard calibration frame, and display a calibration standard calibration frame.
本实施例中,用户输入显示操作触发显示指令,从而显示预先设置的多个标准标定框,用户触摸标准标定框,终端侦测到标准标定框上的触摸信号后,确定该标准标定框被选取。用户移动被选取的标准标定框并将其拖拽在所述包含人体手部区域的视频上,终端即在所述包含人体手部区域的视频上显示被拖拽的标准标定框。In this embodiment, the user inputs a display operation trigger display instruction, thereby displaying a plurality of standard calibration frames set in advance, and the user touches the standard calibration frame, and after detecting the touch signal on the standard calibration frame, the terminal determines that the standard calibration frame is selected. . The user moves the selected standard calibration frame and drags it onto the video containing the human hand area, and the terminal displays the dragged standard calibration frame on the video containing the human hand area.
所述第三标定子模块4024,用于接收到放大、缩小、移动、删除的指令时,对显示的标准标定框进行放大、缩小、移动、删除。The third standard stator module 4024 is configured to enlarge, reduce, move, and delete the displayed standard calibration frame when receiving an instruction to zoom in, zoom out, move, or delete.
预处理模块403,用于对所述标准标定框标定的区域进行预处理。The pre-processing module 403 is configured to pre-process the area of the standard calibration frame.
本实施例中,所述预处理可以包括以下一种或多种的组合:灰度化处理,校正处理。In this embodiment, the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
所述灰度化处理是指将所述标准标定框标定的区域图像转化为灰度图像,因为色彩信息对提取梯度方向直方图特征影响不大,因而所述标准标定框标定的区域图像转化为灰度图像,既不会影响后续计算所述标准标定框标定的区域的各个像素点的梯度信息,还可以减少各个像素点的梯度信息的计算量。The grayscale processing refers to converting the image of the area calibrated by the standard calibration frame into a grayscale image, because the color information has little effect on the extraction gradient direction histogram feature, and thus the image of the calibration of the standard calibration frame is converted into The grayscale image does not affect the gradient information of each pixel of the region where the standard calibration frame is subsequently calculated, and the calculation amount of the gradient information of each pixel is also reduced.
所述校正处理可以采用伽马(Gamma)校正,因在图像的纹理强度中,局部的表层曝光贡献的比重较大,经过Gamma校正处理后的图像能够有效地降低局部的阴影和光照变化。The correction process may use gamma correction, because the local surface exposure contribution is larger in the texture intensity of the image, and the image processed by the gamma correction can effectively reduce local shadow and illumination changes.
分割模块404,用于提取所述预处理后的所述标准标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标准标定框标定的区域进行分割得到手部图像。a segmentation module 404, configured to extract a gradient direction histogram feature of the pre-processed calibration area of the standard calibration frame, and segment the area of the standard calibration frame according to the gradient direction histogram feature to obtain a hand Part image.
跟踪模块405,用于利用连续自适应数学期望移动算子对所述手部图像进行跟踪。The tracking module 405 is configured to track the hand image with a continuous adaptive mathematical expectation movement operator.
进一步地,所述快速手部跟踪装置40还包括规范化模块406,用于获取所述标定框标定的区域对应的包含人体手部区域的视频中的深度信息,根据所述深度信息对所述手部图像进行规范化。Further, the fast hand tracking device 40 further includes a normalization module 406, configured to acquire depth information in a video corresponding to a human hand region corresponding to the calibration area of the calibration frame, and the hand is used according to the depth information. The image is normalized.
所述深度信息为从所述3D深度相机中获取。所述根据所述深度信息对所述手部图像进行规范化的具体过程为:将从第一次的标准标定框标定的区域分割得到的手部图像的尺寸记为标准尺寸S1,第一次的标定框标定的区域对应的景深信息为记为标准景深H1;当前的标准标定框标定的区域分割得到的手部图像的尺寸记为S2,当前的标定框标定的区域对应的景深信息记为H2,对当前的标定框标定的区域分割得到的手部图像进行规范化为S2*(H2/H1)。The depth information is obtained from the 3D depth camera. The specific process of normalizing the hand image according to the depth information is: recording the size of the hand image obtained by segmenting the area of the first standard calibration frame as the standard size S1, the first time The depth of field information corresponding to the area marked by the calibration frame is recorded as the standard depth of field H1; the size of the hand image obtained by the area division of the current standard calibration frame is recorded as S2, and the depth information corresponding to the area marked by the current calibration frame is recorded as H2. The hand image obtained by the region division of the current calibration frame is normalized to S2*(H2/H1).
对所述手部图像的尺寸进行规范化,是为了使得最终提取出的HOG特征表示具有统一的批判标准,即具有相同的量纲,提高手部跟踪的准确度。The size of the hand image is normalized so that the finally extracted HOG feature representation has a uniform critical criterion, that is, has the same dimension, and improves the accuracy of hand tracking.
综上所述,本申请所述的快速手部跟踪装置40,提供两种标准标定框对所述包含人体手部区域的视频进行标定,能够使得用户标定的标定框为标准标定框,进而分割得到的手部区域的形状是标准的,基于该分割的标准的标定框进行手部跟踪效果更佳。In summary, the fast hand tracking device 40 of the present application provides two standard calibration frames to calibrate the video containing the human hand region, so that the calibration frame of the user calibration can be a standard calibration frame, and then segmented. The shape of the obtained hand region is standard, and the hand tracking effect is better based on the standard calibration frame of the segmentation.
需要说明的是,本申请所述的快速动态手部跟踪装置30、40可以适用于 单个手部的跟踪,也可以适用于多个手部的跟踪。对于多个手部的跟踪,采用并行跟踪的方法进行跟踪,其实质为多个单一的手部跟踪的过程,在此不进行详细描述,任何采用本申请的思想进行手部跟踪的装置都应包含在本申请的范围内。It should be noted that the fast dynamic hand tracking device 30, 40 described in the present application can be applied to the tracking of a single hand, and can also be applied to the tracking of multiple hands. For the tracking of multiple hands, the method of parallel tracking is used for tracking. The essence is the process of multiple single hand tracking. It will not be described in detail here. Any device that uses the idea of this application for hand tracking should It is included in the scope of this application.
实施例五Embodiment 5
图5为本申请实施例五提供的终端的示意图。FIG. 5 is a schematic diagram of a terminal according to Embodiment 5 of the present application.
所述终端5包括:存储器51、至少一个处理器52、存储在所述存储器51中并可在所述至少一个处理器52上运行的计算机可读指令53、至少一条通讯总线54及成像设备55。The terminal 5 includes a memory 51, at least one processor 52, computer readable instructions 53 stored in the memory 51 and operable on the at least one processor 52, at least one communication bus 54 and an imaging device 55. .
所述至少一个处理器52执行所述计算机可读指令53时实现上述快速手部跟踪方法实施例中的步骤,例如图1所示的步骤101至104或图2所示的步骤201至205。或者,所述至少一个处理器52执行所述计算机可读指令53时实现上述装置实施例中各模块/单元的功能,例如图3中的模块301至304或图4中的模块401至406。The at least one processor 52 implements the steps in the fast hand tracking method embodiment when the computer readable instructions 53 are executed, such as steps 101 to 104 shown in FIG. 1 or steps 201 to 205 shown in FIG. 2. Alternatively, the at least one processor 52 implements the functions of the modules/units in the above-described apparatus embodiments when the computer readable instructions 53 are executed, such as the modules 301 to 304 in FIG. 3 or the modules 401 to 406 in FIG.
示例性的,所述计算机可读指令53可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器51中,并由所述至少一个处理器52执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机可读指令指令段,该指令段用于描述所述计算机可读指令53在所述终端5中的执行过程。例如,所述计算机可读指令53可以被分割成图3中的显示模块301、标定模块302、分割模块303及跟踪模块304,或者被分割为图4中的显示模块401、标定模块402、预处理模块403、分割模块404、跟踪模块405及规范化模块406。所述显示模块401包括第一显示子模块4010及第二显示子模块4012,所述标定模块402包括第一标定子模块4020、第二标定子模块4022及第三标定子模块4024,各模块的具体功能参见实施例一、二及其相应描述。Illustratively, the computer readable instructions 53 may be partitioned into one or more modules/units, the one or more modules/units being stored in the memory 51 and by the at least one processor 52 Execute to complete this application. The one or more modules/units may be a series of computer readable instruction instruction segments capable of performing a particular function, the instruction segments being used to describe the execution of the computer readable instructions 53 in the terminal 5. For example, the computer readable instructions 53 may be divided into the display module 301, the calibration module 302, the segmentation module 303, and the tracking module 304 in FIG. 3, or may be divided into the display module 401, the calibration module 402, and the pre-FIG. The processing module 403, the segmentation module 404, the tracking module 405, and the normalization module 406. The display module 401 includes a first display sub-module 4010 and a second display sub-module 4012. The calibration module 402 includes a first standard stator module 4020, a second standard stator module 4022, and a third standard stator module 4024. For specific functions, see Embodiments 1 and 2 and their corresponding descriptions.
所述成像设备55包括2D摄像机、3D深度相机等,所述成像设备55可以装设在所述终端5上,也可以与所述终端5分离作为独立的元件存在。The imaging device 55 includes a 2D camera, a 3D depth camera, etc., and the imaging device 55 may be mounted on the terminal 5 or may be separated from the terminal 5 as an independent component.
所述终端5可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。本领域技术人员可以理解,所述示意图5仅仅是终端5的示例,并不构成对终端5的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端5还可以包括输入输出设备、网络接入设备、总线等。The terminal 5 can be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server. It can be understood by those skilled in the art that the schematic diagram 5 is only an example of the terminal 5, does not constitute a limitation of the terminal 5, may include more or less components than the illustration, or combine some components, or different components. For example, the terminal 5 may further include an input/output device, a network access device, a bus, and the like.
所述至少一个处理器52可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。该处理器52可以是微处理器或者该处理器52也可以是任何常规的处理器等,所述处理器52是所述终端5的控制中心,利用各种接口和线路连接整个终端5的各个部分。The at least one processor 52 may be a central processing unit (CPU), or may be another general-purpose processor, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). ), a Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, and the like. The processor 52 may be a microprocessor or the processor 52 may be any conventional processor or the like. The processor 52 is a control center of the terminal 5, and connects the entire terminal 5 with various interfaces and lines. section.
所述存储器51可用于存储所述计算机可读指令53和/或模块/单元,所述处理器52通过运行或执行存储在所述存储器51内的计算机可读指令和/或模块/单元,以及调用存储在存储器51内的数据,实现所述终端5的各种功能。所述存储器51可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据终端5的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器51可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart  Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 51 can be used to store the computer readable instructions 53 and/or modules/units by running or executing computer readable instructions and/or modules/units stored in the memory 51, and The data stored in the memory 51 is called to implement various functions of the terminal 5. The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.); and the storage data area may be Data (such as audio data, phone book, etc.) created according to the use of the terminal 5 is stored. In addition, the memory 51 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (SD). Card, flash card, at least one disk storage device, flash device, or other volatile solid state storage device.
所述终端5集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个非易失性可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性可读存储介质中,该计算机可读指令在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机可读指令包括计算机可读指令代码,所述计算机可读指令代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述非易失性可读介质可以包括:能够携带所述计算机可读指令代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述非易失性可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,非易失性可读介质不包括电载波信号和电信信号。The modules/units integrated by the terminal 5 can be stored in a non-volatile readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present application implements all or part of the processes in the foregoing embodiments, and may also be implemented by computer-readable instructions, which may be stored in a non-volatile manner. In reading a storage medium, the computer readable instructions, when executed by a processor, implement the steps of the various method embodiments described above. Wherein, the computer readable instructions comprise computer readable instruction code, which may be in the form of source code, an object code form, an executable file or some intermediate form or the like. The non-transitory readable medium may include any entity or device capable of carrying the computer readable instruction code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read only memory (ROM, Read-Only Memory), Random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the contents of the non-volatile readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, Volatile readable media does not include electrical carrier signals and telecommunication signals.
在本申请各个实施例中的各功能单元可以集成在相同处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在相同单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。此外,显然“包括”一词不排除其他单元或,单数不排除复数。系统权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。The functional units in the various embodiments of the present application may be integrated in the same processing unit, or each unit may exist physically separately, or two or more units may be integrated in the same unit. The above integrated unit can be implemented in the form of hardware or in the form of hardware plus software function modules. In addition, it is to be understood that the term "comprising" does not exclude other elements or the singular does not exclude the plural. A plurality of units or devices recited in the system claims can also be implemented by a unit or device by software or hardware. The first, second, etc. words are used to denote names and do not denote any particular order.
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神范围。It should be noted that the above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto. Although the present application is described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solutions of the present application can be applied. Modifications or equivalent substitutions are made without departing from the spirit of the invention.

Claims (20)

  1. 一种快速手部跟踪方法,其特征在于,所述方法包括:A fast hand tracking method, characterized in that the method comprises:
    在显示界面上显示成像设备采集的包含人体手部区域的视频;Displaying a video of the human hand region collected by the imaging device on the display interface;
    接收用户在所述包含人体手部区域的视频上标定的标定框;Receiving a calibration frame that is calibrated by the user on the video containing the human hand region;
    提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and
    利用连续自适应数学期望移动算子对所述手部图像进行跟踪,其中所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:Tracking the hand image with a continuous adaptive mathematical expectation movement operator, wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
    将所述手部图像的色彩空间转换到HSV色彩空间,分离出色调分量的手部图像,基于所述色调分量的手部图像I(i,j)及初始化的搜索框的质心位置和大小,计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00)及当前搜索窗的大小
    Figure PCTCN2018100227-appb-100001
    其中,
    Figure PCTCN2018100227-appb-100002
    Figure PCTCN2018100227-appb-100003
    为当前搜索窗的一阶矩,
    Figure PCTCN2018100227-appb-100004
    为当前搜索窗的零阶矩,i为I(i,j)的水平方向上的像素值,jI(i,j)的垂直方向上的像素值。
    Converting the color space of the hand image to the HSV color space, separating the hand image of the hue component, based on the hand image I(i,j) of the hue component and the centroid position and size of the initialized search box, Calculate the centroid position of the current search window (M 10 /M 00 , M 01 /M 00 ) and the size of the current search window
    Figure PCTCN2018100227-appb-100001
    among them,
    Figure PCTCN2018100227-appb-100002
    and
    Figure PCTCN2018100227-appb-100003
    Is the first moment of the current search window,
    Figure PCTCN2018100227-appb-100004
    For the zeroth moment of the current search window, i is the pixel value in the horizontal direction of I(i,j), and the pixel value in the vertical direction of jI(i,j).
  2. 如权利要求1所述的方法,其特征在于,所述在显示界面上显示成像设备采集的包含人体手部区域的视频还包括:The method of claim 1, wherein the displaying the video of the human hand region collected by the imaging device on the display interface further comprises:
    以预先设置的显示方式显示预先设置的标准标定框,所述预先设置的显示方式包括以下一种或多种的组合:The preset standard calibration frame is displayed in a preset display manner, and the preset display manner includes one or a combination of the following:
    接收到显示指令时,显示所述预先设置的标准标定框;When the display instruction is received, the preset standard calibration box is displayed;
    接收到隐藏指令时,隐藏所述预先设置的标准标定框;Hidden the preset standard calibration box when receiving the hidden instruction;
    在接收到所述显示指令显示所述预先设置的标准标定框,且之后没有接收到任何指令的时间超过预设时间段时,自动隐藏所述预先设置的标准标定框。The preset standard calibration frame is automatically hidden when receiving the display instruction to display the preset standard calibration frame, and the time after receiving no instruction exceeds a preset time period.
  3. 如权利要求2所述的方法,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The method of claim 2 wherein said receiving a calibration frame calibrated by said user on said video comprising said human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    接收用户在所述包含人体手部区域的视频中画出的粗略标定框;Receiving a rough calibration frame drawn by the user in the video containing the human hand region;
    通过模糊匹配的方法匹配出与所述粗略标定框相对应的预先设置的标准标定框;Matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method;
    根据匹配出的标准标定框对所述包含人体手部区域的视频中进行标定并显示标定的标准标定框,其中,所述粗略标定框的几何中心与所匹配出的标准标定框的几何中心相同。Calibrating the video containing the human hand region according to the matched standard calibration frame and displaying the calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame .
  4. 如权利要求2所述的方法,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The method of claim 2 wherein said receiving a calibration frame calibrated by said user on said video comprising said human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    直接接收用户选取的标准标定框,根据所述标准标定框在所述包含人体手部区域的视频上进行标定并显示标定的标准标定框。The standard calibration frame selected by the user is directly received, and the calibration standard frame is displayed on the video containing the human hand region according to the standard calibration frame, and the calibration standard calibration frame is displayed.
  5. 如权利要求3或4所述的方法,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标准标定框还包括:The method according to claim 3 or 4, wherein the receiving the standard calibration frame that the user calibrates on the video containing the human hand region further comprises:
    接收到放大、缩小、移动、删除的指令时,对显示的标准标定框进行放大、缩小、移动、删除。When receiving an instruction to zoom in, zoom out, move, or delete, the standard calibration frame displayed is enlarged, reduced, moved, and deleted.
  6. 如权利要求5所述的方法,其特征在于,所述方法还包括:The method of claim 5, wherein the method further comprises:
    对所述标准标定框标定的区域进行预处理,所述预处理可以包括以下一种或多种的组合:灰度化处理,校正处理。The region to which the standard calibration frame is calibrated is pre-processed, and the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  7. 如权利要求6所述的方法,其特征在于,所述方法还包括:The method of claim 6 wherein the method further comprises:
    获取所述标定框标定的区域对应的包含人体手部区域的视频中的深度信息,根据所述深度信息对所述手部图像进行规范化,所述规范化的过程为:S2*(H2/H1),其中S1为从第一次的标准标定框标定的区域分割得到的手部图像的尺寸,H1为第一次的标定框标定的区域对应的景深信息;S2为当前的标准标定框标定的区域分割得到的手部图像的尺寸,H2为当前的标定框标定的区域对应的景深信息。Obtaining depth information in a video including a human hand region corresponding to the calibration area of the calibration frame, and normalizing the hand image according to the depth information, where the normalization process is: S2*(H2/H1) Where S1 is the size of the hand image obtained from the area segmentation of the first calibration frame, H1 is the depth information corresponding to the area of the first calibration frame; S2 is the area of the current standard calibration frame The size of the hand image obtained by the division, and H2 is the depth information corresponding to the area of the current calibration frame.
  8. 一种快速手部跟踪装置,其特征在于,所述装置包括:A rapid hand tracking device, characterized in that the device comprises:
    显示模块,用于在显示界面上显示成像设备采集的包含人体手部区域的视频;a display module, configured to display, on the display interface, a video that is collected by the imaging device and includes a human hand region;
    标定模块,用于接收用户在所述包含人体手部区域的视频上标定的标定框;a calibration module, configured to receive a calibration frame that is calibrated by the user on the video including the human hand region;
    分割模块,用于提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及a segmentation module, configured to extract a gradient direction histogram feature of the calibration area of the calibration frame, and segment the area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and
    跟踪模块,用于利用连续自适应数学期望移动算子对所述手部图像进行跟踪,其中所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:a tracking module, configured to track the hand image by using a continuous adaptive mathematical expectation movement operator, wherein the tracking the hand image by using a continuous adaptive mathematical expectation mobile operator comprises:
    将所述手部图像的色彩空间转换到HSV色彩空间,分离出色调分量的手部图像,基于所述色调分量的手部图像I(i,j)及初始化的搜索框的质心位置和大小,计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00)及当前搜索窗的大小
    Figure PCTCN2018100227-appb-100005
    其中,
    Figure PCTCN2018100227-appb-100006
    Figure PCTCN2018100227-appb-100007
    为当前搜索窗的一阶矩,
    Figure PCTCN2018100227-appb-100008
    为当前搜索窗的零阶矩,i为I(i,j)的水平方向上的像素值,jI(i,j)的垂直方向上的像素值。
    Converting the color space of the hand image to the HSV color space, separating the hand image of the hue component, based on the hand image I(i,j) of the hue component and the centroid position and size of the initialized search box, Calculate the centroid position of the current search window (M 10 /M 00 , M 01 /M 00 ) and the size of the current search window
    Figure PCTCN2018100227-appb-100005
    among them,
    Figure PCTCN2018100227-appb-100006
    and
    Figure PCTCN2018100227-appb-100007
    Is the first moment of the current search window,
    Figure PCTCN2018100227-appb-100008
    For the zeroth moment of the current search window, i is the pixel value in the horizontal direction of I(i,j), and the pixel value in the vertical direction of jI(i,j).
  9. 一种终端,其特征在于,所述终端包括处理器和存储器,所述处理器用于执行所述存储器中存储的计算机可读指令以实现以下步骤:A terminal, comprising: a processor and a memory, the processor for executing computer readable instructions stored in the memory to implement the following steps:
    在显示界面上显示成像设备采集的包含人体手部区域的视频;Displaying a video of the human hand region collected by the imaging device on the display interface;
    接收用户在所述包含人体手部区域的视频上标定的标定框;Receiving a calibration frame that is calibrated by the user on the video containing the human hand region;
    提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and
    利用连续自适应数学期望移动算子对所述手部图像进行跟踪,其中所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:Tracking the hand image with a continuous adaptive mathematical expectation movement operator, wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
    将所述手部图像的色彩空间转换到HSV色彩空间,分离出色调分量的手部图像,基于所述色调分量的手部图像I(i,j)及初始化的搜索框的质心位置和大小,计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00)及当前搜索窗的大小
    Figure PCTCN2018100227-appb-100009
    其中,
    Figure PCTCN2018100227-appb-100010
    Figure PCTCN2018100227-appb-100011
    为当前搜索窗的一阶矩,
    Figure PCTCN2018100227-appb-100012
    为当前搜索窗的零阶矩,i为I(i,j)的水平方向上的像素值,jI(i,j)的垂直方向上的像素值。
    Converting the color space of the hand image to the HSV color space, separating the hand image of the hue component, based on the hand image I(i,j) of the hue component and the centroid position and size of the initialized search box, Calculate the centroid position of the current search window (M 10 /M 00 , M 01 /M 00 ) and the size of the current search window
    Figure PCTCN2018100227-appb-100009
    among them,
    Figure PCTCN2018100227-appb-100010
    and
    Figure PCTCN2018100227-appb-100011
    Is the first moment of the current search window,
    Figure PCTCN2018100227-appb-100012
    For the zeroth moment of the current search window, i is the pixel value in the horizontal direction of I(i,j), and the pixel value in the vertical direction of jI(i,j).
  10. 如权利要求9所述的终端,其特征在于,所述在显示界面上显示成像 设备采集的包含人体手部区域的视频还包括:The terminal according to claim 9, wherein the displaying the video of the human hand region collected by the imaging device on the display interface further comprises:
    以预先设置的显示方式显示预先设置的标准标定框,所述预先设置的显示方式包括以下一种或多种的组合:The preset standard calibration frame is displayed in a preset display manner, and the preset display manner includes one or a combination of the following:
    接收到显示指令时,显示所述预先设置的标准标定框;When the display instruction is received, the preset standard calibration box is displayed;
    接收到隐藏指令时,隐藏所述预先设置的标准标定框;Hidden the preset standard calibration box when receiving the hidden instruction;
    在接收到所述显示指令显示所述预先设置的标准标定框,且之后没有接收到任何指令的时间超过预设时间段时,自动隐藏所述预先设置的标准标定框。The preset standard calibration frame is automatically hidden when receiving the display instruction to display the preset standard calibration frame, and the time after receiving no instruction exceeds a preset time period.
  11. 如权利要求10所述的终端,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The terminal according to claim 10, wherein the calibration frame that is received by the receiving user on the video containing the human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    接收用户在所述包含人体手部区域的视频中画出的粗略标定框;Receiving a rough calibration frame drawn by the user in the video containing the human hand region;
    通过模糊匹配的方法匹配出与所述粗略标定框相对应的预先设置的标准标定框;Matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method;
    根据匹配出的标准标定框对所述包含人体手部区域的视频中进行标定并显示标定的标准标定框,其中,所述粗略标定框的几何中心与所匹配出的标准标定框的几何中心相同。Calibrating the video containing the human hand region according to the matched standard calibration frame and displaying the calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame .
  12. 如权利要求10所述的终端,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The terminal according to claim 10, wherein the calibration frame that is received by the receiving user on the video containing the human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    直接接收用户选取的标准标定框,根据所述标准标定框在所述包含人体手部区域的视频上进行标定并显示标定的标准标定框。The standard calibration frame selected by the user is directly received, and the calibration standard frame is displayed on the video containing the human hand region according to the standard calibration frame, and the calibration standard calibration frame is displayed.
  13. 如权利要求10所述的终端,其特征在于,所述处理器还用于执行所述计算机可读指令以实现以下步骤:The terminal of claim 10, wherein the processor is further configured to execute the computer readable instructions to implement the following steps:
    对所述标准标定框标定的区域进行预处理,所述预处理可以包括以下一种或多种的组合:灰度化处理,校正处理。The region to which the standard calibration frame is calibrated is pre-processed, and the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  14. 如权利要求13所述的终端,其特征在于,所述处理器还用于执行所述计算机可读指令以实现以下步骤:The terminal of claim 13 wherein said processor is further operative to execute said computer readable instructions to:
    获取所述标定框标定的区域对应的包含人体手部区域的视频中的深度信息,根据所述深度信息对所述手部图像进行规范化,所述规范化的过程为:S2*(H2/H1),其中S1为从第一次的标准标定框标定的区域分割得到的手部图像的尺寸,H1为第一次的标定框标定的区域对应的景深信息;S2为当前的标准标定框标定的区域分割得到的手部图像的尺寸,H2为当前的标定框标定的区域对应的景深信息。Obtaining depth information in a video including a human hand region corresponding to the calibration area of the calibration frame, and normalizing the hand image according to the depth information, where the normalization process is: S2*(H2/H1) Where S1 is the size of the hand image obtained from the area segmentation of the first calibration frame, H1 is the depth information corresponding to the area of the first calibration frame; S2 is the area of the current standard calibration frame The size of the hand image obtained by the division, and H2 is the depth information corresponding to the area of the current calibration frame.
  15. 一种非易失性可读存储介质,所述非易失性可读存储介质上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现以下步骤:A non-volatile readable storage medium having stored thereon computer readable instructions, wherein the computer readable instructions, when executed by a processor, implement the following steps:
    在显示界面上显示成像设备采集的包含人体手部区域的视频;Displaying a video of the human hand region collected by the imaging device on the display interface;
    接收用户在所述包含人体手部区域的视频上标定的标定框;Receiving a calibration frame that is calibrated by the user on the video containing the human hand region;
    提取所述标定框标定的区域的梯度方向直方图特征,并根据所述梯度方向直方图特征对所述标定框标定的区域进行分割得到手部图像;及Extracting a gradient direction histogram feature of the calibration area of the calibration frame, and segmenting the calibration area of the calibration frame according to the gradient direction histogram feature to obtain a hand image; and
    利用连续自适应数学期望移动算子对所述手部图像进行跟踪,其中所述利用连续自适应数学期望移动算子对所述手部图像进行跟踪具体包括:Tracking the hand image with a continuous adaptive mathematical expectation movement operator, wherein the tracking the hand image with the continuous adaptive mathematical expectation movement operator specifically includes:
    将所述手部图像的色彩空间转换到HSV色彩空间,分离出色调分量的手部图像,基于所述色调分量的手部图像I(i,j)及初始化的搜索框的质心位置和大小,计算当前搜索窗的质心位置(M 10/M 00,M 01/M 00)及当前搜索窗的大小
    Figure PCTCN2018100227-appb-100013
    其中,
    Figure PCTCN2018100227-appb-100014
    Figure PCTCN2018100227-appb-100015
    为当前搜索窗的一阶矩,
    Figure PCTCN2018100227-appb-100016
    为当前搜索窗的零阶矩,i为I(i,j)的水平方向上的像素值,jI(i,j)的垂直方向上的像素值。
    Converting the color space of the hand image to the HSV color space, separating the hand image of the hue component, based on the hand image I(i,j) of the hue component and the centroid position and size of the initialized search box, Calculate the centroid position of the current search window (M 10 /M 00 , M 01 /M 00 ) and the size of the current search window
    Figure PCTCN2018100227-appb-100013
    among them,
    Figure PCTCN2018100227-appb-100014
    and
    Figure PCTCN2018100227-appb-100015
    Is the first moment of the current search window,
    Figure PCTCN2018100227-appb-100016
    For the zeroth moment of the current search window, i is the pixel value in the horizontal direction of I(i,j), and the pixel value in the vertical direction of jI(i,j).
  16. 如权利要求15所述的存储介质,其特征在于,所述在显示界面上显示成像设备采集的包含人体手部区域的视频还包括:The storage medium according to claim 15, wherein the displaying the video of the human hand region collected by the imaging device on the display interface further comprises:
    以预先设置的显示方式显示预先设置的标准标定框,所述预先设置的显示方式包括以下一种或多种的组合:The preset standard calibration frame is displayed in a preset display manner, and the preset display manner includes one or a combination of the following:
    接收到显示指令时,显示所述预先设置的标准标定框;When the display instruction is received, the preset standard calibration box is displayed;
    接收到隐藏指令时,隐藏所述预先设置的标准标定框;Hidden the preset standard calibration box when receiving the hidden instruction;
    在接收到所述显示指令显示所述预先设置的标准标定框,且之后没有接收到任何指令的时间超过预设时间段时,自动隐藏所述预先设置的标准标定框。The preset standard calibration frame is automatically hidden when receiving the display instruction to display the preset standard calibration frame, and the time after receiving no instruction exceeds a preset time period.
  17. 如权利要求16所述的存储介质,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The storage medium of claim 16 wherein said receiving a calibration frame calibrated by said user on said video comprising said human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    接收用户在所述包含人体手部区域的视频中画出的粗略标定框;Receiving a rough calibration frame drawn by the user in the video containing the human hand region;
    通过模糊匹配的方法匹配出与所述粗略标定框相对应的预先设置的标准标定框;Matching a preset standard calibration frame corresponding to the coarse calibration frame by a fuzzy matching method;
    根据匹配出的标准标定框对所述包含人体手部区域的视频中进行标定并显示标定的标准标定框,其中,所述粗略标定框的几何中心与所匹配出的标准标定框的几何中心相同。Calibrating the video containing the human hand region according to the matched standard calibration frame and displaying the calibration standard calibration frame, wherein the geometric center of the coarse calibration frame is the same as the geometric center of the matched standard calibration frame .
  18. 如权利要求16所述的存储介质,其特征在于,所述接收用户在所述包含人体手部区域的视频上标定的标定框包括:The storage medium of claim 16 wherein said receiving a calibration frame calibrated by said user on said video comprising said human hand region comprises:
    接收用户在所述包含人体手部区域的视频上标定的标准标定框,包括:Receiving a standard calibration frame calibrated by the user on the video containing the human hand region, including:
    直接接收用户选取的标准标定框,根据所述标准标定框在所述包含人体手部区域的视频上进行标定并显示标定的标准标定框。The standard calibration frame selected by the user is directly received, and the calibration standard frame is displayed on the video containing the human hand region according to the standard calibration frame, and the calibration standard calibration frame is displayed.
  19. 如权利要求16所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行还实现以下步骤:The storage medium of claim 16 wherein said computer readable instructions are executed by said processor further implementing the steps of:
    对所述标准标定框标定的区域进行预处理,所述预处理可以包括以下一种或多种的组合:灰度化处理,校正处理。The region to which the standard calibration frame is calibrated is pre-processed, and the pre-processing may include a combination of one or more of the following: grayscale processing, correction processing.
  20. 如权利要求19所述的存储介质,其特征在于,所述计算机可读指令被所述处理器执行还实现以下步骤:The storage medium of claim 19 wherein said computer readable instructions are executed by said processor further implementing the following steps:
    获取所述标定框标定的区域对应的包含人体手部区域的视频中的深度信息,根据所述深度信息对所述手部图像进行规范化,所述规范化的过程为:S2*(H2/H1),其中S1为从第一次的标准标定框标定的区域分割得到的手部图像的尺寸,H1为第一次的标定框标定的区域对应的景深信息;S2为当前的标准标定框标定的区域分割得到的手部图像的尺寸,H2为当前的标定框标定的区域对应的景深信息。Obtaining depth information in a video including a human hand region corresponding to the calibration area of the calibration frame, and normalizing the hand image according to the depth information, where the normalization process is: S2*(H2/H1) Where S1 is the size of the hand image obtained from the area segmentation of the first calibration frame, H1 is the depth information corresponding to the area of the first calibration frame; S2 is the area of the current standard calibration frame The size of the hand image obtained by the division, and H2 is the depth information corresponding to the area of the current calibration frame.
PCT/CN2018/100227 2018-04-18 2018-08-13 Fast hand tracking method, device, terminal, and storage medium WO2019200785A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810349972.XA CN108682021B (en) 2018-04-18 2018-04-18 Rapid hand tracking method, device, terminal and storage medium
CN201810349972.X 2018-04-18

Publications (1)

Publication Number Publication Date
WO2019200785A1 true WO2019200785A1 (en) 2019-10-24

Family

ID=63801123

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100227 WO2019200785A1 (en) 2018-04-18 2018-08-13 Fast hand tracking method, device, terminal, and storage medium

Country Status (2)

Country Link
CN (1) CN108682021B (en)
WO (1) WO2019200785A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886928B (en) * 2019-01-24 2023-07-14 平安科技(深圳)有限公司 Target cell marking method, device, storage medium and terminal equipment
SG10201913029SA (en) * 2019-12-23 2021-04-29 Sensetime Int Pte Ltd Target tracking method and apparatus, electronic device, and storage medium
CN115701873A (en) * 2021-07-19 2023-02-14 北京字跳网络技术有限公司 Image matching method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015139750A1 (en) * 2014-03-20 2015-09-24 Telecom Italia S.P.A. System and method for motion capture
CN105825524A (en) * 2016-03-10 2016-08-03 浙江生辉照明有限公司 Target tracking method and apparatus
CN105957107A (en) * 2016-04-27 2016-09-21 北京博瑞空间科技发展有限公司 Pedestrian detecting and tracking method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8963829B2 (en) * 2009-10-07 2015-02-24 Microsoft Corporation Methods and systems for determining and tracking extremities of a target
CN103390168A (en) * 2013-07-18 2013-11-13 重庆邮电大学 Intelligent wheelchair dynamic gesture recognition method based on Kinect depth information
CN105678809A (en) * 2016-01-12 2016-06-15 湖南优象科技有限公司 Handheld automatic follow shot device and target tracking method thereof
CN106157308A (en) * 2016-06-30 2016-11-23 北京大学 Rectangular target object detecting method
US20180047173A1 (en) * 2016-08-12 2018-02-15 Qualcomm Incorporated Methods and systems of performing content-adaptive object tracking in video analytics
CN107240117B (en) * 2017-05-16 2020-05-15 上海体育学院 Method and device for tracking moving object in video

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015139750A1 (en) * 2014-03-20 2015-09-24 Telecom Italia S.P.A. System and method for motion capture
CN105825524A (en) * 2016-03-10 2016-08-03 浙江生辉照明有限公司 Target tracking method and apparatus
CN105957107A (en) * 2016-04-27 2016-09-21 北京博瑞空间科技发展有限公司 Pedestrian detecting and tracking method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LAN, TIANTIAN ET AL.: "Study on Hand Gesture Recognition Used for Air Conditioning Control", CHINESE MASTER'S THESES FULL-TEXT DATABASE, INFORMATION SCIENCE & TECHNOLOGY, 1 May 2015 (2015-05-01), pages 1 - 70, XP055645970 *

Also Published As

Publication number Publication date
CN108682021A (en) 2018-10-19
CN108682021B (en) 2021-03-05

Similar Documents

Publication Publication Date Title
CN109493350B (en) Portrait segmentation method and device
CN107516319B (en) High-precision simple interactive matting method, storage device and terminal
Xiao et al. Fast image dehazing using guided joint bilateral filter
EP2864933B1 (en) Method, apparatus and computer program product for human-face features extraction
EP3104332B1 (en) Digital image manipulation
US8437570B2 (en) Geodesic image and video processing
CN109934065B (en) Method and device for gesture recognition
US10277806B2 (en) Automatic image composition
US9639943B1 (en) Scanning of a handheld object for 3-dimensional reconstruction
US20110211749A1 (en) System And Method For Processing Video Using Depth Sensor Information
WO2020134818A1 (en) Image processing method and related product
WO2013086255A1 (en) Motion aligned distance calculations for image comparisons
WO2019071976A1 (en) Panoramic image saliency detection method based on regional growth and eye movement model
US20230334235A1 (en) Detecting occlusion of digital ink
WO2018082308A1 (en) Image processing method and terminal
WO2019200785A1 (en) Fast hand tracking method, device, terminal, and storage medium
CN111402170A (en) Image enhancement method, device, terminal and computer readable storage medium
CN114255337A (en) Method and device for correcting document image, electronic equipment and storage medium
KR20220153667A (en) Feature extraction methods, devices, electronic devices, storage media and computer programs
CN107578053B (en) Contour extraction method and device, computer device and readable storage medium
CN108960012B (en) Feature point detection method and device and electronic equipment
CN108647605B (en) Human eye gaze point extraction method combining global color and local structural features
KR20190044761A (en) Apparatus Processing Image and Method thereof
US9171357B2 (en) Method, apparatus and computer-readable recording medium for refocusing photographed image
WO2022021287A1 (en) Data enhancement method and training method for instance segmentation model, and related apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18914953

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09.12.2020)