CN111144207A - Human body detection and tracking method based on multi-mode information perception - Google Patents

Human body detection and tracking method based on multi-mode information perception Download PDF

Info

Publication number
CN111144207A
CN111144207A CN201911146615.4A CN201911146615A CN111144207A CN 111144207 A CN111144207 A CN 111144207A CN 201911146615 A CN201911146615 A CN 201911146615A CN 111144207 A CN111144207 A CN 111144207A
Authority
CN
China
Prior art keywords
tracking
head
depth
color
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911146615.4A
Other languages
Chinese (zh)
Other versions
CN111144207B (en
Inventor
周波
黄文超
甘亚辉
房芳
钱堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201911146615.4A priority Critical patent/CN111144207B/en
Publication of CN111144207A publication Critical patent/CN111144207A/en
Application granted granted Critical
Publication of CN111144207B publication Critical patent/CN111144207B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20024Filtering details
    • G06T2207/20032Median filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a human body detection and tracking method based on multi-mode information perception, which comprises the following steps: calibrating a color camera and a depth camera and performing data filtering processing; detecting the body and the head of a person in the color image and the depth image respectively based on multi-mode information perception human body detection, and fusing two detection results according to the spatial proportion information of the head and the body; based on multi-mode information perception, tracking a body and a head in a color image and a depth image respectively by using a coring related filtering tracking algorithm, and establishing a model of a tracked object; and the space constraint of the tracking object model and the head-to-body ratio is utilized to perfect a tracking mechanism. The method disclosed by the invention is based on multi-mode information perception, overcomes the defects of a target detection and tracking method only based on vision, has wide application in the field of indoor service robots, and is beneficial to functions of human-computer interaction operation, user following and the like.

Description

Human body detection and tracking method based on multi-mode information perception
Technical Field
The invention belongs to the field of indoor service robot application, and particularly relates to a human body detection and tracking method based on multi-mode information perception, in particular to a long-time robust detection and tracking method under an unstructured indoor environment and an illumination change scene.
Background
With the development of computer vision technology and the rise of artificial intelligence, the application range of intelligent service robots is more and more extensive, especially indoor mobile service robots. In an indoor environment, a robot needs to be able to perceive a complex, unstructured scene and interact with a human, and using only visual information is not sufficient to cope with dynamic changes in ambient lighting conditions. The RGB-D camera is a novel vision sensor, can provide high-resolution color and depth images at the same time, and is a very excellent man-machine interaction tool. There is a need to provide an efficient method to fully utilize multimodal information for detection and tracking.
The target detection and tracking method mostly adopts a solution of a camera and a laser. The two-dimensional laser can directly acquire the geometric information of the environment, and is high in precision and fast in processing. However, the amount of information that can be utilized is small, and only simple shape features can be extracted, which is easily confused with similar objects in the environment. Methods of detection and tracking using a camera can be further divided into manual feature-based methods and deep learning-based methods. The manual feature-based method needs to extract set features, train a classifier and detect by using a sliding window, and has the advantages of controllable calculated amount, clear significance of extracted features and insufficient accuracy of algorithm. The method based on deep learning can achieve higher precision, but the calculation amount is large, and real-time operation cannot be performed on a common calculation platform.
In summary, the above conventional target detection and tracking method has the following problems: 1) a satisfactory balance between the performance and the real-time performance of the algorithm is difficult to achieve; 2) only a single information source of color or depth is used for detection and tracking, and detection and tracking under complex environment cannot be realized; 3) the method is lack of analysis and processing under the condition of short-time failure of the algorithm, is easy to lose tracking and has poor robustness.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, a human body detection and tracking method based on multi-mode information perception is provided, so that the problem of real-time and robust human body detection and tracking in a complex environment is solved.
The technical scheme is as follows: in order to realize the purpose, the invention adopts the following technical scheme:
a human body detection and tracking method based on multi-modal information perception comprises the following steps: the method comprises the following steps:
(1) calibrating a color camera and a depth camera and performing data filtering, aligning the color image and the depth image through calibration, and then performing filtering respectively;
(2) human body detection based on multi-modal information perception: detecting a body in the color image, detecting a head in the depth image, and fusing according to the spatial proportion information;
(3) human body tracking based on multi-modal information perception: respectively tracking the body and the head in the color image and the depth image by using a nuclear correlated filtering target tracking algorithm, and establishing a tracking object model to check a tracking result;
(4) a tracking mechanism is perfected by utilizing the space constraint of a tracking object model and a head-body ratio, and if a single tracker fails in the tracking process, the tracking stability is maintained according to the space position constraint of the head and the body.
Further, the step (1) comprises the following steps:
(11) a color camera and a depth camera are respectively used for shooting a plurality of checkerboard pictures at different angles and different distances, so that each position of an image can be covered;
(12) detecting and matching angular points in different images, and calculating an internal and external reference matrix of the color camera according to the matched angular point pair, wherein the internal reference matrix of the color camera is as follows:
Figure BDA0002282381700000021
wherein fx and fy are focal lengths, x0 and y0 are principal point coordinates relative to an imaging plane, and s is a coordinate axis inclination parameter;
the color phase machine external parameter matrix is as follows:
Figure BDA0002282381700000022
wherein, R is a rotation matrix of 3 × 3, t is a translation vector of 3 × 1, and both are relative to a world coordinate system;
(13) mapping the depth values onto the color image;
let P be the position of a spatial point, PrgbAnd pirFor its coordinates in color and depth images, KrgbAnd KirInternal reference matrices, R, for colour and depth cameras respectivelyirAnd tirMapping the depth value of the P point to the coordinate of the color image by the following formula, wherein the color camera is used as an external parameter of a reference system of the depth camera;
prgb=KrgbRirKir -1pir+tir
(14) and simultaneously shrinking the registered color image and the depth image, removing high-frequency noise in the color image by using Gaussian filtering, and removing a depth missing point in the depth image by using median filtering.
Further, the step (2) comprises the following steps:
(21) scanning an image in a color image by using a sliding window, extracting HOG characteristics in the window, and judging whether the window contains a human body by using a trained SVM classifier to obtain all windows possibly containing the human body in the color image;
(22) using a sliding window in the depth image, extracting Haar features in the window, using an Adaboost classifier to classify whether the window is a human head, and obtaining all windows which possibly contain the human head in the depth image;
(23) and (3) according to the fusion detection of the spatial proportion information, and according to the head-body ratio fusion step (21) and the step (22), obtaining a detection result of fusion multi-mode information.
Further, the step (21) includes the steps of:
(211) scanning a color image by a multi-scale sliding window, and firstly amplifying and reducing an original color image according to a preset proportion to obtain color images of multiple scales; then sliding on each color image by a sliding window with a fixed size, and checking whether the window contains a human body;
(212) extracting the HOG features in the window, wherein the HOG features are extracted by the following steps:
(2121) graying and gray level normalization;
firstly, graying the whole color image, and then normalizing;
(2122) calculating the gradient of each pixel in the color image;
wherein the gradient G of the color image in the x and y directions at (x, y)x,GyRespectively as follows:
Gx(x,y)=I(x+1,y)-I(x-1,y);
Gy(x,y)=I(x,y+1)-I(x,y-1);
the magnitude of the gradient is then
Figure BDA0002282381700000031
In the direction of
Figure BDA0002282381700000032
(2123) Dividing the cells of pixels by 8 multiplied by 8, counting the gradient information of all pixels in one cell, and representing the result by a histogram in the gradient direction;
(2124) dividing blocks by 16 multiplied by 16, and carrying out contrast normalization on the gradient histogram in the blocks;
(2125) setting a detection window with the size of 64 multiplied by 128, generating a characteristic vector in the detection window, and combining the characteristic vectors of each block in the detection window to obtain the characteristic vector of the detection window for subsequent classification;
zooming an original color image to form a color image pyramid, sliding a detection window on the color image of the current scale, classifying each position by using a trained SVM classifier, and judging whether a human body exists at the position; and finally, carrying out a non-maximum suppression algorithm on the obtained result to eliminate multiple detection windows of the same target.
Further, the step (22) includes the steps of:
(221) scanning a depth image by a multi-scale sliding window, and firstly amplifying and reducing an original depth image according to a preset proportion to obtain depth images of multiple scales; then sliding a sliding window with a fixed size on each depth image, and checking whether the window contains a human head;
(222) extracting Haar features in the window;
the Haar features are simple rectangular block features and are divided into three types, namely edge features, linear features and diagonal features, and the feature value of each rectangular region is the sum of pixels in a white region and the sum of pixels in a black region subtracted;
(223) adaboost classification, which is to train a classifier by using an AdaBoost algorithm;
the AdaBoost algorithm is a mode of improving a weak learner through enough data so as to generate a high-precision strong learner; a weak classifier hj(x) As shown in formula:
Figure BDA0002282381700000041
wherein f isjIs characterized by, thetajIs a threshold value, pjThe role of (a) is to control the direction of the inequality, x is a 24 x 24 image sub-window; training N times to obtain N weak classifiers, and training the N-th weak classifierAdding normalized weight, wherein the weight is probability distribution; training a classifier h for each feature jj,hjUsing only a single feature, the one classifier h with the lowest error is selectednUpdating the weight to obtain a strong classifier;
classifying the Haar feature vectors in the detection window obtained in the step (222) by using an Adaboost classifier, and giving a probability score of the existence of the human head in the detection window;
(224) and (3) integrating the human head detection result in the depth image, and performing non-maximum suppression according to the probability score of each window to obtain the human head detection result in the depth image.
Further, the step (23) includes the steps of:
(231) acquiring a head and body detection result, acquiring a body frame in the color image from the step (21), acquiring a head frame in the depth image from the step (22), traversing a set of body frames, and executing the following operation on each body frame;
(232) judging whether the body frame has a head frame or not, if not, deleting the body frame, and returning to the step (231); if so, performing step (233);
(233) judging whether the number of the head frames in the body frame is 1, if so, associating the body frame and the head frames to form multi-modal combined human body detection; if the number of the head frames in the body frame exceeds one, selecting an optimal head frame according to the positions of the head frames and the respective confidence degrees, and then associating the optimal head frame with the current body frame.
Further, the step (3) comprises the following steps:
(31) establishing models of the tracked object in the color image and the depth image;
in the color image, the model is a color histogram, and in the depth image, the model is a depth template image; the steps of extracting the color histogram are as follows: firstly, converting RGB color into HSV color space, wherein H is hue, S is saturation and V is brightness, then extracting H channel according to the following formula, and counting distribution of H values in a window to form a color histogram; the method for extracting the depth template picture comprises the steps of intercepting a head surrounding frame before tracking is started, zooming to a standard size, and taking the head surrounding frame as the template picture of head tracking in the depth image;
R′=R/255,G′=G/255,B′=B/255
Cmax=max(R′,G′,B′),Cmin=min(R′,G′,B′)
Δ=Cmax-Cmin
Figure BDA0002282381700000051
(32) tracking the body in the color map and the head in the depth map simultaneously using a KCF nucleation correlation filtering algorithm; the method comprises the following steps: extracting pixel values around a tracked object by using a circulation matrix to serve as training samples, training a discriminant function by using ridge regression, and transforming the samples to a kernel space by using kernel transformation to solve the problem that the sample linearity is not separable;
(33) matching and updating the object model in the tracking process; the matching method is to calculate the normalized correlation coefficient of the tracking object and the initial model, and the calculation formula of the coefficient is as follows:
Figure BDA0002282381700000052
Figure BDA0002282381700000053
wherein d is a color histogram H1And H2R is the normalized correlation coefficient of the depth template pictures T and I; the values of the two numbers are in the range of [0,1 ]]The larger the matching degree, the higher the matching degree, and 0 represents the worst matching effect; if the matching value is greater than 0.9, namely the confidence of the algorithm on the tracking result is higher, carrying out weighted updating on the model; the weight of the initial model is 1-w, and the weight of the current model of the tracked object is w, wherein w is 0.5 × d or w is 0.5 × R.
Further, the step (4) comprises the following steps:
(41) judging the tracking effectiveness according to the normalized correlation coefficient provided in the step (3) in the tracking process; firstly, judging whether the head tracking is effective, namely whether the normalized correlation coefficient R values of the depth template pictures T and I are greater than 0.5, if so, turning to the step (42), otherwise, turning to the step (43);
(42) to determine whether body tracking is valid, i.e. color histogram H1And H2If the normalized correlation coefficient d is larger than 0.5, the body tracking result in the current color image and the head tracking result in the depth image are both effective, and the normal tracking process is continued, otherwise, the step (44) is carried out;
(43) to determine whether body tracking is valid, i.e. color histogram H1And H2If the normalized correlation coefficient d is larger than 0.5, the head tracking in the depth image is invalid, the body tracking in the color image is still effective, the position of the head is estimated according to the spatial position constraint of the head and the body, the matching of a head model is continuously carried out, and the head tracking is recovered once the matching is effective; otherwise, turning to step (45);
(44) in this case, the head tracking in the depth image is effective, while the body tracking in the color image is ineffective, and it is necessary to estimate an approximate body position from the head position according to the spatial position constraints of the head and the body, and to continuously perform the matching of the body color histogram, and to recover the body tracking once the matching is effective;
(45) at this time, the tracking of the head and the tracking of the body both fail to indicate that the tracked object is not present in the color image and the depth image due to occlusion or fast motion, in which case the tracking algorithm stops and a warning needs to be given to the user to make an appropriate response.
Has the advantages that: compared with the prior art, the method and the device effectively solve the problems of real-time and robust detection and tracking of the human body in the complex environment based on the multi-mode information acquired by the RGB-D camera. The multi-modal information is adopted for detecting and tracking the human body, and compared with the single use of color information or depth information, the adaptability of the algorithm to different environmental illumination conditions is improved; the detection results of the color image and the depth image are fused by utilizing the spatial proportion information, so that the recall ratio is improved, the false detection rate is reduced, and the algorithm accuracy is improved; the tracking results on the color image and the depth image are combined, model characteristic information of a tracked object is comprehensively utilized, and the results can be verified and restored in the tracking process, so that the overall algorithm has higher robustness. The method is simple and efficient, can meet the functions of man-machine interaction operation, user following and the like of the indoor service robot, and has wide application range and good economic benefit.
Drawings
FIG. 1 is a general flow chart of the algorithm;
FIG. 2 is a flow chart of the color image human detection in step (2) of the present invention;
FIG. 3 is a flowchart of the human head detection of the depth image in step (2) of the present invention;
FIG. 4 is a flow chart of fusion detection according to head-to-body ratio in step (2) of the present invention;
FIG. 5 is a flow chart of step (3) of the present invention;
FIG. 6 is a flow chart of step (4) of the present invention.
Detailed Description
The invention is further described with reference to the following drawings and detailed description.
Fig. 1 is a general flowchart of a human body detection and tracking method based on multi-modal information perception, specifically including the following steps:
(1) calibrating a color camera and a depth camera and carrying out data filtering processing;
firstly, an RGB-D camera (comprising a color camera and a depth camera, wherein the color camera is used for acquiring a color image, the color image comprises R, G, B three-color values, the depth camera is used for acquiring a depth image, and the depth image comprises a distance (D) value) is used for acquiring color and depth data of the surrounding environment; secondly, according to a certain position offset between a color camera and a depth camera in the RGB-D camera, calibrating the camera to obtain internal and external parameter matrixes of the color camera and the depth camera, and enabling the depth value and the color value of each point to be in one-to-one correspondence; and finally, respectively carrying out filtering processing on the color image and the depth image so as to remove the bright spots and the noise.
The principle of the depth camera for measuring the depth is that an infrared speckle transmitter emits infrared beams, the beams are reflected back to the depth camera after contacting with an obstacle, and then the distance is calculated through the geometrical relationship between returned speckles. The depth camera is actually a common camera provided with a filter, only images infrared light, and only needs to be calibrated by irradiating an object with an infrared light source. The color camera is calibrated by adopting a checkerboard method, a camera to be calibrated is used for shooting a plurality of checkerboard pictures under different visual angles, different pictures are matched by utilizing angular point detection, and an internal and external parameter matrix of the camera can be solved through an equation. The calibration of the color camera comprises the following specific steps:
(11) the RGB-D camera is held in place using a tripod and then a checkerboard picture is taken using a color camera at multiple angles and distances, ensuring that every position of the image can be covered.
(12) Detecting and matching angular points in different images, and calculating an internal reference matrix and an external reference matrix of the camera according to the matched angular point pairs, wherein the internal reference matrix is shown as a formula (1), fx and fy are respectively focal lengths of an x axis and a y axis, (x0 and y0) are principal point coordinates relative to an imaging plane, s is a coordinate axis inclination parameter, and is 0 under an ideal condition; the external reference matrix is shown as a formula (2), wherein R is a rotation matrix of 3 multiplied by 3, t is a translation vector of 3 multiplied by 1, and the two are relative to a world coordinate system;
Figure BDA0002282381700000071
Figure BDA0002282381700000072
(13) the depth values in the depth image are mapped onto the color image. Let P be the position of a spatial point, PrgbAnd pirTheir coordinates in the color image and depth image, KrgbAnd KirInternal reference matrices, R, for colour and depth cameras respectivelyirAnd tirFor the outer parameter of the depth camera with the color camera as the reference frame, the depth value of the P point can be mapped to the coordinate of the color image by the formula (3);
prgd=KrgdRirKir -1pir+tir(3);
(14) and simultaneously scaling the registered color image and the depth image to 480 multiplied by 270, removing high-frequency noise in the color image by using Gaussian filtering, and removing a depth missing point in the depth image by using median filtering.
(2) Human body detection based on multi-modal information perception: firstly, scanning an image in a color image by using a sliding window, extracting features of a HOG (Histogram of Oriented grids) in the window, and then judging whether the window contains a human body or not by using a trained SVM (Support Vector Machine) classifier, so that all windows possibly containing the human body in the color image can be obtained; secondly, a sliding window is also used in the depth image, Haar (Haar) features in the window are extracted, an Adaboost classifier is used for classifying whether the window is the head of a person or not, and then all windows which possibly contain the head of the person in the depth image are obtained; finally, according to the fusion detection of the spatial proportion information, fusing the two detection results according to the head-body ratio (about 1: 7) of the person to obtain a detection result fusing the multi-mode information;
(21) firstly, human body detection based on HOG characteristics and SVM classifiers is carried out in the color image. The flow chart of this operation is shown in fig. 2, and the specific steps are as follows:
(211) the multi-scale sliding window scans the color image. Firstly, enlarging and reducing an original color image according to the proportion of 1.05 (generally in the interval of 1.01-1.5) to obtain color images with multiple scales; then, a sliding window of a fixed size (64 × 128) is slid over each color image to check whether a human body is included in the window.
(212) HOG features within the window are extracted. The extraction steps of the HOG features are as follows:
(2121) graying and grayscale normalization. Since the HOG features mainly describe edge gradient features, the color information has little effect, and in order to reduce the influence of dark light illumination, the whole color image needs to be grayed first and then normalized.
(2122) The gradient of each pixel in the color image is calculated. Wherein the gradient G of the color image in the x and y directions at (x, y)x,GyAs shown in the formulas (4) and (5), the gradient at that position has an amplitude of
Figure BDA0002282381700000081
In the direction of
Figure BDA0002282381700000082
Gx(x,y)=I(x+1,y)-I(x-1,y) (4);
Gy(x,y)=I(x,y+1)-I(x,y-1) (5);
(2123) The cells of the pixels are divided by 8 multiplied by 8, the gradient information of all the pixels in one cell is counted, and the result is represented by a histogram in the gradient direction. The directional channels of the histogram are evenly distributed between 0 deg. -180 deg. (unsigned gradient) or 0 deg. -360 deg. (signed gradient). To reduce aliasing effects, the vote values of neighboring channels in the histogram are also passed through bilinear differences in direction and position.
(2124) The blocks are divided in 16 x 16 size and the gradient histograms are contrast normalized within the block.
(2125) Setting a detection window with the size of 64 multiplied by 128, generating a feature vector in the detection window, and combining the feature vectors of each block in the detection window to obtain the feature vector of the detection window for subsequent classification.
(213) And (5) SVM classification. And (3) classifying the feature vectors in the detection window obtained in the step (212) by using an SVM classifier, and giving a probability score (the value range is 0-1) of the human body in the detection window.
(214) And detecting the human body in the color image. And synthesizing the classification result of each detection window, and performing non-maximum suppression according to the probability score of each window to obtain the human body detection result in the color image.
(22) Secondly, human head detection based on Haar features and an Adaboost cascade classifier is adopted in the depth image. A flow chart of this operation is shown in fig. 3. The method comprises the following specific steps:
(221) the multi-scale sliding window scans the depth image. Firstly, enlarging and reducing an original depth image according to the proportion of 1.05 (generally in the interval of 1.01-1.5) to obtain depth images with multiple scales; then, a sliding window of a fixed size (30 × 30) is slid over each depth image to check whether the window contains a human head.
(222) Haar features within the window are extracted. The Haar features are simple rectangular block features and are divided into three types, namely edge features, linear features and diagonal features, and the feature value of each rectangular region is the sum of pixels in a white region and the sum of pixels in a black region subtracted. The computation of features may be accelerated using an integral map;
(223) the Adaboost classification. The classifier is trained using the AdaBoost algorithm. The AdaBoost algorithm is a way to boost weak learners with enough data to generate strong learners with high accuracy. A weak classifier hj(x) As shown in formula (6), wherein fjIs characterized by, thetajIs a threshold value, pjThe effect of (a) is to control the direction of the inequality, x being a 24 x 24 depth image sub-window. And (3) training N times to obtain N weak classifiers, and adding normalized weight to the N training time, wherein the weight is probability distribution. Training a classifier h for each feature jj,hjUsing only a single feature, the one classifier h with the lowest error is selectednUpdating the weight to obtain a strong classifier;
Figure BDA0002282381700000091
the classifiers are then combined into more complex classifiers using a cascaded structure. A cascade classifier is a combination of a series of strong classifiers in which the classifiers at each level are thresholded to minimize false negatives in order to allow most objects to pass through, while non-object regions are rejected. The classifier at the front end uses a small number of features, the classifier at the rear end uses a large number of features, the calculation is fast, and the depth image which can finally reach the rear end is very small, so that the overall calculation speed is very high. And (4) classifying the feature vectors in the detection window obtained in the step (222) by using an Adaboost classifier, and giving a probability score (the value range is 0-1) of the existence of the human head in the detection window.
(224) And detecting the human head in the depth image. And synthesizing the classification result of each detection window, and performing non-maximum value inhibition according to the probability score of each window to obtain the human head detection result in the depth image.
(23) According to the fusion detection of the spatial proportion information, the two detection results are fused according to the head-body ratio of the person to obtain the detection result of the fused multi-mode information, and a flow chart is shown in fig. 4 and specifically comprises the following steps:
(231) and acquiring a head and body detection result. Acquiring a body frame in the color image from the step (21), acquiring a head frame in the depth image from the step (22), and traversing the set of body frames to perform the following operation on each body frame;
(232) judging whether the body frame has a head frame or not, and if not, deleting the body frame; if so, executing the next step;
(233) judging whether the number of the head frames in the body frame is 1, if so, associating the body frame and the head frames to form multi-modal combined human body detection; if the number of the head frames in the body frame exceeds one, selecting an optimal head frame according to the positions of the head frames and respective confidence degrees, and then associating the optimal head frame with the current body frame;
the body detected in the color image alone and the head detected in the depth image alone may be erroneously detected (a region that is not a target is detected as a target) or erroneously detected (a target is not detected). In order to make the detection result more reliable, the RGB-D information, i.e. the body frame in the color image and the head frame in the depth image, needs to be fused. By adjusting parameters, targets can be detected as many as possible in an independent detection stage, namely, missing detection is reduced, in a fusion stage, body frames and head frames are screened according to the proportion (about 1: 7) of the heads and bodies of most normal people, and the final result is that only one head frame is required in each body frame, so that most false detections can be eliminated, the false detection probability is greatly reduced, and the accuracy is improved.
(3) Human body tracking based on multi-modal information perception: firstly, initializing a model of a tracked object in a color image and a depth image respectively; secondly, respectively tracking the body and the head in the color image and the depth image by using a nuclear correlated filtering algorithm; and finally, if the confidence coefficient is higher in the tracking process, updating the tracking object model to adapt to the change of the tracking object. The flow chart of the above process is shown in fig. 5, and the specific steps are as follows:
(31) and establishing a model of the tracking object in the color image and the depth image. In the color image, the model of the tracking object is a color histogram, and in the depth image, the model of the tracking object is a depth template picture. The steps of extracting the color histogram are as follows: firstly, converting RGB color into HSV color space, wherein H is hue, S is saturation and V is brightness, then extracting H channel according to formula (7), and counting distribution of H value in window to form color histogram. The method for extracting the depth template picture comprises the steps of intercepting a head surrounding frame before tracking is started, zooming to a standard size, and taking the head surrounding frame as the template picture of head tracking in the depth image;
R′=R/255,G′=G/255,B′=B/255
Cmax=max(R′,G′,B′),Cmin=min(R′,G′,B′)
Δ=Cmax-Cmin
Figure BDA0002282381700000111
(32) the body is tracked in the color image and the head is tracked in the depth image simultaneously using the KCF (Kernelized Correlation Filters) algorithm. The method comprises the following steps: the method comprises the steps of extracting pixel values around a tracked object by using a circulation matrix to serve as training samples, training a discriminant function by using ridge regression, and transforming the samples to a kernel space by using kernel transformation to solve the problem that the sample linearity is not separable. In the operation, discrete Fourier transform can be used for diagonalizing the sample matrix in Fourier space, and the calculation of the matrix can be replaced by the dot product of the vector, especially the inversion of the matrix, so that the calculation speed is greatly improved.
(33) The model of the tracked object is matched and updated during the tracking process. The matching method comprises calculating normalized correlation coefficient of the tracking object and the initial tracking object model, wherein the correlation coefficient is calculated as shown in formulas (8) and (9), wherein d is color histogram H1And H2R is the normalized correlation coefficient of the depth template pictures T and I. The values of the two numbers are in the range of [0,1 ]]In between, a larger one indicates a higher degree of matching, and 0 indicates the worst matching effect. And if the matching value is greater than 0.9, namely the confidence of the algorithm on the tracking result is higher, performing weighted updating on the tracking object model. The weight of the initial tracking object model is 1-w, and the current model weight of the tracking object is w, where w is 0.5 × d or w is 0.5 × R.
Figure BDA0002282381700000112
Figure BDA0002282381700000113
(4) And (3) perfecting a tracking mechanism by utilizing space constraints of a tracking object model and a head-to-body ratio: firstly, model feature extraction of a tracked object is continuously carried out in the tracking process, and the feature is matched with an initial tracked object model to judge whether tracking is effective or not; secondly, if the situation that one tracker fails and the other tracker is still effective occurs in the tracking process, the still effective tracking result is used in a short time, the position of the failed tracking object is searched in a specified range based on the space constraint of the head-body ratio, and the tracking is timely recovered; finally, if a situation occurs where both trackers fail, the algorithm needs to be stopped and a warning is given to the user. The flow chart of the step is shown in fig. 6, and the specific steps are as follows:
(41) and (4) judging the tracking effectiveness according to the normalized correlation coefficient size provided in the step (33) in the tracking process. Firstly, judging whether the head tracking is effective, namely whether the R value in the formula (9) is more than 0.5, if so, turning to (42), otherwise, turning to (43);
(42) judging whether the body tracking is effective, namely whether the d value in the formula (8) is more than 0.5, if so, indicating that the body tracking in the current color image and the head tracking in the depth image are both effective, and continuing the normal tracking process, otherwise, turning to the step (44);
(43) judging whether the body tracking is effective, namely whether the value d in the formula (8) is greater than 0.5, if so, indicating that the head tracking in the depth image is invalid, and the body tracking in the color image is still effective, estimating the position of the head from the position of the body according to the space position constraint of the head and the body, continuously performing the matching of a head model, and recovering the head tracking once the matching is effective; otherwise, switching to (45);
(44) in this case, the head tracking in the depth image is effective, while the body tracking in the color image is ineffective, and it is necessary to estimate an approximate body position from the head position according to the spatial position constraints of the head and the body, and to continuously perform the matching of the body color histogram, and to recover the body tracking once the matching is effective;
(45) at this point, the tracking of the head and the tracking of the body have both failed, indicating that the tracked object is not present in the color image and the depth image due to occlusion or fast motion, in which case the tracking algorithm stops and a warning needs to be given to the user to make an appropriate response.

Claims (8)

1. A human body detection and tracking method based on multi-modal information perception is characterized in that: the method comprises the following steps:
(1) calibrating a color camera and a depth camera and performing data filtering, aligning the color image and the depth image through calibration, and then performing filtering respectively;
(2) human body detection based on multi-modal information perception: detecting a body in the color image, detecting a head in the depth image, and fusing according to the spatial proportion information;
(3) human body tracking based on multi-modal information perception: respectively tracking the body and the head in the color image and the depth image by using a nuclear correlated filtering target tracking algorithm, and establishing a tracking object model to check a tracking result;
(4) a tracking mechanism is perfected by utilizing the space constraint of a tracking object model and a head-body ratio, and if a single tracker fails in the tracking process, the tracking stability is maintained according to the space position constraint of the head and the body.
2. The human body detection and tracking method based on multi-modal information perception according to claim 1, wherein the step (1) comprises the following steps:
(11) a color camera and a depth camera are respectively used for shooting a plurality of checkerboard pictures at different angles and different distances, so that each position of an image can be covered;
(12) detecting and matching angular points in different images, and calculating an internal and external reference matrix of the color camera according to the matched angular point pair, wherein the internal reference matrix of the color camera is as follows:
Figure FDA0002282381690000011
wherein fx and fy are focal lengths, x0 and y0 are principal point coordinates relative to an imaging plane, and s is a coordinate axis inclination parameter;
the color phase machine external parameter matrix is as follows:
Figure FDA0002282381690000012
wherein, R is a rotation matrix of 3 × 3, t is a translation vector of 3 × 1, and both are relative to a world coordinate system;
(13) mapping the depth values onto the color image;
let P be the position of a spatial point, PrgbAnd pirFor its coordinates in color and depth images, KrgbAnd KirInternal reference matrices, R, for colour and depth cameras respectivelyirAnd tirMapping the depth value of the P point to the coordinate of the color image by the following formula, wherein the color camera is used as an external parameter of a reference system of the depth camera;
prgb=KrgbRirKir -1pir+tir
(14) and simultaneously shrinking the registered color image and the depth image, removing high-frequency noise in the color image by using Gaussian filtering, and removing a depth missing point in the depth image by using median filtering.
3. The human body detection and tracking method based on multi-modal information perception according to claim 1, wherein the step (2) comprises the steps of:
(21) scanning an image in a color image by using a sliding window, extracting HOG characteristics in the window, and judging whether the window contains a human body by using a trained SVM classifier to obtain all windows possibly containing the human body in the color image;
(22) using a sliding window in the depth image, extracting Haar features in the window, using an Adaboost classifier to classify whether the window is a human head, and obtaining all windows which possibly contain the human head in the depth image;
(23) and (3) according to the fusion detection of the spatial proportion information, and according to the head-body ratio fusion step (21) and the step (22), obtaining a detection result of fusion multi-mode information.
4. The human body detection and tracking method based on multi-modal information perception according to claim 3, wherein the step (21) comprises the steps of:
(211) scanning a color image by a multi-scale sliding window, and firstly amplifying and reducing an original color image according to a preset proportion to obtain color images of multiple scales; then sliding on each color image by a sliding window with a fixed size, and checking whether the window contains a human body;
(212) extracting the HOG features in the window, wherein the HOG features are extracted by the following steps:
(2121) graying and gray level normalization;
firstly, graying the whole color image, and then normalizing;
(2122) calculating the gradient of each pixel in the color image;
wherein the gradient G of the color image in the x and y directions at (x, y)x,GyRespectively as follows:
Gx(x,y)=I(x+1,y)-I(x-1,y);
Gy(x,y)=I(x,y+1)-I(x,y-1);
the magnitude of the gradient is then
Figure FDA0002282381690000021
In the direction of
Figure FDA0002282381690000022
(2123) Dividing the cells of pixels by 8 multiplied by 8, counting the gradient information of all pixels in one cell, and representing the result by a histogram in the gradient direction;
(2124) dividing blocks by 16 multiplied by 16, and carrying out contrast normalization on the gradient histogram in the blocks;
(2125) setting a detection window with the size of 64 multiplied by 128, generating a characteristic vector in the detection window, and combining the characteristic vectors of each block in the detection window to obtain the characteristic vector of the detection window for subsequent classification;
zooming an original color image to form a color image pyramid, sliding a detection window on the color image of the current scale, classifying each position by using a trained SVM classifier, and judging whether a human body exists at the position; and finally, carrying out a non-maximum suppression algorithm on the obtained result to eliminate multiple detection windows of the same target.
5. The method for detecting and tracking human body based on multi-modal information perception according to claim 3, wherein the step (22) comprises the steps of:
(221) scanning a depth image by a multi-scale sliding window, and firstly amplifying and reducing an original depth image according to a preset proportion to obtain depth images of multiple scales; then sliding a sliding window with a fixed size on each depth image, and checking whether the window contains a human head;
(222) extracting Haar features in the window;
the Haar features are simple rectangular block features and are divided into three types, namely edge features, linear features and diagonal features, and the feature value of each rectangular region is the sum of pixels in a white region and the sum of pixels in a black region subtracted;
(223) adaboost classification, which is to train a classifier by using an AdaBoost algorithm;
the AdaBoost algorithm is a mode of improving a weak learner through enough data so as to generate a high-precision strong learner; a weak classifier hj(x) As shown in formula:
Figure FDA0002282381690000031
wherein f isjIs characterized by, thetajIs a threshold value, pjThe role of (a) is to control the direction of the inequality, x is a 24 x 24 image sub-window; training N times to obtain N weak classifiers, and adding normalized weight to the N training time, wherein the weight is probability distribution; training a classifier h for each feature jj,hjUsing only a single feature, the one classifier h with the lowest error is selectednUpdating the weight to obtain a strong classifier;
classifying the Haar feature vectors in the detection window obtained in the step (222) by using an Adaboost classifier, and giving a probability score of the existence of the human head in the detection window;
(224) and (3) integrating the human head detection result in the depth image, and performing non-maximum suppression according to the probability score of each window to obtain the human head detection result in the depth image.
6. The human body detection and tracking method based on multi-modal information perception according to claim 3, wherein the step (23) comprises the steps of:
(231) acquiring a head and body detection result, acquiring a body frame in the color image from the step (21), acquiring a head frame in the depth image from the step (22), traversing a set of body frames, and executing the following operation on each body frame;
(232) judging whether the body frame has a head frame or not, if not, deleting the body frame, and returning to the step (231); if so, performing step (233);
(233) judging whether the number of the head frames in the body frame is 1, if so, associating the body frame and the head frames to form multi-modal combined human body detection; if the number of the head frames in the body frame exceeds one, selecting an optimal head frame according to the positions of the head frames and the respective confidence degrees, and then associating the optimal head frame with the current body frame.
7. The human body detection and tracking method based on multi-modal information perception according to claim 1, wherein the step (3) comprises the following steps:
(31) establishing models of the tracked object in the color image and the depth image;
in the color image, the model is a color histogram, and in the depth image, the model is a depth template image; the steps of extracting the color histogram are as follows: firstly, converting RGB color into HSV color space, wherein H is hue, S is saturation and V is brightness, then extracting H channel according to the following formula, and counting distribution of H values in a window to form a color histogram; the method for extracting the depth template picture comprises the steps of intercepting a head surrounding frame before tracking is started, zooming to a standard size, and taking the head surrounding frame as the template picture of head tracking in the depth image;
R′=R/255,G′=G/255,B′=B/255
Cmax=max(R′,G′,B′),Cmin=min(R′,G′,B′)
Δ=Cmax-Cmin
Figure FDA0002282381690000041
(32) tracking the body in the color map and the head in the depth map simultaneously using a KCF nucleation correlation filtering algorithm; the method comprises the following steps: extracting pixel values around a tracked object by using a circulation matrix to serve as training samples, training a discriminant function by using ridge regression, and transforming the samples to a kernel space by using kernel transformation to solve the problem that the sample linearity is not separable;
(33) matching and updating the object model in the tracking process; the matching method is to calculate the normalized correlation coefficient of the tracking object and the initial model, and the calculation formula of the coefficient is as follows:
Figure FDA0002282381690000042
Figure FDA0002282381690000051
wherein d is a color histogram H1And H2R is the normalized correlation coefficient of the depth template pictures T and I; the values of the two numbers are in the range of [0,1 ]]The larger the matching degree, the higher the matching degree, and 0 represents the worst matching effect; if the matching value is greater than 0.9, namely the confidence of the algorithm on the tracking result is higher, carrying out weighted updating on the model; the weight of the initial model is 1-w, and the weight of the current model of the tracked object is w, wherein w is 0.5 × d or w is 0.5 × R.
8. The human body detection and tracking method based on multi-modal information perception according to claim 1, wherein the step (4) comprises the steps of:
(41) judging the tracking effectiveness according to the normalized correlation coefficient provided in the step (3) in the tracking process; firstly, judging whether the head tracking is effective, namely whether the normalized correlation coefficient R values of the depth template pictures T and I are greater than 0.5, if so, turning to the step (42), otherwise, turning to the step (43);
(42) to determine whether body tracking is valid, i.e. color histogram H1And H2If the normalized correlation coefficient d is larger than 0.5, the body tracking result in the current color image and the head tracking result in the depth image are both effective, and the normal tracking process is continued, otherwise, the step (44) is carried out;
(43) to determine whether body tracking is valid, i.e. color histogram H1And H2If the normalized correlation coefficient d is larger than 0.5, the head tracking in the depth image is invalid, the body tracking in the color image is still effective, the position of the head is estimated according to the spatial position constraint of the head and the body, the matching of a head model is continuously carried out, and the head tracking is recovered once the matching is effective; otherwise, turning to step (45);
(44) in this case, the head tracking in the depth image is effective, while the body tracking in the color image is ineffective, and it is necessary to estimate an approximate body position from the head position according to the spatial position constraints of the head and the body, and to continuously perform the matching of the body color histogram, and to recover the body tracking once the matching is effective;
(45) at this time, the tracking of the head and the tracking of the body both fail to indicate that the tracked object is not present in the color image and the depth image due to occlusion or fast motion, in which case the tracking algorithm stops and a warning needs to be given to the user to make an appropriate response.
CN201911146615.4A 2019-11-21 2019-11-21 Human body detection and tracking method based on multi-mode information perception Active CN111144207B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911146615.4A CN111144207B (en) 2019-11-21 2019-11-21 Human body detection and tracking method based on multi-mode information perception

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911146615.4A CN111144207B (en) 2019-11-21 2019-11-21 Human body detection and tracking method based on multi-mode information perception

Publications (2)

Publication Number Publication Date
CN111144207A true CN111144207A (en) 2020-05-12
CN111144207B CN111144207B (en) 2023-07-07

Family

ID=70517199

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911146615.4A Active CN111144207B (en) 2019-11-21 2019-11-21 Human body detection and tracking method based on multi-mode information perception

Country Status (1)

Country Link
CN (1) CN111144207B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667518A (en) * 2020-06-24 2020-09-15 北京百度网讯科技有限公司 Display method and device of face image, electronic equipment and storage medium
CN111968087A (en) * 2020-08-13 2020-11-20 中国农业科学院农业信息研究所 Plant disease area detection method
CN112150448A (en) * 2020-09-28 2020-12-29 杭州海康威视数字技术股份有限公司 Image processing method, device and equipment and storage medium
CN113393401A (en) * 2021-06-24 2021-09-14 上海科技大学 Object detection hardware accelerators, systems, methods, apparatus, and media
WO2022228019A1 (en) * 2021-04-25 2022-11-03 深圳市优必选科技股份有限公司 Moving target following method, robot, and computer-readable storage medium
TWI798663B (en) * 2021-03-22 2023-04-11 伍碩科技股份有限公司 Depth image compensation method and system
CN116580828A (en) * 2023-05-16 2023-08-11 深圳弗瑞奇科技有限公司 Visual monitoring method for full-automatic induction identification of cat health

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136139A (en) * 2010-01-22 2011-07-27 三星电子株式会社 Target attitude analyzing device and target attitude analyzing method thereof
CN102800126A (en) * 2012-07-04 2012-11-28 浙江大学 Method for recovering real-time three-dimensional body posture based on multimodal fusion
US20160180195A1 (en) * 2013-09-06 2016-06-23 Toyota Jidosha Kabushiki Kaisha Augmenting Layer-Based Object Detection With Deep Convolutional Neural Networks
CN106503615A (en) * 2016-09-20 2017-03-15 北京工业大学 Indoor human body detecting and tracking and identification system based on multisensor
CN107093182A (en) * 2017-03-23 2017-08-25 东南大学 A kind of human height's method of estimation of feature based flex point
CN107197384A (en) * 2017-05-27 2017-09-22 北京光年无限科技有限公司 The multi-modal exchange method of virtual robot and system applied to net cast platform

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102136139A (en) * 2010-01-22 2011-07-27 三星电子株式会社 Target attitude analyzing device and target attitude analyzing method thereof
CN102800126A (en) * 2012-07-04 2012-11-28 浙江大学 Method for recovering real-time three-dimensional body posture based on multimodal fusion
US20160180195A1 (en) * 2013-09-06 2016-06-23 Toyota Jidosha Kabushiki Kaisha Augmenting Layer-Based Object Detection With Deep Convolutional Neural Networks
CN106503615A (en) * 2016-09-20 2017-03-15 北京工业大学 Indoor human body detecting and tracking and identification system based on multisensor
CN107093182A (en) * 2017-03-23 2017-08-25 东南大学 A kind of human height's method of estimation of feature based flex point
CN107197384A (en) * 2017-05-27 2017-09-22 北京光年无限科技有限公司 The multi-modal exchange method of virtual robot and system applied to net cast platform

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667518A (en) * 2020-06-24 2020-09-15 北京百度网讯科技有限公司 Display method and device of face image, electronic equipment and storage medium
CN111667518B (en) * 2020-06-24 2023-10-31 北京百度网讯科技有限公司 Face image display method and device, electronic equipment and storage medium
CN111968087A (en) * 2020-08-13 2020-11-20 中国农业科学院农业信息研究所 Plant disease area detection method
CN111968087B (en) * 2020-08-13 2023-11-07 中国农业科学院农业信息研究所 Plant disease area detection method
CN112150448A (en) * 2020-09-28 2020-12-29 杭州海康威视数字技术股份有限公司 Image processing method, device and equipment and storage medium
CN112150448B (en) * 2020-09-28 2023-09-26 杭州海康威视数字技术股份有限公司 Image processing method, device and equipment and storage medium
TWI798663B (en) * 2021-03-22 2023-04-11 伍碩科技股份有限公司 Depth image compensation method and system
WO2022228019A1 (en) * 2021-04-25 2022-11-03 深圳市优必选科技股份有限公司 Moving target following method, robot, and computer-readable storage medium
CN113393401A (en) * 2021-06-24 2021-09-14 上海科技大学 Object detection hardware accelerators, systems, methods, apparatus, and media
CN113393401B (en) * 2021-06-24 2023-09-05 上海科技大学 Object detection hardware accelerator, system, method, apparatus and medium
CN116580828A (en) * 2023-05-16 2023-08-11 深圳弗瑞奇科技有限公司 Visual monitoring method for full-automatic induction identification of cat health
CN116580828B (en) * 2023-05-16 2024-04-02 深圳弗瑞奇科技有限公司 Visual monitoring method for full-automatic induction identification of cat health

Also Published As

Publication number Publication date
CN111144207B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN111144207B (en) Human body detection and tracking method based on multi-mode information perception
US11699293B2 (en) Neural network image processing apparatus
JP6305171B2 (en) How to detect objects in a scene
TWI383325B (en) Face expressions identification
Chang et al. Tracking Multiple People Under Occlusion Using Multiple Cameras.
CN111967498A (en) Night target detection and tracking method based on millimeter wave radar and vision fusion
CN108416291B (en) Face detection and recognition method, device and system
CN111965636A (en) Night target detection method based on millimeter wave radar and vision fusion
KR20110064117A (en) Method for determining frontal pose of face
CN113850865A (en) Human body posture positioning method and system based on binocular vision and storage medium
CN112784712B (en) Missing child early warning implementation method and device based on real-time monitoring
CN109886195B (en) Skin identification method based on near-infrared monochromatic gray-scale image of depth camera
CN114399675A (en) Target detection method and device based on machine vision and laser radar fusion
CN110909561A (en) Eye state detection system and operation method thereof
CN108021926A (en) A kind of vehicle scratch detection method and system based on panoramic looking-around system
Fang et al. Laser stripe image denoising using convolutional autoencoder
CN115375991A (en) Strong/weak illumination and fog environment self-adaptive target detection method
CN112749664A (en) Gesture recognition method, device, equipment, system and storage medium
CN109410272B (en) Transformer nut recognition and positioning device and method
CN113723432B (en) Intelligent identification and positioning tracking method and system based on deep learning
CN108694348B (en) Tracking registration method and device based on natural features
Hadi et al. Fusion of thermal and depth images for occlusion handling for human detection from mobile robot
EP3469517A1 (en) Curvature-based face detector
Zarkasi et al. Implementation color filtering and Harris corner method on pattern recognition system
CN111950549A (en) Sea surface obstacle detection method based on fusion of sea antennas and visual saliency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant