CN109166136B - Target object following method of mobile robot based on monocular vision sensor - Google Patents

Target object following method of mobile robot based on monocular vision sensor Download PDF

Info

Publication number
CN109166136B
CN109166136B CN201810980715.6A CN201810980715A CN109166136B CN 109166136 B CN109166136 B CN 109166136B CN 201810980715 A CN201810980715 A CN 201810980715A CN 109166136 B CN109166136 B CN 109166136B
Authority
CN
China
Prior art keywords
target object
target
matrix
image
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810980715.6A
Other languages
Chinese (zh)
Other versions
CN109166136A (en
Inventor
刘希龙
张茗奕
庞磊
曹志强
徐德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201810980715.6A priority Critical patent/CN109166136B/en
Publication of CN109166136A publication Critical patent/CN109166136A/en
Application granted granted Critical
Publication of CN109166136B publication Critical patent/CN109166136B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of computer vision recognition, and particularly provides a target object following method of a mobile robot based on a monocular vision sensor, aiming at solving the problems that the existing mobile robot has poor robustness in following a target object and is difficult to ensure the following quality of target objects such as pedestrians. For this purpose, the method provided by the invention comprises the steps that the mobile robot acquires an image of a target object according to a target area of the target object; obtaining a characteristic matrix of the image of the target object; determining the central point of the target object by using a target tracking algorithm and a characteristic matrix; and determining the area where the target object is located according to the central point and the outline frame of the target object, taking the area as a first area, judging whether the proportion of the area of the first area in the image of the target object is greater than a set threshold value, and executing corresponding operation according to the judgment result. Based on the steps, the method provided by the invention has good real-time performance and robustness, and can realize effective following of the target object.

Description

Target object following method of mobile robot based on monocular vision sensor
Technical Field
The invention relates to the field of computer vision, in particular to the field of mobile robots, and particularly relates to a target object following method of a mobile robot based on a monocular vision sensor.
Background
The intelligent mobile robot following the moving target is widely applied to the fields of home service, old and disabled assisting, scene monitoring, intelligent vehicles and the like. The following of the mobile robot to the target object relates to the fields of computer vision, motion control, mode recognition and the like, and has wide application prospect.
Unlike TLD (Tracking-Learning-Detection), strruck, and other methods, the following of the target object by the mobile robot requires high robustness. Generally, a mobile robot senses using a laser sensor or a vision sensor. Although the laser sensor can obtain accurate distance information, the robustness in the aspects of identification and following of a target object is insufficient, and the problem of retrieving the target after the target is lost is difficult to solve. The vision sensor can provide more abundant environmental information, and the vision sensor includes binocular vision sensor and monocular vision sensor. Compared with target object following based on a binocular vision sensor, the target object following algorithm based on the monocular vision sensor is short in processing time and beneficial to guaranteeing real-time performance. At present, methods such as color segmentation and particle filtering are mostly used for target object following of a monocular vision sensor, but due to the existence of interferences such as a complex background, a target object entering and exiting a visual field and the like, the robustness of the methods is poor, and effective following of the target object such as a pedestrian is difficult to realize.
Disclosure of Invention
In order to solve the problems in the prior art, namely the problems that the existing mobile robot is poor in robustness when following a target object and cannot follow the target object such as a pedestrian easily, the invention provides a monocular vision sensor-based target object following method for the mobile robot, which improves robustness on a complex background and ensures that the mobile robot follows the target object such as the pedestrian in different environments. The target object following method of the mobile robot based on the monocular vision sensor comprises the following steps:
step 101: the mobile robot acquires an image of the target object according to the target area of the target object;
step 102: preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by utilizing feature transformation;
step 103: determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; taking the position of the maximum value corresponding to the element in the response matrix as a displacement coordinate of the target object relative to a pre-acquired initial target area, and determining the position of the target object as a central point of the target object by combining the central position of the initial target area;
step 104: determining an area where the target object is located according to the central point and the outline frame of the target object, and taking the area as a first area; judging whether the proportion of the area of the first region in the image of the target object is larger than a set threshold value, if so, executing a step 105, and if not, executing a step 106;
step 105: determining that the first region is a target region of the target object;
step 106: and constructing a detection area by taking the position of the central point as a center, detecting the target object again in the detection area by using a target detector, taking the area where the detected target object is located as a second area, and taking the second area as the target area of the target object if the proportion of the area of the second area to the area of the image in the image of the target object is greater than the set threshold value.
In a preferred embodiment of the above method, the step of "preprocessing the image of the target object and obtaining the feature matrix of the image of the target object by using feature transformation" includes:
performing convolution calculation on the image in the initial target area of the image of the target object to obtain a middle image block of the image of the target object;
performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object;
the feature transformation comprises FHOG feature transformation and RGB feature transformation, and the feature matrix is a multi-channel feature matrix.
In a preferred embodiment of the foregoing method, when the feature transformation is FHOG feature transformation, the step of performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object includes:
and performing characteristic transformation on the intermediate image block according to a method shown as the following formula:
calculating to obtain FHGG characteristic matrix F of the intermediate image blocki FHOGThe rows and columns of the FHOG feature matrix are determined by the following formula:
Figure BDA0001778452640000031
wherein Hi FHOGIs the row, W, of the FHOG feature matrix at time ii FHOGIs the column of the FHOG feature matrix at time i, Hi tar’Is the length of the initial target region at time i, Wi tar’Is the width of the initial target area at time i, cell is the parameter of the descriptor of the FHOG feature matrix, ncellRefers to the size of the cell.
In a preferred embodiment of the above method, when the feature transformation is RGB feature transformation, the step of performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object includes:
performing RGB feature transformation on the intermediate image block to obtain a three-channel feature matrix; performing Gaussian down-sampling on the three-channel feature matrix to obtain an RGB feature matrix, wherein the number of rows of the RGB feature matrix is Hi FHOGThe number of rows is Wi FHOG
And splicing the FHOG characteristic matrix and the RGB characteristic matrix by a channel splicing method to obtain a multi-channel characteristic matrix of the image of the target object.
In a preferred technical solution of the above method, the step of determining the response matrix of the feature matrix by using a target tracking algorithm includes:
generating an initial target characteristic template and an initial parameter matrix of the characteristic matrix in a Fourier domain by using a KCF tracker;
and updating the target characteristic template and the parameter matrix of the target tracking algorithm by using the initial target characteristic template and the initial parameter matrix through the following formulas:
Figure BDA0001778452640000032
wherein M isi-1Is a target feature template at time i-1, Ti-1Is a parameter matrix at the i-1 th time, MiIs a target feature template at time i, TiIs a parameter matrix at the ith time, Mi' is the initial target feature template at time i, Ti' is the initial parameter matrix at the ith time, and alpha and beta are scale factors.
According to the updated target feature template and the parameter matrix, calculating the response matrix by the following formula:
Pi=Γ-1(Mi☉Ti)
wherein ☉ is a matrix dot product operator, PiIs the response matrix at time i, Γ-1Is an inverse fourier transform function.
In a preferred technical solution of the above method, the step of determining the position of the target object as a center point of the target object includes:
recording the maximum value of all elements in the response matrix;
taking the positions of the elements in the response matrix corresponding to the maximum values of all the elements as displacement coordinates of the target object relative to the previous frame image in the current frame image;
obtaining the position of the target object in the current image frame according to the central position of the initial target area, and determining the position of the central point of the target object in the current image frame by the following formula:
(ui tar,vi tar)=(ui tar’,vi tar’)+ncell×(ui,vi)
wherein (u)i,vi) (u) represents the coordinates of the displacement of the target object in the image at the i-th time instant, which is the current frame, with respect to the previous frame imagei tar’,vi tar’) (u) represents the center position of the initial target region at the i-th timei tar,vi tar) Representing the position of said target object in the current image frame, i.e. at the i-th instant, ncellThe size of the FHOG characteristic parameter cell is indicated.
In a preferred embodiment of the above method, the step of determining the area where the target object is located according to the central point and the outline frame of the target object includes:
obtaining the optimal scale of a target object in the image by using a method of adjacent scale sampling and least square fitting;
and determining the length and the width of the target region according to the optimal scale and the length and the width of the initial target region after determining scale transformation, thereby determining the target region.
In a preferred embodiment of the above method, the step of "obtaining an optimal scale of the target object in the image by a method of adjacent scale sampling and least square fitting" includes:
carrying out scale transformation on an intermediate image block of the image of the target object to obtain a first image block and a second image block;
respectively obtaining a first multi-channel feature matrix and a second multi-channel feature matrix corresponding to the first image block and the second image block by utilizing the feature transformation;
respectively calculating a first response matrix and a second response matrix of the first multichannel characteristic matrix and the second multichannel characteristic matrix by using the target tracking algorithm, and respectively recording the maximum values of elements in the response matrices;
and performing least square fitting according to the first response matrix, the second response matrix and the maximum value of the elements in the response matrix, and determining the optimal scale of the target object in the image of the target object.
In a preferred embodiment of the above method, the step of "constructing a detection area centered on the position of the central point, detecting the target object again by using a target detector in the detection area, and setting an area where the target object is located as a second area" includes:
constructing a detection area by taking the position of the central point as a center;
carrying out target object detection on the image blocks in the detection area through a pre-trained SVM model to determine a target frame set;
calculating a multi-channel feature matrix of the image block corresponding to the target frame by using the feature extraction method;
calculating a response matrix of each multi-channel feature matrix according to a KCF tracker and the multi-channel feature matrix, and recording the maximum value of elements in the response matrix;
taking a target frame which contains the central point and contains a maximum element value as an effective target frame, marking the coordinate of the central point of the effective target frame, and taking the corresponding position of the maximum value in the response matrix as the displacement coordinate of the target object relative to the central point of the target frame;
and according to the position of the central point of the target frame, taking the position of the target object as the central position of the effective target object, and constructing a second target area by taking the central position of the target object as the center.
In a preferred embodiment of the above method, the method further comprises:
when no effective target frame exists in the target frame set, controlling the mobile robot to rotate towards the direction that the target disappears in the visual field according to the following method:
Figure BDA0001778452640000051
wherein k isp1Denotes the proportionality coefficient, Vi leftIndicating the left wheel speed, V, of the mobile robot at time ii rightRepresents the speed of the right wheel at the i-th moment of the mobile robot, (u)0 tar,v0 tar) (ii) preset position coordinates representing the target object, (u)last tar,vlast tar) Representing the reality of the target object at a moment in time before it leaves the field of view of the robotThe position coordinates.
The target object following method of the mobile robot based on the monocular vision sensor can achieve effective following of the target object. The invention has good real-time performance, can still realize the following of the mobile robot to the target objects such as pedestrians and the like under the interference of illumination change, target shielding and the like, and has better robustness.
Drawings
Fig. 1 is a flowchart of an embodiment of a target object following method of a mobile robot based on a monocular vision sensor according to the present application.
Detailed Description
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 is a flowchart of an embodiment of a target object following method of a mobile robot based on a monocular vision sensor according to the present application. The monocular vision sensor is arranged on the mobile robot, and the direction of the optical axis of the monocular vision sensor is consistent with the positive direction of the mobile robot. The target object following method of the mobile robot based on the monocular vision sensor comprises the following steps:
step 101, the mobile robot acquires an image of the target object according to the target area of the target object.
In the present embodiment, the monocular vision sensor acquires the image I at the current time (i.e., the ith time)i scrWhere i is 1,2,3, …, the target area Φ determined at the time of i-1i-1 tarInitial target region phi as the ith timei tar’,(ui tar’,vi tar’)=(ui-1 tar,vi-1 tar),Hi tar’=Hi-1 tar,Wi tar’=Wi-1 tarWherein the target region phii-1 tarIs a rectangular area, (u)i-1 tar,vi-1 tar) Is a target region phii-1 tarCentral coordinate of (H)i-1 tarAnd Wi-1 tarRespectively target region phii-1 tarLength and width of (u)i tar’,vi tar’) Is an initial target region phii tar’Central position of (H)i tar’And Wi tar’Respectively initial target region phii tar’Length and width. In particular, the mobile robot collects an image containing a target object through a monocular vision sensor before starting to follow the target object, and manually selects a target region phi according to the position of the target object in the image0 tarAnd the target region phi0 tarInitial target region phi as an initial time1 tar’,(u1 tar’,v1 tar’)=(u0 tar,v0 tar);H1 tar’=H0 tar,W1 tar’=W0 tarWherein (u)0 tar,v0 tar) Is a target region phi0 tarCentral position of (H)0 tarAnd W0 tarRespectively target region phi0 tarLength and width of (u)1 tar’,v1 tar’) Is an initial target region phi1 tar’Central position of (H)1 tar’And W1 tar’Respectively initial target region phi1 tar’Length and width.
The initial target region at the initial time is manually selected, and the initial target region at another time (i.e., the ith time, i is 2,3,4, …) is the target region of the target object obtained at the previous time (i.e., the ith-1 time).
And 102, preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by using feature transformation.
In the present embodiment, the image of the target object acquired in step 101 is subjected to preprocessing. The preprocessing can be image smoothing processing, mean processing, geometric transformation, normalization processing and the like. And performing characteristic transformation on the preprocessed image to obtain a characteristic matrix of the image after the characteristic transformation.
In some optional implementation manners of this embodiment, the preprocessing the image may be performing convolution calculation on an image in an initial target region in the image to obtain an intermediate image block B of the imagei blur. Performing feature transformation on the intermediate image block to obtain a multi-channel feature matrix Fi tar. The feature transformation may be FHOG feature transformation and RGB feature transformation. The FHOG (felzenzwald of aided graphics) feature is a feature descriptor used for object detection in computer vision and image processing. The RGB feature is obtained from a feature descriptor based on three color channels of red (R), green (G) and blue (B).
In some optional implementation manners of this embodiment, an FHOG feature transformation is performed on the intermediate image block, and an FHOG feature matrix F of the image is extractedi FHOGThe rows and columns of the FHOG feature matrix are determined by the following formula:
Figure BDA0001778452640000071
wherein Hi FHOGIs a row of the FHOG feature matrix, Wi FHOGIs the column of the FHOG feature matrix, Hi tar’Is the length of the initial target region, Wi tar’For the width of the initial target area, cell is the parameter of the descriptor of the FHOG feature matrix, ncellRefers to the size of the cell.
In some preferred embodiments, the scale of the feature descriptor parametersCun ncell4. The direction (bins) of the histogram in the FHOG feature is set to 9, the cell size is 4 × 4 pixels, and 3 × 3 cells constitute one block. The number of channels of the feature vector is 31, and the direction insensitive feature comprises 9 direction insensitive features, 18 direction sensitive features and 4 texture features.
In some optional implementation manners of this embodiment, the feature matrix of the image obtained by using the feature transformation may also be a feature matrix F of the image obtained by using RGB feature transformationi RGBThe method specifically comprises the following steps: RGB feature transformation is carried out on the intermediate image block, and one row and one column are extracted to be H respectivelyi tar’And Wi tar’The three-channel feature matrix of (a); gaussian down sampling is carried out on the three-channel feature matrix to obtain rows and columns which are respectively Hi FHOGAnd Wi FHOGThe intermediate feature matrix of (2); splicing the three-channel RGB feature matrix and the FHGO feature matrix into a multi-channel feature matrix F according to a channel splicing modei tar
103, determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; and determining the position of the target object as the central point of the target object by taking the position of the maximum value corresponding to the element in the response matrix as the displacement coordinate of the target object relative to the initial target area and combining the central position of the initial target area.
In this embodiment, the feature matrix may be imported into a preset tracking algorithm model, so as to determine the position of the target object. The tracking algorithm may be a KCF tracking algorithm. Specifically, the feature matrix is imported into the KCF tracking algorithm model, and a response matrix of the feature matrix is determined; recording the maximum value of all elements in the response matrix, taking the position of the maximum value corresponding to the elements in the response matrix as the displacement coordinate of the target object relative to the target object in the initial target area, and combining the central position of the initial target area to determine the position of the target object as the current central point of the target object.
In some optional implementations of this embodiment, the determining the response matrix of the feature matrix by using the target tracking algorithm includes: according to the multi-channel feature matrix Fi tarGenerating an initial target feature template M of the multi-channel feature matrix in the Fourier domain by utilizing the KCF tracking algorithmi' and initial parameter matrix Ti'; updating the target characteristic template M in the KCF tracking algorithm by using the initial target characteristic template and the initial parameter matrix through the following formula and the characteristic matrixiAnd a parameter matrix Ti
Figure BDA0001778452640000081
Wherein M isi-1Is a target feature template at time i-1, Ti-1Is a parameter matrix at the i-1 th time, MiIs a target feature template at time i, TiIs a parameter matrix at the ith time, Mi' is the initial target feature template at time i, Ti' is the initial parameter matrix at the ith time, and alpha and beta are scale factors
According to the updated target feature template and the parameter matrix, calculating the response matrix by the following formula:
Pi=Γ-1(Mi☉Ti)
wherein ☉ is a matrix dot product operator, PiBeing response matrices, Γ-1Is an inverse fourier transform function.
As an example, the maximum value o of all elements in the above response matrix is recordedi(ii) a Will oiThe position of the corresponding element in the response matrix is used as the displacement coordinate (u) of the target object relative to the previous frame image in the current frame imagei,vi) Combining the center position (u) of the initial target regioni tar’,vi tar’) Obtaining the position (u) of the target object in the current image framei tar,vi tar). Namely, the position of the center point of the target object of the current image frame is determined as follows:
(ui tar,vi tar)=(ui tar’,vi tar’)+ncell×(ui,vi)
wherein n iscellRefers to the size of the FHOG characteristic parameter cell.
And 104, determining a region where the target object is located according to the central point and the outline frame of the target object to serve as a first region, judging whether the proportion of the area of the first region in the image of the target object is greater than a set threshold value, if so, executing step 105, and if not, executing step 106.
And 105, determining the first area as a target area of the target object.
In this embodiment, a graphic frame is constructed by using the position of the center point of the target object determined in the step 103 and the shape of the tracked target object, and a region where the target object is located is determined as a first region; and judging whether the ratio of the area of the first region in the image to the area of the first region is larger than a set threshold value. As an example, when the target object is a pedestrian. The outline frame of the target object may be a rectangular frame configured according to the outline of the human body. And setting a rectangular area where the target object constructed by the central position and the rectangular frame is located as a first area. The proportional relation can be set according to actual conditions, but is at least more than one-half. If the ratio is larger than the set threshold value, the target object can be considered to be in the image, and therefore the first area is determined to be the target area of the target object.
In some optional implementation manners of this embodiment, the determining a target area of the target object according to the central point and the outline frame of the target object includes: obtaining the optimal scale of a target object in the image by using a method of adjacent scale sampling and least square fitting; and determining the length and width of the target area according to the optimal scale and the length and width of the initial target area after determining scale transformation by taking the length and width of the initial target area as the initial length and width, thereby determining the target area.
The method for obtaining the optimal scale of the target object in the image through adjacent scale sampling and least square fitting comprises the following steps: carrying out scale transformation on the intermediate image block to obtain a first image block and a second image block, and obtaining a first multi-channel feature matrix and a second multi-channel feature matrix corresponding to the first image block and the second image block by utilizing feature transformation; respectively calculating a first response matrix and a second response matrix of the first multichannel characteristic matrix and the second multichannel characteristic matrix by utilizing the KCF tracking algorithm, and respectively recording the maximum values of elements in the response matrices; and performing least square fitting according to the first response matrix, the second response matrix and the maximum value of the elements in the response matrix to determine the optimal scale of the target object in the upper image.
As an example, first, the intermediate image block B is subjected toi blurRespectively carrying out 0.95 times and 1.05 times of scale transformation to obtain an image block Bi- blurAnd Bi+ blur. Secondly, FHOG characteristic transformation and RGB characteristic transformation are carried out on the transformed image block to obtain a multi-channel characteristic matrix F with the corresponding scales of 0.95, 1 and 1.05i- tar,Fi tar,Fi+ tar. Thirdly, the multi-channel feature matrix F is obtained by utilizing the KCF tracking algorithmi- tar、Fi tarAnd Fi+ tarCorresponding response matrix Pi-、PiAnd Pi+The maximum values o of the elements in the response matrices are recorded separatelyi-、oi、oi+. Then, the function p is usedi=aisi 2+bisi+ciTo (0.95, o)i-)、(1.00,oi)、(1.05,oi+) Three points are subjected to a least squares fit, where siIs a scale of piIs scale siMaximum value of an element in the lower response matrix, ai、bi、ciRespectively, the quadratic term coefficient, the first order coefficient, and the constant term of the function. Finally, fitting to obtain a parameter ai、bi、ciEstimated value A ofi、Bi、CiWill Si=-Bi/(2Ai) Is recorded as the optimal dimension of the target object in the current image frame.
Combining the optimal scale and the initial target area to obtain the length and the width of the target area respectively as Hi tar=Si×Hi tar’,Wi tar=Si×Wi tar’The center position of the target area is the center position of the target object determined in the above step 203.
And 106, constructing a detection area by taking the central position as the center, detecting the target object again in the detection area, taking the area where the detected target object is located as a second area, and taking the second area as the target area of the target object if the ratio of the area of the second area to the image area in the image is greater than the set threshold value.
In this embodiment, if the ratio of the image area of the first region in the image to the region area of the first region is smaller than a set threshold, that is, a partial region of the first region is not in the image, and the area of the portion that is not in the image exceeds the set threshold, it is considered that the target region of the target object determined using the image cannot correctly track the target object. At this time, a detection area is constructed by taking the central position as the center, an initial target area is constructed in the detection area by using a target object detection method, a data image of the target object is acquired again according to the constructed initial target area, and the area where the target object is newly determined is a second area. If the ratio of the area of the second region to the area of the image in the image is greater than the set threshold, the second region can be used as the target region of the target object.
In some optional implementation manners of this embodiment, constructing a detection area with the central point as a center, and determining a second target area in the detection area includes:
constructing a detection area by taking the position of the central point as a center, wherein the area of the detection area is larger than that of the initial target area; detecting a target object by using an SVM (support vector machine) model for the image block in the detection area, and determining a target frame set; calculating a multi-channel feature matrix of the image block corresponding to each target frame by using the feature extraction method; sending the multi-channel feature matrix into the KCF tracker, calculating to obtain a response matrix of each multi-channel feature matrix, and recording the maximum value of elements in the response matrix; determining a target frame containing the central position and the maximum element value as an effective target frame; marking the coordinate of the center point of the effective target frame, taking the position of the maximum value corresponding to the response matrix as the displacement coordinate of the target object relative to the center point of the target frame, and determining the position of the target object as the center position of the effective target object by combining the position of the center point of the target frame; and constructing a second target area by taking the target position as the center.
In some optional implementation manners of this embodiment, the method further includes: when no effective target frame exists in the target frame set, the robot is controlled to rotate towards the direction that the target disappears in the visual field, so that the monocular vision sensor arranged on the robot can acquire the image containing the target object again as soon as possible:
Figure BDA0001778452640000111
wherein k isp1Is a proportionality coefficient, Vi leftAt left wheel speed, Vi rightFor right wheel speed, (u)0 tar,v0 tar) Position coordinates preset for the target object, (u)last tar,vlast tar) The position coordinates of the target object at the time immediately before it leaves the field of view.
As an example, when the ratio is smaller than the set threshold, the image of the target object needs to be re-acquired until the area where the target object is located can be accurately located. The method specifically comprises the following steps:
with the central position (u) of the target area detected lastlast tar,vlast tar) Constructing a square area with the side length of L as the center, and recording the part of the square area in the current image as a region phi to be searcheds
Region to be searched phisThe image blocks in the image blocks are used for pedestrian detection, wherein potential pedestrians can be detected by using SVM modules in Opencv3.0 open source library to obtain a rectangular human body target frame set omegabox. For omegaboxEach rectangular human body target frame R inmM is 1,2, …, M is the set omegaboxNumber of middle elements, if point (u)last tar,vlast tar) Falling on a rectangular human body target frame RmIf not, the target frame is reserved, otherwise, the target frame R is consideredmInvalid, namely, an invalid target frame; omegaboxThe set of all effective rectangular human body target frames is marked as omegaboxe
For the set omegaboxeOf (d), the image coordinate of the center point is (u)j box,vj box) Where j is 1,2, …, and N is the set ΩboxeThe number of elements; for each valid rectangular human target frame center point (u)j box,vj box) Acquiring a target position (u) near the shoulder according to the proportion of the human bodyj body,vj body):
Figure BDA0001778452640000121
Wherein, gamma and beta are scale factors.
All valid points (u) of the rectangular human body target frame are put togetherj body,vj body) (j ═ 1,2, …, N) constitutes a point set Ωbody(ii) a For a set of points ΩbodyIn each point, the length and the width of each point are respectively H by taking the point as a center to constructlast、WlastObtaining image block Ij blockFor image block Ij blockSmoothing, i.e. using Gaussian operator for image block Ij blockPerforming convolution calculation to obtain corresponding image block Bj bodyJ is 1,2, …, N. And performing characteristic transformation on the jth image block, calculating by using a KCF tracking algorithm to obtain a corresponding response matrix, determining the displacement coordinate of the target object by using the maximum value of elements in the response matrix, calculating the central position of the target object in the current image according to the displacement coordinate and the initial central position, and constructing the area where the target object is located by using the central position and marking as a second area. And if the ratio of the area of the second region constructed by the image block in the image to the area of the second region reaches a set threshold, determining that the second region is a target region of the target object, otherwise, continuously controlling the robot to rotate, and acquiring the next frame of image to continuously detect until the target region of the target object is determined.
In some optional implementations of the present application, the method further includes controlling the motion of the robot, and the controlling the motion of the robot may be controlling the speed of left and right wheels of the robot. The control of the robot includes: the central position of the target object is combined with the current optimal scale of the target object, and the movement control of the monocular vision mobile robot is realized through the following formula:
Figure BDA0001778452640000131
Figure BDA0001778452640000132
specifically, k isd、kp2、kp3Is a proportionality coefficient, Vi leftAt left wheel speed, Vi rightFor right wheel speed, (u)0 tar,v0 tar) Is the coordinates of the target object at the preset position, (u)0 tar,v0 tar) Position coordinates preset for the target object, (u)i tar,vi tar) Is the current time position coordinate of the target object. Si diffIs the optimal dimension S of the current time iiDifference from the scale 1 of the initial time, i.e. Si diff=Si-1。
In a preferred example, the monocular vision sensor is model Sony FCB-EX11DP, and is capable of obtaining a color image with a resolution of 640 pixels × 480 pixels, cell 4, γ 0, β -0.47, kp1=0.1,kp2=-140,kp3=0.12,kd=20。
The method provided in the above embodiment of the present application acquires an image of a tracked object according to a position indicated by an initial target area, and determines an area where the target object is located as a target area from the acquired image, so as to track the target object.
Alternatively, the steps of a method described in this application in connection with the above embodiments may be implemented by hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (6)

1. A target object following method of a mobile robot based on a monocular vision sensor is characterized by comprising the following steps:
step 101: the mobile robot acquires an image of the target object according to the target area of the target object;
step 102: preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by utilizing feature transformation; the feature matrix is a multi-channel feature matrix;
step 103: determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; taking the position of the maximum value corresponding to the element in the response matrix as a displacement coordinate of the target object relative to a pre-acquired initial target area, and determining the position of the target object as a central point of the target object by combining the central position of the initial target area, wherein the step of determining the position of the target object comprises the following steps:
recording the maximum value of all elements in the response matrix;
taking the positions of the elements in the response matrix corresponding to the maximum values of all the elements as displacement coordinates of the target object relative to the previous frame image in the current frame image;
obtaining the position of the target object in the current image frame according to the central position of the initial target area, and determining the position of the central point of the target object in the current image frame by the following formula:
(ui tar,vi tar)=(ui tar’,vi tar’)+ncell×(ui,vi)
wherein (u)i,vi) (u) a displacement coordinate representing the i-th time instant of the target object in the current frame image relative to the previous frame imagei tar’,vi tar’) (u) represents the center position of the initial target region at the i-th timei tar,vi tar) Representing the position of the target object at the i-th instant in the current image frame, ncellRepresenting the size of the FHOG characteristic parameter cell;
step 104: determining an area where the target object is located according to the central point and the outline frame of the target object, and taking the area as a first area; judging whether the proportion of the area of the first region in the image of the target object is larger than a set threshold value, if so, executing a step 105, and if not, executing a step 106;
step 105: determining that the first region is a target region of the target object;
step 106: constructing a detection area by taking the position of the central point as a center, detecting the target object in the detection area again, taking the area where the target object is located obtained by detection as a second area, and taking the second area as the target area of the target object if the proportion of the area of the second area to the area of the image in the image of the target object is greater than the set threshold value;
the step of "constructing a detection area with the position of the central point as a center, detecting the target object again in the detection area, and taking the area where the target object is located obtained by detection as a second area" includes:
constructing a detection area by taking the position of the central point as a center;
carrying out target object detection on the image blocks in the detection area through a pre-constructed SVM model to determine a target frame set;
calculating the multi-channel feature matrix of the image block corresponding to the target frame by using a feature extraction method;
calculating a response matrix of the multi-channel feature matrix according to a KCF tracker and the multi-channel feature matrix, and recording the maximum value of elements in the response matrix;
taking a target frame which contains the central point and contains a maximum element value as an effective target frame, marking the coordinate of the central point of the effective target frame, and taking the corresponding position of the maximum value in the response matrix as the displacement coordinate of the target object relative to the central point of the target frame;
according to the position of the central point of the target frame, taking the position of a target object as the central position of an effective target object, and constructing a second target area by taking the central position of the target object as the center;
the method further comprises the following steps:
when no effective target frame exists in the target frame set, controlling the mobile robot to rotate towards the direction that the target disappears in the visual field according to the following method:
Figure FDA0003404460080000021
Figure FDA0003404460080000022
wherein k isp1Denotes the proportionality coefficient, Vi leftIndicating the left wheel speed, V, of the mobile robot at time ii rightRepresents the speed of the right wheel at the i-th moment of the mobile robot, (u)0 tar,v0 tar) (ii) position coordinates representing the target object preset, (u)last tar,vlast tar) Representing the actual position coordinates of the target object at the moment before it has left the field of view.
2. The target object following method of claim 1, wherein the step of preprocessing the image of the target object and obtaining the feature matrix of the image of the target object by using feature transformation comprises:
performing convolution calculation on the image of the target object to obtain a middle image block of the image of the target object;
performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object;
wherein the feature transformation comprises FGOG feature transformation and RGB feature transformation.
3. The target object following method of claim 2, wherein when the feature matrix is the multi-channel feature matrix, the step of performing feature transformation on the intermediate image block to obtain the feature matrix of the image of the target object comprises:
calculating to obtain FHGG characteristic matrix F of the intermediate image blocki FHOGThe rows and columns of the FHOG feature matrix are determined by the following formula:
Figure FDA0003404460080000031
wherein Hi FHOGIs the row, W, of the FHOG feature matrix at time ii FHOGIs the column of the FHOG feature matrix at time i, Hi tar’Is the length of the initial target region at time i, Wi tar’Is the width of the initial target area at time i, cell is the parameter of the descriptor of the FHOG feature matrix, ncellRefers to the size of the cell;
performing RGB feature transformation on the intermediate image block to obtain a three-channel feature matrix;
carrying out Gaussian down-sampling on the three-channel feature matrix to obtain an RGB feature matrix;
and splicing the FHOG characteristic matrix and the RGB characteristic matrix by a channel splicing method to obtain the multi-channel characteristic matrix of the image of the target object.
4. The target object following method of claim 1, wherein the step of determining the response matrix of the feature matrix using a target tracking algorithm comprises:
generating an initial target characteristic template and an initial parameter matrix of the multi-channel characteristic matrix in a Fourier domain by utilizing a KCF tracking algorithm;
and updating the target characteristic template and the parameter matrix of the target tracking algorithm by using the initial target characteristic template and the initial parameter matrix through the following formulas:
Figure FDA0003404460080000041
Figure FDA0003404460080000042
wherein M isi-1Is a target feature template at time i-1, Ti-1Is a parameter matrix at the i-1 th time, MiIs a target feature template at time i, TiIs a parameter matrix at the ith time, Mi' is the initial target feature template at time i, Ti' is an initial parameter matrix at the ith moment, and alpha and beta are scale factors;
according to the updated target feature template and the parameter matrix, calculating the response matrix by the following formula:
Pi=Γ-1(Mi☉Ti)
wherein ☉ is a matrix dot product operator, PiIs the response matrix at time i, Γ-1Is an inverse fourier transform function.
5. The target object following method based on monocular vision sensor of mobile robot of claim 2, wherein the step of determining the area where the target object is located according to the center point and the outline frame of the target object comprises:
obtaining the optimal scale of a target object in the image by using a method of adjacent scale sampling and least square fitting;
and determining the length and the width of the target region according to the optimal scale and the length and the width of the initial target region after determining scale transformation, thereby determining the target region.
6. The target object following method based on monocular vision sensor of mobile robot as recited in claim 5, wherein the step of "obtaining the optimal scale of the target object in the image by the method of adjacent scale sampling and least square fitting" comprises:
carrying out scale transformation on the intermediate image block to obtain a first image block and a second image block;
obtaining a first multi-channel feature matrix and a second multi-channel feature matrix of the first image block and the second image block by using feature transformation;
respectively calculating a first response matrix and a second response matrix of the first multichannel characteristic matrix and the second multichannel characteristic matrix by using the target tracking algorithm, and respectively recording the maximum values of elements in the response matrices;
and performing least square fitting according to the first response matrix, the second response matrix and the maximum value of the elements in the response matrix, and determining the optimal scale of the target object in the image of the target object.
CN201810980715.6A 2018-08-27 2018-08-27 Target object following method of mobile robot based on monocular vision sensor Active CN109166136B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810980715.6A CN109166136B (en) 2018-08-27 2018-08-27 Target object following method of mobile robot based on monocular vision sensor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810980715.6A CN109166136B (en) 2018-08-27 2018-08-27 Target object following method of mobile robot based on monocular vision sensor

Publications (2)

Publication Number Publication Date
CN109166136A CN109166136A (en) 2019-01-08
CN109166136B true CN109166136B (en) 2022-05-03

Family

ID=64896710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810980715.6A Active CN109166136B (en) 2018-08-27 2018-08-27 Target object following method of mobile robot based on monocular vision sensor

Country Status (1)

Country Link
CN (1) CN109166136B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705644B (en) * 2019-10-08 2022-11-18 西安米克斯智能技术有限公司 Method for coding azimuth relation between targets
CN110866486B (en) * 2019-11-12 2022-06-10 Oppo广东移动通信有限公司 Subject detection method and apparatus, electronic device, and computer-readable storage medium
CN110927767A (en) * 2019-11-28 2020-03-27 合肥工业大学 Following system for special crowds

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning
CN107492114A (en) * 2017-06-12 2017-12-19 杭州电子科技大学 The heavy detecting method used when monocular is long during the tracking failure of visual tracking method
CN107564034A (en) * 2017-07-27 2018-01-09 华南理工大学 The pedestrian detection and tracking of multiple target in a kind of monitor video

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10467571B2 (en) * 2016-07-10 2019-11-05 Asim Kumar Datta Robotic conductor of business operations software

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning
CN107492114A (en) * 2017-06-12 2017-12-19 杭州电子科技大学 The heavy detecting method used when monocular is long during the tracking failure of visual tracking method
CN107564034A (en) * 2017-07-27 2018-01-09 华南理工大学 The pedestrian detection and tracking of multiple target in a kind of monitor video

Also Published As

Publication number Publication date
CN109166136A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN108416791B (en) Binocular vision-based parallel mechanism moving platform pose monitoring and tracking method
CN107330376B (en) Lane line identification method and system
JP6095018B2 (en) Detection and tracking of moving objects
CN109034017B (en) Head pose estimation method and machine readable storage medium
CN109166136B (en) Target object following method of mobile robot based on monocular vision sensor
US11205276B2 (en) Object tracking method, object tracking device, electronic device and storage medium
CN110717445B (en) Front vehicle distance tracking system and method for automatic driving
CN108362205B (en) Space distance measuring method based on fringe projection
CN112927303B (en) Lane line-based automatic driving vehicle-mounted camera pose estimation method and system
Ji et al. RGB-D SLAM using vanishing point and door plate information in corridor environment
CN111144207A (en) Human body detection and tracking method based on multi-mode information perception
CN114331986A (en) Dam crack identification and measurement method based on unmanned aerial vehicle vision
CN111178193A (en) Lane line detection method, lane line detection device and computer-readable storage medium
CN112947419A (en) Obstacle avoidance method, device and equipment
CN106709432B (en) Human head detection counting method based on binocular stereo vision
CN114782529A (en) High-precision positioning method and system for line grabbing point of live working robot and storage medium
CN113689365B (en) Target tracking and positioning method based on Azure Kinect
CN112613565B (en) Anti-occlusion tracking method based on multi-feature fusion and adaptive learning rate updating
CN113256731A (en) Target detection method and device based on monocular vision
CN117333846A (en) Detection method and system based on sensor fusion and incremental learning in severe weather
CN108388854A (en) A kind of localization method based on improvement FAST-SURF algorithms
Pandey et al. Analysis of road lane detection using computer vision
Punagin et al. Analysis of lane detection techniques on structured roads using OpenCV
CN116052120A (en) Excavator night object detection method based on image enhancement and multi-sensor fusion
CN115902977A (en) Transformer substation robot double-positioning method and system based on vision and GPS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant