CN109166136B

CN109166136B - Target object following method of mobile robot based on monocular vision sensor

Info

Publication number: CN109166136B
Application number: CN201810980715.6A
Authority: CN
Inventors: 刘希龙; 张茗奕; 庞磊; 曹志强; 徐德
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2022-05-03
Anticipated expiration: 2038-08-27
Also published as: CN109166136A

Abstract

The invention relates to the field of computer vision recognition, and particularly provides a target object following method of a mobile robot based on a monocular vision sensor, aiming at solving the problems that the existing mobile robot has poor robustness in following a target object and is difficult to ensure the following quality of target objects such as pedestrians. For this purpose, the method provided by the invention comprises the steps that the mobile robot acquires an image of a target object according to a target area of the target object; obtaining a characteristic matrix of the image of the target object; determining the central point of the target object by using a target tracking algorithm and a characteristic matrix; and determining the area where the target object is located according to the central point and the outline frame of the target object, taking the area as a first area, judging whether the proportion of the area of the first area in the image of the target object is greater than a set threshold value, and executing corresponding operation according to the judgment result. Based on the steps, the method provided by the invention has good real-time performance and robustness, and can realize effective following of the target object.

Description

Target object following method of mobile robot based on monocular vision sensor

Technical Field

The invention relates to the field of computer vision, in particular to the field of mobile robots, and particularly relates to a target object following method of a mobile robot based on a monocular vision sensor.

Background

The intelligent mobile robot following the moving target is widely applied to the fields of home service, old and disabled assisting, scene monitoring, intelligent vehicles and the like. The following of the mobile robot to the target object relates to the fields of computer vision, motion control, mode recognition and the like, and has wide application prospect.

Unlike TLD (Tracking-Learning-Detection), strruck, and other methods, the following of the target object by the mobile robot requires high robustness. Generally, a mobile robot senses using a laser sensor or a vision sensor. Although the laser sensor can obtain accurate distance information, the robustness in the aspects of identification and following of a target object is insufficient, and the problem of retrieving the target after the target is lost is difficult to solve. The vision sensor can provide more abundant environmental information, and the vision sensor includes binocular vision sensor and monocular vision sensor. Compared with target object following based on a binocular vision sensor, the target object following algorithm based on the monocular vision sensor is short in processing time and beneficial to guaranteeing real-time performance. At present, methods such as color segmentation and particle filtering are mostly used for target object following of a monocular vision sensor, but due to the existence of interferences such as a complex background, a target object entering and exiting a visual field and the like, the robustness of the methods is poor, and effective following of the target object such as a pedestrian is difficult to realize.

Disclosure of Invention

In order to solve the problems in the prior art, namely the problems that the existing mobile robot is poor in robustness when following a target object and cannot follow the target object such as a pedestrian easily, the invention provides a monocular vision sensor-based target object following method for the mobile robot, which improves robustness on a complex background and ensures that the mobile robot follows the target object such as the pedestrian in different environments. The target object following method of the mobile robot based on the monocular vision sensor comprises the following steps:

step 101: the mobile robot acquires an image of the target object according to the target area of the target object;

step 102: preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by utilizing feature transformation;

step 103: determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; taking the position of the maximum value corresponding to the element in the response matrix as a displacement coordinate of the target object relative to a pre-acquired initial target area, and determining the position of the target object as a central point of the target object by combining the central position of the initial target area;

step 104: determining an area where the target object is located according to the central point and the outline frame of the target object, and taking the area as a first area; judging whether the proportion of the area of the first region in the image of the target object is larger than a set threshold value, if so, executing a step 105, and if not, executing a step 106;

step 105: determining that the first region is a target region of the target object;

step 106: and constructing a detection area by taking the position of the central point as a center, detecting the target object again in the detection area by using a target detector, taking the area where the detected target object is located as a second area, and taking the second area as the target area of the target object if the proportion of the area of the second area to the area of the image in the image of the target object is greater than the set threshold value.

In a preferred embodiment of the above method, the step of "preprocessing the image of the target object and obtaining the feature matrix of the image of the target object by using feature transformation" includes:

performing convolution calculation on the image in the initial target area of the image of the target object to obtain a middle image block of the image of the target object;

performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object;

the feature transformation comprises FHOG feature transformation and RGB feature transformation, and the feature matrix is a multi-channel feature matrix.

In a preferred embodiment of the foregoing method, when the feature transformation is FHOG feature transformation, the step of performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object includes:

and performing characteristic transformation on the intermediate image block according to a method shown as the following formula:

calculating to obtain FHGG characteristic matrix F of the intermediate image block_i ^FHOGThe rows and columns of the FHOG feature matrix are determined by the following formula:

wherein H_i ^FHOGIs the row, W, of the FHOG feature matrix at time i_i ^FHOGIs the column of the FHOG feature matrix at time i, H_i ^tar’Is the length of the initial target region at time i, W_i ^tar’Is the width of the initial target area at time i, cell is the parameter of the descriptor of the FHOG feature matrix, n_cellRefers to the size of the cell.

In a preferred embodiment of the above method, when the feature transformation is RGB feature transformation, the step of performing feature transformation on the intermediate image block to obtain a feature matrix of the image of the target object includes:

performing RGB feature transformation on the intermediate image block to obtain a three-channel feature matrix; performing Gaussian down-sampling on the three-channel feature matrix to obtain an RGB feature matrix, wherein the number of rows of the RGB feature matrix is H_i ^FHOGThe number of rows is W_i ^FHOG；

And splicing the FHOG characteristic matrix and the RGB characteristic matrix by a channel splicing method to obtain a multi-channel characteristic matrix of the image of the target object.

In a preferred technical solution of the above method, the step of determining the response matrix of the feature matrix by using a target tracking algorithm includes:

generating an initial target characteristic template and an initial parameter matrix of the characteristic matrix in a Fourier domain by using a KCF tracker;

and updating the target characteristic template and the parameter matrix of the target tracking algorithm by using the initial target characteristic template and the initial parameter matrix through the following formulas:

wherein M is_i-1Is a target feature template at time i-1, T_i-1Is a parameter matrix at the i-1 th time, M_iIs a target feature template at time i, T_iIs a parameter matrix at the ith time, M_i' is the initial target feature template at time i, T_i' is the initial parameter matrix at the ith time, and alpha and beta are scale factors.

According to the updated target feature template and the parameter matrix, calculating the response matrix by the following formula:

P_i＝Γ^-1(M_i☉T_i)

wherein ☉ is a matrix dot product operator, P_iIs the response matrix at time i, Γ^-1Is an inverse fourier transform function.

In a preferred technical solution of the above method, the step of determining the position of the target object as a center point of the target object includes:

recording the maximum value of all elements in the response matrix;

taking the positions of the elements in the response matrix corresponding to the maximum values of all the elements as displacement coordinates of the target object relative to the previous frame image in the current frame image;

obtaining the position of the target object in the current image frame according to the central position of the initial target area, and determining the position of the central point of the target object in the current image frame by the following formula:

(u_i ^tar,v_i ^tar)＝(u_i ^tar’,v_i ^tar’)+n_cell×(u_i,v_i)

wherein (u)_i,v_i) (u) represents the coordinates of the displacement of the target object in the image at the i-th time instant, which is the current frame, with respect to the previous frame image_i ^tar’,v_i ^tar’) (u) represents the center position of the initial target region at the i-th time_i ^tar,v_i ^tar) Representing the position of said target object in the current image frame, i.e. at the i-th instant, n_cellThe size of the FHOG characteristic parameter cell is indicated.

In a preferred embodiment of the above method, the step of determining the area where the target object is located according to the central point and the outline frame of the target object includes:

obtaining the optimal scale of a target object in the image by using a method of adjacent scale sampling and least square fitting;

and determining the length and the width of the target region according to the optimal scale and the length and the width of the initial target region after determining scale transformation, thereby determining the target region.

In a preferred embodiment of the above method, the step of "obtaining an optimal scale of the target object in the image by a method of adjacent scale sampling and least square fitting" includes:

carrying out scale transformation on an intermediate image block of the image of the target object to obtain a first image block and a second image block;

respectively obtaining a first multi-channel feature matrix and a second multi-channel feature matrix corresponding to the first image block and the second image block by utilizing the feature transformation;

respectively calculating a first response matrix and a second response matrix of the first multichannel characteristic matrix and the second multichannel characteristic matrix by using the target tracking algorithm, and respectively recording the maximum values of elements in the response matrices;

and performing least square fitting according to the first response matrix, the second response matrix and the maximum value of the elements in the response matrix, and determining the optimal scale of the target object in the image of the target object.

In a preferred embodiment of the above method, the step of "constructing a detection area centered on the position of the central point, detecting the target object again by using a target detector in the detection area, and setting an area where the target object is located as a second area" includes:

constructing a detection area by taking the position of the central point as a center;

carrying out target object detection on the image blocks in the detection area through a pre-trained SVM model to determine a target frame set;

calculating a multi-channel feature matrix of the image block corresponding to the target frame by using the feature extraction method;

calculating a response matrix of each multi-channel feature matrix according to a KCF tracker and the multi-channel feature matrix, and recording the maximum value of elements in the response matrix;

taking a target frame which contains the central point and contains a maximum element value as an effective target frame, marking the coordinate of the central point of the effective target frame, and taking the corresponding position of the maximum value in the response matrix as the displacement coordinate of the target object relative to the central point of the target frame;

and according to the position of the central point of the target frame, taking the position of the target object as the central position of the effective target object, and constructing a second target area by taking the central position of the target object as the center.

In a preferred embodiment of the above method, the method further comprises:

when no effective target frame exists in the target frame set, controlling the mobile robot to rotate towards the direction that the target disappears in the visual field according to the following method:

wherein k is_p1Denotes the proportionality coefficient, V_i ^leftIndicating the left wheel speed, V, of the mobile robot at time i_i ^rightRepresents the speed of the right wheel at the i-th moment of the mobile robot, (u)₀ ^tar,v₀ ^tar) (ii) preset position coordinates representing the target object, (u)_last ^tar,v_last ^tar) Representing the reality of the target object at a moment in time before it leaves the field of view of the robotThe position coordinates.

The target object following method of the mobile robot based on the monocular vision sensor can achieve effective following of the target object. The invention has good real-time performance, can still realize the following of the mobile robot to the target objects such as pedestrians and the like under the interference of illumination change, target shielding and the like, and has better robustness.

Drawings

Fig. 1 is a flowchart of an embodiment of a target object following method of a mobile robot based on a monocular vision sensor according to the present application.

Detailed Description

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 is a flowchart of an embodiment of a target object following method of a mobile robot based on a monocular vision sensor according to the present application. The monocular vision sensor is arranged on the mobile robot, and the direction of the optical axis of the monocular vision sensor is consistent with the positive direction of the mobile robot. The target object following method of the mobile robot based on the monocular vision sensor comprises the following steps:

step 101, the mobile robot acquires an image of the target object according to the target area of the target object.

In the present embodiment, the monocular vision sensor acquires the image I at the current time (i.e., the ith time)_i ^scrWhere i is 1,2,3, …, the target area Φ determined at the time of i-1_i-1 ^tarInitial target region phi as the ith time_i ^tar’，(u_i ^tar’,v_i ^tar’)＝(u_i-1 ^tar,v_i-1 ^tar)，H_i ^tar’＝H_i-1 ^tar，W_i ^tar’＝W_i-1 ^tarWherein the target region phi_i-1 ^tarIs a rectangular area, (u)_i-1 ^tar,v_i-1 ^tar) Is a target region phi_i-1 ^tarCentral coordinate of (H)_i-1 ^tarAnd W_i-1 ^tarRespectively target region phi_i-1 ^tarLength and width of (u)_i ^tar’,v_i ^tar’) Is an initial target region phi_i ^tar’Central position of (H)_i ^tar’And W_i ^tar’Respectively initial target region phi_i ^tar’Length and width. In particular, the mobile robot collects an image containing a target object through a monocular vision sensor before starting to follow the target object, and manually selects a target region phi according to the position of the target object in the image₀ ^tarAnd the target region phi₀ ^tarInitial target region phi as an initial time₁ ^tar’，(u₁ ^tar’,v₁ ^tar’)＝(u₀ ^tar,v₀ ^tar)；H₁ ^tar’＝H₀ ^tar，W₁ ^tar’＝W₀ ^tarWherein (u)₀ ^tar,v₀ ^tar) Is a target region phi₀ ^tarCentral position of (H)₀ ^tarAnd W₀ ^tarRespectively target region phi₀ ^tarLength and width of (u)₁ ^tar’,v₁ ^tar’) Is an initial target region phi₁ ^tar’Central position of (H)₁ ^tar’And W₁ ^tar’Respectively initial target region phi₁ ^tar’Length and width.

The initial target region at the initial time is manually selected, and the initial target region at another time (i.e., the ith time, i is 2,3,4, …) is the target region of the target object obtained at the previous time (i.e., the ith-1 time).

And 102, preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by using feature transformation.

In the present embodiment, the image of the target object acquired in step 101 is subjected to preprocessing. The preprocessing can be image smoothing processing, mean processing, geometric transformation, normalization processing and the like. And performing characteristic transformation on the preprocessed image to obtain a characteristic matrix of the image after the characteristic transformation.

In some optional implementation manners of this embodiment, the preprocessing the image may be performing convolution calculation on an image in an initial target region in the image to obtain an intermediate image block B of the image_i ^blur. Performing feature transformation on the intermediate image block to obtain a multi-channel feature matrix F_i ^tar. The feature transformation may be FHOG feature transformation and RGB feature transformation. The FHOG (felzenzwald of aided graphics) feature is a feature descriptor used for object detection in computer vision and image processing. The RGB feature is obtained from a feature descriptor based on three color channels of red (R), green (G) and blue (B).

In some optional implementation manners of this embodiment, an FHOG feature transformation is performed on the intermediate image block, and an FHOG feature matrix F of the image is extracted_i ^FHOGThe rows and columns of the FHOG feature matrix are determined by the following formula:

wherein H_i ^FHOGIs a row of the FHOG feature matrix, W_i ^FHOGIs the column of the FHOG feature matrix, H_i ^tar’Is the length of the initial target region, W_i ^tar’For the width of the initial target area, cell is the parameter of the descriptor of the FHOG feature matrix, n_cellRefers to the size of the cell.

In some preferred embodiments, the scale of the feature descriptor parametersCun n_cell4. The direction (bins) of the histogram in the FHOG feature is set to 9, the cell size is 4 × 4 pixels, and 3 × 3 cells constitute one block. The number of channels of the feature vector is 31, and the direction insensitive feature comprises 9 direction insensitive features, 18 direction sensitive features and 4 texture features.

In some optional implementation manners of this embodiment, the feature matrix of the image obtained by using the feature transformation may also be a feature matrix F of the image obtained by using RGB feature transformation_i ^RGBThe method specifically comprises the following steps: RGB feature transformation is carried out on the intermediate image block, and one row and one column are extracted to be H respectively_i ^tar’And W_i ^tar’The three-channel feature matrix of (a); gaussian down sampling is carried out on the three-channel feature matrix to obtain rows and columns which are respectively H_i ^FHOGAnd W_i ^FHOGThe intermediate feature matrix of (2); splicing the three-channel RGB feature matrix and the FHGO feature matrix into a multi-channel feature matrix F according to a channel splicing mode_i ^tar。

103, determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; and determining the position of the target object as the central point of the target object by taking the position of the maximum value corresponding to the element in the response matrix as the displacement coordinate of the target object relative to the initial target area and combining the central position of the initial target area.

In this embodiment, the feature matrix may be imported into a preset tracking algorithm model, so as to determine the position of the target object. The tracking algorithm may be a KCF tracking algorithm. Specifically, the feature matrix is imported into the KCF tracking algorithm model, and a response matrix of the feature matrix is determined; recording the maximum value of all elements in the response matrix, taking the position of the maximum value corresponding to the elements in the response matrix as the displacement coordinate of the target object relative to the target object in the initial target area, and combining the central position of the initial target area to determine the position of the target object as the current central point of the target object.

In some optional implementations of this embodiment, the determining the response matrix of the feature matrix by using the target tracking algorithm includes: according to the multi-channel feature matrix F_i ^tarGenerating an initial target feature template M of the multi-channel feature matrix in the Fourier domain by utilizing the KCF tracking algorithm_i' and initial parameter matrix T_i'; updating the target characteristic template M in the KCF tracking algorithm by using the initial target characteristic template and the initial parameter matrix through the following formula and the characteristic matrix_iAnd a parameter matrix T_i：

Wherein M is_i-1Is a target feature template at time i-1, T_i-1Is a parameter matrix at the i-1 th time, M_iIs a target feature template at time i, T_iIs a parameter matrix at the ith time, M_i' is the initial target feature template at time i, T_i' is the initial parameter matrix at the ith time, and alpha and beta are scale factors

P_i＝Γ^-1(M_i☉T_i)

wherein ☉ is a matrix dot product operator, P_iBeing response matrices, Γ^-1Is an inverse fourier transform function.

As an example, the maximum value o of all elements in the above response matrix is recorded_i(ii) a Will o_iThe position of the corresponding element in the response matrix is used as the displacement coordinate (u) of the target object relative to the previous frame image in the current frame image_i,v_i) Combining the center position (u) of the initial target region_i ^tar’,v_i ^tar’) Obtaining the position (u) of the target object in the current image frame_i ^tar,v_i ^tar). Namely, the position of the center point of the target object of the current image frame is determined as follows:

(u_i ^tar,v_i ^tar)＝(u_i ^tar’,v_i ^tar’)+n_cell×(u_i,v_i)

wherein n is_cellRefers to the size of the FHOG characteristic parameter cell.

And 104, determining a region where the target object is located according to the central point and the outline frame of the target object to serve as a first region, judging whether the proportion of the area of the first region in the image of the target object is greater than a set threshold value, if so, executing step 105, and if not, executing step 106.

And 105, determining the first area as a target area of the target object.

In this embodiment, a graphic frame is constructed by using the position of the center point of the target object determined in the step 103 and the shape of the tracked target object, and a region where the target object is located is determined as a first region; and judging whether the ratio of the area of the first region in the image to the area of the first region is larger than a set threshold value. As an example, when the target object is a pedestrian. The outline frame of the target object may be a rectangular frame configured according to the outline of the human body. And setting a rectangular area where the target object constructed by the central position and the rectangular frame is located as a first area. The proportional relation can be set according to actual conditions, but is at least more than one-half. If the ratio is larger than the set threshold value, the target object can be considered to be in the image, and therefore the first area is determined to be the target area of the target object.

In some optional implementation manners of this embodiment, the determining a target area of the target object according to the central point and the outline frame of the target object includes: obtaining the optimal scale of a target object in the image by using a method of adjacent scale sampling and least square fitting; and determining the length and width of the target area according to the optimal scale and the length and width of the initial target area after determining scale transformation by taking the length and width of the initial target area as the initial length and width, thereby determining the target area.

The method for obtaining the optimal scale of the target object in the image through adjacent scale sampling and least square fitting comprises the following steps: carrying out scale transformation on the intermediate image block to obtain a first image block and a second image block, and obtaining a first multi-channel feature matrix and a second multi-channel feature matrix corresponding to the first image block and the second image block by utilizing feature transformation; respectively calculating a first response matrix and a second response matrix of the first multichannel characteristic matrix and the second multichannel characteristic matrix by utilizing the KCF tracking algorithm, and respectively recording the maximum values of elements in the response matrices; and performing least square fitting according to the first response matrix, the second response matrix and the maximum value of the elements in the response matrix to determine the optimal scale of the target object in the upper image.

As an example, first, the intermediate image block B is subjected to_i ^blurRespectively carrying out 0.95 times and 1.05 times of scale transformation to obtain an image block B_i- ^blurAnd B_i+ ^blur. Secondly, FHOG characteristic transformation and RGB characteristic transformation are carried out on the transformed image block to obtain a multi-channel characteristic matrix F with the corresponding scales of 0.95, 1 and 1.05_i- ^tar，F_i ^tar，F_i+ ^tar. Thirdly, the multi-channel feature matrix F is obtained by utilizing the KCF tracking algorithm_i- ^tar、F_i ^tarAnd F_i+ ^tarCorresponding response matrix P_i-、P_iAnd P_i+The maximum values o of the elements in the response matrices are recorded separately_i-、o_i、o_i+. Then, the function p is used_i＝a_is_i ²+b_is_i+c_iTo (0.95, o)_i-)、(1.00,o_i)、(1.05,o_i+) Three points are subjected to a least squares fit, where s_iIs a scale of p_iIs scale s_iMaximum value of an element in the lower response matrix, a_i、b_i、c_iRespectively, the quadratic term coefficient, the first order coefficient, and the constant term of the function. Finally, fitting to obtain a parameter a_i、b_i、c_iEstimated value A of_i、B_i、C_iWill S_i＝-B_i/(2A_i) Is recorded as the optimal dimension of the target object in the current image frame.

Combining the optimal scale and the initial target area to obtain the length and the width of the target area respectively as H_i ^tar＝S_i×H_i ^tar’，W_i ^tar＝S_i×W_i ^tar’The center position of the target area is the center position of the target object determined in the above step 203.

And 106, constructing a detection area by taking the central position as the center, detecting the target object again in the detection area, taking the area where the detected target object is located as a second area, and taking the second area as the target area of the target object if the ratio of the area of the second area to the image area in the image is greater than the set threshold value.

In this embodiment, if the ratio of the image area of the first region in the image to the region area of the first region is smaller than a set threshold, that is, a partial region of the first region is not in the image, and the area of the portion that is not in the image exceeds the set threshold, it is considered that the target region of the target object determined using the image cannot correctly track the target object. At this time, a detection area is constructed by taking the central position as the center, an initial target area is constructed in the detection area by using a target object detection method, a data image of the target object is acquired again according to the constructed initial target area, and the area where the target object is newly determined is a second area. If the ratio of the area of the second region to the area of the image in the image is greater than the set threshold, the second region can be used as the target region of the target object.

In some optional implementation manners of this embodiment, constructing a detection area with the central point as a center, and determining a second target area in the detection area includes:

constructing a detection area by taking the position of the central point as a center, wherein the area of the detection area is larger than that of the initial target area; detecting a target object by using an SVM (support vector machine) model for the image block in the detection area, and determining a target frame set; calculating a multi-channel feature matrix of the image block corresponding to each target frame by using the feature extraction method; sending the multi-channel feature matrix into the KCF tracker, calculating to obtain a response matrix of each multi-channel feature matrix, and recording the maximum value of elements in the response matrix; determining a target frame containing the central position and the maximum element value as an effective target frame; marking the coordinate of the center point of the effective target frame, taking the position of the maximum value corresponding to the response matrix as the displacement coordinate of the target object relative to the center point of the target frame, and determining the position of the target object as the center position of the effective target object by combining the position of the center point of the target frame; and constructing a second target area by taking the target position as the center.

In some optional implementation manners of this embodiment, the method further includes: when no effective target frame exists in the target frame set, the robot is controlled to rotate towards the direction that the target disappears in the visual field, so that the monocular vision sensor arranged on the robot can acquire the image containing the target object again as soon as possible:

wherein k is_p1Is a proportionality coefficient, V_i ^leftAt left wheel speed, V_i ^rightFor right wheel speed, (u)₀ ^tar,v₀ ^tar) Position coordinates preset for the target object, (u)_last ^tar,v_last ^tar) The position coordinates of the target object at the time immediately before it leaves the field of view.

As an example, when the ratio is smaller than the set threshold, the image of the target object needs to be re-acquired until the area where the target object is located can be accurately located. The method specifically comprises the following steps:

with the central position (u) of the target area detected last_last ^tar,v_last ^tar) Constructing a square area with the side length of L as the center, and recording the part of the square area in the current image as a region phi to be searched_s。

Region to be searched phi_sThe image blocks in the image blocks are used for pedestrian detection, wherein potential pedestrians can be detected by using SVM modules in Opencv3.0 open source library to obtain a rectangular human body target frame set omega_box. For omega_boxEach rectangular human body target frame R in_mM is 1,2, …, M is the set omega_boxNumber of middle elements, if point (u)_last ^tar,v_last ^tar) Falling on a rectangular human body target frame R_mIf not, the target frame is reserved, otherwise, the target frame R is considered_mInvalid, namely, an invalid target frame; omega_boxThe set of all effective rectangular human body target frames is marked as omega_boxe。

For the set omega_boxeOf (d), the image coordinate of the center point is (u)_j ^box,v_j ^box) Where j is 1,2, …, and N is the set Ω_boxeThe number of elements; for each valid rectangular human target frame center point (u)_j ^box,v_j ^box) Acquiring a target position (u) near the shoulder according to the proportion of the human body_j ^body,v_j ^body)：

Wherein, gamma and beta are scale factors.

All valid points (u) of the rectangular human body target frame are put together_j ^body,v_j ^body) (j ═ 1,2, …, N) constitutes a point set Ω_body(ii) a For a set of points Ω_bodyIn each point, the length and the width of each point are respectively H by taking the point as a center to construct_last、W_lastObtaining image block I_j ^blockFor image block I_j ^blockSmoothing, i.e. using Gaussian operator for image block I_j ^blockPerforming convolution calculation to obtain corresponding image block B_j ^bodyJ is 1,2, …, N. And performing characteristic transformation on the jth image block, calculating by using a KCF tracking algorithm to obtain a corresponding response matrix, determining the displacement coordinate of the target object by using the maximum value of elements in the response matrix, calculating the central position of the target object in the current image according to the displacement coordinate and the initial central position, and constructing the area where the target object is located by using the central position and marking as a second area. And if the ratio of the area of the second region constructed by the image block in the image to the area of the second region reaches a set threshold, determining that the second region is a target region of the target object, otherwise, continuously controlling the robot to rotate, and acquiring the next frame of image to continuously detect until the target region of the target object is determined.

In some optional implementations of the present application, the method further includes controlling the motion of the robot, and the controlling the motion of the robot may be controlling the speed of left and right wheels of the robot. The control of the robot includes: the central position of the target object is combined with the current optimal scale of the target object, and the movement control of the monocular vision mobile robot is realized through the following formula:

specifically, k is_d、k_p2、k_p3Is a proportionality coefficient, V_i ^leftAt left wheel speed, V_i ^rightFor right wheel speed, (u)₀ ^tar,v₀ ^tar) Is the coordinates of the target object at the preset position, (u)₀ ^tar,v₀ ^tar) Position coordinates preset for the target object, (u)_i ^tar,v_i ^tar) Is the current time position coordinate of the target object. S_i ^diffIs the optimal dimension S of the current time i_iDifference from the scale 1 of the initial time, i.e. S_i ^diff＝S_i-1。

In a preferred example, the monocular vision sensor is model Sony FCB-EX11DP, and is capable of obtaining a color image with a resolution of 640 pixels × 480 pixels, cell 4, γ 0, β -0.47, k_p1＝0.1，k_p2＝-140，k_p3＝0.12，k_d＝20。

The method provided in the above embodiment of the present application acquires an image of a tracked object according to a position indicated by an initial target area, and determines an area where the target object is located as a target area from the acquired image, so as to track the target object.

Alternatively, the steps of a method described in this application in connection with the above embodiments may be implemented by hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A target object following method of a mobile robot based on a monocular vision sensor is characterized by comprising the following steps:

step 102: preprocessing the image of the target object, and obtaining a feature matrix of the image of the target object by utilizing feature transformation; the feature matrix is a multi-channel feature matrix;

step 103: determining a response matrix of the characteristic matrix by using a target tracking algorithm, and recording the maximum values of all elements in the response matrix; taking the position of the maximum value corresponding to the element in the response matrix as a displacement coordinate of the target object relative to a pre-acquired initial target area, and determining the position of the target object as a central point of the target object by combining the central position of the initial target area, wherein the step of determining the position of the target object comprises the following steps:

recording the maximum value of all elements in the response matrix;

(u_i ^tar,v_i ^tar)＝(u_i ^tar’,v_i ^tar’)+n_cell×(u_i,v_i)

wherein (u)_i,v_i) (u) a displacement coordinate representing the i-th time instant of the target object in the current frame image relative to the previous frame image_i ^tar’,v_i ^tar’) (u) represents the center position of the initial target region at the i-th time_i ^tar,v_i ^tar) Representing the position of the target object at the i-th instant in the current image frame, n_cellRepresenting the size of the FHOG characteristic parameter cell;

step 106: constructing a detection area by taking the position of the central point as a center, detecting the target object in the detection area again, taking the area where the target object is located obtained by detection as a second area, and taking the second area as the target area of the target object if the proportion of the area of the second area to the area of the image in the image of the target object is greater than the set threshold value;

the step of "constructing a detection area with the position of the central point as a center, detecting the target object again in the detection area, and taking the area where the target object is located obtained by detection as a second area" includes:

carrying out target object detection on the image blocks in the detection area through a pre-constructed SVM model to determine a target frame set;

calculating the multi-channel feature matrix of the image block corresponding to the target frame by using a feature extraction method;

calculating a response matrix of the multi-channel feature matrix according to a KCF tracker and the multi-channel feature matrix, and recording the maximum value of elements in the response matrix;

according to the position of the central point of the target frame, taking the position of a target object as the central position of an effective target object, and constructing a second target area by taking the central position of the target object as the center;

the method further comprises the following steps:

wherein k is_p1Denotes the proportionality coefficient, V_i ^leftIndicating the left wheel speed, V, of the mobile robot at time i_i ^rightRepresents the speed of the right wheel at the i-th moment of the mobile robot, (u)₀ ^tar,v₀ ^tar) (ii) position coordinates representing the target object preset, (u)_last ^tar,v_last ^tar) Representing the actual position coordinates of the target object at the moment before it has left the field of view.

2. The target object following method of claim 1, wherein the step of preprocessing the image of the target object and obtaining the feature matrix of the image of the target object by using feature transformation comprises:

performing convolution calculation on the image of the target object to obtain a middle image block of the image of the target object;

wherein the feature transformation comprises FGOG feature transformation and RGB feature transformation.

3. The target object following method of claim 2, wherein when the feature matrix is the multi-channel feature matrix, the step of performing feature transformation on the intermediate image block to obtain the feature matrix of the image of the target object comprises:

wherein H_i ^FHOGIs the row, W, of the FHOG feature matrix at time i_i ^FHOGIs the column of the FHOG feature matrix at time i, H_i ^tar’Is the length of the initial target region at time i, W_i ^tar’Is the width of the initial target area at time i, cell is the parameter of the descriptor of the FHOG feature matrix, n_cellRefers to the size of the cell;

performing RGB feature transformation on the intermediate image block to obtain a three-channel feature matrix;

carrying out Gaussian down-sampling on the three-channel feature matrix to obtain an RGB feature matrix;

and splicing the FHOG characteristic matrix and the RGB characteristic matrix by a channel splicing method to obtain the multi-channel characteristic matrix of the image of the target object.

4. The target object following method of claim 1, wherein the step of determining the response matrix of the feature matrix using a target tracking algorithm comprises:

generating an initial target characteristic template and an initial parameter matrix of the multi-channel characteristic matrix in a Fourier domain by utilizing a KCF tracking algorithm;

wherein M is_i-1Is a target feature template at time i-1, T_i-1Is a parameter matrix at the i-1 th time, M_iIs a target feature template at time i, T_iIs a parameter matrix at the ith time, M_i' is the initial target feature template at time i, T_i' is an initial parameter matrix at the ith moment, and alpha and beta are scale factors;

P_i＝Γ^-1(M_i☉T_i)

5. The target object following method based on monocular vision sensor of mobile robot of claim 2, wherein the step of determining the area where the target object is located according to the center point and the outline frame of the target object comprises:

6. The target object following method based on monocular vision sensor of mobile robot as recited in claim 5, wherein the step of "obtaining the optimal scale of the target object in the image by the method of adjacent scale sampling and least square fitting" comprises:

carrying out scale transformation on the intermediate image block to obtain a first image block and a second image block;

obtaining a first multi-channel feature matrix and a second multi-channel feature matrix of the first image block and the second image block by using feature transformation;