CN111325770A

CN111325770A - RGBD camera-based target following method, system and device

Info

Publication number: CN111325770A
Application number: CN202010090067.4A
Authority: CN
Inventors: 陈艳红; 崔晓光; 张吉祥
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2020-06-23
Anticipated expiration: 2040-02-13
Also published as: CN111325770B

Abstract

The invention belongs to the field of robots, particularly relates to a target following method, a system and a device based on an RGBD camera, and aims to solve the problems of poor following real-time performance and accuracy of the existing target following method. The system method comprises the following steps: acquiring a depth image; extracting a region of interest of the depth image; screening an interested area containing a human body; calculating the distance between each region of interest after screening and the target to be followed in the previous frame, and removing the region of interest if the distance is larger than the set target moving radius; calculating the similarity between the color histogram of each removed region of interest and the color histogram of the pre-stored target to be followed, if the similarity is greater than a set threshold value, taking the color histogram of each removed region of interest as a possible candidate region, and taking the possible candidate region with the maximum similarity as a target region; and calculating the moving speed of the mobile equipment based on the coordinates of the central point of the target area, and controlling the mobile equipment to move towards the target. The invention improves the real-time performance and accuracy of target following.

Description

RGBD camera-based target following method, system and device

Technical Field

The invention belongs to the field of robots, and particularly relates to a target following method, system and device based on an RGBD camera.

Background

The robot product adopting the intelligent technology brings beneficial promoting effects for life and production operation of people, and on occasions where passengers carry large luggage to go out, such as waiting halls of high-speed railway stations and waiting halls of airports, the intelligent mobile device capable of bearing luggage and automatically walking can reduce fatigue and inconvenience of the passengers to go out, and the intelligent following robot can provide the service.

The core problem that intelligence is followed robot needs to solve belongs to target tracking problem field, and is different from traditional control class target tracking, except that need track the target area on the image in real time, follow robot still need know the physical position of target in order to guide the robot to remove. The existing tracking method based on vision firstly detects the position of a target on an image and then solves the physical position of the target by other means. Since the target is dynamically moved and the area of the target on the image is constantly changed, a common solution is to select a certain number of candidate rectangular frames with different sizes and different positions near the target in the previous frame of image, and compare the candidate rectangular frames with the target model to determine the new target position. The selection of the candidate rectangular boxes is tentative and blind, the real-time performance of the algorithm is reduced due to the fact that too many candidate rectangular boxes are used, the target cannot be selected due to the fact that the number of the candidate rectangular boxes is too small, model drift caused by the fact that too much environment information is brought in due to the fact that the range is too large, and the model flexibility is poor due to the fact that too little target area is covered due to the fact that the range is too small.

The invention aims to provide a target following method, a target following system and a target following device based on an RGBD (red, green and blue) camera, so as to solve the problem in the following application of the existing robot.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problem that the following real-time performance and accuracy are poor due to blind and tentative acquisition and update of a target candidate rectangular frame in the target following process of the existing target following (tracking) method based on a visual means, a first aspect of the present invention provides a target following method based on an RGBD camera, including:

step S100, acquiring a depth image of a surrounding environment as an input image through an RGBD (red, green and blue) camera arranged on mobile equipment;

step S200, extracting the region of interest of the input image, and constructing a region of interest set as a first set;

s300, screening the interested regions containing the human body in the first set, and constructing a second set;

step S400, calculating the distance between the center point coordinate of each interested area in the second set and the coordinate of the target to be followed in the previous frame, and removing the corresponding interested area if the distance is larger than the set target moving radius to obtain a third set;

step S500, calculating the similarity between the color histogram corresponding to each interested area in the third set and the color histogram of the pre-stored target to be followed, if the similarity is greater than a set threshold, taking the color histogram as a possible candidate area, taking the possible candidate area with the maximum similarity as a target area, and updating the target coordinates and the corresponding color histogram;

step S600, calculating the moving speed of the mobile equipment by a double-threshold method according to the coordinates of the central point of the target area, and controlling the mobile equipment to move towards the target; after the movement, the step S100 is skipped until the follow task is finished.

In some preferred embodiments, before the mobile device starts to follow the target to be followed, the method further comprises the step of acquiring the target to be followed:

step A100, acquiring a first region of interest set based on the method in the steps S100-S200;

step A200, calculating the width, height, center point coordinates and distance average values of each interested area in the first interested area set, and screening each interested area according to the comparison of the width and height with the height and shoulder width ranges of a set human body to obtain a second interested area set; the distance average value is the average value of the depth distance of each pixel point in the region of interest;

and step A300, if the second interested area set is empty, executing the step A100, otherwise, taking the target corresponding to the interested area with the minimum distance average value and the distance between the center point coordinate and the vertical center line of the imaging plane of the camera within a set threshold range in the second set as the target to be followed.

In some preferred embodiments, the step S200 of "extracting a region of interest of the input image" includes:

clustering depth distances corresponding to all pixel points in the input image by a preset clustering method, carrying out binarization processing on the input image according to a clustering result, and extracting an interested region;

the preset clustering method comprises the following steps:

obtaining the depth distance corresponding to each pixel point in the input image, and constructing a depth distance set D, wherein D is { D ═ D₁,d₂,…,d_N}，d_NExpressing the depth distance of each pixel point;

traverse the depth distance set D by D₁As initial cluster center s_d1Initializing cluster center set S ═ S_d1If d_iAt any one cluster center S of S_djSet radius R of_dThe cluster is classified in the range, and the cluster center is updated; if d is_iThe distances from all the cluster centers in S are larger than R_dThen d is added_iAs new cluster centers s_diUpdate S ═ S_d1,…,s_diBefore the set D is traversed; wherein i, j represents a subscript.

In some preferred embodiments, the width and height of the region of interest are calculated by:

width＝rect_width*averDepth/fx

height＝rect_ytop*averDepth/fy

wherein the content of the first and second substances,width and height are width and height of the region of interest, rect_widthPixel width, rect, of a circumscribed rectangle of the region of interest_ytopThe pixel coordinate of the top left corner of the circumscribed rectangle of the region of interest is a row coordinate, averDepth is a distance average value, and fx and fy are scale factors of the visible light camera in the directions of the horizontal axis and the longitudinal axis.

In some preferred embodiments, the coordinates of the center point are physical coordinates of the region of interest in a camera coordinate system, and the calculation method is as follows:

wherein (u, v) is coordinates of the center of the rectangle on the image, (cx, cy) is a principal point of the visible light camera of the RGBD camera, and (x, y, z) is coordinates of the center point.

In some preferred embodiments, the moving speed includes a linear speed and an angular speed; the linear velocity calculation method comprises the following steps:

v＝v＜v_max？v:v_max

v＝v＞0？v:0

the angular velocity is calculated by the following method:

ω＝|ω|＜ω_max？ω:sign(ω)*ω_max

wherein v is linear velocity, v_maxIs the maximum linear velocity, dist_rangeIncrease of linear velocity from 0 to v_maxRequired interval length, R_start、R_stopIs a straight line starting distance threshold value, a straight line ending distance threshold value, x_target、z_targetIs the target position updated according to the center point coordinates of the target area, omega is the angular velocity, omega_maxIs the maximum value of angular velocity, x_start、x_stopIs a rotation start distance threshold value, a rotation end distance threshold value, x_rangeFor increasing angular velocity from 0 to ω_maxRequired interval length, x_start、x_stopThe threshold value is a rotation starting distance threshold value and a rotation ending distance threshold value.

In some preferred embodiments, step S600 is followed by step S700,

if the second set is empty and/or the third set is empty and/or the number of the possible candidate areas is 0, counting the time length of the target loss, if the time length is greater than the set time length, stopping the following task, otherwise, skipping to the step S100.

The invention provides a target following system based on an RGBD camera, which comprises an acquisition module, an interesting region extraction module, a first screening module, a second screening module, a target region acquisition module and a control equipment moving module, wherein the interesting region extraction module is used for extracting an interesting region from the interesting region;

the acquisition module is configured to acquire a depth image of a surrounding environment as an input image through an RGBD camera arranged on the mobile device;

the region-of-interest extracting module is configured to extract a region of interest of the input image and construct a region-of-interest set as a first set;

the first screening module is configured to screen the region of interest containing the human body in the first set and construct a second set;

the second screening module is configured to calculate a distance between a center point coordinate of each region of interest in the second set and a coordinate of a target to be followed in the previous frame, and if the distance is larger than a set target moving radius, remove the corresponding region of interest to obtain a third set;

the target area obtaining module is configured to calculate similarity between a color histogram corresponding to each region of interest in the third set and a color histogram of a pre-stored target to be followed, if the similarity is greater than a set threshold, the color histogram is used as a possible candidate area, the possible candidate area with the maximum similarity is used as a target area, and target coordinates and the color histogram corresponding to the target coordinate are updated;

the control equipment moving module is configured to calculate the moving speed of the mobile equipment through a double-threshold method according to the coordinates of the central point of the target area and control the mobile equipment to move towards the target; and after moving, skipping the acquisition module until the follow-up task is finished.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being loaded and executed by a processor to implement the RGBD camera-based object following method described above.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the RGBD camera based object following method described above.

The invention has the beneficial effects that:

the invention improves the real-time performance and accuracy of target following. The depth distance of each pixel in the depth image acquired by the RGBD camera is clustered, the depth image is binarized according to a clustering result, and the region of interest is extracted. And (3) the depth data is used in the target tracking process, and the image segmentation is realized by using the depth data to obtain rectangular frames of all non-adhesive objects in the space where the camera is located.

Meanwhile, the candidate rectangular frame screening condition is further designed based on the 'limited human body walking motion range', the obtained candidate rectangular frame is compared with the target model to realize target following, and compared with the blind and tentative candidate rectangular frame obtaining, the candidate rectangular frame obtained by the method is more accurate and effective, and has good robustness for common tracking problems of target quick movement, target deformation, target scale transformation and the like.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

FIG. 1 is a schematic flow chart of an RGBD camera-based target following method according to an embodiment of the present invention;

FIG. 2 is a block diagram of an RGBD camera based target following system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a hardware structure of an RGBD camera-based target-following mobile device according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an RGBD camera coordinate system according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a visible light image, a depth data mapping image and a depth data point cloud obtained by an RGBD camera according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of RGBD camera based target locking, following according to an embodiment of the present invention;

FIG. 7 is a schematic illustration of a target-lock modality of an embodiment of the invention;

FIG. 8 is a schematic diagram of clustering of depth data according to one embodiment of the invention;

FIG. 9 is a schematic diagram of image segmentation based on depth data according to an embodiment of the present invention;

FIG. 10 is a schematic illustration of a travel speed calculation according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an occluded target in the following according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

In the following embodiments, the RGBD camera based target following mobile device is described first, and then the RGBD camera based target following method applied to the device is described.

1. Target following mobile device based on RGBD camera

The RGBD camera-based target following moving device of the present invention, as shown in fig. 3, includes: the system comprises an RGBD camera, an onboard processor, a display screen, a loudspeaker, a differential two-wheel driver and communication equipment (such as APP equipment) connected with a target following device;

the RGBD camera is used for acquiring a depth image, the depth image comprises a visible light image and corresponding depth data, the visible light image is a R, G, B three-channel color image, the depth data is the depth distance between the surface of each object in a three-dimensional space where the camera is located and an imaging plane of the camera, the width/height of the visible light image is colorW/colorH, the depth data is the depth distance between the surface of the object corresponding to each pixel point on the visible light image and the imaging plane of the camera, the visible light image and the depth data are used for initially locking and tracking a human body target, the RGBD camera is mounted in the middle of the upper portion of a front panel of the mobile device and is at a certain height from the ground, and in the embodiment of the invention, the ground distance height of the camera is preferably set to;

the system comprises an onboard processor, a following program, a following state issuing program and a following program, wherein the following program is used for calculating a motion control speed, the motion control speed is the control speed of a differential two-wheel driver and specifically comprises a linear speed and an angular speed (v, omega) (a forward speed and a turning speed), the following state specifically comprises ① starting following service ②, target locking success ③, target blocking ④, target losing ② 0, target locking ② 1 and ending following service, a state ① reminds a user to enter a specified area range right in front of the device for locking the target, in the specific embodiment of the invention, the area range is 50 centimeters to 1 meter right in front of the device, the state ② informs the user that the user can start normal walking, the state ③ reminds the user of blocking between the device and the user, the state ④ reminds the user of losing the target, the target needs to be locked again, the state ⑤ informs the user of waiting for the device to lock the target again, and the state ⑥ informs the user of ending the;

the display screen is used for displaying an operation interface, the operation interface comprises a two-dimensional code and a clickable button, the screen is a touch screen and can be manually clicked, and meanwhile, a user can scan the two-dimensional code on the screen through App equipment and start following service on the App equipment;

the loudspeaker is used for broadcasting the following state issued by the onboard processor and broadcasting different voice contents according to different following states, for example, in the specific embodiment of the invention, the state ① broadcasts 'start following service, please stand in the range of 50 cm to 1 m in front of me and wait for two seconds of me', and corresponding voice contents are broadcasted in other states on the basis of comfortable user experience;

the following device can move forwards, rotate left and rotate right and can not move backwards in the following process, the motion control speed calculated by the onboard processor is the mass center speed of the differential two-wheel driver, the mass center speed is converted into the respective speeds of the two wheels to follow the differential kinematics model, and the details are omitted;

app equipment, the user can start following the service through the two-dimensional code on the App equipment scanning screen, shows operation interface on App equipment, and operation interface contains the button that can click, and in addition App equipment can vibrate when appearing following the target and lose and remind, and App equipment carries out the information interaction through wireless communication and airborne processor.

The basic operation flow of the mobile device is as follows: when a user starts the following service by clicking a screen button or scanning a two-dimensional code on a screen by App equipment, the onboard processor receives a control instruction for starting the following mode, starts the following process, indicating a user to stand in a specified range in front of the camera through loudspeaker voice broadcast, entering a target locking mode, initializing a color model and a target position after the target is successfully locked, then entering a following movement mode, the on-board processor issues the following state in real time in the following process of the human body, if the target is lost in following, the on-board processor sends voice broadcast to the loudspeaker, meanwhile, a vibration prompt is sent to the App equipment to inform the user to stand in a specified range in front of the camera again, the mode is converted into a locking target mode along with the movement, and after the target is picked up again, entering a following movement mode, and continuing the human body following until the user clicks a button on a screen or App equipment to finish following service.

2. RGBD camera-based target following method

The invention discloses an RGBD camera-based target following method, which comprises the following steps of:

In order to more clearly describe the RGBD camera-based object following method of the present invention, the following describes steps in an embodiment of the method in detail with reference to the drawings.

In step S100, a depth image of a surrounding environment is acquired as an input image by an RGBD camera provided on a mobile device.

Fig. 1 is a schematic flow chart of the RGBD camera-based target following method of the present invention, and the specific processing is detailed in the following steps.

In this embodiment, before the mobile device (or the mobile device) starts to follow the target, the target is locked, that is, the target to be followed is acquired. The steps of acquiring the target to be followed are shown in fig. 6 and 7, and the specific processing procedure is as follows:

step A100, obtaining a depth image, and extracting a region of interest set as a first region of interest set.

When a user starts the following service by clicking a screen button or scanning a two-dimensional code on a screen by App equipment, the user is reminded of the user to stand in a preset range in front of the RGBD camera, and a depth image of the surrounding environment is acquired by the RGBD camera arranged on mobile equipment (mobile device). The depth image comprises a visible light image and depth data, the depth data is the depth distance from the object surface corresponding to each pixel point of the visible light image to the imaging plane of the camera, and a depth distance set D is constructed, wherein D is { D ═ D { (D)₁,d₂,…,d_NN ═ colorW ═ colorH, where d_NAnd the depth distance of each pixel point is represented. As shown in the camera coordinate system of fig. 4, the depth distance specifically refers to z-axis coordinates, and the depth data set length is equal to the number of pixels of the visible image. The final purpose of this step is to segment out suspected persons in the imageThe depth data is the distance between the object surface and the camera imaging plane, the mobile equipment runs on the indoor flat ground, the camera sight line is horizontal, and under the normal condition, the pedestrian vertically walks, so the object can be collected by the radius R_dThe depth distance set is clustered, in this embodiment, with a radius R_dPreferably 40 cm. Clustering the depth distance set by taking each pixel point as a clustering center based on a set distance radius to obtain a clustering center set S ═ S_d1,s_d2,…,s_dnN represents subscript, the clustering method in the invention is similar to a Mean shift clustering method (Mean shift algorithm), and the clustered depth data is shown in fig. 8. The specific method comprises the following steps:

traverse the depth distance set D by D₁As initial cluster center s_d1Initializing cluster center set S ═ S_d1If d_iAt any one cluster center S of S_djRadius R of_dThe cluster is classified as the scope, and the cluster center s is updated_dj＝∑d_k+d_i/m+1,d_k∈s_djWherein ∑ d_kAs the cluster center s_djIn which m is the number of existing elements, if d_iThe distances from all the cluster centers in S are larger than R_dThen d is added_iAs new cluster centers s_diUpdate S ═ S_d1,…,s_diAnd f, until the set D is traversed, wherein i, j and k represent subscripts.

Traversing S, traversing depth data for each depth cluster center, wherein the depth data corresponds to pixels on the visible light image one by one and is to be at the cluster center radius R_dThe pixels corresponding to the depth data in the range are marked as white, and the rest are marked as black, so that a binary image is obtained, as shown in fig. 5. Extracting the region of interest according to the binarized image, and constructing a region of interest set S_Img＝{s_img1,s_img2,…s_imgmM represents subscript, fig. 9 shows a certain depth cluster center radius R after clustering frame depth data in the embodiment of the present invention_dThe region of interest obtained by the image is marked within the range.

Step A200, calculating the width, height, center point coordinates and distance average values of each interested area in the first interested area set, and screening each interested area according to the comparison of the width and height with the height and shoulder width ranges of a set human body to obtain a second interested area set; the distance average value is the average value of the depth distance of each pixel point in the region of interest.

In this embodiment, the distance average value averDepth (the average depth of the pixel set from the camera), the width of the region, the height of the region, and the center point coordinate (the physical coordinate of the center point of the circumscribed rectangle of the region in the camera coordinate system) (x, y, z) of each region of interest are counted, and the first region of interest set is screened according to the general height range and shoulder width range of the human body to obtain a region of interest set meeting the anthropological conditions, which is used as the second region of interest.

The distance average value calculation method for each region of interest is shown in formula (1):

the calculation method of the physical coordinates of the center point of the circumscribed rectangle of the region of interest in the camera coordinate system is shown in formula (2):

wherein, (u, v) are coordinates of the center of the circumscribed rectangle of the region of interest on the image, fx and fy are scale factors of the visible light camera in the directions of the horizontal axis and the longitudinal axis, and (cx and cy) are principal points of the visible light camera of the RGBD camera.

The calculation method of the width and the height of the region of interest is shown in the formulas (3) and (4):

width＝rect_width*averDepth/fx (3)

height＝rect_ytop*averDepth/fy(4)

wherein, rect_widthFor the external moment of the region of interestPixel width of shape, rect_ytopLine coordinates of the top left corner pixel of the circumscribed rectangle for the region of interest.

In the embodiment of the present invention, the height range of the human body is preferably set to (1.1 m to 1.9 m), and the shoulder width range is preferably set to (15 cm to 60 cm).

Step a300, if the second set of regions of interest is empty, performing step a100, otherwise, taking the target corresponding to the region of interest in the second set, which has the smallest distance average value and the distance between the center point coordinate and the vertical center line of the imaging plane of the camera within the set threshold range, as the target to be followed, where the threshold is preferably set to ± 25 cm.

And selecting the interested region with the minimum distance average value (the average depth from the camera) and the distance between the center point coordinate and the vertical center line of the imaging plane of the camera within a threshold range (the center of the circumscribed rectangle of the region is within a certain range left and right of the vertical center line of the imaging plane of the camera) as the target to be followed according to the obtained second interested region set.

And if the set of the regions of interest after screening is empty, determining that the target of the current frame fails to be locked, and acquiring the image again, otherwise, determining that the target of the current frame is successfully locked.

In the invention, in order to further improve the success rate of target locking, multi-frame depth images are continuously acquired for confirming the target to be followed. The specific process is as follows:

when the target locking is successful for more than N seconds, the target locking mode is finished, the user is informed that the target locking is successful and can start to follow the service through loudspeaker broadcasting, and the user is reminded to walk normally, wherein N is set to be 2 in the embodiment of the invention.

Each frame of image successfully targeting results in one image region of interest, and a duration of targeting success greater than N seconds results in about N × fps image regions of interest, where fps is the frame rate of the image, and in embodiments of the invention, fps is 15 frames per second. Selecting an interested region with the largest area in the outline of the interested region as a final follow target, initializing a target color model (color histogram) by using an image region contained in the interested region on a visible light image, initializing a target position by using a physical coordinate of a central point of a circumscribed rectangle of the interested region, and entering a follow moving mode.

Step S200, extracting the interested region of the input image, and constructing an interested region set as a first set.

In this embodiment, the specific processing procedure of this step is as in step a100 described above. The detailed description is not repeated here.

And S300, screening the interested regions containing the human body in the first set, and constructing a second set.

In this embodiment, the specific processing procedure of this step is as in the above-mentioned steps a200-a300, and the second set is obtained by comparing the width and height with the height and shoulder width of the set human body, and if the second set is empty, the step S700 is skipped.

Step S400, calculating the distance between the center point coordinate of each interested area in the second set and the coordinate of the target to be followed in the previous frame, and if the distance is larger than the set target moving radius, removing the corresponding interested area to obtain a third set.

In this embodiment, the second set is traversed, and the distance between the physical coordinates of each region of interest and the position of the target to be followed in the previous frame, that is, the moving distance D 'of the target, is calculated'_moveThe calculation process is shown in formula (5):

wherein, (x ', z') is the physical coordinate of the current region of interest, (x)_target,z_target) In order to store the target position of the previous frame, the mobile device runs indoors to level the ground, so that only two-dimensional coordinates of x and z are needed.

Setting a target moving radius R by counting the maximum walking speed of the person according to the distance between the physical coordinates of each interested area and the position of the target in the previous frame_moveScreening the interested areas if D'_moveLess than R_moveAnd reserving the region of interest, otherwise deleting the region of interest, if the region of interest in the target moving range is not empty, continuing to screen the target, otherwise, skipping to the step S700.

According to statistics, the maximum walking speed of a person is about 1.2 meters per second under normal conditions, because image data is processed in real time, the frame frequency of an image in the embodiment of the invention is 15 frames per second, and the distance that the person can move in the time period from the previous frame image to the current frame image is about 0.08 meter, based on the fact that the region of interest can be screened according to the target movement range, and for the sake of conservation, in the embodiment of the invention, R is used for screening the region of interest according to the target movement range_movePreferably 20 cm.

Step S500, calculating a similarity between the color histogram corresponding to each region of interest in the third set and a pre-stored color histogram of the target to be followed, if the similarity is greater than a set threshold, taking the color histogram as a possible candidate region, taking the possible candidate region with the maximum similarity as a target region, and updating the target coordinates and the color histogram corresponding to the target region.

In this embodiment, according to the regions of interest screened in step S400, a target color model (color histogram) of each region of interest is obtained, and the color histogram of the target to be followed is obtained when the target is locked to perform similarity comparison, and if the similarity is greater than a set threshold thresh, the similarity is compared with the color histogram of the target to be followed_colorAnd taking the corresponding interested region as a possible candidate region, and selecting a target candidate region with the maximum similarity as a new target, namely a region of the target to be followed in the image of the frame. If there is no possible candidate area, step S700 is skipped.

Because the embodiment of the present invention defines the target color model as a three-channel statistical histogram (color histogram) of the visible light image R, G, B in the region of interest, the comparison value range is (0,1), and the more similar the two color histograms are, the closer the comparison value is to the comparison value1, in the embodiment of the present invention, the threshold thresh_colorPreferably set to 0.3.

Acquiring the position of the target to be followed in the image of the frame and the corresponding color histogram for updating, wherein the physical coordinates (x, y, z) of the central point of the circumscribed rectangle of the region of interest in the camera coordinate system are updated to the target position (x)_target,z_target)。

In the embodiment, the moving speed is calculated by adopting a double-threshold method, and the motion switching value can be within a range interval instead of a single threshold by adopting double-threshold control, so that the motion robustness is facilitated, and the motion oscillation is reduced. The moving speed is divided into a linear speed and an angular speed, wherein the linear speed calculation method is shown in formulas (6), (7), (8) and (9):

v＝v＜v_max？v:v_max(7)

v＝v＞0？v:0 (8)

wherein v is linear velocity, v_maxIs the maximum linear velocity, dist_rangeIncrease of linear velocity from 0 to v_maxRequired interval length, R_start、R_stopThe straight line starting distance threshold and the straight line ending distance threshold.

The angular velocity is calculated by the following equation (10) (11):

ω＝|ω|＜ω_max？ω:sign(ω)*ω_max(11)

where ω is the angular velocity, ω_maxIs the maximum value of angular velocity, x_start、x_stopIs a rotation start distance threshold value, a rotation end distance threshold value, x_rangeFor increasing angular velocity from 0 to ω_maxRequired interval length, x_start、x_stopThe threshold value is a rotation starting distance threshold value and a rotation ending distance threshold value.

As shown in FIG. 10, in the present embodiment, v_max、ω_max、dist_range、x_rangeDetermined from the motion characteristics of a differential two-wheel drive, R_start、R_stopDetermined according to the safe distance the mobile device needs to maintain from the pedestrian, x_start、x_stopFrom the RGBD camera view determination. Wherein v is_maxPreferably set at 1 meter per second, ω_maxPreferably set at 0.6 radians per second, dist_rangePreferably set at 0.5m, x_rangePreferably set at 0.4 m, R_startPreferably set at 0.6 m, R_stopPreferably set at 0.5m, x_startPreferably set at 0.2 meters, x_stopPreferably set at 0.1 meters.

In the moving process, if the following target is shielded, the moving is stopped, and meanwhile, the user is reminded of shielding through voice broadcasting and please slow down the walking speed. The method for judging the target is blocked is shown in fig. 11, the depth data point cloud is projected to the ZX plane to obtain a temporary map of the local environment where the device is located, the coordinates of the mobile device on the temporary map are known, whether other obstacles exist on the connecting line of the device and the target center is judged, if the obstacles exist, the device stops moving, and meanwhile, the voice broadcast reminds the user of the blockage and asks to slow down the walking speed

Step S700, counting the time length of target loss, if the time length is greater than the set time length, ending the following task, otherwise, skipping to step S100.

In this embodiment, the region of interest screened in the above step is empty, that is, step S300 to step S500, it is considered that the target is lost, the target in the previous frame is lost, and the target in the current frame is also lost, which is called as a persistent target loss, and when the persistent target loss is greater than M seconds, it is considered that the following target is lost, and the target modality is shifted to be locked.

In an actual situation, the following target can be lost due to sudden walking acceleration, sudden turning and shielding by other objects of a user, the human body following method cannot guarantee that the target is not lost absolutely, but the following target can be found in time after being lost, and voice broadcasting and reminding are carried out in time, under the cooperation of the user, the target is picked up again after being lost, and the human body following method still can normally follow the service, and in addition, the situation that the target is lost basically cannot occur when the user normally walks, does not fast walk or runs, and normally turns around, and the density of obstacles in the environment is sparse (the space allows the mobile device to rotate in place to be a reference standard, and the turning radius of the mobile device in the embodiment of the invention is 0.5 m).

When the following target is lost, the onboard processor reports 'failure of following the target through the loudspeaker, please re-station to the designated position area in front of the mobile device', and at the same time, the App device vibrates to remind the user to re-lock the target in the designated area range in front of the mobile device, and reminds the user of normal walking and continuing to follow the mode after the target is successfully locked. The human body following method based on the RGBD camera is switched between a target locking mode and a following moving mode according to actual conditions until a user clicks an App device or a button on a screen to finish following service.

An RGBD camera-based target following system according to a second embodiment of the present invention, as shown in fig. 2, includes: the system comprises an acquisition module 100, an interesting region extraction module 200, a first screening module 300, a second screening module 400, an object region acquisition module 500 and a control equipment moving module 600;

the acquiring module 100 is configured to acquire a depth image of a surrounding environment as an input image through an RGBD camera disposed on a mobile device;

the region-of-interest extracting module 200 is configured to extract a region of interest of the input image, and construct a region-of-interest set as a first set;

the first screening module 300 is configured to screen regions of interest including human bodies in the first set, and construct a second set;

the second screening module 400 is configured to calculate a distance between a center point coordinate of each region of interest in the second set and a coordinate of the target to be followed in the previous frame, and if the distance is greater than a set target movement radius, remove the corresponding region of interest to obtain a third set;

the target region obtaining module 500 is configured to calculate similarity between a color histogram corresponding to each region of interest in the third set and a color histogram of a pre-stored target to be followed, if the similarity is greater than a set threshold, the color histogram is used as a possible candidate region, and the possible candidate region with the maximum similarity is used as a target region;

the control device moving module 600 is configured to calculate a moving speed of the mobile device by a dual-threshold method according to the coordinates of the central point of the target area, and control the mobile device to move to the target; after moving, the acquisition module 100 is jumped until the following task is finished.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the RGBD camera-based target following system provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores therein a plurality of programs adapted to be loaded by a processor and to implement the above-described RGBD camera-based object following method.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the RGBD camera based object following method described above.

It can be clearly understood by those skilled in the art that, for convenience and brevity, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. An RGBD camera-based target following method is characterized by comprising the following steps:

2. The RGBD-camera-based target following method according to claim 1, further comprising the step of acquiring the target to be followed before the mobile device starts following the target to be followed:

3. The RGBD-camera-based target following method according to claim 1, wherein the step S200 of "extracting the region of interest of the input image" comprises:

the preset clustering method comprises the following steps:

4. The RGBD camera-based target following method according to claim 1, wherein the width and height of the region of interest are calculated as follows:

width＝rect_width*averDepth/fx

height＝rect_ytop*averDepth/fy

wherein, width and height are width and height of the region of interest, rect_widthPixel width, rect, of a circumscribed rectangle of the region of interest_ytopThe pixel coordinate of the top left corner of the circumscribed rectangle of the region of interest is a row coordinate, averDepth is a distance average value, and fx and fy are scale factors of the visible light camera in the directions of the horizontal axis and the longitudinal axis.

5. The RGBD camera-based target following method according to claim 4, wherein the center point coordinates are physical coordinates of the region of interest in a camera coordinate system, and the calculation method is as follows:

wherein, (u, v) is coordinates of the center of the circumscribed rectangle of the region of interest on the image, (cx, cy) is a principal point of the visible light camera of the RGBD camera, and (x, y, z) is coordinates of the center point.

6. The RGBD camera-based object following method according to claim 5, wherein the moving speed comprises a linear speed and an angular speed; the linear velocity calculation method comprises the following steps:

v＝v＜v_max？v:v_max

v＝v＞0？v:0

the angular velocity is calculated by the following method:

ω＝|ω|＜ω_max？ω:sign(ω)*ω_max

7. The RGBD camera based target following method according to any of the claims 1-6, wherein step S600 is followed by step S700,

8. An RGBD camera-based target following system, comprising: the system comprises an acquisition module, an interesting region extraction module, a first screening module, a second screening module, a target region acquisition module and a control equipment moving module;

9. A storage device having stored therein a plurality of programs, wherein the program applications are loaded and executed by a processor to implement the RGBD camera based object following method as claimed in any one of claims 1 to 7.

10. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterised in that the program is adapted to be loaded and executed by a processor to implement an RGBD camera based object following method as claimed in any of claims 1-7.