CN111325770B

CN111325770B - RGBD camera-based target following method, system and device

Info

Publication number: CN111325770B
Application number: CN202010090067.4A
Authority: CN
Inventors: 陈艳红; 崔晓光; 张吉祥
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-02-13
Filing date: 2020-02-13
Publication date: 2023-12-22
Anticipated expiration: 2040-02-13
Also published as: CN111325770A

Abstract

The invention belongs to the field of robots, in particular relates to a target following method, a target following system and a target following device based on an RGBD camera, and aims to solve the problems of poor following instantaneity and accuracy of the existing target following method. The system method comprises the following steps: acquiring a depth image; extracting a region of interest of the depth image; screening a region of interest containing a human body; calculating the distance between each region of interest after screening and the target to be followed in the previous frame, and removing if the distance is larger than the set target movement radius; calculating the similarity between the color histogram of each removed region of interest and the color histogram of the pre-stored target to be followed, if the similarity is larger than a set threshold value, taking the color histogram as a possible candidate region, and taking the possible candidate region with the largest similarity as a target region; and calculating the moving speed of the mobile equipment based on the center point coordinates of the target area, and controlling the mobile equipment to move towards the target. The invention improves the real-time performance and accuracy of target following.

Description

RGBD camera-based target following method, system and device

Technical Field

The invention belongs to the field of robots, and particularly relates to a target following method, a target following system and a target following device based on an RGBD camera.

Background

The intelligent robot product brings beneficial promotion effects to life and production operation of people, and can bear luggage and automatically walk on occasions where passengers carry large pieces of luggage to travel, such as a waiting hall of a high-speed rail station, a waiting hall of an airport and the like, so that fatigue and inconvenience of the traveling of the passengers can be relieved, and the intelligent following robot can provide the service.

The core problem that intelligent following robot need solve belongs to the target tracking problem field, and unlike traditional control class target tracking, except need the real-time tracking target area on the image, following the robot still need know the physical position of target in order to instruct the robot to remove. Existing vision-based tracking methods first detect the location of the target on the image and then resolve the physical location of the target by other means. Since the target is dynamically moving, the area of the target on the image is also constantly changing, and a common solution is to select a certain number of candidate rectangular frames with different sizes and different positions near the target in the previous frame of image, and compare the candidate rectangular frames with the target model to determine a new target position. The selection of the candidate rectangular frames has heuristics and blindness, the real-time performance of the algorithm is reduced due to the fact that too many candidate rectangular frames can reduce the number of the candidate rectangular frames, targets can not be selected due to the fact that too many environment information is brought into the range, model drift is caused by the fact that too few range covers the target area, and model flexibility is poor due to the fact that too few range covers the target area.

The invention aims to provide a target following method, a target following system and a target following device based on an RGBD camera, which are used for solving the problem in the existing robot following application.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, to solve the problem that the existing target following (tracking) method based on visual means blindly acquires and updates the target candidate rectangular frame in the target following process, and causes poor following instantaneity and accuracy, the first aspect of the present invention provides a target following method based on an RGBD camera, which includes:

step S100, acquiring a depth image of the surrounding environment as an input image by an RGBD camera arranged on the mobile device;

step S200, extracting a region of interest of the input image, and constructing a region of interest set as a first set;

step S300, screening the region of interest containing the human body in the first set, and constructing a second set;

step S400, calculating the distance between the coordinates of the central point of each region of interest in the second set and the coordinates of the target to be followed in the previous frame, and removing the corresponding region of interest if the distance is larger than the set target movement radius to obtain a third set;

Step S500, calculating the similarity between the color histogram corresponding to each region of interest in the third set and the pre-stored color histogram of the target to be followed, if the similarity is greater than a set threshold, using the color histogram as a possible candidate region, using the possible candidate region with the greatest similarity as a target region, and updating the target coordinates and the corresponding color histogram thereof;

step S600, calculating the moving speed of the mobile equipment according to the center point coordinates of the target area by a double-threshold method, and controlling the mobile equipment to move towards the target; after the movement, step S100 is skipped until the following task ends.

In some preferred embodiments, before the mobile device starts to follow the target to be followed, the method further comprises the step of acquiring the target to be followed:

step A100, acquiring a first region of interest set based on the methods in the steps S100-S200;

step A200, calculating the coordinates of the width, height, center point and distance average value of each region of interest in the first region of interest set, and screening each region of interest according to the comparison of the width and height with the set height and shoulder width ranges of the human body to obtain a second region of interest set; the distance average value is an average value of depth distances of all pixel points in the region of interest;

And step A300, if the second region of interest set is empty, executing step A100, otherwise, taking a target corresponding to the region of interest, in which the distance between the center point coordinate and the imaging plane vertical center line of the camera is within a set threshold range, in the second set, as a target to be followed.

In some preferred embodiments, the method of "extracting the region of interest of the input image" in step S200 is:

clustering depth distances corresponding to all pixel points in the input image by a preset clustering method, and performing binarization processing on the input image according to a clustering result to extract a region of interest;

the preset clustering method comprises the following steps:

obtaining depth distances corresponding to all pixel points in the input image, and constructing a depth distance set D, D= { D ₁ ,d ₂ ,…,d _N }，d _N Representing the depth distance of each pixel point;

traversing the depth distance set D to D ₁ For the initial cluster center s _d1 Initializing a cluster center set S= { S _d1 If d _i At any cluster center S of S _dj Is set to radius R _d The range is classified into the cluster, and the cluster center is updated at the same time; if d _i The distance from all cluster centers in S is larger than R _d Then add d _i For a new cluster center s _di Update s= { S _d1 ,…,s _di -until traversing set D; where i, j represent subscripts.

In some preferred embodiments, the area of interest is wide and high, and the calculation method is as follows:

width＝rect _width *averDepth/fx

height＝rect _ytop *averDepth/fy

wherein width and height are the width and height of the region of interest, rect _width The pixel width circumscribing a rectangle for the region of interest, rect _ytop Circumscribing an upper left corner of a rectangle for a region of interestPixel row coordinates, averDepth, distance average, fx, fy, scale factors of the visible camera in the horizontal and vertical axis directions.

In some preferred embodiments, the center point coordinates are physical coordinates of the region of interest in a camera coordinate system, and the calculation method is as follows:

where, (u, v) is the coordinates of the rectangular center on the image, (cx, cy) is the principal point of the visible light camera of the RGBD camera, and (x, y, z) is the center point coordinates.

In some preferred embodiments, the movement speed includes a linear speed and an angular speed; the linear velocity is calculated by the following steps:

v＝v＜v _max ？v:v _max

v＝v＞0v:0

the angular velocity is calculated by the following steps:

ω＝|ω|＜ω _max ？ω:sign(ω)*ω _max

wherein v is linear velocity, v _max For maximum linear velocity, dist _range To increase the linear velocity from 0 to v _max The required interval length, R _start 、R _stop Is a straight-line start distance threshold value and a straight-line end distance threshold value, x _target 、z _target For updating target position according to central point coordinates of target areaOmega is angular velocity, omega _max For maximum angular velocity x _start 、x _stop For the rotation start distance threshold value, the rotation end distance threshold value, x _range To increase angular velocity from 0 to omega _max The required interval length, x _start 、x _stop The rotation start distance threshold value and the rotation end distance threshold value.

In some preferred embodiments, step S600 is followed by step S700,

if the second set is empty and/or the third set is empty and/or the number of possible candidate areas is 0, counting the duration of losing the target, if the duration is longer than the set duration, stopping the following task, otherwise, jumping to the step S100.

The invention provides a target following system based on an RGBD camera, which comprises an acquisition module, a region of interest extraction module, a first screening module, a second screening module, a target region acquisition module and a control equipment movement module;

the acquisition module is configured to acquire a depth image of the surrounding environment through an RGBD camera arranged on the mobile device as an input image;

the interesting region extraction module is configured to extract interesting regions of the input image, and construct an interesting region set as a first set;

The first screening module is configured to screen the region of interest containing the human body in the first set, and construct a second set;

the second screening module is configured to calculate the distance between the coordinates of the central point of each region of interest in the second set and the coordinates of the target to be followed in the previous frame, and if the distance is greater than the set target movement radius, remove the corresponding region of interest to obtain a third set;

the target area obtaining module is configured to calculate the similarity between the color histogram corresponding to each region of interest in the third set and the pre-stored color histogram of the target to be followed, if the similarity is greater than a set threshold, the target area is used as a possible candidate area, and the possible candidate area with the greatest similarity is used as a target area, and the target coordinates and the corresponding color histogram thereof are updated;

the control equipment moving module is configured to calculate the moving speed of the mobile equipment according to the center point coordinates of the target area by a double-threshold method and control the mobile equipment to move towards the target; and after the movement, skipping to the acquisition module until the following task is finished.

In a third aspect of the present invention, a storage device is provided in which a plurality of programs are stored, the program applications being loaded and executed by a processor to implement the above-described RGBD camera-based target following method.

In a fourth aspect of the present invention, a processing device is provided, including a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described RGBD camera based target following method.

The invention has the beneficial effects that:

the invention improves the real-time performance and accuracy of target following. According to the method, the depth distances of all pixels in the depth image obtained by the RGBD camera are clustered, the depth image is binarized according to the clustering result, and the region of interest is extracted. And using the depth data in a target tracking process, and using the depth data to realize image segmentation to obtain rectangular frames of all non-adhesion objects in the space where the camera is located.

Meanwhile, the invention further designs candidate rectangular frame screening conditions based on 'limited human walking movement range', the obtained candidate rectangular frames are compared with a target model to realize target following, and compared with the blind and tentative acquisition of the candidate rectangular frames, the candidate rectangular frames obtained by the method are more accurate and effective, and have good robustness to common tracking problems such as rapid target movement, target deformation, target scale transformation and the like.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings.

FIG. 1 is a flow chart of an RGBD camera-based target following method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a frame of an RGBD camera-based target following system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the hardware architecture of an RGBD camera-based target-following mobile device according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an RGBD camera coordinate system of an embodiment of the present invention;

FIG. 5 is a schematic diagram of a visible light image, a depth data map image, and a depth data point cloud acquired by an RGBD camera according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of targeting, following, of an RGBD-based camera according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a target locking modality of an embodiment of the invention;

FIG. 8 is a schematic representation of clustering of depth data according to one embodiment of the invention;

FIG. 9 is a schematic representation of depth data based image segmentation in accordance with one embodiment of the present invention;

FIG. 10 is a schematic diagram of a movement velocity solution according to an embodiment of the present invention;

FIG. 11 is a schematic diagram of an object being occluded in the following process in accordance with one embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other.

In the following embodiments, an RGBD camera-based target following mobile device is described first, and then an RGBD camera-based target following method applied to the device is described.

1. RGBD camera-based target following mobile device

The RGBD camera-based target following mobile device of the present invention, as shown in FIG. 3, includes: RGBD camera, on-board processor, display screen, horn, differential two-wheel driver, and communication equipment (such as APP equipment) connected with the target follower;

the RGBD camera is used for acquiring a depth image, the depth image comprises a visible light image and corresponding depth data, the visible light image is a R, G, B three-channel color image, the depth data is the depth distance between the surface of each object in the three-dimensional space where the camera is located and the imaging plane of the camera, the width/height of the visible light image is colorW/colorH, the depth data is specifically the depth distance between the surface of each object corresponding to each pixel point on the visible light image and the imaging plane of the camera, the visible light image and the depth data are used for initial locking and tracking of a human body target, the RGBD camera is arranged at the middle position of the upper part of the front panel of the mobile device and is at a certain height from the ground, and in the embodiment of the invention, the height between the camera and the ground is preferably set to be 1 meter;

the on-board processor is used for starting a following process and processing the visible light image and the depth data acquired by the RGBD camera; running a follow-up program; solving a motion control speed, wherein the motion control speed is a control speed for a differential two-wheel driver, and specifically comprises a linear speed and an angular speed (v, omega) (a forward speed and a turning speed); and issuing a following state, wherein the following state specifically comprises the following steps of: (1) starting the following service (2), wherein the target locking is successful (3), the target is blocked (4), the target is lost (5), the target is locked again (6) and the following service is finished, wherein the state (1) reminds the user to stand in a specified area range in front of the device for locking the target, in the specific embodiment of the invention, the area range is 50 cm to 1 m in front of the device, the state (2) informs the user that normal walking can be started, the state (3) reminds the user that the device and the user have the blocking, the speed is reduced, the state (4) reminds the user that the target is lost, the target is required to be locked again, the state (5) informs the user to wait for the device to be locked again, and the state (6) informs the user that the following service is finished;

The display screen is used for displaying an operation interface, the operation interface comprises a two-dimension code and a clickable button, the screen is a touch screen and can be clicked manually, and meanwhile, a user can scan the two-dimension code on the screen through the App equipment and start following service on the App equipment;

the loudspeaker is used for broadcasting following states issued by the airborne processor, broadcasting different voice contents according to different following states, for example, in the specific embodiment of the invention, state (1) broadcasts that the following service is started, please stand in the range of 50 cm to 1 m in front of me for two seconds, and other states broadcast corresponding voice contents on the basis of comfortable user experience;

the differential two-wheel driver is used for driving the moving device to move, the driving wheel adopts a differential kinematic model, so that the moving device can move forwards, leftwards and rightwards, the following device can move forwards, leftwards and rightwards in the following process and cannot move backwards, the motion control speed calculated by the onboard processor is the mass center speed of the differential two-wheel driver, and the mass center speed is converted into the respective speeds of the two wheels to follow the differential kinematic model, so that the differential two-wheel driver is not repeated;

and the user can scan the two-dimensional code on the screen through the App device, start the follow-up service, display an operation interface on the App device, wherein the operation interface comprises a clickable button, and in addition, when the follow-up target is lost, the App device can vibrate to remind, and the App device performs information interaction with the airborne processor through wireless communication.

The basic operation flow of the mobile device is as follows: when a user starts the following service by clicking a screen button or an App device to scan a two-dimensional code on a screen, the airborne processor receives a control instruction for starting a following mode, starts a following process, instructs the user to stand in a specified range in front of the camera through loudspeaker voice broadcasting, enters a target locking mode, initializes a color model and a target position after the target is successfully locked, then enters a following moving mode, and issues a following state in real time in a human body following process.

2. RGBD camera-based target following method

The target following method based on RGBD camera of the invention comprises the following steps:

In order to more clearly describe the object following method based on the RGBD camera of the present invention, each step in one embodiment of the method of the present invention is described in detail below with reference to the accompanying drawings.

In step S100, a depth image of the surrounding environment is acquired as an input image by an RGBD camera provided on the mobile device.

Fig. 1 is a schematic flow chart of the target following method based on the RGBD camera according to the present invention, and a specific process is described in detail in the following steps.

In this embodiment, before the mobile device (or mobile equipment) starts to follow the target, the target is locked, that is, the target to be followed is acquired, and in other embodiments, the target to be followed may be given (predefined) manually, or may be acquired automatically by an intelligent device similar to a robot, and processed according to the actual situation. The steps of acquiring the target to be followed are shown in fig. 6 and 7, and the specific processing procedure is as follows:

step A100, acquiring a depth image, and extracting a region of interest set as a first region of interest set.

When a user starts a follow-up service by clicking a screen button or an App device scans a two-dimensional code on a screen, the user is reminded of a predetermined range in front of an RGBD camera, and a depth image of the surrounding environment is acquired through the RGBD camera provided on the mobile device (mobile apparatus). The depth image comprises a visible light image and depth data, wherein the depth data is the depth distance between the object surface corresponding to each pixel point of the visible light image and the imaging plane of the camera, and a depth distance set D, D= { D is constructed ₁ ,d ₂ ,…,d _N N=color×color, where d _N The depth distance of each pixel is represented. As shown in the camera coordinate system of fig. 4, the depth distance specifically refers to the z-axis coordinate, and the depth data set length is equal to the number of pixels of the visible light image. The final objective of this step is to segment the set of suspected human targets in the image, since the depth data is the distance of the object surface from the camera imaging plane, and moveThe equipment runs on indoor flat ground, the sight line of the camera is horizontal, and under normal conditions, pedestrians walk vertically, so the equipment can run at the radius R _d Clustering depth distance sets, in this embodiment radius R _d Preferably 40cm. Taking each pixel point as a clustering center, and clustering depth distance sets based on a set distance radius to obtain a clustering center set S= { S _d1 ,s _d2 ,…,s _dn The clustering method in the present invention is similar to the Mean shift clustering method (Mean shift algorithm) in that the clustered depth data is shown in fig. 8. The specific method comprises the following steps:

traversing the depth distance set D to D ₁ For the initial cluster center s _d1 Initializing a cluster center set S= { S _d1 If d _i At any cluster center S of S _dj Radius R of (2) _d The range is classified into the cluster, and the cluster center s is updated at the same time _dj ＝∑d _k +d _i /m+1,d _k ∈s _dj Wherein Σd _k For the cluster center s _dj The sum of the existing elements m is the number of the existing elements, if d _i The distance from all cluster centers in S is larger than R _d Then add d _i For a new cluster center s _di Update s= { S _d1 ,…,s _di Until set D is traversed, where i, j, k represent indices.

Traversing S, traversing depth data for each depth clustering center, wherein the depth data corresponds to pixels on the visible light image one by one, and the radius R of the clustering center is formed _d Pixels corresponding to depth data in the range are marked white, and the rest are marked black, so that a binarized image is obtained, as shown in fig. 5. Extracting the region of interest according to the binarized image, and constructing a region of interest set S _Img ＝{s _img1 ,s _img2 ,…s _imgm "m" represents a subscript, FIG. 9 shows a depth cluster center radius R after one frame of depth data is clustered in accordance with an embodiment of the present invention _d The region of interest resulting from the in-range marker image.

Step A200, calculating the coordinates of the width, height, center point and distance average value of each region of interest in the first region of interest set, and screening each region of interest according to the comparison of the width, height and the set height and shoulder width ranges of the human body to obtain a second region of interest set; the distance average value is an average value of depth distances of all pixel points in the region of interest.

In this embodiment, the average value of the distance (the average depth of the pixel set from the camera), the width of the region, the height of the region, and the coordinates of the center point (the physical coordinates of the rectangular center point of the region in the camera coordinate system) (x, y, z) of each region of interest are counted, and the first region of interest set is screened according to the general height range and the wide shoulder range of the human body to obtain the region of interest set meeting the ergonomic condition, which is used as the second region of interest.

The distance average value calculation method for each region of interest is shown in formula (1):

the physical coordinate calculation method of the circumscribed rectangle central point of the region of interest in the camera coordinate system is shown in formula (2):

wherein, (u, v) is the coordinates of the center of the circumscribed rectangle of the region of interest on the image, fx, fy are the scale factors of the visible light camera in the directions of the horizontal axis and the vertical axis, and (cx, cy) is the principal point of the visible light camera of the RGBD camera.

The calculation method of the width and the height of the region of interest is shown in the formula (3) (4):

width＝rect _width *averDepth/fx (3)

height＝rect _ytop *averDepth/fy (4)

wherein rect _width The pixel width circumscribing a rectangle for the region of interest, rect _ytop The region of interest is circumscribed by the row coordinates of the pixels of the upper left corner of the rectangle.

In the embodiment of the present invention, the height range of the human body is preferably set to (1.1 m to 1.9 m), and the shoulder width range is preferably set to (15 cm to 60 cm).

Step a300, if the second region of interest set is empty, executing step a100, otherwise, taking a target corresponding to a region of interest in the second set, where the distance between the center point coordinate and the vertical center line of the imaging plane of the camera is within a set threshold range, as a target to be followed, where the threshold is preferably set to ±25 centimeters.

And selecting the region of interest with the smallest distance average value (average depth from the camera) and the center point coordinate within the threshold range (the rectangular center circumscribed by the region is within a certain range around the vertical center line of the imaging plane of the camera) from the imaging plane of the camera as the target to be followed according to the obtained second region of interest set.

If the screened interested region set is empty, the target locking of the current frame is considered to be failed, the image is acquired again, and otherwise, the target locking of the current frame is considered to be successful.

In the invention, in order to further improve the success rate of target locking, multiple frames of depth images are continuously acquired for confirming the target to be followed. The specific process is as follows:

and (3) carrying out the processing on each frame of real-time visible light image and depth data acquired by the camera, wherein the previous frame of image is successfully locked, and the current frame of image is also successfully locked, namely, continuously locked target success, and ending the locking target mode when the continuously locked target success is greater than N seconds, notifying a user of 'successful target locking and service following' through horn broadcasting, and reminding the user of normal walking.

Each frame of successfully targeted image will result in an image region of interest, and continued targeting for more than N seconds will result in approximately N fps image regions of interest, where fps is the frame rate of the image, and in the present embodiment fps is 15 frames per second. And selecting the region of interest with the largest area in the outline of the region of interest as a final following target, initializing a target color model (color histogram) by using an image region contained in the region of interest on a visible light image, initializing a target position by using the physical coordinates of the circumscribed rectangular central point of the region of interest, and entering a following movement mode, wherein the target color model is defined as a statistical histogram of three channels of a visible light image R, G, B in the region of interest in the embodiment of the invention.

Step S200, extracting a region of interest of the input image, and constructing a region of interest set as a first set.

In this embodiment, the specific processing procedure of this step is as in step a100 described above. And will not be described in detail herein.

Step S300, the region of interest containing the human body in the first set is screened, and a second set is constructed.

In this embodiment, the specific processing procedure of this step is as in steps a200-a300, and the second set is obtained by comparing the width and height with the set height and shoulder width ranges of the human body, and if the second set is empty, the processing goes to step S700.

Step S400, calculating the distance between the coordinates of the central point of each region of interest in the second set and the coordinates of the target to be followed in the previous frame, and removing the corresponding region of interest if the distance is larger than the set target movement radius, so as to obtain a third set.

In this embodiment, the second set is traversed, and the distance between the physical coordinates of each region of interest and the position of the target to be followed in the previous frame, that is, the moving distance D 'of the target is calculated' _move The calculation process is shown in the formula (5):

wherein (x ', z') is the physical coordinates of the current region of interest, (x) _target ,z _target ) For the stored target position of the previous frame, the mobile device runs on the indoor flat ground, so that only x and z two-dimensional coordinates are needed.

Setting a target movement radius R by counting the maximum walking speed of a person according to the distance between the physical coordinates of each region of interest and the position of the target of the previous frame _move Screening each region of interest, if D' _move Less than R _move The region of interest is reserved, otherwise the region of interest is deleted, if the region of interest in the target movement range is not empty, the target is continuously screened, otherwise step S700 is skipped.

According to statistics, the maximum speed of walking of a person is about 1.2 m/s under normal conditions, because the image data is processed in real time, the frame frequency of the image is 15 frames/s in the embodiment of the invention, the distance that the person possibly moves in the period from the last frame of image to the current frame of image is about 0.08 m, based on the actual conditions, the region of interest can be screened according to the target movement range, and R is conservative in the embodiment of the invention _move Preferably 20 cm.

And S500, calculating the similarity between the color histogram corresponding to each region of interest in the third set and the pre-stored color histogram of the target to be followed, if the similarity is greater than a set threshold, using the color histogram as a possible candidate region, using the possible candidate region with the greatest similarity as a target region, and updating the target coordinates and the corresponding color histogram.

In this embodiment, according to the regions of interest screened in step S400, a target color model (color histogram) of each region of interest is obtained, and similarity comparison is performed between the target color model and the color histogram of the target to be tracked obtained during target locking, if the similarity is greater than a set threshold thresh _color And taking the corresponding region of interest as a possible candidate region, and selecting the target candidate region with the maximum similarity as a new target, namely the region of the target to be followed in the frame image. If there is no possible candidate region, step S700 is skipped.

Because the embodiment of the invention defines the target color model as the statistical histogram (color histogram) of the three channels of the visible light image R, G, B in the region of interest, the comparison value takes a value range of (0, 1), the more similar the two color histograms are, the closer the comparison value is to 1, In an embodiment of the invention, the threshold value thresh _color Preferably set to 0.3.

The position of a target to be followed in the frame image and a corresponding color histogram are acquired and updated, wherein the physical coordinates (x, y, z) of the circumscribed rectangular central point of the region of interest in a camera coordinate system are updated to the target position (x _target ,z _target )。

In this embodiment, the moving speed is calculated by adopting a dual-threshold method, and the motion switching value can be within a range interval instead of a single threshold by adopting dual-threshold control, which is beneficial to motion robustness and reduces motion oscillation. The moving speed is divided into a linear speed and an angular speed, wherein the linear speed calculation method formulas (6) (7) (8) (9) are shown as follows:

v＝v＜v _max ？v:v _max (7)

v＝v＞0v:0 (8)

wherein v is linear velocity, v _max For maximum linear velocity, dist _range To increase the linear velocity from 0 to v _max The required interval length, R _start 、R _stop The straight line starting distance threshold value and the straight line ending distance threshold value.

The angular velocity is calculated as shown in the formulas (10) and (11):

ω＝|ω|＜ω _max ？ω:sign(ω)*ω _max (11)

wherein ω is angular velocity, ω _max For maximum angular velocity x _start 、x _stop For the rotation start distance threshold value, the rotation end distance threshold value, x _range To increase angular velocity from 0 to omega _max The required interval length, x _start 、x _stop The rotation start distance threshold value and the rotation end distance threshold value.

In the present embodiment, v is as shown in FIG. 10 _max 、ω _max 、dist _range 、x _range R is determined according to the motion performance of the differential two-wheel driver _start 、R _stop X is determined according to the safety distance required to be kept between the mobile device and the pedestrian _start 、x _stop Determined from the RGBD camera field of view. Wherein v is _max Preferably set to 1 meter per second omega _max Preferably set to 0.6 radians per second, dist _range Preferably set to 0.5 m, x _range Preferably set to 0.4 meter, R _start Preferably set to 0.6 meter, R _stop Preferably set to 0.5 m, x _start Preferably set to 0.2 meter, x _stop Preferably set to 0.1 meter.

In the moving process, if the following target is shielded, the moving is stopped, and meanwhile, the user is reminded of shielding through voice broadcasting, and the walking speed is slowed down. The method for judging that the target is blocked is shown in fig. 11, the depth data point cloud is projected to the ZX plane to obtain a temporary map of the local environment where the device is located, the coordinates of the temporary map of the mobile device are known, whether other obstacles exist on the connecting line between the device and the center of the target or not is judged, if the obstacles exist, the device stops moving, and simultaneously, a user is reminded of blocking through voice broadcasting, and the walking speed is slowed down

And S700, counting the lost time length of the target, ending the following task if the time length is longer than the set time length, otherwise, jumping to the step S100.

In this embodiment, the region of interest after the filtering in the above step is empty, i.e. step S300-step S500, the target is considered to be lost, the previous frame of image target is lost, and the current frame of image target is also lost, which is referred to as continuous target loss, and when the continuous loss target is greater than M seconds, the target is considered to be lost, and the target mode is locked, where M is preferably set to 2 seconds in the embodiment of the present invention.

In practical situations, a user can walk suddenly, accelerate, turn suddenly and be blocked by other objects to lose a following target, the human body following method of the invention cannot ensure that the following target is not lost absolutely, but can find out in time after the following target is lost, prompt a voice broadcast, and under the cooperation of the user, the following service can still be normally carried out after the target is lost, and in addition, it can be determined that when the user walks normally, does not walk or jogged, turns normally, the density of obstacles in the environment is sparse (the space allows the mobile device to rotate in place as a reference standard, and the turning radius of the mobile device in the embodiment of the invention is 0.5 m), the situation that the target is lost basically can not occur.

When the following target is lost, the airborne processor broadcasts 'failure of following the target, please re-stand in the specified position area in front of the mobile device' through a loudspeaker, and meanwhile, the App equipment vibrates to remind, informs a user to re-stand in the specified position area in front of the mobile device to re-lock the target, and reminds the user of normal walking after the target is locked successfully, and continues to follow the mode. The human body following method based on the RGBD camera is switched between a target locking mode and a following movement mode according to the actual condition until the user clicks an App device or a button on a screen to finish the following service.

An object following system based on an RGBD camera according to a second embodiment of the present invention, as shown in fig. 2, includes: the device comprises an acquisition module 100, an interesting region extraction module 200, a first screening module 300, a second screening module 400, an acquisition target region module 500 and a control device movement module 600;

the acquiring module 100 is configured to acquire a depth image of the surrounding environment as an input image through an RGBD camera provided on the mobile device;

the region of interest extracting module 200 is configured to extract a region of interest of the input image, and construct a set of regions of interest as a first set;

The first screening module 300 is configured to screen the first set to include the region of interest of the human body, and construct a second set;

the second filtering module 400 is configured to calculate a distance between a coordinate of a center point of each region of interest in the second set and a coordinate of a target to be followed in a previous frame, and if the distance is greater than a set target movement radius, remove the corresponding region of interest to obtain a third set;

the target region obtaining module 500 is configured to calculate a similarity between a color histogram corresponding to each region of interest in the third set and a pre-stored color histogram of the target to be followed, and if the similarity is greater than a set threshold, the candidate region is used as a possible candidate region, and the possible candidate region with the greatest similarity is used as a target region;

the control device moving module 600 is configured to calculate a moving speed of the mobile device according to the center point coordinates of the target area by a dual-threshold method, and control the mobile device to move towards the target; after the movement, the acquisition module 100 is skipped until the following task ends.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working processes and related descriptions of the above-described system may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

It should be noted that, in the RGBD camera-based object following system provided in the foregoing embodiment, only the division of the foregoing functional modules is illustrated, in practical application, the foregoing functional allocation may be performed by different functional modules, that is, the modules or steps in the foregoing embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into a plurality of sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps related to the embodiments of the present invention are merely for distinguishing the respective modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device of a third embodiment of the present invention stores therein a plurality of programs adapted to be loaded by a processor and to implement the above-described RGBD camera-based target following method.

A processing device according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the above-described RGBD camera based target following method.

It can be clearly understood by those skilled in the art that the storage device, the specific working process of the processing device and the related description described above are not described conveniently and simply, and reference may be made to the corresponding process in the foregoing method example, which is not described herein.

Those of skill in the art will appreciate that the various illustrative modules, method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the program(s) corresponding to the software modules, method steps, may be embodied in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application, but such implementation is not intended to be limiting.

The terms "first," "second," and the like, are used for distinguishing between similar objects and not for describing a particular sequential or chronological order.

The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus/apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus/apparatus.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will fall within the scope of the present invention.

Claims

1. A target following method based on an RGBD camera, the method comprising:

the method for extracting the region of interest of the input image comprises the following steps:

the preset clustering method comprises the following steps:

traversing the depth distance set D to D ₁ For the initial cluster center s _d1 Initializing a cluster center set S= { S _d1 If d _i At any cluster center S of S _dj Is set to radius R _d The range is classified into the cluster, and the cluster center is updated at the same time; if d _i The distance from all cluster centers in S is larger than R _d Then add d _i For a new cluster center s _di Update s= { S _d1 ,…,s _di -until traversing set D; wherein i, j represents a subscript;

2. The RGBD camera based target following method according to claim 1, further comprising the step of acquiring the target to be followed before the mobile device starts to follow the target to be followed:

3. The RGBD camera based target following method according to claim 1, wherein the width and height of the region of interest are calculated by:

width＝rect _width *averDepth/fx

height＝rect _ytop *averDepth/fy

wherein width and height are the width and height of the region of interest, rect _width The pixel width circumscribing a rectangle for the region of interest, rect _ytop The pixel line coordinates of the upper left corner of the rectangle circumscribed by the region of interest are average values of distances, and fx and fy are scale factors of the visible light camera in the directions of the horizontal axis and the vertical axis.

4. The RGBD camera based target following method of claim 3, wherein the center point coordinates are physical coordinates of the region of interest in a camera coordinate system, and the calculating method is as follows:

where, (u, v) is the coordinates of the center of the circumscribed rectangle of the region of interest on the image, (cx, cy) is the principal point of the visible light camera of the RGBD camera, and (x, y, z) is the center point coordinates.

5. The RGBD camera based target following method of claim 4, wherein the moving speed includes a linear speed and an angular speed; the linear velocity is calculated by the following steps:

v＝v＜v _max ？v:v _max

v＝v＞0v:0

the angular velocity is calculated by the following steps:

ω＝|ω|＜ω _max ？ω:sign(ω)*ω _max

wherein v is linear velocity, v _max For maximum linear velocity, dist _range To increase the linear velocity from 0 to v _max The required interval length, R _start 、R _stop Is a straight-line start distance threshold value and a straight-line end distance threshold value, x _target 、z _target For the target position updated according to the center point coordinates of the target area, ω is the angular velocity, ω _max For maximum angular velocity x _start 、x _stop For the rotation start distance threshold value, the rotation end distance threshold value, x _range To increase angular velocity from 0 to omega _max The required interval length, x _start 、x _stop The rotation start distance threshold value and the rotation end distance threshold value.

6. The RGBD camera based target following method of any of claims 1 to 5, further comprising step S700 after step S600,

7. An RGBD camera-based target following system, the system comprising: the device comprises an acquisition module, an interesting region extraction module, a first screening module, a second screening module, a target region acquisition module and a control equipment movement module;

the preset clustering method comprises the following steps:

8. A storage device in which a plurality of programs are stored, characterized in that the program applications are loaded and executed by a processor to implement the RGBD camera based target following method of any of claims 1-6.

9. A processing device, comprising a processor and a storage device; a processor adapted to execute each program; a storage device adapted to store a plurality of programs; characterized in that the program is adapted to be loaded and executed by a processor to implement the RGBD camera based target following method of any of claims 1-6.