WO2017133453A1

WO2017133453A1 - Method and system for tracking moving body

Info

Publication number: WO2017133453A1
Application number: PCT/CN2017/071510
Authority: WO
Inventors: 王玉亮; 薛林; 王晓刚; 乔涛
Original assignee: 北京进化者机器人科技有限公司
Priority date: 2016-02-02
Filing date: 2017-01-18
Publication date: 2017-08-10
Also published as: CN105760824A; CN105760824B

Abstract

A method and system for tracking a moving body, comprising the following steps: receiving an audio signal; receiving a video signal; calculating the relative distance between an audio source and a system and the positive direction angle between the audio source and the system, determining whether the audio source is located in a photographing range of a camera; if a robot is stationary, employing a three-frame differencing algorithm to produce a movement area of a current frame, detecting whether a human body is present in the movement area, and issuing an action instruction on the basis of the size and position of the person being followed in a current video frame; if the robot is moving, predicting a movement area of the person being followed in the current frame, identifying the human body with respect to the predicted movement area, and issuing an action instruction on the basis of the size and position of the person being followed in the current video frame. The method and system effectively detect the position of the person being followed in the field of view of the camera of an autonomously moving robot platform, thus solving the problem of failed tracking when the body of the person being tracked is partly blocked, effectively implementing the tracking of movements of the person being followed, and reducing costs.

Description

Method and system for tracking human body

The present invention claims priority to Chinese Patent Application No. 201610073052.0, filed on Jan. 02, 2016, entitled "S.

Technical field

The present invention relates to the field of automation technologies, and in particular, to a method and system for tracking a moving human body.

Background technique

In recent years, as a high-tech technology, robotics has gradually penetrated into every aspect of our lives. From the production workshop to the hospital, the role played by robots is immeasurable. Traditional industrial robots are suitable for structured environments and perform repetitive tasks. Modern robots hope to work together with humans in the same unstructured space and environment to perform non-deterministic tasks online in real time. The field has emerged from the fixed-point operations in the structural environment, and has developed into autonomous operations in aerospace, interstellar exploration, military reconnaissance attacks, underwater underground pipelines, disease inspection and treatment, disaster relief and other non-structural environments; Input and single-ended output systems, while modern robots are multi-input and multi-end output systems; traditional robots are far less than in smart homework, online perception, understanding of human behavior and abstract commands, cognitive and decision-making capabilities, etc. People can't communicate and communicate effectively with people. Future robots will work for humans in known or unknown environments where humans cannot or are difficult to reach, many of which are based on robotic human recognition and follow-up functions. Therefore, in order to meet people's higher needs and improve human-computer interaction technology, robot human body recognition and follow-up function is an urgent problem to be solved.

The related technologies of human follow-up research mainly include three aspects: the detection of the followed person, the tracking of the followed person, and the obstacle avoidance of the robot in the following process. There are many organizations in the world studying the human body recognition and following methods. Some are based on RGBD sensors (such as Kinect, Xtion and Obi LIGHT) for human body following mobile robot control systems; in addition, special equipment rooms are also widely used for identification. Target person. An intelligent environment detector is used to detect the surrounding environment of the robot, realize human recognition and stably follow the human body. In the experiment, the University of Tokyo set up multiple laser distance sensors into a system that recognizes the human leg and tracks pedestrians, or sets three laser distance sensors to detect the human body's legs, upper body and head, respectively. Tracking of the human body, but these devices are fixed.

The above methods have drawbacks in practical applications. The RGBD sensor has the following disadvantages: 1) the target person cannot be occluded; 2) is not suitable for the mobile platform; 3) the special equipment room is also expensive and the robot's range of motion is limited. Although the laser ranging sensor has a wide measuring angle, if it is used to identify a person's leg, it will be difficult for the robot to determine which two feet are the target person and also for the woman wearing the skirt.

Summary of the invention

The invention provides a moving human body tracking method and system, which can effectively detect the position of a follower in the field of view of the camera of the autonomous mobile robot platform through the combination of the sound localization technology and the frame difference method technology and the human body detection technology; Optical flow method or particle filter and Kalman filter and other visual-based moving object tracking methods realize the tracking of the following people's motion, which solves the problem that the following person is occluded to a certain extent, and ensures the continuous tracking of the target by the robot; And by controlling the movement of the autonomous mobile robot platform to achieve the follow-up of the following person; the ordinary camera used as the following sensor reduces the system cost and effectively solves the problem of expensive use of other sensors.

The technical solution of the present invention provides a method for tracking a moving human body, comprising the following steps:

S101: The system collects time information of the sound signal and the sound reaching the respective positions, and sends the time information to the central controller;

S102: determining whether it is a "following instruction", if not, returning to S101;

S103: calculating a relative distance between the sound source and the system and an angle β between the sound source and the positive direction of the system;

S104: determining whether the sound source is located in the imaging range of the camera, and if so, then turning to S106;

S105: the pan/tilt rotates by a β angle;

S106: The central controller controls to turn on the color camera for video capture;

S107: analyzing three consecutive frames of images by using a three-frame difference method to obtain a motion region of the current frame;

S108: determining whether the active area meets the requirements, if it is less than the lower threshold, then go to S101, if it is greater than the upper threshold, then go to S106;

S109: Extract a motion area that meets the requirements in the current video frame;

S110: The human body detector determines whether the human body is detected according to the human body detection classifier obtained by offline training, and if not, then proceeds to S101;

S111: obtaining an active area of the tracked person;

S112: determining whether the angle between the active area of the person and the system is matched with the angle of the sound source relative to the system, if less than the threshold, then moving to S101;

S113: determining an activity area of the tracked person;

S114: extract human body features of the currently tracked person, and train the target human body identifier, including but not limited to color, texture, edge contour, and size;

S115: Send an action instruction according to the size and position of the current video frame by the followed person;

S116: determining whether a "stop following" command is received, and if yes, proceeding to S124;

S117: color camera for video capture;

S118: predicting an active area of the tracked person in the current frame;

S119: determining whether the prediction is successful, if the prediction fails, then moving to S122;

S120: Perform human body recognition on the predicted tracking person active area by using the target human body recognizer;

S121: determining whether the human body recognition is successful, if successful, then moving to S123;

S122: the robot stops moving and turns to S106;

S123: extracting the human body features of the tracked person, updating the human body identifier, and turning to S115;

S124: End.

Further, the autonomous mobile robot detects the sound signal through the sound sensor;

Four sound sensors are distributed around the periphery of the autonomous mobile robot, and one sound sensor is located at the top of the pan/tilt;

The five arrayed sound sensors are fixed and do not move with the gimbal movement.

Further, the angle β is an angle between the sound source and the positive direction of the system;

The β value is positive in the clockwise direction and negative in the counterclockwise direction.

Further, in step S104, the determining whether the sound source is located in the imaging range of the camera, and if not, proceeding to S106, further comprising:

The horizontal angle of view of the color camera is α;

If |β|<α/2-θ, the sound source is located in the imaging range of the camera;

If |β|>=α/2-θ, the sound source is outside the imaging range of the camera;

The θ is a threshold to ensure that the sound source can be completely within the field of view of the color camera.

Further, in step S110, the human body detector determines whether the human body is detected according to the feature value of the object, and further includes:

Adopting the HOG and HAAR features or adopting the DPM model, adopting the SVM learning method or the Adaboost learning method, training the human body model offline, and generating a human body detection classifier;

Whether or not the human body is detected is determined by the human body detection classifier.

Further, in step S115, the issuing an action instruction according to the size and position of the current video frame by the followed person further includes:

The size change of the motion area in the current video frame corresponds to the distance of the tracked person from the camera;

The change in position in the current video frame corresponds to the change in the azimuth angle of the tracked person in the positive direction of the system;

The direction of motion of the followed person is determined based on the change in the size of the motion region and the change in position in the current video frame.

Further, in step S118, the predicting the active area of the tracked person in the current frame further includes:

Predicting according to the human body characteristics of the tracked person extracted in the previous frame;

The method for predicting a location includes a single tracking algorithm and a fusion algorithm;

The single tracking algorithm includes an optical flow method, a particle filter tracking algorithm, and a Kalman filter tracking algorithm;

The fusion algorithm uses multiple tracking algorithms to improve the effectiveness of the algorithm.

The technical solution of the present invention further provides a sports human body tracking system, comprising: a central controller unit, a sound sensor unit, a camera unit, a motion unit, a pan/tilt, wherein

The central controller unit is configured to analyze the sound signal, process the video information, control the rotation of the gimbal, calculate the position of the autonomous mobile robot and the motion track of the followed person, and issue a control command to the motion unit;

The sound sensor unit is configured to receive the sound signal and transmit the sound information to the central controller unit;

The camera unit is configured to obtain image information of an environment in which the autonomous mobile robot platform is located, and send an image signal to the central controller unit;

The motion unit is configured to receive a control command and perform motion;

The pan/tilt rotates according to the command of the central control unit to adjust the camera shooting angle.

Further, the autonomous mobile robot is equipped with five sound sensors;

4 sound sensors are distributed around the periphery of the autonomous mobile robot, and a sound sensor is located at the top of the pan/tilt;

Further, the camera unit and the central controller unit are located in the head of the autonomous mobile robot;

The pan/tilt can be rotated 360 degrees freely to ensure that the camera is at an appropriate angle;

The Yuntai is located above the autonomous mobile robot.

The technical solution of the invention can effectively detect the position of the followed person in the field of view of the camera of the autonomous mobile robot platform through the combination of the sound localization technology and the frame difference method technology and the human body detection technology; and further adopts the optical flow method or the particle filter based on The visual moving object tracking method realizes the tracking of the following person's movement, and solves the problem that the followed person is occluded to a certain extent, It proves that the robot keeps track of the target; and controls the movement of the autonomous mobile robot platform to achieve follow-up by the follower; the common camera is used as the follow-up sensor, which reduces the system cost and effectively solves the problem of expensive use of other sensors.

Other features and advantages of the invention will be set forth in part in the description which follows. The objectives and other advantages of the invention may be realized and obtained by means of the structure particularly pointed in the appended claims.

The technical solution of the present invention will be further described in detail below through the accompanying drawings and embodiments.

DRAWINGS

The drawings are intended to provide a further understanding of the invention, and are intended to be a In the drawing:

1 is a flowchart of a method for tracking a moving human body according to Embodiment 1 of the present invention;

2 is a structural diagram of a moving human body tracking system according to Embodiment 1 of the present invention.

detailed description

The preferred embodiments of the present invention are described with reference to the accompanying drawings, which are intended to illustrate and illustrate the invention.

FIG. 1 is a flowchart of a method for tracking a moving human body according to Embodiment 1 of the present invention. As shown in Figure 1, the process includes the following steps:

S101: The system collects time information of the sound signal and the sound reaching the respective positions, and sends the time information to the central controller.

The autonomous mobile robot detects the sound signal through the sound sensor;

S102: It is judged whether it is a "following instruction", and if not, it returns to S101.

A signal from the sound sensor is received by the central control unit to identify if it is a "follow command."

S103: Calculate the relative distance between the sound source and the system and the angle β between the sound source and the positive direction of the system.

Receiving a signal from the sound sensor by the central control unit, calculating a relative distance between the sound source and the system and an angle β between the sound source and the positive direction of the system;

Step S104: It is determined whether the sound source is located in the imaging range of the camera, and if yes, the process proceeds to step S106.

The wide angle of the color camera is α;

If |β|>=α/2-θ, the sound source is outside the imaging range of the camera;

S105: The pan/tilt rotates by β angle.

S106: The central controller controls to turn on the color camera for video capture.

S107: analyzing three consecutive frames of images by using a three-frame difference method to obtain a motion region of the current frame.

S108: It is determined whether the active area meets the requirement. If it is less than the lower threshold, the process proceeds to S101, and if it is greater than the upper threshold, the process proceeds to S106.

S109: Extract a motion area that meets the requirements in the current video frame.

S110: The human body detector determines whether the human body is detected according to the human body detection classifier obtained by offline training, and if not, then proceeds to S101.

S111: Obtain an active area of the tracked person.

S112: determining whether the angle between the active area of the person and the system is matched with the angle of the sound source relative to the system, If it is less than the threshold, it goes to S101.

S113: Determine an activity area of the tracked person.

S114: Extract human body features of the currently tracked person, and train a human body tracker, including but not limited to color, texture, edge contour, and size.

S115: Issue an action instruction according to the size and position of the current video frame by the followed person.

S116: It is judged whether the "stop following" command is received, and if yes, the process proceeds to S124.

S117: Color camera for video capture.

S118: predict the active area of the tracked person in the current frame.

The optical flow method estimates a position velocity field using a gray scale change of an image sequence with respect to time (t) and space (x, y);

The particle filter tracking algorithm first performs feature extraction on the extracted motion region of the current video frame, and then approximates the feature probability density function by searching a set of random samples propagating in the state space, and replaces the sample feature mean value. Integral operation to obtain a state minimum variance distribution, that is, the position of the followed person in the next video frame;

S119: It is judged whether the prediction is successful, and if the prediction fails, the process proceeds to S122.

S120: Perform human body recognition on the predicted tracking person active area by using the target human body recognizer.

S121: Determine whether the human body recognition is successful, and if successful, turn to S123.

S122: The robot stops moving and turns to S106.

S123: Extract the human body features of the tracked person, update the human body recognizer, and turn to S115.

S124: End.

In order to implement the above process of updating the server code, the embodiment further provides a moving human body tracking system, and FIG. 2 is a structural diagram of the moving human body tracking system according to the first embodiment of the present invention. As shown in FIG. 2, the system includes: a central controller unit 201, a sound sensor unit 202, a camera unit 203, a motion unit 204, and a cloud platform 205, wherein

The motion unit is configured to receive a control command and perform motion;

Further, the autonomous mobile robot is equipped with five sound sensors;

The Yuntai is located above the autonomous mobile robot.

The technical solution of the invention can effectively detect the following person in the field of view of the camera of the autonomous mobile robot platform by combining the sound localization technology and the frame difference method technology and the human body detection technology. The position is further controlled by the optical-flow method or particle filter to realize the tracking of the following person's motion, which solves the problem that the following person is occluded to a certain extent, and ensures the continuous tracking of the target by the robot. And by controlling the movement of the autonomous mobile robot platform to achieve the follow-up of the following person; the ordinary camera used as the following sensor reduces the system cost and effectively solves the problem of expensive using other sensors.

Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or automation device product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of an electronic device product embodied in one or more of the automated devices.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowcharts and/or FIG. These electronic devices, computer program instructions, or electronic device devices can be provided to general purpose electronic devices, specialized electronic devices, accessory electronic devices, or other types of electronic devices to produce an automated device machine for processing by a computer or other programmable data processing device. The instructions executed by the apparatus generate means for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.

The electronic devices, computer program instructions or electronic device devices can also be used in a readable memory of an automation device capable of directing a computer or other programmable data processing device to operate in a particular manner, such that instructions stored in the automation device readable memory An article of manufacture comprising instruction means is implemented which implements the functions specified in a block or blocks of a flow or a flow and/or a block diagram of the flowchart.

These electronic devices, computer program instructions or electronic device devices can also be loaded onto an automation device or other programmable data processing device such that a series of operational steps are performed on an automated or other programmable device to produce an automated process whereby the automated device Or instructions executed on other programmable devices are provided for implementing one or more processes and/or block diagrams in the flowchart The steps of a function specified in a box or multiple boxes.

It is apparent that those skilled in the art can make various modifications and variations to the invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and modifications of the invention

Claims

A method for tracking a moving human body, comprising the steps of:

S101: The system collects time information of the sound signal and the sound reaching the respective positions, and sends the time information to the central controller;

S102: determining whether it is a "following instruction", if not, returning to S101;

S103: calculating a relative distance between the sound source and the system and an angle β between the sound source and the positive direction of the system;

S104: determining whether the sound source is located in the imaging range of the camera, and if so, then turning to S106;

S105: the pan/tilt rotates by a β angle;

S106: The central controller controls to turn on the color camera for video capture;

S107: analyzing three consecutive frames of images by using a three-frame difference method to obtain a motion region of the current frame;

S108: determining whether the active area meets the requirements, if it is less than the lower threshold, then go to S101, if it is greater than the upper threshold, then go to S106;

S109: Extract a motion area that meets the requirements in the current video frame;

S110: The human body detector determines whether the human body is detected according to the human body detection classifier obtained by offline training, and if not, then proceeds to S101;

S111: obtaining an active area of the tracked person;

S112: determining whether the angle between the active area of the person and the system is matched with the angle of the sound source relative to the system, if less than the threshold, then moving to S101;

S113: determining an activity area of the tracked person;

S114: extract human body features of the currently tracked person, and train the target human body identifier, including but not limited to color, texture, edge contour, and size;

S115: Send an action instruction according to the size and position of the current video frame by the followed person;

S116: determining whether a "stop following" command is received, and if yes, proceeding to S124;

S117: color camera for video capture;

S118: predicting an active area of the tracked person in the current frame;

S119: determining whether the prediction is successful, if the prediction fails, then moving to S122;

S120: Perform human body recognition on the predicted tracking person active area by using the target human body recognizer;

S121: determining whether the human body recognition is successful, if successful, then moving to S123;

S122: the robot stops moving and turns to S106;

S123: extracting the human body features of the tracked person, updating the human body identifier, and turning to S115;

S124: End.
The method of claim 1 further comprising:

The autonomous mobile robot detects the sound signal through the sound sensor;

Four sound sensors are distributed around the periphery of the autonomous mobile robot, and one sound sensor is located at the top of the pan/tilt;

The five arrayed sound sensors are fixed and do not move with the gimbal movement.
The method of claim 1 further comprising:

The angle β is an angle between the sound source and the positive direction of the system;

The β value is positive in the clockwise direction and negative in the counterclockwise direction.
The method according to claim 1, wherein in step S104, the determining whether the sound source is located in the imaging range of the camera, and if not, proceeding to S106, further comprising:

The horizontal angle of view of the color camera is α;

If |β|<α/2-θ, the sound source is located in the imaging range of the camera;

If |β|>=α/2-θ, the sound source is outside the imaging range of the camera;

The θ is a threshold to ensure that the sound source can be completely within the field of view of the color camera.
The method according to claim 1, wherein in step S110, the human body detector determines whether the human body is detected according to the feature value of the object, and further includes:

Adopting the HOG and HAAR features or adopting the DPM model, adopting the SVM learning method or the Adaboost learning method, training the human body model offline, and generating a human body detection classifier;

Whether or not the human body is detected is determined by the human body detection classifier.
The method according to claim 1, wherein in step S115, the issuing an action instruction according to the size and position of the current video frame by the followed person further comprises:

The size change of the motion area in the current video frame corresponds to the distance of the tracked person from the camera;

The change in position in the current video frame corresponds to the change in the azimuth angle of the tracked person in the positive direction of the system;

The direction of motion of the followed person is determined based on the change in the size of the motion region and the change in position in the current video frame.
The method according to claim 1, wherein in step S118, the predicting the active area of the tracked person in the current frame further comprises:

Predicting according to the human body characteristics of the tracked person extracted in the previous frame;

The method for predicting a location includes a single tracking algorithm and a fusion algorithm;

The single tracking algorithm includes an optical flow method, a particle filter tracking algorithm, and a Kalman filter tracking algorithm;

The fusion algorithm uses multiple tracking algorithms to improve the effectiveness of the algorithm.
A sports human body tracking system, comprising: a central controller unit, a sound sensor unit, a camera unit, a motion unit, a pan/tilt, wherein

The central controller unit is configured to analyze the sound signal, process the video information, control the rotation of the gimbal, calculate the position of the autonomous mobile robot and the motion track of the followed person, and issue a control command to the motion unit;

The sound sensor unit is configured to receive the sound signal and transmit the sound information to the central controller unit;

The camera unit is configured to obtain image information of an environment in which the autonomous mobile robot platform is located, and send an image signal to the central controller unit;

The motion unit is configured to receive a control command and perform motion;

The pan/tilt rotates according to the command of the central control unit to adjust the camera shooting angle.
The system of claim 8 further comprising:

The autonomous mobile robot is equipped with five sound sensors;

4 sound sensors are distributed around the periphery of the autonomous mobile robot, and a sound sensor is located at the top of the pan/tilt;

The five arrayed sound sensors are fixed and do not move with the gimbal movement.
The system of claim 8 further comprising:

The camera unit and the central controller unit are located in the head of the autonomous mobile robot;

The pan/tilt can be rotated 360 degrees freely to ensure that the camera is at an appropriate angle;

The Yuntai is located above the autonomous mobile robot.