CN117685980A

CN117685980A - Multi-camera positioning method, system and medium for indoor robot

Info

Publication number: CN117685980A
Application number: CN202311713952.3A
Authority: CN
Inventors: 章城骏; 刘茴香; 唐华锦; 袁孟雯; 吴迅冬; 杨博; 潘纲
Original assignee: Zhejiang University ZJU; Zhejiang Lab
Current assignee: Zhejiang University ZJU; Zhejiang Lab
Priority date: 2023-12-13
Filing date: 2023-12-13
Publication date: 2024-03-12

Abstract

The invention relates to a multi-camera positioning method, a multi-camera positioning system and a multi-camera positioning medium for an indoor robot, wherein the multi-camera positioning method comprises the following steps: acquiring an image acquired by an image acquisition device and performing image processing according to scene requirements; marking and connecting the outline and the key points of the target robot according to the processed target image, extracting a target frame by using marked key point information based on a key point detection model, and outputting the positions of the robot and the key points in the current image according to the target frame; according to the position information of the target frame and the key points, calculating the position information of the robot under the world coordinate system through the conversion of the coordinate system; calculating a yaw angle by using Euler angles based on the position information of key points of the robot under the world coordinate system, and outputting the head orientation of the robot; and predicting the positions and the head orientations of the key points by using Kalman filtering, so as to realize the accurate positioning of a plurality of indoor moving robots. Compared with the prior art, the method has the advantages of high target identification accuracy, good real-time performance and reliability and the like.

Description

Multi-camera positioning method, system and medium for indoor robot

Technical Field

The invention relates to the technical field of robot positioning, in particular to an indoor robot multi-camera positioning method, system and medium based on key point detection.

Background

Robots are automated equipment with multiple capabilities of sensing, decision making, executing and the like, and with the rapid development of intelligent manufacturing and information technology, robots are widely applied to the fields of industry, medical treatment, military, service industry and the like. The indoor mobile robot is an autonomous robot for executing specific tasks in an indoor environment, and accurate positioning is an important precondition for ensuring that the mobile robot can accurately sense and navigate the surrounding environment.

The robot positioning technology refers to a process of determining the position and the posture of a robot by utilizing respective advanced technologies and algorithms in an unknown environment. The existing indoor robot positioning technology mainly comprises methods such as laser radar, ultrasonic ranging, wireless signal positioning, inertial navigation positioning, visual positioning and the like.

In the positioning method, the laser radar positioning technology obtains high-precision distance information through the laser sensor, has good instantaneity and stability, but has high deployment cost, is sensitive to ambient light, and lacks semantic information; the ultrasonic ranging is suitable for short-distance positioning and obstacle avoidance, has strong penetrating power and low application cost, but cannot accurately describe the position of a target object under the influence of sound wave propagation and reflection; the wireless signal positioning technology can automatically update data under the condition of less equipment and cost, has a longer communication distance, but is easy to be propagated and interfered by other signals under a complex environment; the inertial navigation positioning technology utilizes data such as acceleration, angular velocity and the like acquired by an inertial sensor (IMU) to calculate the position and the gesture of an object, the method has good instantaneity and does not depend on external environment, but the calculating method is based on time integration, has high precision and stability in a short time, and navigation positioning errors are continuously increased along with the accumulation of time; the vision technology positioning uses a camera as a sensor, and is positioned by an image processing technology, so that the precision is high, but the vision technology positioning is easily influenced by illumination conditions, texture distinction, shielding and the like.

Disclosure of Invention

The invention aims to provide a multi-camera positioning method, a multi-camera positioning system and a multi-camera positioning medium for an indoor robot based on key point detection, which utilize key point information to realize accurate positioning of the indoor robot, facilitate subsequent navigation and have good real-time performance and reliability.

The aim of the invention can be achieved by the following technical scheme:

an indoor robot multi-camera positioning method based on key point detection comprises the following steps:

s1, acquiring images acquired by an image acquisition device arranged on an indoor robot platform and performing image processing according to different scene requirements;

s2, marking and connecting the outline and the key points of the target robot according to the processed target image, extracting a target frame by using marked key point information based on a key point detection model, and outputting the positions of the robot and the key points in the current image according to the target frame;

s3, calculating the position information of the target frame and the key points of the robot under the world coordinate system through conversion of the coordinate system according to the position information of the target frame and the key points;

s4, calculating a yaw angle by using Euler angles based on the position information of key points of the robot in the world coordinate system, and outputting the head orientation of the robot;

s5, predicting the positions and the head orientations of the key points by utilizing Kalman filtering, and realizing accurate positioning of a plurality of indoor moving robots.

The image acquisition device is one or more vertical overlook depth cameras, and the depth cameras are independent, non-interfering and mutually cooperative.

The step S1 specifically comprises the following steps: and determining a spliced image using a single view image or multiple views according to the specific application scene according to the first view and the images of multiple views acquired by the depth camera of the image acquisition device.

The step S2 includes the steps of:

s21, marking and connecting key points on the robot according to the processed image, and sequentially connecting the key points according to the central point, the left top point in front of the target, the right top point in front of the target and the central point in the marking process;

s22, carrying out data enhancement on the image so as to adapt to different illumination scenes;

s23, extracting image features through a convolutional neural network backbone structure by utilizing a key point detection model based on a convolutional neural network, determining the position and the size of a target frame in an output feature map, and predicting the key point position of a target according to the number and the type of key points by using the key point detection model for each pixel point in the target frame;

s24, training the key point detection model, and then outputting a target frame and key point positions, wherein the model extracts the target frame based on the labeling information, adaptively compensates according to the target frame to obtain the robot key point positions, calculates a loss function by using the labeling information in the training stage, and updates model parameters through back propagation;

and S25, screening and non-maximum suppression are carried out on the output of the key point detection model based on the confidence coefficient, and a final target frame and a final key point position are obtained.

The position information output by the key point detection model is based on a coordinate vector of a pixel coordinate system UV, and the unit is pixel; for each camera i, there is no relative motion between the cameras.

The transformation of the coordinate system in step S3 involves the world coordinate system X _W Y _W Z _W Camera coordinate system X _C Y _C Z _C The coordinates of the position of the corner point or the key point of the target frame in the ith camera obtained in the step S2 are set as (u, v) and the coordinate system of the image is set as XY and the coordinate system of the pixel is set as UV, and the coordinate system of the image is set as X _i Y _i The corresponding point in (c) is denoted (x, y),

the relationship between the image coordinate system and the pixel coordinate system is

Wherein (x) _0i ，y _0i ) Representing the coordinates of the origin (0, 0) of the image coordinate system mapped on the pixel coordinate plane, dx and dy representing the physical size of the single pixel point mapped on the image coordinate plane, the pixel coordinate system U _i V _i To an image coordinate system X _i Y _i The conversion relation of (2) is:

from an image coordinate system X _i Y _i To the camera coordinate system X _Ci Y _Ci Z _Ci Is a 2D-to-3D process, camera coordinate system X _Ci Y _Ci Z _Ci The inner target center point is denoted as (x _c ，y _c ，z _c ) According to the image coordinate system X _i Y _i Center points (x, y) and (x) of the target _c ，y _c ，z _c ) The projection ratio of (2) can be obtained:then image coordinate system X _i Y _i To the camera coordinate system X _Ci Y _Ci Z _Ci The conversion relation is as follows:

where f is the camera focal length, z _c For camera depth, K _i An internal reference matrix K representing the ith camera _i The method comprises the steps of obtaining through calibration of a Zhengyou chessboard;

from camera coordinate system X _Ci Y _Ci Z _Ci To world coordinate system X _W Y _W Z _W Rotation matrix for conversion of (a)Translation vector to describe:

wherein R is _i For rotating matrix, T _i Representing an extrinsic matrix of an ith camera as a translation vector;

and converting the position information of the robots captured by all cameras in the pixel coordinate system into the same world coordinate system through the conversion relation, wherein the world coordinate system is an inertial system or a global coordinate system, so that the position information of the single image or the spliced image under the same coordinate system is determined.

The key point position information comprises coordinates of a target center point and left and right vertexes in front of the target.

The step S4 specifically includes: calculating the target head orientation through the key point position under the world coordinate system, wherein the central point of the vertex in front of the target is a new central point coordinate, and the new central point pointed by the target central point is the target head orientation; yaw angle calculation of the robot using the Euler angle, the robot being located in Z in the world coordinate system _W The yaw angle is the rotation angle of the robot about the z-axis.

The step S5 specifically comprises the following steps: initializing a Kalman filter, wherein the Kalman filter comprises a state vector, a state estimation, a covariance matrix and a measurement matrix; and at the current moment, predicting the position of the target at the next moment by using the state transition matrix and the covariance matrix, taking the target position output by the key point detection model as an actual observation value, combining the prediction value with the observation value during updating to obtain the accurate position at the next moment, and realizing real-time positioning.

An indoor robot multi-camera positioning system based on keypoint detection, comprising:

a plurality of indoor mobile robots;

the image acquisition device is used for acquiring image information;

a computing device comprising a memory, a processor, and a program stored in the memory, the processor implementing the method as described above when executing the program.

A storage medium having stored thereon a program which when executed performs a method as described above.

Compared with the prior art, the invention has the following beneficial effects:

1) The image acquisition device of the invention vertically shoots images by utilizing a plurality of depth cameras, uses a single image or spliced images according to application scenes, can acquire a larger visual field, has no relative motion between the cameras, and does not interfere with each other and cooperates with each other.

2) The positioning method of the indoor robot adopts a neural network technology based on key point detection, can accurately acquire the position information of the target robot, and then obtains the position information of the target robot under a global coordinate system in the system through coordinate system conversion.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of key points of a target robot in one embodiment;

FIG. 3 is a schematic diagram of coordinate system conversion in one embodiment;

FIG. 4 is a diagram of a direction resolution structure in one embodiment;

FIG. 5 is a flow chart of Kalman filtering prediction in one embodiment.

Detailed Description

The invention will now be described in detail with reference to the drawings and specific examples. The present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

Example 1

The embodiment provides an indoor robot multi-camera positioning method based on key point detection, as shown in fig. 1, comprising the following steps:

s1, acquiring images acquired by an image acquisition device arranged on an indoor robot platform and performing image processing according to different scene requirements.

In this embodiment, an image acquisition device is built on the indoor robot platform, the image acquisition device is one or more vertical overlooking depth cameras, after the device acquires respective view angle images, a single view image or a spliced image (obtained by splicing a plurality of view images) is to be used according to a specific application scene, so that a larger view is obtained, and the depth cameras are independent, do not interfere with each other and cooperate with each other.

And S2, marking and connecting the outline and the key points of the target robot according to the processed target image, extracting a target frame by using marked key point information based on a key point detection model, and outputting the positions of the robot and the key points thereof in the current image according to the target frame.

Specifically, step S2 includes the steps of:

Specifically, in step S25, according to the detection result of the position of the key point at the current moment, the detection results with the confidence coefficient greater than or equal to 0.4 are retained, and the detection results are subjected to non-maximum suppression, so as to obtain the final result including the position information of the four corner points of the target frame and the three target key points (the coordinates of the target center point and the 2 target front vertices).

Fig. 2 is a schematic diagram of each key point of the indoor robot, which shows the positions of the robot and its key points in the current RGB image output by the key point detection network.

And S3, calculating the position information of the target frame and the key point of the robot under the world coordinate system through conversion of the coordinate system according to the position information of the target frame and the key point.

The position information output by the key point detection model is based on a coordinate vector of a pixel coordinate system UV, and the unit is pixel; the position information of the target frame corner point and the key point of the robot under the world coordinate system can be calculated through the conversion of the coordinate system, wherein the conversion relation of the coordinate system comprises the world coordinate system X _W Y _W Z _W Camera coordinate system X _C Y _C Z _C For each camera i there is no relative motion between the cameras, an image coordinate system XY and a pixel coordinate system UV.

As shown in fig. 3, the present embodiment takes the ith camera internal target center point (u, v) as an example, and the image coordinate system X _i Y _i The target center point is expressed as (x, y), and the conversion of other points can be also referred to as the following method:

the relationship between the image coordinate system and the pixel coordinate system is:

from camera coordinate system X _Ci Y _Ci Z _Ci To world coordinate system X _W Y _W Z _W Is described by a rotation matrix and translation vector:

wherein R is _i For rotating matrix, T _i The extrinsic matrix of the ith camera is represented as a translation vector.

And S4, calculating a yaw angle by using Euler angles based on the position information of the key points of the robot in the world coordinate system, and outputting the head orientation of the robot.

Specifically, the target head orientation can be calculated through the key point position under the world coordinate system, as shown in fig. 4, which is a head orientation analysis structure diagram of the indoor robot positioning system, the key point position information comprises a target center point and two vertexes, the center point of the vertex in front of the target is used as a new center point coordinate, and the new center point pointed by the target center point is used as the target head orientation; yaw angle calculation of the robot using the Euler angle, the robot being located in Z in the world coordinate system _W The yaw angle is the rotation angle of the robot about the z-axis.

The calculation method of the euler angle belongs to a common means for a person skilled in the art, and is not described herein for the purpose of avoiding ambiguity in the present application.

Specifically, initializing a Kalman filter, which comprises a state vector, a state estimation, a covariance matrix and a measurement matrix; and at the current moment, predicting the position of the target at the next moment by using the state transition matrix and the covariance matrix, taking the target position output by the key point detection model as an actual observation value, combining the prediction value with the observation value during updating to obtain the accurate position at the next moment, and realizing real-time positioning.

In a preferred embodiment, as shown in fig. 5, the steps of continuously predicting the key point position and the head orientation information by using the kalman filter to realize the accurate positioning of a plurality of indoor moving robots are as follows: detecting whether a target exists or not, if yes, outputting target information at the current moment including the position of a target frame, the position of a key point on the target and the head orientation by using an algorithm, forming a state vector at the current moment, and initializing a Kalman filter, wherein the state vector comprises a state vector, a state estimation, a covariance matrix and a measurement matrix; when predicting, at the current moment, predicting the position of the target at the next moment by using a state transition matrix and a covariance matrix, taking the target position output by the key point detection model as an actual observation value, calculating the residual error and covariance of the predicted value and the actual observation value, when updating, fusing the predicted state with the observation value, calculating the Kalman gain, updating the state estimation and the state covariance matrix at the next moment by using the Kalman gain, and acquiring the target information at the next moment according to the updated state estimation value, wherein the target information comprises the position of a target frame, the position of the key point on the target and the head orientation. And repeating the steps of prediction and updating, and continuously iterating to perform state estimation so as to realize the accurate positioning of the target robot.

Example 2

The embodiment provides an indoor robot multi-camera positioning system based on key point detection, which comprises:

a plurality of indoor mobile robots;

an image acquisition device equipped with one or more vertical-looking-down depth cameras for acquiring image information;

a computing device comprising a memory, a processor, and a program stored in the memory, the processor implementing the method as described in embodiment 1 above when executing the program.

Example 3

The present embodiment provides a storage medium having stored thereon a program which, when executed, implements the method described in embodiment 1 above.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in the same piece or pieces of software and/or hardware when implementing the present invention.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments of the present invention are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment is mainly described in the differences from the other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention by one of ordinary skill in the art without undue burden. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by a person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims

1. The multi-camera positioning method of the indoor robot based on the key point detection is characterized by comprising the following steps of:

2. The method for positioning multiple cameras of an indoor robot based on key point detection according to claim 1, wherein the image acquisition device is one or more vertical top view depth cameras, and the depth cameras are independent, non-interfering and co-operate with each other.

3. The method for positioning the multiple cameras of the indoor robot based on the key point detection according to claim 2, wherein the step S1 specifically comprises: and determining a spliced image using a single view image or multiple views according to the specific application scene according to the first view and the images of multiple views acquired by the depth camera of the image acquisition device.

4. The method for positioning the multiple cameras of the indoor robot based on the key point detection according to claim 1, wherein the step S2 comprises the steps of:

5. The method for positioning multiple cameras of an indoor robot based on key point detection according to claim 1, wherein the transformation of the coordinate system in the step S3 involves a world coordinate system X _W Y _W Z _W Camera coordinate system X _C Y _C Z _C The coordinates of the position of the corner point or the key point of the target frame in the ith camera obtained in the step S2 are set as (u, v) and the coordinate system of the image is set as XY and the coordinate system of the pixel is set as UV, and the coordinate system of the image is set as X _i Y _i The corresponding coordinates of (c) are denoted (x, y),

6. The indoor robot multi-camera positioning method based on the key point detection according to claim 1, wherein the key point position information comprises coordinates of a target center point and left and right vertexes in front of the target.

7. The method for positioning the multiple cameras of the indoor robot based on the key point detection according to claim 1, wherein the step S4 specifically comprises: calculating the target head orientation through the key point position under the world coordinate system, wherein the central point of the vertex in front of the target is a new central point coordinate, and the new central point pointed by the target central point is the target head orientation; yaw angle calculation of the robot using the Euler angle, the robot being located in Z in the world coordinate system _W The yaw angle is the rotation angle of the robot about the z-axis.

8. The method for positioning the multiple cameras of the indoor robot based on the key point detection according to claim 1, wherein the step S5 specifically comprises: initializing a Kalman filter, wherein the Kalman filter comprises a state vector, a state estimation, a covariance matrix and a measurement matrix; and at the current moment, predicting the position of the target at the next moment by using the state transition matrix and the covariance matrix, taking the target position output by the key point detection model as an actual observation value, combining the prediction value with the observation value during updating to obtain the accurate position at the next moment, and realizing real-time positioning.

9. Multi-camera positioning system of indoor robot based on key point detects, characterized by comprising:

a plurality of indoor mobile robots;

the image acquisition device is used for acquiring image information;

a computing device comprising a memory, a processor, and a program stored in the memory, which when executed by the processor implements the method of any of the preceding claims 1-8.

10. A storage medium having a program stored thereon, wherein the program, when executed, implements the method of any of claims 1-8.