CN113609985A

CN113609985A - Object pose detection method, detection device, robot and storage medium

Info

Publication number: CN113609985A
Application number: CN202110895680.8A
Authority: CN
Inventors: 张干
Original assignee: Noah Robot Technology Shanghai Co ltd
Current assignee: Noah Robot Technology Shanghai Co ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2021-11-05
Anticipated expiration: 2041-08-05
Also published as: CN113609985B

Abstract

The invention provides an object pose detection method, which comprises the following steps: training and identifying the detected target object to be identified by utilizing deep learning to obtain an identification network of the target object; calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result; identifying the target object, the image range of the target object in the image and a plurality of characteristic parts according to the image data frame and the identification network; according to the calibration result, extracting sensor point cloud data in an image range, and acquiring local point cloud corresponding to the characteristic part; and acquiring the central positions of a plurality of characteristic parts according to the local point clouds, and acquiring the pose of the object by using the central positions. Therefore, accurate information is provided for planning the movement route of the robot, and the movement capacity and efficiency of the robot are improved.

Description

Object pose detection method, detection device, robot and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an object pose detection method, detection equipment, a robot and a storage medium.

Background

When the robot capable of moving autonomously moves in the motion environment, if the object detected by the sensor is not identified, part of some moving instruments, carts and other obstacle objects can be used as obstacles, and the whole object cannot be known, so that when the obstacle is shielded by people or carts, an improper route can be planned, the robot motion effect is poor, and even collision accidents occur. Because the movement route of the robot is relatively fixed, the detected object can be basically recognized through training by methods such as deep learning, and if only a target obstacle object is recognized, the pose of the obstacle object cannot be known, and good movement expression cannot be achieved.

Disclosure of Invention

The invention aims to provide an object pose detection method, detection equipment, a robot and a computer storage medium, which are used for solving the problem of pose detection of a detected target object when the robot moves, providing accurate information for planning a movement route of the robot and improving the movement capacity and efficiency of the robot.

The technical scheme provided by the invention is as follows:

an object pose detection method includes:

training and identifying a target object to be identified according to a data frame detected by a sensor by utilizing deep learning to obtain an identification network of the target object;

calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

identifying the target object, the image range of the target object in the image and a plurality of characteristic parts according to the image data frame and the identification network;

according to the calibration result, extracting sensor point cloud data in an image range, and acquiring local point cloud corresponding to the characteristic part;

and acquiring the central positions of a plurality of characteristic parts according to the local point clouds, and acquiring the pose of the object by using the central positions.

Preferably, the sensor is a depth sensor, including but not limited to an RGBD sensor, a lidar or a solid state lidar.

Further, the obtaining of the central positions of the plurality of feature parts according to the local point cloud and the obtaining of the pose of the object by using the central positions specifically include:

calculating sensor point cloud data and local point cloud to obtain the central position of the local point cloud;

and calculating the pose of the whole target object according to the central positions of the local point clouds and the positions of the characteristic parts relative to the center of the target object.

Optionally, the calculating the pose of the whole object according to the central positions of the plurality of local point clouds and the positions of the feature local relative to the center of the target object specifically includes:

determining candidate positions of a first characteristic part and a second characteristic part in a plurality of characteristic parts according to the central positions of the first characteristic part and the second characteristic part and the distance between the first characteristic part and the second characteristic part;

calculating candidate poses of the target object according to the candidate positions of the characteristic parts in the target object;

and obtaining the whole pose of the target object according to the candidate pose of the target object.

Optionally, obtaining the whole pose of the target object according to the candidate pose of the target object specifically includes:

the candidate poses of the target object comprise a first candidate pose and a second candidate pose, wherein the first candidate pose and the second candidate pose are obtained according to a first candidate position and a second candidate position of the characteristic part in the target object respectively;

obtaining a first detection data frame and a second detection data frame of the sensor at the first candidate pose and the second candidate pose respectively;

and respectively judging whether the first detection data frame and the second detection data frame are overlapped with the sensor point cloud data of the target object, if so, taking the candidate pose corresponding to the overlapped detection data frame as the pose of the target object.

In order to achieve the object of the present invention, an embodiment of the present invention further provides an object pose detection apparatus, including:

the first identification module is used for training and identifying a target object to be identified according to a data frame detected by a sensor by utilizing deep learning to obtain an identification network of the target object;

the calibration module is used for calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

the second identification module is used for identifying the target object, the image range of the target object in the image and a plurality of characteristic parts according to the image data frame and the identification network;

the point cloud data calculation module is used for extracting sensor point cloud data in an image range according to the calibration result and acquiring local point clouds corresponding to the characteristic parts;

and the pose acquisition module is used for acquiring the central positions of a plurality of characteristic parts according to the local point clouds and acquiring the pose of the object by using the central positions.

Further, the pose acquisition module specifically includes:

the central position calculating unit is used for calculating the sensor point cloud data and the local point cloud to obtain the central position of the local point cloud;

and the pose calculation unit is used for calculating the pose of the whole target object according to the central positions of the local point clouds and the positions of the characteristic parts relative to the center of the target object.

Optionally, the pose calculation unit specifically includes:

a candidate position determining subunit, configured to determine, according to respective central positions of a first feature part and a second feature part in the plurality of feature parts and a distance between the first feature part and the second feature part, a candidate position of the feature part in the target object;

the candidate pose calculation subunit is used for calculating candidate poses of the target object according to the candidate positions of the local features on the target object;

and the acquisition subunit is used for acquiring the whole pose of the target object according to the candidate pose of the target object.

In order to achieve the object of the present invention, an embodiment of the present invention further provides a robot, which includes a processor and a memory, where the processor is coupled to the memory, and the memory is used for storing a program; the processor is configured to execute the program in the memory to cause the robot to perform the method of object pose detection as described above.

To achieve the object of the present invention, the embodiment of the present invention further provides a computer-readable storage medium, which stores instructions that, when executed on a computer, enable the computer to execute any of the above-mentioned methods for implementing sensor detection.

According to the invention, the position and posture of the whole target object are obtained by identifying the target object and the local part of the target object and the characteristic local part of the target object by utilizing the depth sensor and the camera data frame, so that the robot can reasonably plan the movement route in the movement process, and the movement efficiency of the robot is improved.

Drawings

The above features, technical features, advantages and implementations of the method and apparatus for user equipment admission, the method and apparatus for user equipment handover will be further explained in the following detailed description of preferred embodiments in a clearly understandable manner with reference to the accompanying drawings.

Fig. 1 is a flowchart of an object pose detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an object pose detection apparatus according to an embodiment of the present invention;

fig. 3 is a schematic view of another object pose detection apparatus provided by the embodiment of the present invention;

fig. 4 is a schematic view of another object pose detection apparatus provided by the embodiment of the present invention;

fig. 5 is a schematic diagram of an autonomous mobile robot according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically depicted, or only one of them is labeled. In this document, "one" means not only "only one" but also a case of "more than one".

In the development process of autonomous movement of intelligent devices, the inventor needs to utilize some sensors to detect the environment in order to realize the autonomous movement of the intelligent devices. The intelligent device may be an autonomous moving robot, an autonomous moving automobile, or other autonomous walking devices, which may more or less employ depth sensors such as a laser radar to detect a target object and employ a camera to identify an object in a motion environment of the robot, however, in the prior art, detection or identification of a target object is only an object, but if only a part of the object is detected, the whole object is not obtained, and the obstacle level of the object cannot be accurately determined.

In order to accurately acquire the pose of a target object, the embodiment of the invention provides an object pose detection method.

Referring to fig. 1, an object pose detection method according to an embodiment of the present invention includes:

s1, training and identifying a target object to be identified according to a data frame detected by a sensor by utilizing deep learning to obtain an identification network of the target object;

the embodiment of the invention firstly uses a mature deep learning method to train the object to be recognized and the characteristic part of the object, such as a cart and wheels on the cart;

the deep learning of the embodiment of the invention does not limit the network used, for example, the full convolution network technology can be transplanted to the three-dimensional distance scanning data detection task. Specifically, the scene is set as a detection task based on the range data of the Velodyne64E lidar. Data is presented in a 2D point map and a single 2D end-to-end full convolution network is used to predict both target confidence and bounding box. The complete 3D bounding box can also be predicted using a 2D convolutional network by designed bounding box coding.

Or to eliminate the need for manual feature engineering of 3D point clouds, VoxelNet, a generic 3D detection network, can be used that unifies feature extraction and bounding box prediction into a single-step end-to-end trainable deep network.

Or the environment representation based on the grid graph is very suitable for a sensor fusion method, a free space estimation method and a machine learning method, and the deep CNN is mainly used for detecting and classifying the target. As an input of the CNN, 3D distance sensor information is efficiently encoded using a multi-layer mesh map. The inference output is a list of rotated bounding boxes with associated semantic categories. The distance sensor measurements are converted to a multi-layer grid map as input to the target detection and classification network. The CNN network infers the rotated 3D bounding box and semantic categories simultaneously. These frames are projected into the camera image for visual verification.

S2, calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

a spatial conversion relation from the sensor to the camera is found through calibration, a rotation matrix R and a translation matrix T are needed for conversion among different coordinate systems, and preparation is made for subsequent data fusion of the sensor and the camera.

S3, identifying the target object, the image range of the target object in the image and a plurality of characteristic parts according to the image data frame and the identification network;

the target object in the embodiment of the present invention refers to an image for performing pose detection, and the target object may include a person, an instrument device, a cart, and the like. In the embodiment of the present invention, an image of a target object may be acquired first, for example, the target image may be selected from stored image data, or a transmitted target image may also be received from another device, or the target image may also be captured directly by an image capturing device, which is only an exemplary illustration of acquiring the target image, and the embodiment of the present invention is not limited thereto.

After the image of the target object is obtained, the target object in the image of the target object may be identified, where the target object in the target image may be identified through an image identification algorithm, and the identification of the target object may also be performed through a trained machine learning network model, where the machine learning network model may include a neural network model, a deep learning neural network model, or the like, which is not limited in the embodiment of the present invention.

For example, the patient bed and the wheels of the patient bed are identified by deep learning, and the point cloud of the patient bed and the wheels is obtained by using the calibration relation of the camera and the sensor. The whole coordinates and pose of the sickbed can be calculated by utilizing the relative distance relationship among 4 wheels of the sickbed and the identified coordinates of the point cloud of the wheels.

S4, extracting sensor point cloud data in an image range according to the calibration result, and acquiring local point clouds corresponding to the characteristic parts;

and S5, acquiring central positions of a plurality of characteristic parts according to the local point clouds, and acquiring the pose of the object by using the central positions.

For example, knowing the central positions W1 and W2 of two wheels on the cart, in combination with the feature that the cart has 4 wheels, the candidate poses P1 and P2 of the cart can be calculated with respect to the overall pose of the cart (since the cart has 4 wheels, the distance between the two wheels is generally three possibilities of L1, L2 and L3, and it can be known which two of the two wheels are possible according to W1 and W2);

assuming that the cart is in candidate positions P1 and P2, respectively, it is calculated whether there is a conflict between the depth data returned by the sensors and the object when the cart is in this position: for example, when an object is on P1, which objects should be detected locally but are not actually detected, two scores S1 and S2 are obtained; the highest pose is selected as the object.

In order to achieve the object of the present invention, as shown in fig. 2, an embodiment of the present invention further provides an object pose detection apparatus 100, including:

the first identification module 11 is configured to perform training identification on a target object to be identified according to a data frame detected by a sensor by using deep learning, and acquire an identification network of the target object;

the calibration module 12 is configured to calibrate the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera, so as to obtain a calibration result;

the second identification module 13 is configured to identify the target object, an image range of the target object in the image, and a plurality of feature parts according to the image data frame and the identification network;

the point cloud data calculation module 14 is configured to extract sensor point cloud data within an image range according to the calibration result, and acquire a local point cloud corresponding to the characteristic part;

and the pose acquisition module 15 is configured to acquire central positions of a plurality of feature parts according to the local point cloud, and acquire a pose of the object by using the central positions.

Further, as shown in fig. 3, the pose acquisition module specifically includes:

a central position calculating unit 151, configured to calculate sensor point cloud data and a local point cloud to obtain a central position of the local point cloud;

and a pose calculation unit 152, configured to calculate a pose of the entire target object according to the central positions of the several local point clouds and the positions of the feature local areas relative to the center of the target object.

Optionally, as shown in fig. 4, the pose calculation unit 152 specifically includes:

a candidate position determining subunit 1521, configured to determine, according to respective central positions of a first feature part and a second feature part in the plurality of feature parts and a distance between the first feature part and the second feature part, a candidate position of the feature part in the target object;

a candidate pose calculation subunit 1522, configured to calculate, according to the candidate position of the local feature in the target object, a candidate pose of the target object;

the obtaining subunit 1523 is configured to obtain the entire pose of the target object according to the candidate pose of the target object.

It should be noted that the embodiment of the pose detection apparatus provided by the present invention and the embodiment of the pose detection method provided by the foregoing are all based on the same inventive concept, and can obtain the same technical effect; thus, other specific contents of the embodiments of the pose detection apparatus can be referred to the description of the embodiments of the pose detection method described above.

It should be noted that the division of each module or unit of the above detection device is only a division of a logic function, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units can be realized in the form of software called by processor; or may be implemented entirely in hardware; and part of the units can be realized in the form of calling by a processor through software, and part of the units can be realized in the form of hardware.

For example, the functions of the above modules or units may be stored in a memory in the form of program codes, which are scheduled by a processor to implement the functions of the above units. The processor may be a general purpose processor such as a Central Processing Unit (CPU) or other processor capable of calling programs. As another example, the above units may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, in combination with the above two methods, part of the functions is implemented in the form of a scheduler code of the processor, and part of the functions is implemented in the form of a hardware integrated circuit. And when the above functions are integrated together, the functions can be realized in the form of a system-on-a-chip (SOC).

The detection device and the like provided by the embodiment of the application can be specifically chips, and the chips comprise: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute computer-executable instructions stored by the storage unit to cause a chip within the detection device to perform the steps performed by the detection device described in the illustrated embodiment described above, or to cause a chip within the execution device to perform the steps performed by the detection device as described in the foregoing embodiment shown in fig. 2.

Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

To achieve the object of the present invention, as shown in fig. 5, an embodiment of the present invention further provides a robot 180, where the robot 180 includes a processor 1803 and a memory 1804, the processor 1803 is coupled to the memory 1804, and where,

the memory 1804 for storing programs;

the processor 1803 is configured to execute the program in the memory, so that the robot performs the method for detecting the object pose as described above.

Referring to fig. 5, the method disclosed in the embodiment of the present invention and corresponding to fig. 1 may be applied to an autonomous mobile robot 180, where the robot 180 includes a processor 1803, and the processor 1803 may be an integrated circuit chip having signal processing capability. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware or instructions in the form of software in the processor 1803. The processor 1803 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 1803 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments corresponding to fig. 1 of the present application.

A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1804, and the processor 1803 reads the information in the memory 1804, and completes the steps of the above method in combination with the hardware thereof.

The receiver 1801 may be used to receive entered numerical or character information and to generate signal inputs relating to the relevant settings and functional control of the robot 180. The transmitter 1802 may be used to output numeric or character information through a first interface; the transmitter 1802 is further operable to send instructions to the disk groups via the first interface to modify data in the disk groups; the transmitter 1802 may also include a display device such as a display screen.

An embodiment of the present invention also provides a computer-readable storage medium in which a program for performing signal processing is stored, which, when run on a computer, causes the computer to perform the steps performed by the object pose detection method described in the foregoing illustrated embodiment, or causes the computer to perform the steps performed by the detection apparatus described in the foregoing illustrated embodiment of fig. 2.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in or contributed to by the prior art, and the computer software product may be stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, and includes several instructions for causing a computer device (which may be a personal computer or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, which may be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. An object pose detection method, characterized by comprising:

according to the calibration result, extracting sensor point cloud data in the image range, and acquiring local point cloud corresponding to the characteristic part;

2. The object pose detection method according to claim 1, wherein the sensor is a depth sensor including but not limited to an RGBD sensor, a lidar or a solid state lidar.

3. The object pose detection method according to claim 1, wherein the obtaining of the center positions of the plurality of feature parts according to the local point cloud and the obtaining of the pose of the object using the center positions specifically comprise:

4. The object pose detection method according to claim 3, wherein the calculating the pose of the entire object according to the central positions of the several local point clouds and the positions of the feature local to the center of the target object specifically comprises:

5. The object pose detection method according to claim 4, wherein obtaining the entire pose of the target object based on the candidate poses of the target object specifically comprises:

6. An object pose detection apparatus characterized by comprising:

7. The object pose detection apparatus according to claim 6, wherein the pose acquisition module specifically includes:

8. The object pose detecting apparatus according to claim 7, wherein the pose calculating unit specifically includes:

9. A robot, characterized in that the robot comprises a processor and a memory, the processor being coupled with the memory,

the memory is used for storing programs;

the processor to execute the program in the memory to cause the robot to perform the method of any of claims 1-5.

10. A computer storage medium, comprising a program which, when run on a computer, causes the computer to perform the method of any one of claims 1-5.