CN113609985B

CN113609985B - Object pose detection method, detection device, robot and storable medium

Info

Publication number: CN113609985B
Application number: CN202110895680.8A
Authority: CN
Inventors: 张干
Original assignee: Noah Robot Technology Shanghai Co ltd
Current assignee: Noah Robot Technology Shanghai Co ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2024-02-23
Anticipated expiration: 2041-08-05
Also published as: CN113609985A

Abstract

The invention provides an object pose detection method, which comprises the following steps: training and identifying the detected target object to be identified by deep learning to obtain an identification network of the target object; calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result; according to the image data frame and the identification network, identifying the target object, an image range of the target object in an image and a plurality of characteristic parts; according to the calibration result, sensor point cloud data in an image range are extracted, and local point clouds corresponding to the features are obtained; and acquiring the central positions of a plurality of characteristic parts according to the local point cloud, and acquiring the pose of the object by utilizing the central positions. Therefore, accurate information is provided for planning of the movement route of the robot, and the movement capacity and efficiency of the robot are improved.

Description

Object pose detection method, detection device, robot and storable medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an object pose detection method, detection equipment, a robot and a storable medium.

Background

When the robot capable of moving autonomously moves in the motion environment, if the object detected by the sensor is not identified, the part of the obstacle object such as some moving instruments and equipment, a cart and the like can be used as the obstacle, and the whole object cannot be known, so that when the robot is shielded by a person or the cart obstacle, an unsuitable route can be planned, the motion effect of the robot is poor, and even collision accidents occur. Because the motion route of the robot is relatively fixed, the detected object can be basically identified through training by deep learning and other methods, but if only the target obstacle object is identified, the pose of the obstacle object cannot be known, and good motion performance cannot be achieved.

Disclosure of Invention

The invention aims to provide an object pose detection method, detection equipment, a robot and a computer storage medium, which are used for solving the problem of pose detection of a detected target object when the robot moves, providing accurate information for planning a robot movement route and improving the movement capacity and efficiency of the robot.

The technical scheme provided by the invention is as follows:

an object pose detection method, comprising:

training and identifying a target object to be identified according to a data frame detected by a sensor by utilizing deep learning, and acquiring an identification network of the target object;

calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

according to the image data frame and the identification network, identifying the target object, an image range of the target object in an image and a plurality of characteristic parts;

according to the calibration result, sensor point cloud data in an image range are extracted, and local point clouds corresponding to the features are obtained;

and acquiring the central positions of a plurality of characteristic parts according to the local point cloud, and acquiring the pose of the object by utilizing the central positions.

Preferably, the sensor is a depth sensor, including but not limited to an RGBD sensor, a lidar or a solid state lidar.

Further, the obtaining the central positions of the feature parts according to the local point cloud, and obtaining the pose of the object by using the central positions specifically includes:

calculating sensor point cloud data and local point cloud to obtain the center position of the local point cloud;

and calculating the pose of the whole target object according to the central positions of the local point clouds and the positions of the characteristic local relative to the center of the target object.

Optionally, calculating the pose of the whole object according to the central positions of the local point clouds and the positions of the feature local relative to the center of the target object specifically includes:

according to the central positions of a first feature part and a second feature part in a plurality of feature parts and the distances between the first feature part and the second feature part, determining candidate positions of the feature parts on the target object;

according to the candidate position of the characteristic part on the target object, calculating to obtain the candidate pose of the target object;

and obtaining the whole pose of the target object according to the candidate pose of the target object.

Optionally, according to the candidate pose of the target object, obtaining the whole pose of the target object specifically includes:

the candidate pose of the target object comprises a first candidate pose and a second candidate pose, wherein the first candidate pose and the second candidate pose are obtained according to a first candidate position and a second candidate position of the characteristic part in the target object respectively;

respectively obtaining a first detection data frame and a second detection data frame of the sensor on the first candidate pose and the second pose;

and respectively judging whether the first detection data frame and the second detection data frame are overlapped with the sensor point cloud data of the target object, and if so, taking the candidate pose corresponding to the overlapped detection data frame as the pose of the target object.

In order to achieve the object of the present invention, an embodiment of the present invention further provides an apparatus for detecting a pose of an object, the apparatus including:

the first recognition module is used for training and recognizing a target object to be recognized according to a data frame detected by the sensor by utilizing deep learning, and obtaining a recognition network of the target object;

the calibration module is used for calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

the second recognition module is used for recognizing the target object, the image range of the target object in the image and a plurality of characteristic parts according to the image data frame and the recognition network;

the point cloud data calculation module is used for extracting sensor point cloud data in an image range according to the calibration result and obtaining local point clouds corresponding to the features locally;

and the pose acquisition module is used for acquiring the central positions of a plurality of characteristic parts according to the local point cloud and acquiring the pose of the object by utilizing the central positions.

Further, the pose acquisition module specifically includes:

the central position calculating unit is used for calculating sensor point cloud data and local point clouds to obtain the central position of the local point clouds;

and the pose calculating unit is used for calculating the pose of the whole target object according to the central positions of the local point clouds and the positions of the characteristic local relative to the center of the target object.

Optionally, the pose calculating unit specifically includes:

a candidate position determining subunit, configured to determine a candidate position of the feature part in the target object according to respective center positions of a first feature part and a second feature part in the plurality of feature parts, and distances between the first feature part and the second feature part;

the candidate pose calculating subunit is used for calculating the candidate pose of the target object according to the candidate position of the characteristic part on the target object;

and the acquisition subunit is used for acquiring the whole pose of the target object according to the candidate pose of the target object.

In order to achieve the object of the invention, an embodiment of the present invention also provides a robot including a processor and a memory, the processor being coupled to the memory, the memory being for storing a program; the processor is configured to execute the program in the memory, so that the robot performs the method for detecting the pose of the object as described above.

In order to achieve the object of the present invention, an embodiment of the present invention further provides a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform any of the above-described methods for realizing detection by a sensor.

According to the invention, the depth sensor and the camera data frame are utilized, the target object and the local part of the target object are identified, and the pose of the whole target object is obtained through the characteristic local part of the target object, so that the robot can reasonably plan the movement route in the movement process, and the movement efficiency of the robot is improved.

Drawings

The above features, technical features, advantages and implementation manners of the user equipment admission method and apparatus, the user equipment handover method and apparatus will be further described in the following description of the preferred embodiments with reference to the accompanying drawings in a clearly understandable manner.

Fig. 1 is a flowchart of an object pose detection method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an object pose detection device according to an embodiment of the present invention;

fig. 3 is a schematic diagram of another object pose detection apparatus according to an embodiment of the present invention;

fig. 4 is a schematic diagram of another object pose detection apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an autonomous mobile robot according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.

For the sake of simplicity of the drawing, the parts relevant to the present invention are shown only schematically in the figures, which do not represent the actual structure thereof as a product. Additionally, in order to facilitate a concise understanding of the drawings, components having the same structure or function in some of the drawings are only schematically depicted, or only one of them is labeled. Herein, "a" means not only "only this one" but also "more than one" case.

In the development process of autonomous movement of intelligent devices, the inventor needs to utilize sensors to detect environments in order to realize autonomous movement of the intelligent devices. The intelligent device may be an autonomous moving robot, an autonomous moving automobile, or other autonomous walking devices, and the devices more or less adopt depth sensors such as a laser radar to detect a target object and a camera to identify the object in the motion environment of the robot, however, in the prior art, the detection or identification of the target object only detects the object, but if only a part of the object is detected, the obstacle degree of the object cannot be accurately judged, even if the whole object part is obtained, if the pose of the object is not known, for example, the walking planning of the robot is affected by the walking direction of the object, so a method is needed to accurately detect the pose of the target object, and then the obstacle-free or obstacle-avoiding movement can be realized.

In order to accurately acquire the pose of a target object, the embodiment of the invention provides an object pose detection method.

Referring to fig. 1, an object pose detection method according to an embodiment of the present invention includes:

s1, training and identifying a target object to be identified according to a data frame detected by a sensor by utilizing deep learning, and obtaining an identification network of the target object;

firstly, training an object to be identified and a feature part on the object, such as a trolley and wheels on the trolley, by using a mature deep learning method;

the deep learning of the embodiment of the invention does not limit the network used, for example, the full convolution network technology can be transplanted to the three-dimensional distance scanning data detection task. Specifically, the scene is set as a detection task according to the distance data of the Velodyne64E lidar. Data is presented in a 2D point map and target confidence and bounding boxes are predicted simultaneously using a single 2D end-to-end full convolution network. By designed frame coding, a complete 3D frame can also be predicted using a 2D convolutional network.

Or to eliminate the need for manual feature engineering of the 3D point cloud, voxelNet, a general 3D detection network, can be used, which unifies feature extraction and frame prediction into a single-step end-to-end trainable depth network.

And the method is also suitable for sensor fusion, free space estimation and machine learning methods based on environment representation of a grid chart, and mainly uses depth CNN to detect and classify targets. As an input to the CNN, the 3D distance sensor information is efficiently encoded using a multi-layer mesh map. The inference output is a list of rotated bounding boxes with associated semantic categories. The distance sensor measurements are converted into a multi-layer mesh as input to the target detection and classification network. The CNN network simultaneously infers the rotated 3D bounding box and the semantic category. These frames are projected into the camera image for visual verification.

S2, calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

the space conversion relation from the sensor to the camera is found through calibration, a rotation matrix R and a translation matrix T are needed for conversion between different coordinate systems, and preparation is made for subsequent data fusion of the sensor and the camera.

S3, identifying the target object, an image range of the target object in an image and a plurality of characteristic parts according to the image data frame and the identification network;

the target object in the embodiment of the present invention refers to an image for performing pose detection, and may include a person, an instrument device, a cart, and the like. In the embodiment of the present invention, the image of the target object may be acquired first, for example, the target image may be selected from the stored image data, or the transmitted target image may be received from another device, or the target image may be directly captured by the image capturing device, which is merely illustrative of acquiring the target image, and the embodiment of the present invention is not limited thereto.

After the image of the target object is acquired, the target object in the image of the target object may be identified, where the target object in the target image may be identified by an image identification algorithm, or the identification of the target object may be performed by a trained machine learning network model, where the machine learning network model may include a neural network model, a deep learning neural network model, or the like, which is not limited in this embodiment of the present invention.

For example, a hospital bed and wheels of the hospital bed are identified by deep learning, and a point cloud of the hospital bed and wheels is obtained by using a calibrated relationship of cameras and sensors. The whole coordinate and the pose of the sickbed can be calculated by utilizing the relation of the relative distances among 4 wheels of the sickbed and the identified coordinates of the point clouds of the wheels.

S4, extracting sensor point cloud data in an image range according to the calibration result, and obtaining local point clouds corresponding to the features locally;

s5, according to the local point cloud, obtaining central positions of a plurality of feature parts, and obtaining the pose of the object by utilizing the central positions.

For example, knowing the central positions W1 and W2 of the two wheels on the cart, combining the characteristics of the cart with 4 wheels, the candidate poses P1 and P2 of the cart can be calculated with respect to the overall pose of the cart (since the cart has 4 wheels, the distance between the two wheels is typically three possibilities of L1, L2, L3, and it can be known which two of the two wheels are possible from W1 and W2);

assuming that the carts are on candidate poses P1 and P2, respectively, calculating whether the depth data returned by the sensor and the object collide when the carts are on the pose: for example, when an object is on P1, which object portions should be detected but not actually detected, thereby obtaining two scores S1 and S2; the highest pose is selected as the object.

In order to achieve the object of the present invention, as shown in fig. 2, an embodiment of the present invention further provides an apparatus 100 for detecting a pose of an object, the apparatus including:

the first recognition module 11 is configured to perform training recognition on a target object to be recognized according to a data frame detected by a sensor by using deep learning, so as to obtain a recognition network of the target object;

the calibration module 12 is used for calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera to obtain a calibration result;

the second identifying module 13 is configured to identify, according to the image data frame and the identifying network, the target object, an image range of the target object in an image, and a plurality of feature parts;

the point cloud data calculation module 14 is configured to extract sensor point cloud data in an image range according to the calibration result, and obtain a local point cloud corresponding to the feature locally;

and the pose acquisition module 15 is used for acquiring the central positions of a plurality of characteristic parts according to the local point clouds and acquiring the pose of the object by utilizing the central positions.

Further, as shown in fig. 3, the pose acquisition module specifically includes:

a central position calculating unit 151, configured to calculate sensor point cloud data and a local point cloud, so as to obtain a central position of the local point cloud;

the pose calculating unit 152 is configured to calculate the pose of the entire target object according to the central positions of the local point clouds and the positions of the feature local relative to the center of the target object.

Optionally, as shown in fig. 4, the pose calculating unit 152 specifically includes:

a candidate position determining subunit 1521, configured to determine a candidate position of a feature part in the target object according to respective center positions of a first feature part and a second feature part in the plurality of feature parts, and a distance between the first feature part and the second feature part;

a candidate pose calculating subunit 1522, configured to calculate, according to the candidate position of the feature part on the target object, a candidate pose of the target object;

and the obtaining subunit 1523 is configured to obtain the entire pose of the target object according to the candidate pose of the target object.

It should be noted that, the embodiment of the pose detection apparatus provided by the present invention and the embodiment of the pose detection method provided by the present invention are both based on the same inventive concept, and can achieve the same technical effects; thus, other specific contents of the embodiment of the pose detection apparatus may refer to the description of the embodiment contents of the aforementioned pose detection method.

It should be noted that the above division of the modules or units of the detection device is merely a division of a logic function, and may be fully or partially integrated into a physical entity or may be physically separated. And these units may all be implemented in software in the form of processor calls; or can be realized in hardware; and part of the units can be realized in a form of calling by a processor through software, and part of the units can be realized in a form of hardware.

For example, the functions of the above modules or units may be stored in a memory in the form of program codes, which are scheduled by a processor to realize the functions of the above units. The processor may be a general purpose processor such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke a program. As another example, each of the above units may be one or more integrated circuits configured to implement the above methods, e.g.: one or more Application Specific Integrated Circuits (ASICs), or one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, in combination with the two modes, part of the functions are realized by the form of processor scheduler codes, and part of the functions are realized by the form of hardware integrated circuits. And when the above functions are integrated together, they may be implemented in the form of a system-on-a-chip (SOC).

The detection device and the like provided in the embodiment of the application may specifically be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored by the storage unit to cause a chip within the detection device to perform the steps performed by the detection device described in the above-described embodiment, or to cause a chip within the execution device to perform the steps performed by the detection device described in the embodiment as described in fig. 2.

Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.

In order to achieve the object of the invention, as shown in fig. 5, an embodiment of the present invention further provides a robot 180, the robot 180 including a processor 1803 and a memory 1804, the processor 1803 being coupled to the memory 1804, wherein,

the memory 1804 is used for storing programs;

the processor 1803 is configured to execute the program in the memory, so that the robot performs the method for detecting the pose of the object as described above.

Referring to fig. 5, the method disclosed in the embodiment of the present invention corresponding to the embodiment of fig. 1 may be applied to an autonomous mobile robot 180, where the robot 180 includes a processor 1803, and the processor 1803 may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 1803. The processor 1803 may be a general-purpose processor, a digital signal processor (digital signalprocessing, DSP), a microprocessor, or a microcontroller, and may further include an application specific integrated circuit (applicationspecific integrated circuit, ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The processor 1803 may implement or perform the methods, steps, and logic blocks disclosed in the corresponding embodiment of fig. 1 of the present application.

A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1804, and the processor 1803 reads information in the memory 1804 and, in combination with the hardware, performs the steps of the method described above.

The receiver 1801 may be used to receive input numeric or character information and to generate signal inputs related to the relevant setup and control of the robot 180. The transmitter 1802 is operable to output numeric or character information via a first interface; the transmitter 1802 is further operable to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 1802 may also include a display device such as a display screen.

In an embodiment of the present invention, there is also provided a computer-readable storage medium having stored therein a program for performing signal processing, which when executed on a computer, causes the computer to perform the steps performed by the object pose detection method described in the foregoing embodiment or causes the computer to perform the steps performed by the detection apparatus described in the foregoing embodiment shown in fig. 2.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, or a network device, etc.) to execute the method described in the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium, which may be any available medium that can be stored by a computer or a data storage device, such as a training device, a data center, or the like, that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

It should be noted that the above embodiments can be freely combined as needed. The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. The object pose detection method is characterized by comprising the following steps of:

training and identifying a target object to be identified according to a data frame detected by a depth sensor by utilizing deep learning, and acquiring an identification network of the target object;

calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera, and obtaining a calibration result by calibrating to find the space conversion relation from the sensor to the camera;

according to the calibration result, extracting sensor point cloud data in the image range, and obtaining local point clouds corresponding to the features locally;

according to the local point cloud, acquiring central positions of a plurality of feature parts, and acquiring the pose of an object by utilizing the central positions, wherein the method specifically comprises the following steps: calculating sensor point cloud data and local point cloud to obtain the center position of the local point cloud;

calculating the pose of the whole target object according to the central positions of the local point clouds and the positions of the characteristic local relative to the center of the target object, wherein calculating the pose of the whole object according to the central positions of the local point clouds and the positions of the characteristic local relative to the center of the target object specifically comprises:

obtaining the whole pose of the target object according to the candidate pose of the target object, wherein the obtaining the whole pose of the target object according to the candidate pose of the target object specifically comprises:

respectively obtaining a first detection data frame and a second detection data frame of the sensor on the first candidate pose and the second candidate pose;

2. The object pose detection method according to claim 1, wherein the depth sensor includes, but is not limited to, an RGBD sensor, a laser radar.

3. An apparatus for detecting a pose of an object, the apparatus comprising:

the first recognition module is used for training and recognizing a target object to be recognized according to a data frame detected by the depth sensor by utilizing the deep learning to obtain a recognition network of the target object;

the calibration module is used for calibrating the camera and the sensor according to the detection data frame of the sensor and the image data frame of the camera, and the spatial conversion relation from the sensor to the camera is found through calibration to obtain a calibration result;

the pose acquisition module is used for acquiring central positions of a plurality of feature parts according to the local point cloud and acquiring the pose of an object by utilizing the central positions, wherein the pose acquisition module specifically comprises:

the pose calculating unit is used for calculating the pose of the whole target object according to the central positions of a plurality of local point clouds and the positions of the characteristic local relative to the center of the target object, wherein the pose calculating unit specifically comprises:

the obtaining subunit is configured to obtain an entire pose of the target object according to the candidate pose of the target object, where obtaining the entire pose of the target object according to the candidate pose of the target object specifically includes:

4. A robot comprising a processor and a memory, said processor being coupled to said memory, characterized in that,

the memory is used for storing programs;

the processor for executing a program in the memory, causing the robot to perform the method of any one of claims 1-2.

5. A computer storage medium comprising a program which, when run on a computer, causes the computer to perform the method of any of claims 1-2.