CN111823277A

CN111823277A - Object grabbing platform and method based on machine vision

Info

Publication number: CN111823277A
Application number: CN202010722155.1A
Authority: CN
Inventors: 王曰英; 吴春强; 彭艳; 张丹; 谢少荣; 罗均; 蒲华燕
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-10-27

Abstract

The invention discloses an object grabbing platform and method based on machine vision, wherein the grabbing platform comprises: the system comprises a vision system, a target detection module, a coordinate conversion and storage module, a sound pickup, a semantic recognition module, a path planning module and a mechanical arm system; the robot comprises a visual system, a target detection module, a sound pickup, a voice conversion module, a path planning module, a mechanical arm system, a robot arm system and a service robot, wherein the visual system and the target detection module are used for environment recognition, the sound pickup and the voice conversion module are used for voice command acquisition, the path planning module is used for path planning, the mechanical arm system is used for object grabbing according to a path planning result, and the cooperative work of the modules is used for realizing an object grabbing function in home service.

Description

Object grabbing platform and method based on machine vision

Technical Field

The invention relates to the technical field of service robots, in particular to an object grabbing platform and method based on machine vision.

Background

The service robot is an important branch of the robot field, and with the development of the society, the pace of life and work is accelerated, and the aging of the population is intensified, so that a huge service robot market is developed along with the incubation. A clear distinction from conventional industrial robots is that service robots operate in an unordered, unstructured environment. An industrial robot can operate back and forth according to specified actions only by planning a working mode in advance, but the working environment of a service robot changes frequently, and the robot needs stronger cognitive ability and execution force, so that higher requirements are put on the intelligence and the adaptability of the robot. How to provide a service robot capable of completing complex tasks in a home environment becomes a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide an object grabbing platform and method based on machine vision so as to provide a service robot capable of completing complex tasks in a family environment.

In order to achieve the purpose, the invention provides the following scheme:

a machine vision based object grasping platform, the grasping platform comprising:

the system comprises a vision system, a target detection module, a coordinate conversion and storage module, a sound pickup, a semantic recognition module, a path planning module and a mechanical arm system;

the vision system is connected with the target detection module and is used for acquiring RGB information of objects in a grabbing range of the mechanical arm system and depth information between the vision system and the objects and sending the RGB information and the depth information of all the objects to the target detection module;

the target detection module is connected with the coordinate conversion and storage module and is used for classifying the RGB information of each object, determining the name of each object and determining the three-dimensional coordinate of each object in a camera coordinate system according to the depth information of each object; the name of each object and the three-dimensional coordinates in the camera coordinate system are sent to the coordinate conversion and storage module;

the coordinate conversion and storage module is used for converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the manipulator in the manipulator coordinate system, and storing the name of each object and the three-dimensional coordinates of each object in the manipulator coordinate system;

the sound pick-up is connected with the semantic recognition module and used for acquiring the grabbing required voice information sent by a user and sending the grabbing required voice information to the semantic recognition module;

the semantic recognition module is respectively connected with the coordinate conversion and storage module and the path planning module, and is used for recognizing the voice information of the grabbing requirement, acquiring the name of an object which needs to be grabbed by a user, acquiring the three-dimensional coordinate of the manipulator arm coordinate system of the object which needs to be grabbed by the user from the coordinate conversion and storage module, and sending the three-dimensional coordinate of the manipulator arm coordinate system of the object which needs to be grabbed by the user to the path planning module;

the path planning module is connected with the mechanical arm system and used for carrying out path planning according to three-dimensional coordinates under a mechanical arm coordinate system of an object to be grabbed by a user, acquiring an object grabbing path and sending the object grabbing path to the mechanical arm system;

the mechanical arm hand system is used for grabbing the object required to be grabbed by the user according to the object grabbing path.

Optionally, the vision system includes a depth camera and a support, and the depth camera includes a central RGB camera and infrared cameras uniformly distributed around the RGB camera.

Optionally, the manipulator system comprises a manipulator base, a manipulator and a terminal gripper;

the mechanical arm is arranged on the mechanical arm base; the end gripper is arranged at the end of the mechanical arm.

Optionally, the target detection module includes an object classification sub-module and a coordinate determination sub-module;

the object classification submodule is used for classifying the RGB information of each object by adopting a target detection algorithm of a convolutional neural network and determining the name of each object;

and the coordinate determination submodule is used for determining the three-dimensional coordinates of each object in the camera coordinate system according to the depth information of each object.

Optionally, the coordinate transforming and storing module includes:

and the coordinate conversion submodule is used for converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the mechanical arm coordinate system according to the calibrated internal parameters and external parameters of the depth camera.

Optionally, the path planning module includes:

and the path planning submodule is used for planning a path according to the three-dimensional coordinates of the manipulator coordinate system of the user needing to grab the object by adopting an improved RRT algorithm so as to obtain the object grabbing path.

Optionally, the target detection module, the coordinate transformation and storage module, and the semantic recognition module are all integrated in an ROS system.

An object grabbing method based on machine vision is applied to a grabbing platform and comprises the following steps:

acquiring RGB information of an object in a grabbing range of the manipulator system and depth information between the depth camera and the object;

classifying the RGB information of each object, and determining the name of each object;

determining the three-dimensional coordinates of each object in a camera coordinate system according to the depth information of each object;

converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the manipulator in the manipulator coordinate system;

correspondingly storing the name of each object and the three-dimensional coordinate of the manipulator in a coordinate system;

acquiring grabbing demand voice information sent by a user;

recognizing the voice information of the grabbing requirement to acquire the name of an object to be grabbed by a user;

acquiring three-dimensional coordinates of the manipulator arm coordinate system of the object required to be grabbed by the user according to the corresponding storage relation between the name of the object required to be grabbed by the user and the name of each object and the three-dimensional coordinates of the manipulator arm coordinate system;

planning a path according to three-dimensional coordinates of a manipulator arm coordinate system of an object to be grabbed by a user to obtain an object grabbing path;

and grabbing the object required to be grabbed by the user according to the object grabbing path.

Optionally, the classifying the RGB information of each object and determining the name of each object specifically includes:

and classifying the RGB information of each object by adopting a target detection algorithm of a convolutional neural network, and determining the name of each object.

Optionally, the converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the robot arm coordinate system specifically includes:

calibrating the internal parameters and the external parameters of the depth camera by adopting a Zhangyingyou calibration method to obtain the calibrated internal parameters and external parameters of the depth camera;

and converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the mechanical arm coordinate system according to the calibrated internal parameters and external parameters of the depth camera.

Optionally, the path planning is performed according to the three-dimensional coordinates of the robot arm coordinate system where the user needs to grasp the object, and the object grasping path is obtained, which specifically includes:

and performing path planning according to the three-dimensional coordinates of the manipulator arm coordinate system for the user to grasp the object by adopting an improved RRT algorithm to obtain an object grasping path.

Compared with the prior art, the invention has the beneficial effects that:

the invention provides an object grabbing platform and method based on machine vision, wherein the grabbing platform comprises: the system comprises a vision system, a target detection module, a coordinate conversion and storage module, a sound pickup, a semantic recognition module, a path planning module and a mechanical arm system; the robot comprises a visual system, a target detection module, a sound pickup, a voice conversion module, a path planning module, a mechanical arm system, a robot arm system and a service robot, wherein the visual system and the target detection module are used for environment recognition, the sound pickup and the voice conversion module are used for voice command acquisition, the path planning module is used for path planning, the mechanical arm system is used for object grabbing according to a path planning result, and the cooperative work of the modules is used for realizing an object grabbing function in home service.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a structural component diagram of an object grabbing platform based on machine vision according to the present invention;

FIG. 2 is a flow chart of the training of the neural network model in the target detection module provided by the present invention;

FIG. 3 is a schematic diagram of a path planning algorithm of the path planning module provided by the present invention;

FIG. 4 is a schematic diagram of a depth camera configuration provided by the present invention; fig. 4(a) is a schematic structural diagram of the depth camera, and fig. 4(b) is a working principle diagram of infrared camera ranging;

FIG. 5 is a structural view of an insulating layer provided by the present invention;

fig. 6 is an object grabbing principle diagram of an object grabbing platform based on machine vision provided by the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention aims to provide an object grabbing platform and method based on machine vision so as to provide a service robot capable of completing complex tasks in a home environment.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1 and 6, the present invention provides a machine vision based object grasping platform, comprising: the system comprises a vision system 101, an object detection module 102, a coordinate conversion and storage module 103, a sound pickup 104, a semantic recognition module 105, a path planning module 106 and a mechanical arm system 107; the vision system 101 is connected to the target detection module 102, and the vision system 101 is configured to obtain RGB information of an object within a range captured by the arm system 107 and depth information between the vision system and the object, and send the RGB information and the depth information of all the objects to the target detection module 102; the target detection module 102 is configured to classify RGB information of each object, determine a name of each object, and determine a three-dimensional coordinate of each object in a camera coordinate system according to depth information of each object; and sends the name of each object and the three-dimensional coordinates in the camera coordinate system to the coordinate conversion and storage module 103; the coordinate conversion and storage module 103 is configured to convert the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the arm coordinate system, and store the name of each object and the three-dimensional coordinates of the arm coordinate system; the sound pickup 104 is connected to the semantic recognition module 105, and the sound pickup 104 is configured to acquire capture-required voice information sent by a user and send the capture-required voice information to the semantic recognition module 105; the semantic recognition module 105 is connected to the coordinate conversion and storage module 103 and the path planning module 106, respectively, and the semantic recognition module 105 is configured to recognize the voice information of the grabbing requirement, obtain a name of an object that the user needs to grab, obtain three-dimensional coordinates of the arm coordinate system of the object that the user needs to grab from the coordinate conversion and storage module, and send the three-dimensional coordinates of the arm coordinate system of the object that the user needs to grab to the path planning module 106; the path planning module 106 is connected to the arm system 107, and the path planning module 106 is configured to perform path planning according to a three-dimensional coordinate of an arm coordinate system where a user needs to grasp an object, obtain an object grasping path, and send the object grasping path to the arm system 107; the arm robot system 107 is used for grabbing the object required to be grabbed by the user according to the object grabbing path.

The object detection module 102, the coordinate transformation and storage module 103, and the semantic recognition module 105 are all integrated in the ROS system. The path planner module 106 is located in a path planner of the robot arm system.

As shown in fig. 4, the vision system 101 includes a depth camera including a centrally located RGB camera and infrared cameras evenly distributed around the RGB camera and a stand. The central lens of the depth camera is a common RGB camera and is used for collecting color images of the surrounding environment, obtaining RGB information of objects in the environment, and the obtained RGB information is used for classifying the objects in the environment. The infrared camera of the depth camera can measure the distance, so that the depth information from the camera to the object can be obtained.

The depth camera adopts Kinect v2 of Microsoft corporation, has the capability of processing data in real time, comprises an RGB camera, an infrared transmitter and a receiver, can acquire infrared images, and can automatically complete the calculation process of depth information. The camera support of the depth camera can adjust the height of the camera to adjust the visual range of the camera, and further meets the requirements of different experiments.

The robotic arm system includes a robotic arm base, a robotic arm, and a terminal gripper. The mechanical arm is arranged on the mechanical arm base; the end gripper is arranged at the end of the mechanical arm. The mechanical arm system comprises a mechanical arm base, a mechanical arm and a tail end paw, wherein a programmable path planning controller is arranged in the mechanical arm base, and a grabbing path planned by the planner is safer and humanoid through continuously optimizing a path planning algorithm; the tail end paw has 3 degrees of freedom, and can grab a specified object. Specifically, the robot arm system is a Jaco robot arm with six degrees of freedom, and as shown in fig. 5, the robot arm system can reach any position in a range theoretically to realize the grabbing work of the terminal paw. Wherein, 1 is a controller, 2 is a first actuator, 3 is a shoulder, 4 is a second actuator, 5 is an arm, 6 is a third actuator, 7 is a forearm, 8 is a wrist, 9 is a fourth actuator, and 10 is a grasping hand.

The target detection module 102 comprises an object classification submodule and a coordinate determination submodule; the object classification submodule is used for classifying the RGB information of each object by adopting a target detection algorithm of the convolutional neural network and determining the name of each object, and specifically, the target detection module classifies the objects by utilizing a trained target detection algorithm-YOLO of the convolutional neural network. The coordinate determination submodule is used for determining the three-dimensional coordinate of each object in the camera coordinate system according to the depth information of each object; specifically, the coordinate determination submodule gives three-dimensional coordinates of each object based on a camera coordinate system. The network model adopted by the convolutional neural network is GooLeNet, and the network model comprises 24 convolutional layers, a pooling layer and 2 full-connection layers.

The coordinate conversion and storage module includes: and the coordinate conversion submodule is used for converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the mechanical arm coordinate system according to the calibrated internal parameters and external parameters of the depth camera. The calibration method of the internal parameters and the external parameters of the depth camera comprises the following steps: the calibration of the internal and external parameters of the camera is carried out by adopting a Zhang Zhengyou calibration method, the internal and external parameters need to be calibrated firstly, and the three-dimensional coordinates of the object based on the camera coordinate system can be converted into the three-dimensional coordinates based on the mechanical arm coordinate system after the internal and external parameters are calibrated, otherwise, the mechanical arm system cannot realize the grabbing work of the corresponding object.

The path planning module 106 includes: and the path planning submodule is used for planning a path according to a three-dimensional coordinate of a manipulator coordinate system of the manipulator which needs to grasp the object by a user by adopting an improved RRT (Rapid-expanding Random Tree) algorithm so as to acquire the object grasping path. Specifically, the path planning submodule plans a safe, convenient and humanoid grasping path information according to the path planning algorithm-RRT algorithm shown in fig. 3, and then transmits the grasping path information to the manipulator system to implement a specific target object grasping work.

It can be seen that the object detection module 102 of the present invention is implemented by a convolutional neural network based object detection algorithm (YOLO) that distinguishes all object classes in the visual range and three-dimensional coordinate information corresponding to each object class using information obtained by a depth camera.

The path planning module is realized by a path planning algorithm, and after the three-dimensional coordinate information of the specified object is received, the path planner can automatically plan a safe and smooth grabbing path according to the requirement of the path planning algorithm, so that the terminal paw is further sent to the specified position and the specified object is grabbed.

The ROS system (robot operating system) can integrate a target detection algorithm and a mechanical arm grabbing algorithm together, and provides a publish-subscribe communication framework for simply and quickly constructing a distributed computing system. The ROS is a distributed process (i.e., "node") framework, processes are encapsulated in packages and functional packages that are easy to share and publish, and the ROS also supports a federated system similar to a code repository, in which engineering collaboration and publication can be achieved, which design enables the development of an engineering to achieve completely independent decisions from the file system to the user interface. The ROS system comprises a visualization tool Rviz simulator, can be used for visualizing data and state information of a sensor, can design a mechanical arm in the Rviz and can control the movement of the mechanical arm.

The ROS system (robot operating system) runs on an Ubuntu 16.04 operating system of the upper computer, integrates a vision system, a mechanical arm system and a corresponding software system together, uniformly transfers the vision system, the mechanical arm system and the corresponding software system through codes, provides a communication tool for the codes of each part of the whole platform, and realizes communication among different processes. The ROS system supports a federated system similar to a code repository where engineering collaboration and release can be achieved, a design that allows for the development of an engineering to achieve completely independent decisions from the file system to the user interface.

Specifically, as shown in fig. 1, the present example provides an object grabbing platform based on machine vision, which includes a vision system 101, a mechanical arm system 107, and a host computer system, wherein the host computer system is installed with an ROS system, and the ROS system integrates an object detection module 102, a coordinate transformation and storage module 103, and a semantic recognition module 105.

The visual system 101 is responsible for acquiring images of objects in the surrounding environment and corresponding depth information, the target detection module 102 is responsible for classifying real objects in the acquired images and attaching labels to the real objects, meanwhile, three-dimensional space coordinates of the corresponding objects are given, and the name and the three-dimensional coordinates of each real object are issued to nodes corresponding to the ROS system; the semantic recognition module 105 can receive the voice and translate the keywords in the voice into corresponding object names, and further release the object names to a corresponding process of the ROS system; when the path planning module on the ROS system receives the name of the specific real object, a safe and humanoid grasping path is planned for the manipulator system 107 according to the three-dimensional coordinates of the corresponding object issued by the target detection module, and the controller of the manipulator system 107 controls the manipulator to deliver the end gripper to the specified position and grasp the specified object.

To implement the object detection and grasping work of the present example, a neural network based on the target detection algorithm YOLO algorithm is first modeled and trained, the modeling of the network model of the present example is based on a GooLeNet network model, and the training process of the network is shown in fig. 2.

The working principle of the depth camera of the present invention is shown in fig. 4, and in order to implement the above functions, the depth camera in the vision system 101 first needs to calibrate the internal and external parameters, and only after the internal and external parameters are calibrated, the three-dimensional coordinates of the object based on the camera coordinate system can be converted into the three-dimensional coordinates based on the robot arm coordinate system, otherwise, the robot arm system 107 cannot implement the grabbing work of the corresponding object.

The conversion relationship between the three-dimensional coordinates of the object in the depth camera coordinate system and the robot arm coordinate system is shown in the following formula.

In the formula [ X_BY_BZ_B]Is a three-dimensional coordinate under the robot hand mark system, [ X ]_KY_KZ_K]Three-dimensional coordinates in the depth camera coordinate system.

The calibration method adopts a Zhangyingyou calibration method.

Based on the above work, building the whole platform on the ROS system has many advantages: firstly, each module can be designed independently, and the mutual independence is stronger, so that a plurality of unnecessary problems are simplified in the design process; secondly, because each module is designed independently, the management is very clear in the later code debugging and optimizing process; third, the ROS system provides many tools (e.g., Rviz emulator) and communication package, and provides a publish-subscribe communication framework for simple and fast construction of distributed computing system, and the ROS is a distributed process (i.e., "node") framework, and processes are encapsulated in packages and function packages that are easy to be shared and published, so that the collaboration and publication of the project can be realized, and the design can make the development of a project to realize completely independent decision from the file system to the user interface.

The invention also provides an object grabbing method based on machine vision, which is applied to a grabbing platform and comprises the following steps:

and acquiring RGB information of the object in the grabbing range of the mechanical arm system and depth information between the depth camera and the object.

The RGB information of each object is classified, and the name of each object is determined.

The classifying the RGB information of each object and determining the name of each object specifically includes: and classifying the RGB information of each object by adopting a target detection algorithm of a convolutional neural network, and determining the name of each object.

And determining the three-dimensional coordinates of each object in the camera coordinate system according to the depth information of each object.

The converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the manipulator in the manipulator coordinate system specifically includes: calibrating the internal parameters and the external parameters of the depth camera by adopting a Zhangyingyou calibration method to obtain the calibrated internal parameters and external parameters of the depth camera; and converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the mechanical arm coordinate system according to the calibrated internal parameters and external parameters of the depth camera.

And converting the three-dimensional coordinates of each object in the camera coordinate system into the three-dimensional coordinates of the mechanical arm coordinate system.

And correspondingly storing the name of each object and the three-dimensional coordinates of the manipulator in the coordinate system.

And acquiring the grabbing demand voice information sent by the user.

And identifying the voice information of the grabbing requirement to acquire the name of the object to be grabbed by the user.

and planning a path according to the three-dimensional coordinates of the manipulator arm coordinate system for the user to grasp the object, and acquiring an object grasping path.

The three-dimensional coordinates under the manipulator coordinate system for grabbing the object according to the user needs are subjected to path planning to obtain the object grabbing path, and the method specifically comprises the following steps: and performing path planning according to the three-dimensional coordinates of the manipulator arm coordinate system for the user to grasp the object by adopting an improved RRT algorithm to obtain an object grasping path.

Compared with the prior art, the invention has the beneficial effects that:

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In summary, this summary should not be construed to limit the present invention.

Claims

1. An object grasping platform based on machine vision, said grasping platform comprising:

2. The machine-vision based object capture platform of claim 1, wherein the vision system comprises a depth camera and a bracket, the depth camera comprising a centrally located RGB camera and infrared cameras evenly distributed around the RGB camera.

3. The machine-vision based object grasping platform according to claim 1, wherein the robotic arm system includes a robotic arm base, a robotic arm, and an end grip;

4. The machine-vision based object grasping platform according to claim 1, wherein the target detection module includes an object classification sub-module and a coordinate determination sub-module;

5. The machine-vision based object capture platform of claim 1, wherein the coordinate transformation and storage module comprises:

6. The machine-vision based object grasping platform according to claim 1, wherein the path planning module comprises:

7. The machine-vision based object grasping platform according to claim 1,

the target detection module, the coordinate conversion and storage module and the semantic recognition module are all integrated in an ROS system.

8. An object grabbing method based on machine vision, which is applied to the grabbing platform of any one of claims 1-7, and comprises the following steps:

acquiring grabbing demand voice information sent by a user;

9. The object grabbing method based on machine vision according to claim 8, wherein the classifying the RGB information of each object and determining the name of each object specifically comprises:

10. The method for grabbing an object based on machine vision according to claim 8, wherein the converting the three-dimensional coordinates in the camera coordinate system of each object into the three-dimensional coordinates in the robot arm coordinate system specifically comprises: