Which approach are you using for object selection and detection within the PICK-PLACE project?
Due to the large number of references that the system needs to be able to cope with, we are using a deep learning based approach for object identification, segmentation and grasping point selection.
There are different steps involved in order to generate a deep learning model. First, a dataset with images of different objects needs to be generated. Scenes with different number of objects in different positions and poses are generated. Then, all the pictures need to be labeled.
We are using an open source image labeling tool in order to tag all the scenes. Two different approaches of labelling are performed. One is focused on grasping point area identification and the other one is focused on object identification and mask level segmentation. We are also working with data augmentation techniques in order to increase the number of pictures of the dataset. For that purpose, we change synthetically some features of the image. We rotate some images, we add some blur effect, we crop the pictures.
Our deep learning approach is based on transfer learning. We are re-using a previously trained model’s weights, and we are using it as the starting point of our training process. When the training process is finished, we evaluate the model and we train it again with some hyper parameter’s changes. We compare different training results accuracy. When we select the best model, it is ready to work for inference.
What’s the current status and how does your research accomplish progress beyond the state of the art?
We have developed the first version of the object identification, segmentation and grasping point selection deep learning model. On one hand, the grasping point selection algorithm is a generic algorithm that can choose the correct grasping point of any object (gripper or suction). It works with known and unknown objects, relying on the color and shape information. On the other hand, the identification and segmentation algorithms are only working with objects that are inside our dataset.
In the second version of the model, it will be able to segment any unknown object, but the identification will be possible only with previously trained objects.
What results have you achieved (speed, efficiency, etc.) and how can they be useful in real-world applications?
We used 70% of the dataset to train the model, 15% to change different hyper parameters values and the remaining 15% to validate the deep learning model. We have achieved an accuracy of 87% when we are using suction and accuracy of 72% when we are using the griper.
Concerning the inference speed, each inference to the model takes 10-50ms, depending on which model is being used. Since in the implemented approach multiple inferences are needed, for example to determine the gripper’s vertical angle, the whole process takes 1s as much.
What’s the hardware and software set-up you’re using? Can the system be used with a different set-up?
Our testing setup is composed of two Intel RealSense D435 camera ( a commercial RGB-D camera that it is easy to integrate with ROS) and an Universal Robot UR 10, but our solution can work with any robot that supports ROS. Object identification and segmentation algorithms can work on different scenarios and set-ups.