RPN-based architecture for object detection and pose estimation using RGB-D data