ROS2 perception stack for generalist robotics.
- track_anything - EdgeTAM tracking + 3D segmentation with RGBD
- vector_perception_utils - Image and point cloud utilities
# Create venv
python3 -m venv ~/vector_venv
source ~/vector_venv/bin/activate
# Install dependencies
cd /home/alex-lin/dev/vector_perception_ros
pip install -r requirements.txt
# Build
source /opt/ros/jazzy/setup.bash
colcon build
## Usage
```bash
# Every terminal session, activate environment:
source ~/vector_venv/bin/activate
source /opt/ros/jazzy/setup.bash
source /home/alex-lin/dev/vector_perception_ros/install/setup.bash
# Test EdgeTAM with webcam
python -m track_anything.test_edge_tam
# Run 3D tracking
ros2 launch track_anything track_3d.launch.pyTracks objects in 3D using EdgeTAM + RGBD cameras.
Launch:
# RealSense D435i (default)
ros2 launch track_anything track_3d.launch.py
# ZED Mini/2i
ros2 launch track_anything track_3d.launch.py \
depth_scale:=1.0 \
color_topic:=/zed/zed_node/rgb/color/rect/image \
depth_topic:=/zed/zed_node/depth/depth_registered \
camera_info_topic:=/zed/zed_node/rgb/color/rect/camera_infoPublishes:
/track_3d/detections_2d- 2D detections with masks/track_3d/detections_3d- 3D detections with bboxes/track_3d/objects_pointcloud- RGB point cloud of tracked objects/track_3d/tracked_overlay- Visualization image with masks and bboxes/track_3d/is_tracking- Bool, true when actively tracking objects
from track_anything.edge_tam import EdgeTAMProcessor
from vector_perception_utils.image_utils import draw_bbox, apply_mask_overlay
from vector_perception_utils.pointcloud_utils import rgbd_to_pointcloud, pointcloud_to_bbox3d
# Track single or multiple objects
processor = EdgeTAMProcessor()
detections = processor.init_track(image, bboxes=[(100, 100, 300, 300)])
# Or track multiple objects
detections = processor.init_track(
image,
bboxes=[(100, 100, 300, 300), (400, 100, 600, 300)]
)
detections = processor.process_image(next_frame)
# Convert to 3D
for det in detections:
points, colors = rgbd_to_pointcloud(
depth_image, rgb_image, intrinsics,
depth_scale=1000.0, # RealSense
mask=det['mask']
)
bbox_3d = pointcloud_to_bbox3d(points)| Camera | depth_scale | Encoding | Min Depth |
|---|---|---|---|
| RealSense D435i | 1000.0 | uint16 (mm) | 0.3m |
| ZED Mini | 1.0 | float32 (m) | 0.2m |
| ZED 2i | 1.0 | float32 (m) | 0.2m |
ModuleNotFoundError: Activate venv: source ~/vector_venv/bin/activate
No camera info: Check camera is running: ros2 topic list | grep camera_info
Performance: EdgeTAM needs GPU. Check: nvidia-smi
See package READMEs: