HomeROSClaw HubData FlywheelDocs
Back to Hub

Foundation Models

The Cognitive Brain.

Mount the world's most powerful Vision-Language-Action (VLA) models. Plug-and-play intelligence optimized for robotic inference.

π0

Physical Intelligence Zero

A general-purpose robotic foundation model trained on diverse physical tasks.

Min VRAM: 24GB
Latency: ~120ms on RTX 4090
Observations: RGB, Depth, Proprioception
Diffusion PolicyTransformer
$ rosclaw mount model π0

RT-2

Robotic Transformer 2

Google DeepMind's vision-language-action model for general robotic control.

Min VRAM: 16GB
Latency: ~85ms on RTX 4090
Observations: RGB, Language Instructions
PaLM-EEnd-to-End
$ rosclaw mount model rt-2

ACT

Action Chunking with Transformers

Efficient imitation learning with action chunking for smooth robot motions.

Min VRAM: 8GB
Latency: ~45ms on RTX 4090
Observations: RGB, Joint States
Behavior CloningEfficient
$ rosclaw mount model act

Diffusion Policy

Diffusion Policy for Visuomotor Learning

State-of-the-art visuomotor policy using diffusion models for multi-modal action distributions.

Min VRAM: 12GB
Latency: ~95ms on RTX 4090
Observations: RGB, Depth, Force
DiffusionMulti-modal
$ rosclaw mount model diffusion policy

More models coming soon. Submit yours via GitHub