Foundation Models

The Cognitive Brain.

Mount the world's most powerful Vision-Language-Action (VLA) models. Plug-and-play intelligence optimized for robotic inference.

Physical Intelligence Zero

A general-purpose robotic foundation model trained on diverse physical tasks.

Min VRAM: 24GB

Latency: ~120ms on RTX 4090

Observations: RGB, Depth, Proprioception

Diffusion PolicyTransformer

$ rosclaw mount model π0

Robotic Transformer 2

Google DeepMind's vision-language-action model for general robotic control.

Min VRAM: 16GB

Latency: ~85ms on RTX 4090

Observations: RGB, Language Instructions

PaLM-EEnd-to-End

$ rosclaw mount model rt-2

Action Chunking with Transformers

Efficient imitation learning with action chunking for smooth robot motions.

Min VRAM: 8GB

Latency: ~45ms on RTX 4090

Observations: RGB, Joint States

Behavior CloningEfficient

$ rosclaw mount model act

Diffusion Policy for Visuomotor Learning

State-of-the-art visuomotor policy using diffusion models for multi-modal action distributions.

Min VRAM: 12GB

Latency: ~95ms on RTX 4090

Observations: RGB, Depth, Force

DiffusionMulti-modal

$ rosclaw mount model diffusion policy

More models coming soon. Submit yours via GitHub