Robot SDKs, ROS 2, and Simulation Environments

A 2026 robot platform — humanoid or quadruped — ships with three software stories you have to deal with before any of your own code runs. The manufacturer SDK that talks to the joints. The ROS 2 wrapper that talks to everything else. And the simulator you use because flying a $43k Unitree G1 EDU into a wall is an expensive way to debug a typo.

These three layers are not interchangeable. The SDK is real-time and proprietary. ROS 2 is the lingua franca that lets the rest of the world's robotics code work with your platform. The simulator is where you do the work you do not want to do on the real robot — reinforcement learning, large-scale data generation, scene design, headless CI. Most teams underspend on all three.

This article walks through what each layer actually provides in 2026, where the boundaries are, and which combinations are worth standing up on a real robotics-lab budget. The reference platforms are the ones in R01 and R02. The reference compute is the K-AI line (4× or 8× RTX 5090 or RTX Pro 6000 Blackwell, EPYC or Xeon host).

The SDK layer — what manufacturers actually give you

Every credible robot vendor ships a Software Development Kit. The name is the same; the contents are not. Here is what is actually in the box.

Vendor SDK Language bindings Joint control Sensor streaming Locomotion API ROS 2 wrapper
Unitree SDK2 (G1/H1/B2/Go2) C++ / Python (DDS-based) Yes, real-time Yes (IMU, joint, camera) Walk, stand, sit, gait params Official unitree_ros2 (Humble + Jazzy)
Booster Robotics SDK C++ / Python Yes Yes Walk, balance, scripted moves Open-source, ROS 2 native
EngineAI SDK (PM01 / SE01) C++ / Python Yes Yes Walk, balance Community + vendor-supplied URDF
Boston Dynamics Spot SDK Python (gRPC), some C++ High-level only (no joint torque) Yes (proto streams) Walk, stand, sit, navigate, Spot Arm spot_ros2 (community + BD-blessed)
ANYbotics ANYmal API C++ / Python (gRPC) High-level (locomotion primitives) Yes Walk, climb, dock Internal; Gazebo + URDF for research tier
DeepRobotics SDK (X30/Lite3) C++ / Python Yes Yes Walk, climb Community + vendor URDF

Two splits matter more than the rest.

Open vs gated. Unitree, Booster, EngineAI, and DeepRobotics give you joint-level access. You can issue torque commands at 500 Hz to 1 kHz directly to the actuators. That is what you need for RL policy deployment, custom locomotion research, and anything dynamic. Boston Dynamics and ANYbotics keep joint control closed — you get high-level primitives (walk to here, climb that, manipulate this), the SDK exposes those as RPC calls, and the company's own controller does the joint work. That is a deliberate choice. The closed stack is more reliable and the autonomy software is what you are paying $74k–$200k for. It is also the wrong stack if your research is in low-level control.

Real-time vs not. Unitree SDK2 uses Cyclone DDS over a dedicated network interface and gives you sub-millisecond round-trip on the joint loop. Booster's stack is similar. The Spot SDK is gRPC over HTTP/2 — fine for commanding a behavior, useless for closing a control loop at 1 kHz. If your work needs a sub-2-ms loop, the architecture matters more than the language.

A note on the Python question. Every vendor offers Python bindings. None of them are the language the control loop runs in. The pattern across the industry is identical: C++ for the real-time controller, Python for everything above it. Vendor SDKs reflect that: the C++ bindings expose more, run faster, and are what the vendor's own examples are written in. Python bindings are first-class for high-level work and a wrapper for the low-level path. Pick C++ when you are writing the inner control loop and Python for orchestration, perception glue, and integration with the inference server.

ROS 2 — why everyone normalizes to it

The native SDK gets you onto the robot. ROS 2 gets you into the ecosystem.

There are three reasons every serious integration ends up speaking ROS 2 even when the manufacturer SDK is sufficient:

  1. Interoperability. Your perception stack, your SLAM, your navigation, your manipulation planner, your teleop — all of these have ROS 2 implementations maintained by hundreds of contributors. None of them speak the Unitree DDS topic format natively. Wrap the SDK once into ROS 2 messages and the entire ecosystem opens up.
  2. Multi-vendor fleets. The day you add a second robot from a different vendor, the native-SDK approach collapses. ROS 2 is the only abstraction that lets a Spot, an ANYmal, and a G1 share the same map server, the same task graph, and the same human-machine interface.
  3. Hiring. People know ROS 2. They do not know Unitree DDS topic naming conventions. A team that builds on ROS 2 can hire from a global pool. A team that builds on a proprietary stack cannot.

ROS 2 distro state, May 2026:

Distro Released EOL Status in 2026
Humble Hawksbill May 2022 May 2027 The dominant LTS in production. Most vendor wrappers target this.
Iron Irwini May 2023 Nov 2024 (EOL) Skip. Non-LTS, already retired.
Jazzy Jalisco May 2024 May 2029 Current LTS for new projects. Migration in progress across the ecosystem.
Kilted Kaiju May 2025 Nov 2026 Non-LTS bridge. Used for early adopters.
Lyrical Luth May 2026 May 2031 The new LTS, just released. Wait six months before betting production on it.

The honest read. Production deployments today still run Humble. Humble is the LTS the vendors target — Unitree's unitree_ros2, Booster's stack, the community spot_ros2, and ANYbotics' grid_map all have stable Humble branches. Jazzy is where new builds should start: it has another three years of LTS life and the migration tooling is mature. Lyrical Luth is too new to commit to — give it until late 2026 before standing up a production system on it.

If you are starting a project in May 2026, go Jazzy. If you are extending one started in 2023, stay on Humble until the next robot purchase forces the migration. Skip Iron and Kilted entirely; non-LTS releases are for people who want to track Gazebo Harmonic upgrades closely and accept the maintenance tax.

The control-loop split

This is the architectural call that catches most teams off guard.

Real-time control loop (500–1000 Hz)
  • C++ on the robot's RT controller
  • Joint torque commands
  • Balance, gait, safety reflexes
  • IMU + encoder fusion
  • Manufacturer SDK only
DDS, ~10–50 Hz state
High-level orchestration (10–100 Hz)
  • Python (or C++) in ROS 2
  • Perception, SLAM, planning
  • Command router
  • gRPC client to inference server
  • All vendor-neutral logic

The control-loop split: real-time C++ SDK layer below, ROS 2 Python orchestration above, connected at 10–50 Hz via DDS state messages.

The control loop stays on the robot in C++. The high-level layer runs in ROS 2 nodes, either on the robot's application processor (Jetson Orin AGX) or, for heavier work, on the off-board inference server (see I01). The two layers talk at 10–50 Hz over DDS. They do not share a process.

This split is not optional. A Python ROS 2 node cannot close a 1 kHz joint loop — the garbage collector alone makes that impossible. Trying to is the most common mistake in first-year humanoid work. The vendor SDK exists precisely so you do not have to.

When you need a simulator (and when you don't)

A simulator is mandatory for four kinds of work:

  • Reinforcement learning. You cannot train a locomotion policy from scratch on a real robot. The robot would break, the trainer would quit, the cost would be obscene. RL happens in sim — millions of episodes, thousands of environments in parallel.
  • Large-scale data generation. Synthetic data for VLM fine-tuning, scene understanding, manipulation grasps. The bleeding-edge teams (R09) generate hundreds of thousands of labelled scenes overnight.
  • Headless CI. You want every git push to verify that the robot still walks. Doing that on hardware is impractical. A nightly sim run that exercises the locomotion and perception stack is.
  • Environment design. Laying out a warehouse, a factory cell, a lab — knowing whether the robot can navigate it before you build it.

You do not need a simulator for:

  • Early prototyping. If you are writing a "make the robot wave" demo, a simulator is overhead. Use the real robot.
  • Simple manipulation. Picking up a box, opening a drawer. The sim-to-real gap on these tasks is often larger than the time you would have spent doing it on the real robot.
  • Teleop development. The whole point of teleop is the human in the loop. The sim does not help.
  • One-off behaviors. Anything you will run twice and throw away.

The mistake teams make is treating the simulator as the default. It is not. It is a tool for the four cases above, plus the obvious safety case of "I do not want to crash the real robot."

The 2026 simulator lineup

There are four simulators that matter for legged robotics in 2026. They are not equivalent and the choice is opinionated.

Simulator Physics Rendering Parallel envs (one 5090) Best for
NVIDIA Isaac Sim / Isaac Lab PhysX 5 / Omniverse Photorealistic (RTX) 4,096–8,192 Photoreal data gen, large-scale RL, sim-to-real
MuJoCo / MJX MuJoCo (rigid-body, contact-rich) Basic raster 4,000–16,000+ Locomotion RL, dexterous manipulation, deterministic
Gazebo Harmonic DART / Bullet / ODE OGRE 2 1–8 ROS 2-native development, integration tests
Genesis Multi-physics (rigid + soft + fluid) Path-traced / raster 10,000+ Academic-style high-throughput RL, mixed-physics tasks

The numbers in the parallel-envs column are practical, not theoretical. They depend on the model complexity. A Franka arm runs at higher parallel counts than a 41-DOF humanoid with dex hands.

Isaac Sim and Isaac Lab — NVIDIA's bet

Isaac Sim is the photorealistic, GPU-accelerated simulator built on the Omniverse stack. Isaac Lab is the robot-learning framework that sits on top of it. Together they are NVIDIA's answer to every robotics simulation question.

What it does well:

  • Photorealistic rendering. Ray-traced lighting, accurate materials, real-world camera models. If your perception stack needs to learn what the world actually looks like, this is the only simulator that delivers it at scale.
  • GPU-accelerated physics. Parallel environments live in GPU memory. A single RTX 5090 runs thousands of humanoid instances at hundreds of physics steps per second.
  • Isaac Lab. A clean RL framework with built-in support for RSL RL, RL-Games, SKRL, and Stable Baselines3. The 16+ pre-built robot models include G1, H1, Spot, ANYmal, Franka, and a growing list of newcomers.
  • GR00T integration. NVIDIA's humanoid foundation-model stack lives here. If you want to train a vision-language-action policy and deploy it cross-platform, this is the path.

What it does badly:

  • Omniverse setup pain. The Omniverse runtime is opinionated, heavy, and not friendly to a default Ubuntu install. Expect a day of fighting the launcher, the Nucleus server, and the asset cache before your first simulation runs.
  • GPU-bound. Isaac Sim does not run on a CPU. It does not run well on a single 4090. The reference setup is a 5090 minimum, a Pro 6000 Blackwell preferred, and an EPYC host to feed the data.
  • Opinionated. Asset formats are USD. Physics is PhysX. Rendering is RTX. If you want a different physics engine you are in the wrong simulator.

Compute reality. A K-AI server with 4× RTX 5090 runs Isaac Lab humanoid training comfortably — 4,000–8,000 parallel environments, full photoreal at 30–60 FPS render, policy convergence on a locomotion task in 6–24 hours. With 8× 5090 you scale wider or run multiple experiments in parallel. The Pro 6000 Blackwell is the right call when memory is the binding constraint (large batch sizes, larger humanoid asset libraries).

MuJoCo and MJX — the physics-first answer

MuJoCo is DeepMind's physics simulator. MJX is the XLA / JAX rewrite that runs MuJoCo on the GPU with full batching. MJWarp is a newer NVIDIA-Google joint effort that writes MuJoCo in Warp and scales better for contact-rich scenes.

What it does well:

  • Speed. Pure contact-rich rigid-body physics, written for the GPU from the ground up. On a single RTX 5090, MJX runs thousands of humanoid environments at physics-steps-per-second that Isaac Sim cannot match for the same scene complexity.
  • Determinism. Same seed, same trajectory. Reproducible RL is a feature.
  • DeepMind backing. Every major DeepMind robotics paper for the last five years uses MuJoCo. The model libraries (MuJoCo Menagerie) include G1, H1, Spot, ANYmal, the Franka arm, and the Shadow Hand.
  • Simpler. No Omniverse, no USD, no Nucleus. Pip install, run.

What it does badly:

  • Visual rendering. MuJoCo's renderer is OpenGL-era. Adequate for visualisation, useless for photorealistic data gen.
  • Asset ecosystem. Smaller than Isaac. URDF imports work but the polish is lower.
  • Sensor models. Camera, depth, and LiDAR models are basic. If your policy depends on realistic sensor noise, you will be building those models yourself.

Compute reality. A single RTX 5090 runs 4,000–16,000 parallel MJX humanoid environments depending on contact complexity. A 4× 5090 K-AI server can run 50,000+ humanoid environments in parallel for locomotion RL. This is the math that has DeepMind and most academic labs on MuJoCo for the inner loop of policy training.

Gazebo Harmonic — the ROS 2 native

Gazebo is the open-source simulator that has been the default for ROS for two decades. Gazebo Harmonic is the current LTS and pairs cleanly with ROS 2 Humble and Jazzy.

What it does well:

  • ROS 2 integration. First-class. The ros_gz bridge means your ROS 2 nodes work in sim with no changes.
  • Open and free. No license, no Omniverse, no NVIDIA dependency. Runs on CPU, on AMD GPUs, on anything.
  • Right for integration testing. When your goal is "verify the nav stack and the planner still work after a refactor," Gazebo is the right call.

What it does badly:

  • Slow. Single-threaded physics, software rendering by default. Parallel environments mean spinning up multiple Gazebo processes. You do not train RL policies in Gazebo; you smoke-test them.
  • Not GPU-friendly. No GPU batching. Adding rendering acceleration helps but does not change the fundamental scaling.
  • Sim-to-real gap is worse. The contact model is less rigorous than MuJoCo, the sensor models are less realistic than Isaac.

Compute reality. Gazebo runs on whatever you have. The K-AI server is overkill for it. A laptop is fine. The trade-off is that for any non-trivial RL workload you will outgrow Gazebo within days.

Genesis — the academic high-throughput contender

Genesis is the multi-institution simulator that landed late 2024 with bold performance claims and has since proven most of them. CMU, Stanford, MIT CSAIL, NVIDIA, and Tsinghua all contributed.

What it does well:

  • Throughput. Genesis hits over 40 million FPS on a single RTX 4090 for a Franka inverse-kinematics workload. For humanoids the number is lower but still in the millions. The architecture is built around mass parallel simulation from the start.
  • Multi-physics. Rigid, fluid, soft body, granular, MPM. If your task involves something other than rigid contact (deformable manipulation, fluid pouring, granular interaction) Genesis is the only simulator in this list that handles it natively.
  • Generative scene tooling. Genesis ships with prompt-driven scene generation. You describe an environment in text, you get a scene.

What it does badly:

  • Newer ecosystem. Fewer pre-built models, fewer tutorials, smaller community than MuJoCo or Isaac.
  • Less battle-tested for sim-to-real. The physics is fast; whether it transfers as well as MuJoCo on hardware is still being established by the community.
  • Documentation gaps. Common with rapidly-evolving research projects.

Where it fits in 2026. Genesis is the right call for academic-style RL where you want maximum environment throughput and you are willing to write more of your own glue. For a production sim-to-real pipeline today, MuJoCo or Isaac is the safer call. Watch this space.

Sim-to-real reality, 2026 edition

The practical state of sim-to-real, May 2026:

  • Domain randomization is mandatory. You cannot train on a single set of physics parameters and expect transfer. Mass, friction, motor delays, sensor noise, latency — all of these get randomized over a range during training.
  • Action delay matters more than people admit. A real robot's actuators have 5–25 ms of delay from command to torque. Sim that runs without this delay produces policies that oscillate on hardware. Bake delay into the sim from day one.
  • Sensor noise must be injected. IMU drift, camera noise, depth-sensor dropouts. The 2026 papers that work include all of these in training.
  • Multi-simulator training is the bleeding edge. PolySim and similar frameworks train across MuJoCo + Isaac + sometimes Gazebo simultaneously. Early results are good. The compute cost is 2–3× single-sim training.
  • Heavy-loaded humanoid motion is still hard. Carrying a payload, recovering from a push while carrying something — these still degrade 50–80% from sim to real on the platforms in R01.

Plan for two-thirds of your real-world performance to come from sim. Plan for the last third to be hardware fine-tuning, system identification, and stubborn debugging.

Concrete recipe: G1 + ROS 2 Humble + Isaac Lab on a K-AI server

Hardware:
  - Unitree G1 EDU (Jetson Orin AGX, 23–43 DOF)
  - K-AI 256 Turin Dual / 4× RTX 5090 / 1× RTX Pro 6000 Blackwell
    (the Pro 6000 for sim, the 5090s for batched policy training)
  - Wi-Fi 6E AP, line of sight to working area
  - 10 GbE switch, wired link from K-AI to AP

Software on the K-AI server:
  - Ubuntu 22.04, CUDA 13, Docker
  - Isaac Sim 5.x + Isaac Lab (Omniverse runtime, Nucleus local)
  - ROS 2 Humble (or Jazzy, if starting fresh today)
  - PyTorch 2.x, JAX, RSL RL
  - vLLM serving a VLM for high-level perception (see I02)
  - MuJoCo + MJX as the second sim, for cross-sim validation

Software on the G1:
  - Unitree SDK2 (C++ joint-control loop)
  - unitree_ros2 (Humble) on the Jetson
  - Custom ROS 2 nodes: command_router, perception_relay
  - gRPC client to vLLM on the K-AI server

Workflow:
  1. Train a locomotion or whole-body policy in Isaac Lab on the
     Pro 6000 (4,000+ parallel G1 instances, RSL RL).
  2. Validate the policy in MJX (4× 5090 batched) with different
     domain randomization seeds.
  3. Deploy the policy via TorchScript to the G1's Jetson, wrapped
     in a ROS 2 node that the Unitree SDK2 control loop calls.
  4. ROS 2 high-level commands flow from the lab's task graph
     through the command_router into the policy.
  5. Heavy perception (VLM) lives on the K-AI server, reached over
     Wi-Fi 6E + gRPC.

The total stand-up time for a team with prior ROS 2 experience is two to three weeks. The total stand-up time for a team learning ROS 2 from scratch is two to three months. Plan accordingly. The compute side of the K-AI server is the easier piece.

The honest take

Three opinions, plainly:

  1. ROS 2 is the right abstraction layer. Native SDKs are necessary but not sufficient. Build on the ROS 2 wrapper, treat the manufacturer SDK as a service the wrapper consumes. Going proprietary buys you nothing and costs you the rest of the ecosystem.
  2. Isaac Sim is the future, but the present is MuJoCo + ROS 2 for most teams. If your work is RL on locomotion and dexterous manipulation, MuJoCo / MJX is the cheaper, faster, easier-to-deploy choice. If your work is photorealistic perception, large-scale data generation, or VLA-style foundation-model fine-tuning, Isaac is the only call. Most labs need both, and the K-AI server has the headroom to run both.
  3. The bleeding-edge labs use all four. Isaac for the rendering and the foundation-model pipeline. MuJoCo for the RL inner loop. Gazebo for the ROS 2 integration tests. Genesis for the multi-physics work. Pretending you will only need one is how teams end up rewriting their stack a year in.

What to do next — decision tree

Question 1: Are you doing RL, or are you doing classical control + perception?

  • RL → you need a simulator. Continue to question 2.
  • Classical only → you can skip the simulator-heavy work. Build on ROS 2 + the manufacturer SDK, use Gazebo for integration tests, and spend your sim budget on hardware.

Question 2: Is photoreal rendering load-bearing for your work?

  • Yes (VLA training, synthetic data for VLMs, visual servoing on textures) → Isaac Sim / Isaac Lab on a 5090 or Pro 6000 Blackwell.
  • No (locomotion RL, contact-rich manipulation, low-level control policies) → MuJoCo / MJX. Cheaper, faster, and the model library covers every legged platform in this article.

Question 3: Do you need multi-physics (deformables, fluids, granular)?

  • Yes → Genesis. Accept the rougher ecosystem in exchange for the only simulator that handles all of it.
  • No → stick with Isaac or MuJoCo.

Question 4: What is your compute reality?

  • Single workstation, one or two GPUs → MuJoCo / MJX. Will use what you have. Isaac Sim will technically run but it will be cramped.
  • 4× to 8× GPU server (K-AI tier) → either Isaac or MuJoCo at full scale, or both in parallel. This is the right compute for the work; see I01 for the build.
  • No dedicated server yet → buy one before you buy the second robot. The compute is what makes the robot a research platform instead of a walking demo.

Question 5: What is your team's ROS 2 baseline?

  • Strong → start on Jazzy. Five years of LTS support, current tooling, modern Gazebo Harmonic integration.
  • Weak → start on Humble. Most tutorials, most vendor wrappers, most community help. Migrate to Jazzy when the next robot or major refactor forces it.

The robots are real, the simulators are real, the work is real. The "we'll just deploy from sim" promise is not. Plan for the gap.


This is part of the Kentino Wiki, a reference series on AI compute, robotics, and the systems that connect them. Comments and corrections welcome at info@kentino.com.

返回博客