Forward Kinematics
Computes end-effector pose from joint angles. Uses Denavit-Hartenberg (DH) parameters: a series of 4×4 homogeneous transforms T₀ₙ = T₀₁ · T₁₂ · … · Tₙ₋₁ₙ.
Inverse Kinematics
Finds joint configuration for desired end-effector pose. Analytically closed-form or numerically solved via iterative Jacobian methods. Multiple solutions exist for redundant robots (DOF > 6).
Denavit-Hartenberg Params
4 parameters per joint: a (link length), α (link twist), d (link offset), θ (joint angle). Reduces kinematic chain to systematic 4×4 matrix products.
Workspace Analysis
Reachable workspace: all poses reachable with at least one joint config. Dexterous workspace: poses reachable with all orientations. Determined by link lengths, joint limits, and DOF.
Screw Theory (Lie Groups)
Modern alternative to DH. Uses SE(3) Lie group / Lie algebra. Product of Exponentials (PoE) formula: T = e^[S₁]θ₁ · e^[S₂]θ₂ · … · M gives cleaner, singularity-free representation.
Degrees of Freedom
Grübler's formula: M = 6(n-1) - Σ(6-fᵢ) for spatial mechanisms. 3R planar arm = 3 DOF. Most industrial arms = 6 DOF. Redundant manipulators ≥ 7 DOF (e.g. KUKA iiwa).
Cyclic Coordinate Descent (CCD)
Iteratively rotates each joint to minimize end-effector error. Fast for real-time applications. May get stuck in local minima. Used in game character animation and lightweight robotics.
FABRIK Algorithm
Forward And Backward Reaching Inverse Kinematics. Operates directly on joint positions, not angles. Very fast convergence, handles constraints well. Popular in real-time character rigs.
Jacobian Transpose
Δq = αJᵀΔx. No matrix inversion needed. Slower convergence than pseudoinverse but computationally cheaper. Avoids singularity issues inherent to pseudoinverse methods.
Damped Least Squares
q̇ = Jᵀ(JJᵀ + λ²I)⁻¹ẋ. Adds damping factor λ to avoid singularities. Also called Levenberg-Marquardt. Standard in industrial robots near workspace boundaries.
Geometric Jacobian
Maps joint velocities q̇ to end-effector velocities ẋ = J(q)q̇. 6×n matrix (3 linear + 3 angular velocity rows). Rank deficiency indicates a kinematic singularity.
Singularities
Configurations where rank(J) < 6 — robot loses ability to move in certain directions. Types: wrist singularity (collinear last 3 axes), elbow singularity (extended/retracted), shoulder singularity.
Manipulability
w = √det(JJᵀ) measures how far from singularity. Yoshikawa manipulability ellipsoid: axes = √σᵢ of J. Used for optimizing redundant motion to maintain dexterity.
Null Space Motion
For redundant robots: q̇ = J⁺ẋ + (I - J⁺J)q̇₀. The null space projector (I - J⁺J) allows joint motion without end-effector motion — used for obstacle avoidance, joint limit avoidance.
Newton-Euler Equations
Recursive algorithm: O(n) complexity. Forward pass: propagate velocities/accelerations outward. Backward pass: propagate forces/torques inward. Most efficient for real-time control.
Lagrangian Dynamics
τ = M(q)q̈ + C(q,q̇)q̇ + g(q). Mass matrix M, Coriolis/centrifugal C, gravity g. Elegant for analysis and model-based control. Computationally O(n³) but symbolic simplification possible.
Inertia Tensor
I ∈ ℝ³ˣ³ describes mass distribution. Diagonal elements: moments of inertia. Off-diagonal: products of inertia. Diagonalized via principal axes transformation. Key for accurate torque prediction.
Trajectory Dynamics
Operational space dynamics: Λ(x)ẍ + μ(x,ẋ) + p(x) = F. Decouples task-space behavior. Khatib's OSF formulation enables force-controlled tasks in Cartesian space.
LiDAR — Light Detection and Ranging
Emits laser pulses and measures Time-of-Flight (ToF). Spinning mechanical LiDARs (Velodyne HDL-64E): 360° coverage, ~10Hz, up to 120m range. Solid-state LiDARs (Livox, Ouster): no moving parts, cheaper, limited FoV. Returns dense 3D point clouds (x,y,z,intensity). Key for SLAM and obstacle detection in autonomous vehicles.
RGB-D Camera — Color + Depth
Combines RGB image with per-pixel depth. Methods: Structured Light (Intel RealSense D435 — projects IR pattern), Active Stereo (Luxonis OAK-D), Indirect ToF (Microsoft Azure Kinect). Depth range: 0.1–10m. Used for 3D object detection, manipulation, surface reconstruction, and volumetric mapping (TSDF).
IMU — Inertial Measurement Unit
Combines 3-axis accelerometer + 3-axis gyroscope (+ optional magnetometer = 9-DOF). Accelerometer: measures proper acceleration (gravity + linear). Gyroscope: angular velocity. Integration gives pose but drifts — fused with vision/LiDAR via EKF/UKF. MEMS IMUs (Bosch BMI088): small, cheap, ~1kHz. Tactical-grade IMUs (Xsens MTi): fiber-optic gyros, sub-arcsecond drift.
Force/Torque Sensor
6-axis F/T sensors (ATI Mini45, OnRobot HEX) measure Fx,Fy,Fz,Tx,Ty,Tz. Based on strain gauges or piezoelectric elements. Resolution: ~0.01N / 0.01Nm. Used in impedance/admittance control, contact detection, assembly tasks. Wrist-mounted or embedded in joints.
Tactile Skin Arrays
Distributed pressure sensing over robot surfaces. Technologies: capacitive (GelSight), piezoresistive, barometric (BioTac), optical (DIGIT sensor). BioTac: 19 electrodes + pressure + temperature. DIGIT: high-res camera + gel for tactile image rendering. Enables slip detection, texture classification, fine manipulation.
Sonar / Ultrasonic
Emits 40kHz sound pulses, measures echo ToF. Range: 2cm–4m. Wide beam angle (15–30°). Low cost, works in fog/dust. Used in robot bumpers, underwater ROVs (acoustic sonar), parking sensors. Not suitable for fast-moving objects or highly reflective materials.
Joint Encoders (Proprioception)
Optical encoders: disc with slits, counts pulses → angular position. Incremental (relative) vs absolute. Resolution: 4096 to 1M CPR. Magnetic encoders (AS5048): Hall effect, robust to vibration. Used in every servo actuator for position/velocity feedback. Foundation of all robot control loops.
mmWave Radar
77GHz FMCW radar: measures range, velocity (Doppler), angle. Works in rain, dust, fog, darkness. Point cloud sparser than LiDAR. Texas Instruments AWR1843: 4Tx × 3Rx MIMO array. Used for velocity estimation, through-wall detection, people tracking in service robots.
RRT — Rapidly-exploring Random Tree
Incrementally builds a tree by randomly sampling configuration space. O(n log n) typical. RRT-Connect: bi-directional. RRT*: asymptotically optimal. Highly effective in high-dimensional spaces (6+ DOF). Does not require explicit obstacle model.
PRM — Probabilistic Roadmap
Two-phase: learning (build roadmap by random sampling + local planner), query (search roadmap with A*). Multi-query: roadmap reusable. Lazy PRM: defer collision checks. Best for static environments needing many queries.
A* Algorithm
f(n) = g(n) + h(n). Optimal if heuristic h is admissible (never overestimates). Variants: D* (dynamic replanning), Theta* (any-angle), Hybrid A* (non-holonomic vehicles). Operates on discretized grid or graph.
Artificial Potential Fields
Goal creates attractive field, obstacles create repulsive fields. Robot follows gradient descent. Simple, real-time. Major issue: local minima (robot gets stuck). Mitigated by random walks or combined with global planner.
Trajectory Optimization
CHOMP, STOMP, TrajOpt. Minimize cost functional: C[ξ] = ∫ obstacle_cost + smoothness dt. Gradient-based or stochastic optimization. Supports constraint satisfaction (collision, dynamics, joint limits) as penalty terms.
SLAM — Simultaneous Localization and Mapping
Builds map while tracking pose simultaneously. Graph-SLAM (offline): g2o, GTSAM pose graph optimization. Online: EKF-SLAM, FastSLAM (particle filter). Visual SLAM: ORB-SLAM3, LIO-SAM (LiDAR-Inertial). SLAM is the core navigation capability of autonomous mobile robots.
PID Controller
τ = Kₚe + Kᵢ∫e dt + Kd ė. Proportional-Integral-Derivative. Kₚ: reduce error; Kᵢ: eliminate steady-state error; Kd: damp oscillations. Ziegler-Nichols tuning. Most widely deployed industrial controller.
Model Predictive Control (MPC)
Solves constrained optimization over prediction horizon at each timestep. Handles state/input constraints explicitly. Real-time MPC (OSQP solver): ~1kHz for legged robots. Used in Boston Dynamics Atlas, quadruped locomotion.
Impedance Control
Regulates relationship between force and motion: M_d ẍ + B_d ẋ + K_d x = F_ext. Target mass-spring-damper behavior. Enables safe human-robot interaction. Khatib 1987. Used in surgical robots, collaborative arms (iiwa, Franka).
Feedback Linearization
Cancels nonlinear robot dynamics through exact model inversion. τ = M(q)u + C(q,q̇)q̇ + g(q) where u is the new input. Transforms system into double integrator. Requires accurate dynamic model — robust to model error.
Whole-Body Control (WBC)
Hierarchical task-space QP controller. Solves: min ||J_task·q̈ - ẍ_task||² s.t. contact constraints, dynamics. Priority-based: higher priority tasks take precedence. Foundation of modern humanoid/quadruped control (Atlas, Spot, ANYmal).
Reinforcement Learning Control
Policy π(a|s) trained via PPO/SAC in simulation (Isaac Gym, MuJoCo). Sim-to-real transfer via domain randomization. ETH Zurich ANYmal: RL policy for rough terrain locomotion. Boston Dynamics: RL for parkour behaviors. Replaces hand-tuned controllers.
Embodied Cognition
Intelligence emerges from the interaction between brain, body, and environment — not computation alone. Rodney Brooks' Behavior-based AI (1986): intelligence without explicit world model. Physical grounding enables symbol meaning (Harnad's Symbol Grounding Problem).
Sense–Plan–Act
Traditional AI robot paradigm: perceive world → build symbolic model → plan actions → execute. Clean, analyzable but slow. Model inaccuracies compound. Replaced by reactive and learning-based systems for many tasks.
Imitation Learning (IL)
Robot learns from human demonstrations. Behavior Cloning (BC): supervised learning on (state, action) pairs. DAgger: corrects distribution shift. Dataset Aggregation. Works well for structured tasks. Limited generalization outside training distribution.
Reinforcement Learning (RL)
Agent learns policy π maximizing cumulative reward. PPO (Proximal Policy Optimization) and SAC (Soft Actor-Critic) dominate. Sparse reward: Hindsight Experience Replay (HER). Sparse reward Locomotion: curriculum training. Multi-task: MTRL, PEARL.
Language-Conditioned Policies
RT-2 (Google DeepMind): VLM fine-tuned directly as robot policy — takes image + text instruction, outputs robot action tokens. SayCan: LLM scores feasibility × affordance of actions. OpenVLA: open-source VLA with 7B parameters.
World Models in Robotics
Robot internally simulates future states. RSSM (Dreamer): latent dynamics model, plans in imagination. GR-1: generalist robot policy with video prediction. UniSim: neural simulator learns physical world from video data. Enables model-based RL with limited real-world data.
Diffusion Policy
Models robot action distribution as denoising diffusion process. Learns to reverse Gaussian noise over action sequences. Handles multi-modal action distributions. Chi et al. 2023. State-of-the-art on complex manipulation. DDPM training, DDIM fast inference.
ACT — Action Chunking with Transformers
Zhao et al. 2023 (Stanford). CVAE encoder-decoder with transformer. Predicts L-step action chunks to reduce compounding error. Temporal ensembling at inference. Trained on 50 bimanual demos per task. Achieves ~80% success on fine manipulation.
RT-2 (Robotic Transformer 2)
Google DeepMind 2023. Co-fine-tunes PaLI-X (55B) on robot data. Robot actions as language tokens. Emergent: generalized OOD reasoning, counting, semantic understanding transferred from web scale. 15× better on novel tasks vs RT-1.
π₀ (Pi Zero)
Physical Intelligence 2024. Flow Matching policy on top of PaliGemma VLM (3B). Trained on 10,000+ hours of robot data across diverse tasks/embodiments. Zero-shot cross-task transfer. Dexterous manipulation of deformable objects (folding laundry).
GR-1 / UniPi
Generalist Robot Policy with video prediction backbone. GR-1 (MIT 2024): GPT backbone predicts future video frames + actions jointly. UniPi: treats robot learning as text-conditioned video generation. Enables planning through imagination.
Action Tokenization
Discretizes continuous actions into vocabulary tokens (e.g., 256 bins per dimension). Enables LLM to directly predict actions. RT-2, OpenVLA use this. Challenges: precision loss, exponential action space. Alternatives: flow matching, regression head, diffusion head.
# Minimal Diffusion Policy inference (simplified) import torch from noise_scheduler import DDIMScheduler def diffusion_policy_infer(obs, policy_net, scheduler, n_steps=10): # Start from pure noise in action space noise = torch.randn((1, horizon, action_dim)) for t in scheduler.timesteps[:n_steps]: # Predict noise residual conditioned on observation noise_pred = policy_net(noise, t, obs_cond=obs) # DDIM denoising step noise = scheduler.step(noise_pred, t, noise).prev_sample return noise # denoised action sequence # Action chunking: execute first k steps, re-plan action_chunk = diffusion_policy_infer(obs, policy, sched) robot.execute(action_chunk[:8]) # execute 8 of 16 steps
Grasp Quality Metrics
Epsilon metric: radius of largest wrench ball in grasp wrench space (GWS). Q₁ = min singular value of G. Force-closure: object can resist arbitrary external wrench. Form-closure: pure geometry, no friction needed. Ferrari-Canny metric: gold standard.
Grasp Pose Estimation
GraspNet-1Billion: CNN predicts 6-DOF grasp poses from point clouds. AnyGrasp: foundation model for grasping, 1M+ diverse objects. Contact-GraspNet: estimates parallel-jaw grasps from depth images. GPD (Grasp Pose Detection): evaluates sampled antipodal grasp candidates.
In-Hand Manipulation
OpenAI Dactyl (2019): PPO + LSTM + domain randomization solves Rubik's cube with 24-DOF Shadow Hand. Key: 6000 CPU years in simulation. Sensor: Grasp stability from fingertip pressure + visual tracking. Enables rotation, regrasping, fine assembly.
Task and Motion Planning (TAMP)
Integrates symbolic task planning (PDDL) with continuous motion planning. PDDLStream: streams geometric samplers into task planner. TAMP solves long-horizon manipulation: grasp→place→push sequences. Challenges: combinatorial search, constraint satisfaction.
Deformable Object Manipulation
Cloth, cables, food: complex physical simulation. DiffCloth, PyBullet-Cloth, Flex. Challenges: partial observability, high-dimensional state. Approaches: point cloud tracking (PlasticineLab), graph neural networks, visual imitation. Active research area.
6-DOF Object Pose Estimation
FoundPose, FoundationPose (NVIDIA 2024): generalizes to novel objects with single RGB-D reference image. DenseFusion: color+depth fusion. PVN3D: 3D keypoint voting. OnePose: structure-from-motion based, textureless objects. Critical for precise pick-and-place.
| Algorithm | Completeness | Optimality | High-DOF | Real-Time | Best For | Complexity |
|---|---|---|---|---|---|---|
| RRT Sampling | ● Prob. | ● No | ● Yes | ● Medium | Single-query, cluttered | O(n log n) |
| RRT* Optimal | ● Prob. | ● Asympt. | ● Yes | ● Slow | Optimal paths, offline | O(n log n) |
| PRM Sampling | ● Prob. | ● No | ● Yes | ● Fast (query) | Multi-query, static env | O(n² log n) |
| A* Search | ● Yes | ● Yes | ● No | ● Medium | 2D/3D grid navigation | O(b^d) |
| D* Lite Search | ● Yes | ● Yes | ● No | ● Fast replan | Dynamic environments | O(k log k) |
| CHOMP Optim | ● Local | ● Local | ● Yes | ● Medium | Smooth, collision-free | O(n·iter) |
| MPPI Sampling | ● Prob. | ● Local | ● Yes | ● GPU fast | Nonlinear dynamics, GPU | O(N·H) parallel |
| RL Policy Learning | ● Stoch. | ● Approx. | ● Yes | ● Fast (infer) | Complex tasks, locomotion | O(1) inference |