JAX

Author

Chao Ma

Published

April 25, 2026

Notes on the JAX ecosystem for machine learning, from the core transformation model to the surrounding training stack.

Scaling Up How distributed data parallelism, fully sharded data parallelism, tensor parallelism, and JAX sharding primitives fit together when scaling training.

JAX NumPy How JAX NumPy differs from NumPy: compiled execution, immutable arrays, explicit randomness, automatic vectorization, pytrees, and explicit sharding.

Introducing Flax NNX How Flax NNX gives JAX a stateful neural-network API while preserving explicit RNG streams, JIT compilation, autodiff, and Optax updates.

JAX AI Stack How JAX, XLA, Flax NNX, Optax, Orbax, and Grain fit together into a modern training stack, plus the role of jit, grad, and vmap.

---
title: "JAX"
author: "Chao Ma"
date: "2026-04-25"
---

Notes on the JAX ecosystem for machine learning, from the core transformation model to the surrounding training stack.

---

::: {.content-grid}

::: {.content-card}
**[Scaling Up](scaling-up.qmd)**
How distributed data parallelism, fully sharded data parallelism, tensor parallelism, and JAX sharding primitives fit together when scaling training.
:::

::: {.content-card}
**[JAX NumPy](jax-numpy.qmd)**
How JAX NumPy differs from NumPy: compiled execution, immutable arrays, explicit randomness, automatic vectorization, pytrees, and explicit sharding.
:::

::: {.content-card}
**[Introducing Flax NNX](introducing-flax-nnx.qmd)**
How Flax NNX gives JAX a stateful neural-network API while preserving explicit RNG streams, JIT compilation, autodiff, and Optax updates.
:::

::: {.content-card}
**[JAX AI Stack](jax-ai-stack.qmd)**
How JAX, XLA, Flax NNX, Optax, Orbax, and Grain fit together into a modern training stack, plus the role of `jit`, `grad`, and `vmap`.
:::

:::