Skip to content
Ark Compiler v0.4.1

Write scalar. Execute parallel.

Ark is a compute-native language that treats tensors, hardware cost, and distributed state as first-class primitives. You write clear intent—the compiler produces hyper-optimized kernels and deterministic grid dispatch.

Type-level Shapes
Tensor<T, N>
Dims mathematically proven at compile time
Cost-aware Compile
VRAM Tracking
Fails fast before grid deployment
Portable Targets
PTX / WASM / SPIR-V
One pure source, multiple backends
The hard way (CUDA/C++)
Manual thread grids, raw pointers, host-to-device copies, and obscure launch parameters.
matmul.cu
__global__ void matrixMul(float *A, float *B, float *C, int N) {
  int row = blockIdx.y * blockDim.y + threadIdx.y;
  int col = blockIdx.x * blockDim.x + threadIdx.x;
  float sum = 0.0f;
  if (row < N && col < N) {
    for (int i = 0; i < N; i++) {
      sum += A[row * N + i] * B[i * N + col];
    }
    C[row * N + col] = sum;
  }
}
// + 100 lines of brittle memory management...
Compute Native
The Ark way
Write strict mathematical intent; the compiler handles tiling, fusion, and grid scheduling.
matmul.ark
// Matrix Multiplication
fn[gpu] matmul(a: Tensor<f32, 2>, b: Tensor<f32, 2>) -> Tensor<f32, 2> {

  // Ark natively handles tiling, shared memory layout, 
  // and deterministic kernel dispatch.
  return a @ b;

}
Zero-overhead

Forget malloc and cudaMemcpy.

High-performance kernels shouldn’t require hand-rolled pointer arithmetic, manual barrier synchronization, and endless launch tuning. Ark keeps the source logic mathematically pure, while the compiler and runtime handle optimal memory placement and hardware scheduling.

  • Zero-cost tensor abstractions (no manual launch params)
  • Compile-time verification of multidimensional shapes
  • Deterministic kernels across heterogeneous GPUs
  • Placement hints are explicit, readable, and enforceable
Runtime placement is a one-liner
Keep your functions pure. When you call them, append a runtime hint — an abstract network preset or a concrete local target.
Abstract Preset
let y = matmul(a, b) @runtime preset("prod");
Explicit Local Target
let y = matmul(a, b) @runtime { target: "gpu:0" };

Implicit parallelism

Write code as if it runs sequentially. The Ark compiler analyzes dependencies and mathematically proves how to parallelize operations across thousands of GPU cores.

  • Dependency analysis
  • Auto-tiling + loop fusion
  • Predictable hardware scheduling

Resource aware

The type system tracks exact VRAM footprints and hardware constraints. If your requested tensor allocations exceed the target's capacity, compilation fails instantly.

  • VRAM usage estimation
  • Constraint propagation
  • Fail-fast deployment grids

Hardware agnostic

Target Nvidia, AMD, and edge devices from one pure codebase. Ark serves as a universal intermediate representation with deterministic, bit-exact lowering.

  • Multi-backend lowering
  • Stable Abstract IR
  • Portable cryptographic artifacts
The Ark Pipeline

A compiler built for correctness,then ruthless optimization.

Ark Source
Frontend (HIR)
Optimizer (MIR)
Backends

High-Level Intermediate Representation

The compiler builds an abstract syntax tree and lowers it to HIR. Here, the heavy mathematical lifting occurs: verifying multidimensional shapes, executing borrow-checker rules, and enforcing isolation constraints.

Shape inference & validationBorrow & lifetime checkingSemantic correctness bounds

Ready to port your kernels?

Install the compiler, run through the interactive language tour, and deploy your first mathematically proven kernel to the grid.