Skip to content
Reproducible performance

The speed of metal.

ArkNet runs kernels on bare metal and routes across a competitive market. Benchmarks below are designed to be reproduced end-to-end with published scripts.

Warm p95
48ms
time to first token
Throughput
184 t/s
H100 PCIe
Cost
$0.65
per 1M output tokens
Benchmark

Cold start latency

Time to first token for a fresh 70B container. ArkNet uses snapshot-based restoration to reduce initialization overhead.

Metric
Time (lower is better)
Best: ArkNet (Warm)
ProviderTime (lower is better)
ArkNet (Warm)48ms
ArkNet (Cold)850ms
AWS Lambda (Prov. Conc.)2400ms
Standard Container (K8s)6500ms
Warm = pre-restored snapshot. Cold = fresh scheduling + restore.
Benchmark

Inference cost (Llama-3-70B)

Cost per 1M output tokens. ArkNet’s market pricing is driven by competitive supply and real-time demand.

Metric
Price (lower is better)
Best: ArkNet spot market
ProviderPrice (lower is better)
ArkNet spot market$0.65
Together AI$0.9
OpenAI (proxy)$2.5
Token pricing varies by region and model; publish your exact run parameters in PRs.
Benchmark

Throughput (H100 PCIe)

Tokens per second on identical hardware. Ark’s kernel compilation optimizes memory access patterns with predictable scheduling.

Metric
Tokens/sec (higher is better)
Best: Ark runtime
ProviderTokens/sec (higher is better)
Ark runtime184 t/s
vLLM (optimized)162 t/s
HuggingFace TGI145 t/s
Same GPU, same quantization, same input/output token budget.

Methodology

Reproduction details to keep comparisons fair, debuggable, and versioned.

View raw data
Hardware
GPU
NVIDIA H100 80GB PCIe
Region
US-East
Driver
Pinned version (see repo)
Workload
Model
Llama-3-70B-Instruct
Quant
FP8
I/O
512 in / 128 out tokens
Network + definition
Cold start
request → first byte at client
SDK
arknet-js
Percentile
p95 unless noted
Reproducibility
Scripts
Published + versioned
Config
Pinned parameters
Outputs
Raw logs committed
Submit a PR to add your environment as a baseline. Keep runs pinned and repeatable.

Stop paying the virtualization tax.

Deploy kernels to deterministic hardware, validate outputs, and scale on demand.