Reproducible performance
The speed of metal.
ArkNet runs kernels on bare metal and routes across a competitive market. Benchmarks below are designed to be reproduced end-to-end with published scripts.
Warm p95
48ms
time to first token
Throughput
184 t/s
H100 PCIe
Cost
$0.65
per 1M output tokens
Benchmark
Cold start latency
Time to first token for a fresh 70B container. ArkNet uses snapshot-based restoration to reduce initialization overhead.
Metric
Time (lower is better)
Best: ArkNet (Warm)
ProviderTime (lower is better)
ArkNet (Warm)48ms
ArkNet (Cold)850ms
AWS Lambda (Prov. Conc.)2400ms
Standard Container (K8s)6500ms
Warm = pre-restored snapshot. Cold = fresh scheduling + restore.
Benchmark
Inference cost (Llama-3-70B)
Cost per 1M output tokens. ArkNet’s market pricing is driven by competitive supply and real-time demand.
Metric
Price (lower is better)
Best: ArkNet spot market
ProviderPrice (lower is better)
ArkNet spot market$0.65
Together AI$0.9
OpenAI (proxy)$2.5
Token pricing varies by region and model; publish your exact run parameters in PRs.
Benchmark
Throughput (H100 PCIe)
Tokens per second on identical hardware. Ark’s kernel compilation optimizes memory access patterns with predictable scheduling.
Metric
Tokens/sec (higher is better)
Best: Ark runtime
ProviderTokens/sec (higher is better)
Ark runtime184 t/s
vLLM (optimized)162 t/s
HuggingFace TGI145 t/s
Same GPU, same quantization, same input/output token budget.
Methodology
Reproduction details to keep comparisons fair, debuggable, and versioned.
Hardware
GPU
NVIDIA H100 80GB PCIeRegion
US-EastDriver
Pinned version (see repo)Workload
Model
Llama-3-70B-InstructQuant
FP8I/O
512 in / 128 out tokensNetwork + definition
Cold start
request → first byte at clientSDK
arknet-jsPercentile
p95 unless notedReproducibility
Scripts
Published + versionedConfig
Pinned parametersOutputs
Raw logs committedSubmit a PR to add your environment as a baseline. Keep runs pinned and repeatable.
Stop paying the virtualization tax.
Deploy kernels to deterministic hardware, validate outputs, and scale on demand.