Reproducible performance

The speed of metal.

ArkNet runs kernels on bare metal and routes across a competitive market. Benchmarks below are designed to be reproduced end-to-end with published scripts.

Benchmark docs Reproduce on GitHub

Warm p95

48ms

time to first token

Throughput

184 t/s

H100 PCIe

Cost

$0.65

per 1M output tokens

Benchmark

Cold start latency

Time to first token for a fresh 70B container. ArkNet uses snapshot-based restoration to reduce initialization overhead.

Metric

Time (lower is better)

Best: ArkNet (Warm)

ProviderTime (lower is better)

ArkNet (Warm)48ms

ArkNet (Cold)850ms

AWS Lambda (Prov. Conc.)2400ms

Standard Container (K8s)6500ms

Warm = pre-restored snapshot. Cold = fresh scheduling + restore.

Benchmark

Inference cost (Llama-3-70B)

Cost per 1M output tokens. ArkNet’s market pricing is driven by competitive supply and real-time demand.

Metric

Price (lower is better)

Best: ArkNet spot market

ProviderPrice (lower is better)

ArkNet spot market$0.65

Together AI$0.9

OpenAI (proxy)$2.5

Token pricing varies by region and model; publish your exact run parameters in PRs.

Benchmark

Throughput (H100 PCIe)

Tokens per second on identical hardware. Ark’s kernel compilation optimizes memory access patterns with predictable scheduling.

Metric

Tokens/sec (higher is better)

Best: Ark runtime

ProviderTokens/sec (higher is better)

Ark runtime184 t/s

vLLM (optimized)162 t/s

HuggingFace TGI145 t/s

Same GPU, same quantization, same input/output token budget.

Methodology

Reproduction details to keep comparisons fair, debuggable, and versioned.

View raw data

Hardware

GPU

NVIDIA H100 80GB PCIe

Region

US-East

Driver

Pinned version (see repo)

Workload

Model

Llama-3-70B-Instruct

Quant

FP8

I/O

512 in / 128 out tokens

Network + definition

Cold start

request → first byte at client

SDK

arknet-js

Percentile

p95 unless noted

Reproducibility

Scripts

Published + versioned

Config

Pinned parameters

Outputs

Raw logs committed

Submit a PR to add your environment as a baseline. Keep runs pinned and repeatable.

Definitions Reproduce locally

Stop paying the virtualization tax.

Deploy kernels to deterministic hardware, validate outputs, and scale on demand.

Deploy a kernel Download CLI