Skip to main content
Version: 1.1.0

Benchmarks

These numbers come from a reproducible go test -bench harness that lives in benchmarks/ in the repository. It measures Quark's per-operation overhead against a hand-written database/sql baseline and against four peer libraries — GORM (the reflect-ORM peer) plus ent and sqlc (the code-generation peers) — on the same model, schema, data, and operations.

:::note Microbenchmarks, not production timings The harness runs against in-memory SQLite so it isolates ORM and driver CPU/allocation overhead, not disk or network I/O. Against a networked database, that overhead is a small fraction of round-trip latency — do not read these microseconds as production request times. Numbers are also machine- and run-specific: treat the relative ratios as the signal and reproduce locally before drawing conclusions. :::

What is measured

Five operations, chosen because they exercise the reflect-based hot paths (row scanning, insert/update building) that the optional code generator targets:

BenchmarkOperation
InsertOneInsert a single row
InsertBatchInsert 100 rows in one batch
FindByPKSelect one row by primary key
ListWhereSelect up to 50 rows with a WHERE age >= ? filter
UpdateUpdate one row (all non-PK columns) by primary key

Each is implemented five ways against the same bench_users table:

  • Raw — hand-written database/sql with manual Scan/Exec; the floor.
  • Quark — the quark.For[T] API on the current reflect path.
  • GORM — the reflect-ORM peer.
  • ent — a code-generation ORM: a typed client generated from a schema, with a rich runtime (builders, mutations, hooks).
  • sqlc — a code generator that turns annotated SQL into thin typed wrappers over database/sql, with no runtime of its own.

ent and sqlc are the code-generation tier — the same tier Quark's own optional generated scanners/binders (shipped in v0.11.0) belong to.

:::note sqlc batch insert is not a multi-row VALUES sqlc emits no variadic multi-row INSERT for SQLite (its :copyfrom / :batch helpers are pgx-only), so its InsertBatch is a transaction-wrapped loop of single-row inserts — a real API asymmetry vs the multi-row VALUES batch the other four use. Read sqlc's InsertBatch number with that in mind. :::

How to reproduce

cd benchmarks
go test -run=^$ -bench=. -benchmem ./...

See the harness README for the full methodology, its limits, and how to add another ORM.

A representative run

Apple M4 Pro, macOS, go1.26.0 toolchain, modernc.org/sqlite v1.23.1, gorm.io/gorm v1.31.0, entgo.io/ent v0.14.6, sqlc v1.31.1, in-memory SQLite. Medians of -bench=. -benchmem -count=6, summarized with benchstat:

Time per operation (ns/op, lower is better):

OperationRawQuarkGORMentsqlc
InsertOne6,57212,94019,12013,0806,009
InsertBatch175,300263,600265,500302,300279,000
FindByPK7,86414,14010,40011,7507,544
ListWhere(50)33,90066,54054,36045,33035,770
Update2,8514,6118,32721,0003,014

Allocations per operation (allocs/op, lower is better):

OperationRawQuarkGORMentsqlc
InsertOne2061787721
InsertBatch6221,2771,2873,2782,307
FindByPK24656610025
ListWhere(50)365468705756374
Update15558414318

Reading the numbers

  • Code generation alone does not put you at the floor — the absence of a runtime does. sqlc sits right on the raw database/sql floor (~1.0–1.1×) because its generated code is thin wrappers with no runtime. ent is also code-generated, but it carries a rich runtime (builders, mutations); it lands in the reflect class on writes — its Update is the slowest here (it does the most work per call) and its InsertBatch allocates the most. So the speed difference across these libraries tracks runtime and allocation design, not reflect-vs-codegen.
  • Quark, GORM, and ent are in the same performance class. None dominates: Quark is faster than GORM on inserts and updates, GORM and ent are faster on the single-row read and the filtered list. Only sqlc is consistently faster, and it trades ergonomics for that (no batch helper on SQLite, hand-written SQL, no model lifecycle).
  • This is exactly why Quark's own code generation (shipped v0.11.0) was reframed. Profiled against this baseline, the generated scanners/binders recover only ~1–5% (benchmarks/PROFILING.md): they remove reflection but the cost is architectural allocation plus the driver round-trip — the same reason ent (codegen + a runtime) stays in the reflect class. So codegen in Quark is a type-safety feature, not a speedup, and the ADR-0002 ≥3× performance gate was retired (ADR-0017).

These results are not a claim of fastest-in-class. The per-operation figures have run-to-run noise (a few vary ±10–25% between runs); treat the relative ratios as the signal and reproduce locally before drawing conclusions.