22x Faster Builds: Inside GALA's Compilation Performance Journey

In GALA 0.24.0, we reduced compilation time for multi-file packages from 44.7 seconds to 2.7 seconds -- a 22x improvement. This post covers what was slow, how we found it, and what we did about it.

The Problem

GALA transpiles .gala files to .go files. For single-file packages, this is fast -- parse, analyze, transform, emit. But real projects have multi-file packages. A server might have 7 files in the same package, each file needing to know about types, methods, and sealed types defined in the other 6.

Before 0.24.0, each file in a package was transpiled as a separate process invocation. The gala transpile command was called once per file, and each invocation:

Parsed the target file
Scanned all sibling files to extract type information
Analyzed the full dependency graph (imports, Go type inference, sealed type metadata)
Transformed the AST to Go
Emitted the output

For a 7-file package, step 2 and 3 happened 7 times, each time re-parsing the same siblings and re-resolving the same imports. The Go type inference step -- which shells out to go list to resolve return types and method signatures from Go packages -- was particularly expensive, adding hundreds of milliseconds per invocation.

The gala-server package (7 files) took 44.7 seconds. Most of that was redundant work.

Profiling First

GALA 0.24.0 introduced GALA_PROFILE=1, an environment variable that enables compilation profiling. When set, the transpiler emits timing breakdowns for every phase:

[PROFILE] Parse:     12ms
[PROFILE] Analyze:   340ms
[PROFILE] Transform: 85ms
[PROFILE] Emit:      3ms
[PROFILE] Total:     440ms

Running this across all 7 files revealed the pattern immediately: analysis dominated. Each file spent ~340ms in analysis, and the analysis for each file was doing nearly identical work -- re-parsing sibling files, re-resolving Go types, re-building sealed type metadata.

The profiling data made the optimization path obvious. Without it, we might have guessed wrong and optimized the transform phase (which was already fast) or the parser (which was negligible).

Lesson: always profile before optimizing. The GALA project enforces this as policy -- the memory file literally says "always profile with GALA_PROFILE=1 before optimizing."

Solution 1: BatchAnalyzer

The first optimization was architectural: share analysis state across files in the same package.

The BatchAnalyzer takes all files in a package at once. It parses each file, then runs a single unified analysis pass that:

Extracts type declarations, method signatures, and sealed type definitions from all files
Runs Go type inference once for the entire package's import set
Builds a shared metadata cache that every file can read from

Instead of 7 analysis passes (one per file), there is now 1 analysis pass for the whole package. The analysis result is then distributed to each file's transformer.

The implementation required refactoring the analyzer's entry point. Previously, NewGalaAnalyzer accepted a single file and optional sibling file paths. The new NewBatchAnalyzer accepts all files upfront and returns a shared analysis context:

Before:  7 files x 1 analyzer each  = 7 full analyses
After:   7 files x 1 shared analyzer = 1 full analysis + 7 lookups

Solution 2: Disk Cache

Even with batch analysis, the Go type inference step (resolving return types, struct fields, and method signatures from Go standard library and third-party packages) was still expensive for the first compilation of a package.

GALA 0.24.0 introduced a disk cache at .gala/cache/. The cache stores analysis results keyed by content hash -- a SHA-256 of the file contents plus the contents of all its imports. If the file and its dependencies haven't changed, the cached analysis is reused.

The cache format is simple: JSON files named by their content hash. On a warm cache, the analysis phase drops from ~340ms to ~5ms per file, because no Go type inference or sibling parsing is needed.

Cache invalidation is content-addressed, so it is always correct: if any source file changes, its hash changes, and the cache misses. No timestamps, no file watchers, no stale cache bugs.

Solution 3: Batch Transpilation

The final piece was eliminating the process-per-file overhead. Before 0.24.0, the build system (Bazel or gala build) invoked the transpiler binary once per .gala file. Each invocation paid the cost of process startup, Go runtime initialization, and flag parsing.

Batch transpilation runs a single transpiler process for all files in a package. The process:

Parses all input files
Runs the BatchAnalyzer (one shared analysis pass)
Transforms each file using the shared analysis context
Emits all .go files

This eliminated 6 process startups (for a 7-file package) and enabled the shared analysis to happen in-memory rather than being serialized to disk and re-read.

Results

Benchmarked on the gala-server package (7 .gala files):

Configuration	Time	Speedup
0.23.x (file-at-a-time, no cache)	44.7s	baseline
Batch analysis, no cache	6.1s	7.3x
Batch analysis + disk cache (cold)	5.8s	7.7x
Batch analysis + disk cache (warm)	2.7s	16.6x
Batch transpilation + warm cache	2.0s	22x

The warm-cache number is the steady-state experience during development: edit a file, rebuild, and the unchanged files hit the cache while only the modified file gets re-analyzed.

For single-file packages (like most examples), compilation time is unchanged at ~200-400ms. The optimization specifically targets the multi-file case where redundant work was the bottleneck.

Lessons Learned

1. Profile before optimizing. The profiling data pointed directly at analysis as the bottleneck. Without it, we might have spent time optimizing the wrong phase.

2. Shared state beats repeated computation. The batch analyzer is conceptually simple -- do the work once, share the result -- but it required restructuring the analyzer's API from "single file in, analysis out" to "all files in, shared context out."

3. Content-addressed caching is worth the investment. It is simpler than timestamp-based caching (no clock skew issues, no "did the file actually change?" ambiguity) and it is always correct. The overhead of computing SHA-256 hashes is negligible compared to the analysis it replaces.

4. Process startup costs add up. For a language tool that gets invoked hundreds of times during a build, the cost of spawning a new process each time is real. Batch mode amortizes this to one process per package.

5. The optimization was straightforward once the data was clear. There was no clever algorithm, no sophisticated caching strategy. The 22x improvement came from eliminating obviously redundant work that the profiler made visible.

Try It

GALA 0.24.0+ is available now. To see compilation timing for your own packages:

GALA_PROFILE=1 gala build ./...

The playground at https://gala-playground.fly.dev transpiles instantly (single-file), but for multi-file projects, the batch transpilation makes a meaningful difference in the edit-compile-run cycle.

22x Faster Builds: Inside GALA's Compilation Performance Journey

The Problem

Profiling First

Solution 1: BatchAnalyzer

Solution 2: Disk Cache

Solution 3: Batch Transpilation

Results

Lessons Learned

Try It

Comments

GALA From the Ground Up

The Type Information Problem

More from this blog

GALA in June: do-notation, applicative validation, and concurrent binds (0.56 -> 0.62)

From Zero to a Go Module on Bazel in One Sitting: Onboarding to GALA 0.53

Productionizing the GALA Build Stack: rules_gala, a Real Toolchain, and Gazelle

The State of GALA: May 2026

Splitting the Brain: Plugin vs LSP

Command Palette

The Problem

Profiling First

Solution 1: BatchAnalyzer

Solution 2: Disk Cache

Solution 3: Batch Transpilation

Results

Lessons Learned

Try It

Comments

GALA From the Ground Up

The Type Information Problem

More from this blog