Probabilities vs rules
Fine-tuning nudges a model toward your DSL by training on examples. After it, the model is more likely to produce valid syntax — but 'more likely' is not 'always', and in a language with strict structure a single wrong bracket or keyword invalidates the whole output. You're trusting a distribution to never roll the unlucky token.
Constrained decoding changes the question from 'how likely' to 'is it allowed'. The grammar becomes a mask applied to the model's choices, so tokens that would break the syntax are removed before sampling. The result can't be syntactically wrong because the wrong paths are never available. This is the same idea behind GBNF grammars in llama.cpp, Outlines, and XGrammar.
The data problem
Fine-tuning needs a training set — typically hundreds to thousands of examples to learn a grammar well. Most teams with a custom DSL have a grammar file and a handful of snippets, not a labelled corpus, so before fine-tuning does anything you'd have to synthesise that data first.
Constraining needs only the grammar you already have. There's no training run, no GPU, and nothing to label — which is why dslai's playground can demonstrate the guarantee in your browser the moment you paste a grammar.
Where fine-tuning still earns its place
Constraint guarantees syntax, not taste. If you need a small, cheap, self-hosted model to produce idiomatic, semantically sensible DSL — good names, sensible defaults, the patterns your team actually uses — fine-tuning on top of the constraint can lift quality. The key is to serve it multi-LoRA (one base model, an adapter swapped in per request) rather than as a dedicated warm GPU per customer, or the economics don't work.
Fine-tuning vs constrained decoding for DSL generation
| Fine-tuning | Constrained decoding (dslai) | |
|---|---|---|
| Validity guarantee | More likely, never certain | Valid by construction |
| What you need | Hundreds–thousands of examples | Just your grammar |
| Cost to try | GPU training run | Runs in the browser, free |
| Time to first result | Hours | Seconds |
| Best at | Semantic taste on small models | Syntactic correctness |
frequently asked
- Does constrained decoding make fine-tuning pointless?
- No. Constraining guarantees syntax; fine-tuning can still improve semantic quality on small self-hosted models. dslai uses the constraint as the foundation and treats fine-tuning as an optional upsell on top.
- Can't I just prompt a frontier model with my grammar?
- Often it'll follow it — but 'often' isn't a guarantee, and the model can still emit a token that breaks your syntax. A decoding constraint removes those tokens entirely, so the output always parses.
- Do I need a GPU to use constrained decoding?
- No. dslai's playground compiles your grammar and demonstrates guaranteed-valid generation in your browser, with no account and no GPU.
- Which models support a grammar constraint?
- Any open model you host through a runtime that accepts a GBNF-style grammar (llama.cpp, Outlines, XGrammar, and similar). The same grammar you test in the playground is what you hand the model in production.
Last updated June 7, 2026