Reliably generating structured outputs is a key capability for modern LLM applications. Despite its growing adoption, a systematic evaluation of structured output generation is still lacking. With JSON Schema emerging as the standard format for structured data, we introduce JSONSchemaBench a benchmark of around 10,000 real-world JSON schemas that capture a wide range of constraints and complexities. JSONSchemaBench helps to measure efficiency and coverage of a given structured output engine.
Figure 1: Comparison across various constrained-decoding frameworks by efficiency (speed of output generation), coverage (support for JSON Schema features), and quality (effects on underlying task accuracy).
On top of JSONSchemaBench's view on real world JSON schema, we also develop a purely performance-oriented benchmark MaskBench targeting mask computation times. This benchmark emphasizes results relevant to server-side deployments of constrained decoding.
Figure 2: Isolated performance of token mask computation (for server-side scenarios).
See MaskBench folder.
JSONSchemaBench is built from a collection of real-world JSON schemas drawn from diverse sources, including GitHub, Kubernetes configurations, and API specifications. The benchmark consists of schemas categorized into datasets based on complexity and domain. We start from collections from json-schema-corpus and did heavy curation to ensure the schemas are standard-compliant and satisfiable. We also added schemas from other sources to increase the diversity of the benchmark, such as GlaiveAI function call schemas and kubernetes schemas. We then categorized the schemas into datasets based on complexity and domain. The datasets are as follows:
Dataset | Category | Count |
---|---|---|
GlaiveAI-2K | Function Call | 1707 |
Github-Trivial | Misc | 444 |
Github-Easy | Misc | 1943 |
Snowplow | Operational API | 403 |
Github-Medium | Misc | 1976 |
Kubernetes | Kubernetes API | 1064 |
Washington Post | Resource Access API | 125 |
Github-Hard | Misc | 1240 |
JSONSchemaStore | Misc | 492 |
Github-Ultra | Misc | 164 |
Total | 9558 |
For statistics on the datasets and an overview of schema constraint features, please refer to the paper(link coming soon).
JSONSchemaBench is now available on the Hugging Face Hub. You can load it directly using the datasets
library:
from datasets import load_dataset
dataset = load_dataset("epfl-dlab/JSONSchemaBench")
print(dataset)
Each dataset split contains:
"json_schema"
: The schema definition."unique_id"
: A unique identifier for the schema.
data
βββ Github_easy
βββ Github_hard
βββ Github_medium
βββ Github_trivial
βββ Github_ultra
βββ Glaiveai2K
βββ JsonSchemaStore
βββ Kubernetes
βββ Snowplow
βββ WashingtonPost
Each folder contains the json schema included
data
βββ Github_easy
βββ track_calories_82bd0ec9.json
βββ track_calories_93d75421.json
βββ track_calories_be81cd27.json
βββ track_calories_d61851c9.json
βββ track_calories_d9be8839.json
βββ track_calories_ecbd9766.json
βββ track_expenses_a6fa070d.json
βββ track_expenses_c8268204.json
βββ track_fitness_activity_2989efaf.json
Step 1. Load a JSON Schema from the GlaiveAI Dataset:
import json
with open('data/Glaiveai2K/search_restaurants_d4619845.json') as f:
schema = json.load(f)
print(schema)
View Example JSON Schema
{
"properties": {
"cuisine": {
"description": "The cuisine to search for",
"type": "string"
},
"location": {
"description": "The location to search for restaurants",
"type": "string"
},
"price_range": {
"properties": {
"max_price": {
"description": "The maximum price range for restaurants",
"type": "number"
},
"min_price": {
"description": "The minimum price range for restaurants",
"type": "number"
}
},
"required": [
"min_price",
"max_price"
],
"type": "object"
}
},
"required": [
"location",
"cuisine",
"price_range"
],
"type": "object"
}
Step 2. Create a prompt to generate structured output:
prompt = f"Generate a function call that adheres to the following schema: {schema}"
# Feel free to include few-shot examples or additional context to improve the output.
Step 3. Use a Structured Output Generation Engine to generate output:
Below are examples for various structured output generation engines. For further details, refer to the official documentation of each engine.
View Code Example
Requires openai
library and a valid OpenAI API key. Install the library using:
pip install openai
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
# OpenAI requires "additional_properties": false in the schema and all nested sub-schemas.
schema["additionalProperties"] = False
schema["properties"]["price_range"]["additionalProperties"] = False
openai_response = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=[
{"role": "user", "content": prompt}
],
response_format={
"type": "json_schema",
"json_schema": {
"name": "default",
"strict": True,
"schema": schema
}
}
)
output:str = openai_response.choices[0].message.content
View Code Example
Requires google-generativeai
library and a valid Google API key. Install the library using:
pip install google-generativeai
import os
import google.generativeai as genai
genai.configure(api_key=os.environ["GOOGLE_API_KEY"])
model = genai.GenerativeModel("gemini-1.5-flash")
response = model.generate_content(contents=prompt, generation_config=genai.GenerationConfig(
response_mime_type="application/json", response_schema=schema
))
output:str = response.text
View Code Example
Requires guidance
and transformers
libraries. Install them using:
pip install guidance transformers
import guidance
operator = guidance.json(schema=schema, name='json_generation')
guidance_model = guidance.models.Transformers('meta-llama/Llama-3.2-1B-Instruct')
response = guidance_model + prompt + operator
output:str = response["json_generation"]
View Code Example
Requires xgrammar
and transformers
libraries. Install them using:
pip install xgrammar transformers
import xgrammar
from transformers import AutoTokenizer, AutoModelForCausalLM
hf_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
hf_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B-Instruct")
tokenizer_info = xgrammar.TokenizerInfo.from_huggingface(
hf_tokenizer, vocab_size=hf_model.config.vocab_size
)
grammar_compiler = xgrammar.GrammarCompiler(
tokenizer_info
)
compiled_grammar = grammar_compiler.compile_json_schema(json.dumps(schema))
logit_processor = xgrammar.contrib.hf.LogitsProcessor(compiled_grammar)
model_inputs = hf_tokenizer(prompt, return_tensors="pt")
output_ids = hf_model.generate(**model_inputs, logits_processor=[logit_processor], max_length=200)
generated_ids = output_ids[0][len(model_inputs.input_ids[0]) :]
output:str = hf_tokenizer.decode(generated_ids, skip_special_tokens=True)
View Code Example
Requires outlines
library. Install it using:
pip install outlines
import outlines
model = model = outlines.models.transformers(
model_name="meta-llama/Llama-3.2-1B-Instruct"
)
generator = outlines.generate.json(
model, schema_object=json.dumps(schema)
)
output:str = json.dumps(generator(prompt))
View Code Example
Requires llama_cpp
library. Install it using:
pip install llama-cpp-python
import llama_cpp
from llama_cpp.llama_grammar import LlamaGrammar
model = llama_cpp.Llama.from_pretrained(repo_id="bartowski/Llama-3.2-1B-Instruct-GGUF", filename="*Q8_0.gguf")
compiled_grammar = LlamaGrammar.from_json_schema(json.dumps(schema))
response = model.create_chat_completion(
messages=[
{"role": "user", "content": prompt}
],
grammar=compiled_grammar
)
output:str = response["choices"][0]["message"]["content"]
We validate the schemas using the jsonschema library, ensuring compliance with the JSON Schema Draft 2012 specification. Additionally, we enable the format
validation with a few custom format checkers for enhanced validation.
To install the required library, run:
pip install jsonschema
Hereβs an example of validating a structured output against a schema:
import json
from validation import validate_enhanaced
output:str = '{"cuisine": "Italian", "location": "New York", "price_range": {"max_price": 30, "min_price": 10}}'
validate_enhanaced(json.loads(output), schema)
For more details about the JSON Schema Test Suite used in the paper, visit the official repository. The results of the test suite coverage are shown below
We provide a feature checklist for each Structured Output Generation Engine based on their documentation and implementation. This provides a comprehensive overview of the supported JSON Schema features.
@misc{geng2025jsonschemabench,
title={Generating Structured Outputs from Language Models: Benchmark and Studies},
author={Saibo Geng and Hudson Cooper and MichaΕ Moskal and Samuel Jenkins and Julian Berman and Nathan Ranchin and Robert West and Eric Horvitz and Harsha Nori},
year={2025},
eprint={2501.10868},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.10868},
}