MQL Benchmark
A 30,000-example open-source benchmark for evaluating natural-language → DSL generation, with a public model leaderboard.
Content tagged with "llms"
A 30,000-example open-source benchmark for evaluating natural-language → DSL generation, with a public model leaderboard.
An open-source evaluation framework and three benchmark metrics for measuring LLM-generated cybersecurity detection rules.
A benchmark and three metrics for measuring LLM-generated cybersecurity rules — CAMLIS 2025.
Accelerating Adoption of Domain-Specific Languages with Large Language Models.
Open-source natural language to domain-specific language dataset for email security.