MQL Benchmark
A 30,000-example open-source benchmark for evaluating natural-language → DSL generation, with a public model leaderboard.
Frameworks, agents, and open-source work.
A 30,000-example open-source benchmark for evaluating natural-language → DSL generation, with a public model leaderboard.
A framework for evaluating earned autonomy in deployed AI systems.
How we architected ASA and ADÉ for adversarial production.
A benchmark and three metrics for measuring LLM-generated cybersecurity rules — CAMLIS 2025.
Accelerating Adoption of Domain-Specific Languages with Large Language Models.
Open-source natural language to domain-specific language dataset for email security.
Malware bypass research using reinforcement learning.