Evaluating LLM Generated Detection Rules in Cybersecurity

September 2025 Anna Bertiger, Bobby Filar, Aryan Luthra, Stefano Meschiari, Aiden Mitchell, Sam Scholten, Vivek Sharath Conference on Applied Machine Learning in Information Security (CAMLIS) 2025

LLMs are increasingly pervasive in the security environment, with limited measures of their effectiveness. This paper presents an open-source evaluation framework and benchmark metrics for evaluating LLM-generated cybersecurity rules. The benchmark uses a holdout-set methodology to compare LLM-generated rules against a human-generated corpus, with three metrics inspired by how experts evaluate detection rules: detection accuracy (precision blended with unique-TP coverage), economic cost of syntactic correctness, and robustness of query.

The methodology is illustrated on Sublime Security’s Automated Detection Engineer (ADÉ), an agentic system that writes detections in MQL.

arXiv:2509.16749 · See also: project writeup