Avatar of Bobby Filar

Bobby Filar

Sublime Security

Head of AI at Sublime Security. Research on agentic systems, LLM evaluation, adversarial ML, and AI governance.

  • About
  • Projects
  • Publications
  • Talks & News
  • CV

#evaluation

Content tagged with "evaluation"

MQL Benchmark
2026-05-15
#Evaluation #Benchmarks #LLMs

A 30,000-example open-source benchmark for evaluating natural-language → DSL generation, with a public model leaderboard.

View
Trust, Then Autonomy
2026-05-07
#AI Governance #Evaluation

A framework for evaluating earned autonomy in deployed AI systems.

View
Evaluating LLM Generated Detection Rules in Cybersecurity
2025-09-20 Anna Bertiger, Bobby Filar, Aryan Luthra, Stefano Meschiari, Aiden Mitchell, Sam Scholten, Vivek Sharath Conference on Applied Machine Learning in Information Security (CAMLIS) 2025
#LLMs #Evaluation #Cybersecurity #Featured

An open-source evaluation framework and three benchmark metrics for measuring LLM-generated cybersecurity detection rules.

View
CAMLIS 2025: Evaluating LLM-Generated Detection Rules
2025-09-20 Conference on Applied Machine Learning in Information Security (CAMLIS) 2025
#Paper #Evaluation

Paper accepted at CAMLIS 2025 — an open-source benchmark and three metrics (detection accuracy, economic cost of syntactic correctness, robustness of query) for measuring LLM-generated security rules.

View
Evaluating LLM-Generated Detection Rules
2025-09-20
#Evaluation #LLMs #Cybersecurity

A benchmark and three metrics for measuring LLM-generated cybersecurity rules — CAMLIS 2025.

View
© 2026 Bobby Filar.
Built with Academic Portfolio Astro