BabbelPhish Dataset

July 2023

A ~3,000-example dataset pairing natural language descriptions with Message Query Language (MQL) queries, intended for fine-tuning and evaluating LLMs in the email detection-engineering setting.

Superseded by the MQL Benchmark (~30,000 examples, four difficulty tiers, public leaderboard). Kept here as historical context.

Sources used to construct it:

Each example was reviewed by a human-in-the-loop annotation pass.

Dataset: huggingface.co/datasets/sublime-security/babbelphish